# The train.in input file

## Purpose

• The train.in file contains all the training data for the NEP potential.
• The test.in file contains all the testing data for the NEP potential.
• These two files have the same format. Therefore, we only refer to the train.in file below for simplicity.
• This file format is only applicable up to GPUMD-v3.3.1.

## Data format

The data format in this file is fixed:

Nc
N_1  has_virial [weight_1]
N_2  has_virial [weight_2]
...
N_Nc has_virial [weight_Nc]
Data for configuration 1
Data for configuration 2
...
Data for configuration Nc


Here,

• Nc is the total number of configurations (systems, or structures).
• N_i is the number of atoms in configuration i.
• The has_virial flag (can only be 0 or 1) dictates whether or not there is virial information for the current configuration.
• [weight_i] is optional and it is the relative weight for configuration i in the total loss function. This is introduced in GPUMD-v3.0. This item can be present or absent for each configuration. When it is absent, the relative weight for the corresponding configuration will be the default value of 1.
• Data for one configuration occupy N_i + 2 lines:
• The first line should have 1 or 7 numbers. If has_virial for the current configuration is 0, this line only has one number, which is the total energy of the current configuration. If has_virial for the current configuration is 1, this line has 7 numbers, which are the total energy of the current configuration followed by 6 virial components (in the order of xx, yy, zz, xy, yz, and zx) of the current configuration. Note that this is different from the Voigt convention.
• The second line should have nine numbers defining the cell vectors ($\vec{a}$, $\vec{b}$, $\vec{c}$): ax ay az bx by bz cx cy cz.
• In the remaining N_i lines, each line contains 7 numbers, corresponding to the atom type, position components (x, y, z), and force components (fx, fy, fz): type x y z fx fy fz.
• In GPUMD-v2.6, the atom type is the atomic number (that is, number of protons).
• In GPUMD-v2.7, the atom type is a non-negative integer defined by the user. In an $n$-element system, these numbers should be integers from 0 to $n-1$. For example, in PbTe, one can assign type 0 to Te and type 1 to Pb.
• In GPUMD-2.8 and later, the atom type is the atom symbol (such as H, He, Li).

## Units

In this file:

• Length and position are in units of Angstrom.
• Energy is in units of eV.
• Force is in units of eV/Angstrom.
• Virial is in units of eV (so virial divided by volume gives pressure).

## Tips

• Periodic boundary conditions are always assumed for all directions in each configuration. When the box thickness in a direction is smaller than twice of the radial cutoff distance, the code will replicate the box in that direction.
• The minimal number of atoms in a configuration is 1. The user is responsible for choosing a good referene energy when preparing the energy data. But this is not really important: absolute energy is meaningless.
• The energy and virial data refer to the total energy and virial for the system. They are not per-atom but per-box quantities.