# The train.in input file

Jump to navigation
Jump to search

## Contents

## Purpose

- The
`train.in`

file contains all the training data for the NEP potential. - The
`test.in`

file contains all the testing data for the NEP potential. - These two files have the same format. Therefore, we only refer to the
`train.in`

file below for simplicity. - This file format is only applicable
**up to GPUMD-v3.3.1**.

## Data format

The data format in this file is fixed:

Nc N_1 has_virial [weight_1] N_2 has_virial [weight_2] ... N_Nc has_virial [weight_Nc] Data for configuration 1 Data for configuration 2 ... Data for configuration Nc

Here,

`Nc`

is the total number of configurations (systems, or structures).`N_i`

is the number of atoms in configuration`i`

.- The
`has_virial`

flag (can only be 0 or 1) dictates whether or not there is virial information for the current configuration. `[weight_i]`

is**optional**and it is the relative weight for configuration`i`

in the total loss function. This is introduced in**GPUMD-v3.0**. This item can be present or absent for each configuration. When it is absent, the relative weight for the corresponding configuration will be the default value of 1.- Data for one configuration occupy
`N_i + 2`

lines:- The first line should have 1 or 7 numbers. If
`has_virial`

for the current configuration is 0, this line only has one number, which is the**total energy**of the current configuration. If`has_virial`

for the current configuration is 1, this line has 7 numbers, which are the**total energy**of the current configuration followed by 6 virial components (in the order of`xx`

,`yy`

,`zz`

,`xy`

,`yz`

, and`zx`

) of the current configuration.**Note that this is different from the Voigt convention.** - The second line should have nine numbers defining the cell vectors ([math]\vec{a}[/math], [math]\vec{b}[/math], [math]\vec{c}[/math]):
`ax ay az bx by bz cx cy cz`

. - In the remaining
`N_i`

lines, each line contains 7 numbers, corresponding to the atom type, position components (`x`

,`y`

,`z`

), and force components (`fx`

,`fy`

,`fz`

):`type x y z fx fy fz`

.- In
**GPUMD-v2.6**, the atom type is the atomic number (that is, number of protons). - In
**GPUMD-v2.7**, the atom type is a non-negative integer defined by the user. In an [math]n[/math]-element system, these numbers should be integers from 0 to [math]n-1[/math]. For example, in PbTe, one can assign type 0 to Te and type 1 to Pb. - In
**GPUMD-2.8 and later**, the atom type is the atom symbol (such as H, He, Li).

- In

- The first line should have 1 or 7 numbers. If

## Units

In this file:

- Length and position are in units of Angstrom.
- Energy is in units of eV.
- Force is in units of eV/Angstrom.
- Virial is in units of eV (so virial divided by volume gives pressure).

## Tips

- Periodic boundary conditions are always assumed for all directions in each configuration. When the box thickness in a direction is smaller than twice of the radial cutoff distance, the code will replicate the box in that direction.
- The minimal number of atoms in a configuration is 1. The user is responsible for choosing a
**good**referene energy when preparing the energy data. But this is not really important: absolute energy is meaningless. - The energy and virial data refer to the total energy and virial for the system. They are not per-atom but per-box quantities.