The train.xyz input file
Jump to navigation
Jump to search
Contents
Purpose
- The
train.xyz
file contains all the training data for the NEP potential. - The
test.xyz
file contains all the testing data for the NEP potential. - These two files have the same format. Therefore, we only refer to the
train.xyz
file below for simplicity. - This file format will be used starting from GPUMD-v3.4.
Overall data format
- This file follows the so-called extended XYZ file format, as proposed here: https://github.com/libAtoms/extxyz
- Each structure (or configuration, or frame) occupies [math]N+2[/math] lines, where [math]N[/math] is the number of atoms in the structure.
Data format for a single structure
Line 1
The first line should have one item only, which is the number of atoms in the structure [math]N[/math].
Line 2
- This line consists of a number of
keyword=value
pairs separated by spaces. Spaces before and after=
are allowed. All the characters are case-insensitive.value
can be a single item or a number of items enclosed by double quotes, such askeyword="value_1 value_2 value_3"
. Here, the different values are separated by spaces and spaces after the left"
and before the right"
are allowed. For example, one can writekeyword=" value_1 value_2 value_3 "
. - Essentially any keyword is allowd, but we only read the following ones:
- This is mandatory.
lattice="ax ay az bx by bz cx cy cz"
gives the box vectors:
- This is mandatory.
$$\boldsymbol{a} = a_x \boldsymbol{e}_x + a_y \boldsymbol{e}_y + a_z \boldsymbol{e}_z;$$ $$\boldsymbol{b} = b_x \boldsymbol{e}_x + b_y \boldsymbol{e}_y + b_z \boldsymbol{e}_z;$$ $$\boldsymbol{c} = c_x \boldsymbol{e}_x + c_y \boldsymbol{e}_y + c_z \boldsymbol{e}_z;$$
- This is mandatory.
energy=energy_value
such asenergy=-123.4
gives the target energy of the structure, which is -123.4 eV in this example. - This is optional.
virial="vxx vxy vxz vyx vyy vyz vzx vzy vzz"
gives the 3*3 virial tensor of the structure. - This is optional.
weight=relative_weight
gives the relative weight for the current structure in the total loss function. - This is mandatory.
properties=property_name:data_type:number_of_columns
. We only read the following items (mandatory):species:S:1
atom symbol in the periodic table (case-sensitive)pos:R:3
position vectorforce:R:3
orforces:R:3
target force vector
- This is mandatory.
Starting from line 3
- Each line will have the same number of items, which are determined by the
property
keyword in line 2.
Units
In this file:
- Length and position are in units of Angstrom.
- Energy is in units of eV.
- Force is in units of eV/Angstrom.
- Virial is in units of eV (so virial divided by volume gives pressure).
Tips
- Periodic boundary conditions are always assumed for all directions in each configuration. When the box thickness in a direction is smaller than twice of the radial cutoff distance, the code will replicate the box in that direction.
- The minimal number of atoms in a configuration is 1. The user is responsible for choosing a good referene energy when preparing the energy data. But this is not really important: absolute energy is meaningless.
- The energy and virial data refer to the total energy and virial for the system. They are not per-atom but per-box quantities.