# The train.xyz input file

Jump to navigation
Jump to search

## Contents

## Purpose

- The
`train.xyz`

file contains all the training data for the NEP potential. - The
`test.xyz`

file contains all the testing data for the NEP potential. - These two files have the same format. Therefore, we only refer to the
`train.xyz`

file below for simplicity. - This file format will be used
**starting from GPUMD-v3.4**.

## Overall data format

- This file follows the so-called extended XYZ file format, as proposed here: https://github.com/libAtoms/extxyz
- Each structure (or configuration, or frame) occupies [math]N+2[/math] lines, where [math]N[/math] is the number of atoms in the structure.

## Data format for a single structure

### Line 1

The first line should have one item only, which is the number of atoms in the structure [math]N[/math].

### Line 2

- This line consists of a number of
`keyword=value`

pairs separated by spaces. Spaces before and after`=`

are**allowed**. All the characters are**case-insensitive**.`value`

can be a single item or a number of items enclosed by double quotes, such as`keyword="value_1 value_2 value_3"`

. Here, the different values are separated by spaces and spaces after the left`"`

and before the right`"`

are**allowed**. For example, one**can**write`keyword=" value_1 value_2 value_3 "`

. - Essentially any keyword is allowd, but we only read the following ones:
- This is
**mandatory**.`lattice="ax ay az bx by bz cx cy cz"`

gives the box vectors:

- This is

$$\boldsymbol{a} = a_x \boldsymbol{e}_x + a_y \boldsymbol{e}_y + a_z \boldsymbol{e}_z;$$ $$\boldsymbol{b} = b_x \boldsymbol{e}_x + b_y \boldsymbol{e}_y + b_z \boldsymbol{e}_z;$$ $$\boldsymbol{c} = c_x \boldsymbol{e}_x + c_y \boldsymbol{e}_y + c_z \boldsymbol{e}_z;$$

- This is
**mandatory**.`energy=energy_value`

such as`energy=-123.4`

gives the target energy of the structure, which is -123.4 eV in this example. - This is
**optional**.`virial="vxx vxy vxz vyx vyy vyz vzx vzy vzz"`

gives the 3*3 virial tensor of the structure. - This is
**optional**.`weight=relative_weight`

gives the relative weight for the current structure in the total loss function. - This is
**mandatory**.`properties=property_name:data_type:number_of_columns`

. We only read the following items (mandatory):`species:S:1`

atom symbol in the periodic table (**case-sensitive**)`pos:R:3`

position vector`force:R:3`

or`forces:R:3`

target force vector

- This is

### Starting from line 3

- Each line will have the same number of items, which are determined by the
`property`

keyword in line 2.

## Units

In this file:

- Length and position are in units of Angstrom.
- Energy is in units of eV.
- Force is in units of eV/Angstrom.
- Virial is in units of eV (so virial divided by volume gives pressure).

## Tips

- Periodic boundary conditions are always assumed for all directions in each configuration. When the box thickness in a direction is smaller than twice of the radial cutoff distance, the code will replicate the box in that direction.
- The minimal number of atoms in a configuration is 1. The user is responsible for choosing a
**good**referene energy when preparing the energy data. But this is not really important: absolute energy is meaningless. - The energy and virial data refer to the total energy and virial for the system. They are not per-atom but per-box quantities.