The train.xyz input file

From GPUMD
Jump to navigation Jump to search

Purpose

  • The train.xyz file contains all the training data for the NEP potential.
  • The test.xyz file contains all the testing data for the NEP potential.
  • These two files have the same format. Therefore, we only refer to the train.xyz file below for simplicity.
  • This file format will be used starting from GPUMD-v3.4.

Overall data format

  • This file follows the so-called extended XYZ file format, as proposed here: https://github.com/libAtoms/extxyz
  • Each structure (or configuration, or frame) occupies [math]N+2[/math] lines, where [math]N[/math] is the number of atoms in the structure.

Data format for a single structure

Line 1

The first line should have one item only, which is the number of atoms in the structure [math]N[/math].

Line 2

  • This line consists of a number of keyword=value pairs separated by spaces. Spaces before and after = are allowed. All the characters are case-insensitive. value can be a single item or a number of items enclosed by double quotes, such as keyword="value_1 value_2 value_3". Here, the different values are separated by spaces and spaces after the left " and before the right " are allowed. For example, one can write keyword=" value_1 value_2 value_3 ".
  • Essentially any keyword is allowd, but we only read the following ones:
    • This is mandatory. lattice="ax ay az bx by bz cx cy cz" gives the box vectors:

$$\boldsymbol{a} = a_x \boldsymbol{e}_x + a_y \boldsymbol{e}_y + a_z \boldsymbol{e}_z;$$ $$\boldsymbol{b} = b_x \boldsymbol{e}_x + b_y \boldsymbol{e}_y + b_z \boldsymbol{e}_z;$$ $$\boldsymbol{c} = c_x \boldsymbol{e}_x + c_y \boldsymbol{e}_y + c_z \boldsymbol{e}_z;$$

    • This is mandatory. energy=energy_value such as energy=-123.4 gives the target energy of the structure, which is -123.4 eV in this example.
    • This is optional. virial="vxx vxy vxz vyx vyy vyz vzx vzy vzz" gives the 3*3 virial tensor of the structure.
    • This is optional. weight=relative_weight gives the relative weight for the current structure in the total loss function.
    • This is mandatory. properties=property_name:data_type:number_of_columns. We only read the following items (mandatory):
      • species:S:1 atom symbol in the periodic table (case-sensitive)
      • pos:R:3 position vector
      • force:R:3 or forces:R:3 target force vector

Starting from line 3

  • Each line will have the same number of items, which are determined by the property keyword in line 2.

Units

In this file:

  • Length and position are in units of Angstrom.
  • Energy is in units of eV.
  • Force is in units of eV/Angstrom.
  • Virial is in units of eV (so virial divided by volume gives pressure).

Tips

  • Periodic boundary conditions are always assumed for all directions in each configuration. When the box thickness in a direction is smaller than twice of the radial cutoff distance, the code will replicate the box in that direction.
  • The minimal number of atoms in a configuration is 1. The user is responsible for choosing a good referene energy when preparing the energy data. But this is not really important: absolute energy is meaningless.
  • The energy and virial data refer to the total energy and virial for the system. They are not per-atom but per-box quantities.