The train.in input file

From GPUMD
Jump to navigation Jump to search

Purpose

  • The train.in file contains all the training data for the NEP potential.
  • The test.in file contains all the testing data for the NEP potential.
  • These two files have the same format. Therefore, we only refer to the train.in file below for simplicity.
  • This file format is only applicable up to GPUMD-v3.3.1.

Data format

The data format in this file is fixed:

Nc
N_1  has_virial [weight_1]
N_2  has_virial [weight_2]
...
N_Nc has_virial [weight_Nc]
Data for configuration 1
Data for configuration 2
...
Data for configuration Nc

Here,

  • Nc is the total number of configurations (systems, or structures).
  • N_i is the number of atoms in configuration i.
  • The has_virial flag (can only be 0 or 1) dictates whether or not there is virial information for the current configuration.
  • [weight_i] is optional and it is the relative weight for configuration i in the total loss function. This is introduced in GPUMD-v3.0. This item can be present or absent for each configuration. When it is absent, the relative weight for the corresponding configuration will be the default value of 1.
  • Data for one configuration occupy N_i + 2 lines:
    • The first line should have 1 or 7 numbers. If has_virial for the current configuration is 0, this line only has one number, which is the total energy of the current configuration. If has_virial for the current configuration is 1, this line has 7 numbers, which are the total energy of the current configuration followed by 6 virial components (in the order of xx, yy, zz, xy, yz, and zx) of the current configuration. Note that this is different from the Voigt convention.
    • The second line should have nine numbers defining the cell vectors ([math]\vec{a}[/math], [math]\vec{b}[/math], [math]\vec{c}[/math]): ax ay az bx by bz cx cy cz.
    • In the remaining N_i lines, each line contains 7 numbers, corresponding to the atom type, position components (x, y, z), and force components (fx, fy, fz): type x y z fx fy fz.
      • In GPUMD-v2.6, the atom type is the atomic number (that is, number of protons).
      • In GPUMD-v2.7, the atom type is a non-negative integer defined by the user. In an [math]n[/math]-element system, these numbers should be integers from 0 to [math]n-1[/math]. For example, in PbTe, one can assign type 0 to Te and type 1 to Pb.
      • In GPUMD-2.8 and later, the atom type is the atom symbol (such as H, He, Li).

Units

In this file:

  • Length and position are in units of Angstrom.
  • Energy is in units of eV.
  • Force is in units of eV/Angstrom.
  • Virial is in units of eV (so virial divided by volume gives pressure).

Tips

  • Periodic boundary conditions are always assumed for all directions in each configuration. When the box thickness in a direction is smaller than twice of the radial cutoff distance, the code will replicate the box in that direction.
  • The minimal number of atoms in a configuration is 1. The user is responsible for choosing a good referene energy when preparing the energy data. But this is not really important: absolute energy is meaningless.
  • The energy and virial data refer to the total energy and virial for the system. They are not per-atom but per-box quantities.