# The train.in input file

## Purpose

This file contains all the training data, possibly from DFT calculations.

## Data format

The data format in this file is fixed:

Nc
N_1  has_virial
N_2  has_virial
...
N_Nc has_virial
Data for configuration 1
Data for configuration 2
...
Data for configuration Nc


Here,

• Nc is the total number of configurations (systems).
• N_i is the number of atoms in configuration i.
• The has_virial flag (can only be 0 or 1) dictates whether or not there is virial information for the current configuration.
• Data for one configuration occupy N_i + 2 lines:
• The first line should have 1 or 7 numbers. If has_virial for the current configuration is 0, this line only has one number, which is the total energy of the current configuration. If has_virial for the current configuration is 1, this line has 7 numbers, which are the total energy of the current configuration followed by 6 virial components (in the order of xx, yy, zz, xy, yz, and zx) of the current configuration.
• The second line should have nine numbers defining the cell vectors ($\vec{a}$, $\vec{b}$, $\vec{c}$)
ax ay az bx by bz cx cy cz

• In the remaining N_i lines, each line contains 7 numbers, corresponding to the atomic number (that is, number of protons, Z), position components (x, y, z), and force components (fx, fy, fz):
Z x y z fx fy fz


## Units

In this file:

• Length and position are in units of Angstrom.
• Energy is in units of eV.
• Force is in units of eV/Angstrom.
• Virial is in units of eV (so virial divided by volume gives pressure).

## Tips

• Periodic boundary conditions are always assumed for all directions in each configuration. We use the minimum image convetion, and it is the responsibility of the user to make sure that the box is large enough for the chosen cutoff distance.
• The minimal number of atoms in a configuration is 2. The user is responsible for choosing a good referene energy when preparing the energy data.
• The energy and virial data refer to the total energy and virial for the system. They are not per-atom but per-box quantities.