Quick start#
Running a fit with pacemaker
requires at least two components: fitting dataset and configurational input file. Fitting dataset
contains structural information as well as corresponding energies and forces that are subject to fitting with ACE.
Input file contains details about desired ACE potential configuration and various parameters influencing optimization
process.
In this section we will describe the format of the fitting dataset, we will run a fit with an example dataset and
overview the output produced by pacemaker
. Input parameters are detailed in the section below.
Fitting dataset preparation#
In order to use your data for fitting with pacemaker
one would need to provide it in the form of pandas
DataFrame.
An example DataFrame can be red as:
import pandas as pd
df = pd.read_pickle("../data/exmpl_df.pckl.gzip", compression="gzip", protocol=4)
And it contains the following entries:
energy | forces | ase_atoms | energy_corrected | |
---|---|---|---|---|
0 | -3.69679 | [[0.0, 0.0, 0.0]] | Atoms(symbols='Al', pbc=True, cell=[[0.0, 1.949947, 1.949947], [1.949947, 0.0, 1.949947], [1.949947, 1.949947, 0.0]]) | -3.69679 |
1 | -3.71569 | [[0.0, 0.0, 0.0]] | Atoms(symbols='Al', pbc=True, cell=[[0.0, 1.964285, 1.964285], [1.964285, 0.0, 1.964285], [1.964285, 1.964285, 0.0]]) | -3.71569 |
2 | -3.72955 | [[0.0, 0.0, 0.0]] | Atoms(symbols='Al', pbc=True, cell=[[0.0, 1.978417, 1.978417], [1.978417, 0.0, 1.978417], [1.978417, 1.978417, 0.0]]) | -3.72955 |
3 | -3.7389 | [[0.0, 0.0, 0.0]] | Atoms(symbols='Al', pbc=True, cell=[[0.0, 1.99235, 1.99235], [1.99235, 0.0, 1.99235], [1.99235, 1.99235, 0.0]]) | -3.7389 |
4 | -3.74421 | [[0.0, 0.0, 0.0]] | Atoms(symbols='Al', pbc=True, cell=[[0.0, 2.006091, 2.006091], [2.006091, 0.0, 2.006091], [2.006091, 2.006091, 0.0]]) | -3.74421 |
-
Columns have the following meaning:
ase_atoms
: is the instance of the ASE Atoms class. This is the main form of storing structural information thatpacemkaer
relies on. It must contain information about atomic positions, corresponding atom types, pbc and lattice vectors.energy
: total energy of the correspondingase_atoms
structure (in eV).forces
: corresponding atomic forces in the form of 2D array with dimensions [NumberOfAtoms, 3] (in eV/A).energy_corrected
: total energy of a structure minus a reference energy.
Reference energy might be different depending on the dataset at hand. In general, one would prefer to reference
energy
against the free atom energies. In this caseenergy_corrected
corresponds to the cohesive energy. If the free atom energies are not available, reference energy might be any constant shift or 0. In this exampleenergy
is already the cohesive energy.
NOTE: regardless howenergy_corrected
is produced, this is the energy that will be used for fitting.
One could create such DataFrame from raw data following this example:
import pandas as pd
from ase import Atoms
# Collect raw data for the first structure
# Positions
pos1 = [[2.04748516, 2.04748516, 0. ],
[0. , 0. , 0. ],
[2.04748516, 0. , 1.44281847],
[0. , 2.04748516, 1.44475745]]
# Matrix of lattice vectors
lattice1 = [[4.09497 , 0. , 0. ],
[0. , 4.09497 , 0. ],
[0. , 0. , 2.887576]]
# Atomic symbols
symbls1 = ['Al', 'Al', 'Ni', 'Ni']
# energy
e1 = -21.07723361
# Forces
f1 = [[0.0, 0.0, 0.0],
[0.0, 0.0, 0.0],
[0.0, 0.0, 0.00725587],
[0.0, 0.0, -0.00725587]]
# create ASE atoms
at1 = Atoms(symbols=symbls1, positions=pos1, cell=lattice1, pbc=True)
#Collect raw data for the second structure
pos2 = [[0., 0., 0.]]
lattice2 = [[0. , 1.781758, 1.781758],
[1.781758, 0. , 1.781758],
[1.781758, 1.781758, 0. ]]
symbls2 = ['Ni']
e2 = -5.45708644
f2 = [[0.0, 0.0, 0.0]]
at2 = Atoms(symbols=symbls2, positions=pos2, cell=lattice2, pbc=True)
# set reference energy to 0
reference_energy = 0
# collect all the data into a dictionary
data = {'energy': [e1, e2],
'forces': [f1, f2],
'ase_atoms': [at1, at2],
'energy_corrected': [e1 - reference_energy, e2 - reference_energy]}
# create a DataFrame
df = pd.DataFrame(data)
# and save it
df.to_pickle('my_data.pckl.gzip', compression='gzip', protocol=4)
The resulting dataframe can be used for fitting with pacemaker
.
Creating an input file#
In order to fit an ACE potential to the data prepared following the previous section, one need to create a configurational
file with relevant settings. pacemaker
utilizes .yaml
format for configurations. An input file template can be created
by running pacemaker --template
(or pacemaker -t
). Doing so will produce an input.yaml
file with the most general
settings that can be adjusted for a particular task. Detailed overview of the input file parameters can be found in the
section below.
In this example we will use template as it is, however one would need to provide a path to the
example dataset exmpl_df.pckl.gzip
. This can be done by changing filename
parameter in the data
section of the
input.yaml
:
data:
filename: /path/to/the/pyace/data/exmpl_df.pckl.gzip
Run fitting#
Running a fit is as easy as executing the command:
pacemaker input.yaml
or to run the fitting process in the background:
nohup pacemaker input.yaml &
For more pacemaker
command options see the corresponding section.
Default behavior of pacemaker is to utilize a GPU accelerated fitting of ACE using tensorpotential
. However, GPU
parallelization is not supported at the moment. Therefore, if your machine has a multi GPU setup one would need to select
a single one before running pacemaker
. This can be done by executing export CUDA_VISIBLE_DEVICES=ind
in the shell
replacing ind
with the GPU index (i.g. 0, 1, ...) or -1 to disable GPU usage.
Note, that tensorpotential
can be used without a GPU as well.
Analysis#
During and after the fitting pacemaker
produces several outputs, including:
interim_potential_X.yaml
: current state of the potential at each iteration of fit cycle (i.g. X=0, 1, ...)interim_potential_best_cycle.yaml
: best out of X interim potentialslog.txt
: log file containing all current information including summary of the optimization steps.report
: folder containing figures displaying various error statistics and distributions.output_potential.yaml
: final fitted potential.
There are two main types of the information in the log file:
- optimization step log:
Iteration #999 (1052 evals): Loss: 0.000192 | RMSE Energy(low): 17.95 (16.79) meV/at | Forces(low): 7.89 (7.04) meV/A | Time/eval: 517.83 mcs/at
where Iteration
is the index of the optimization step performed by the optimizer
(number in parentheses shows the number of function evaluation calls done by optimizaer), Loss
is the current value of the loss function, RMSE Energy/Forces
is the current root mean-squared error
for energy/forces wrt. training dataset (numbers in paretheses show corresponding values for the structures which
energy is not greater than e_min + 1 eV
, where e_min
is the lowest energy in the training set). Time/eval
shows the computational time spent on evaluating loss function and it's gradient for the training dataset
averaged across evaluations and divided by the number of atoms. This timing doesn't include optimization step itself.
- fit statistics:
--------------------------------------------FIT STATS--------------------------------------------
Iteration: #1000Loss: Total: 1.9159e-04 (100%)
Energy: 1.6074e-04 ( 84%)
Force: 3.0859e-05 ( 16%)
L1: 0.0000e+00 ( 0%)
L2: 0.0000e+00 ( 0%)
Number of params./funcs: 232/86 Avg. time: 526.93 mcs/at
-------------------------------------------------------------------------------------------------
Energy/at, meV/at Energy_low/at, meV/at Force, meV/A Force_low, meV/A
RMSE: 17.93 16.73 7.86 7.06
MAE: 12.22 11.11 5.31 3.30
MAX_AE: 53.19 38.30 35.19 20.32
-------------------------------------------------------------------------------------------------
Every display_step the summary of fit statistics is printed out. It displays the total loss function value and contributions to it from energy, forces and other regularizations parameters. In addition to RMSE, mean-absolute error (MAE) and maximum absolute error (MAX_AE) are also printed.
Using fitted potential#
Fitted potential can be used for calculations both within python/ASE as well as LAMMPS.
ASE#
Python interface of the ACE potential is realized via ASE calculator:
from ase import Atoms
from pyace import PyACECalculator
# use the example of the Atoms from the first section
# Positions
pos1 = [[2.04748516, 2.04748516, 0. ],
[0. , 0. , 0. ],
[2.04748516, 0. , 1.44281847],
[0. , 2.04748516, 1.44475745]]
# Matrix of lattice vectors
lattice1 = [[4.09497 , 0. , 0. ],
[0. , 4.09497 , 0. ],
[0. , 0. , 2.887576]]
# Atomic symbols
symbls1 = ['Al', 'Al', 'Ni', 'Ni']
# create ASE atoms
at1 = Atoms(symbols=symbls1, positions=pos1, cell=lattice1, pbc=True)
# Create calculator
calc = PyACECalculator('output_potential.yaml')
# Attach it to the Atmos
at1.set_calculator(calc)
# Evaluate properties
energy = at1.get_potential_energy()
forces = at1.get_forces()
LAMMPS#
Using potential with LAMMPS requires its conversion into YACE format with command
pace_yaml2yace output_potential.yaml
that will generate output_potential.yace
file, which you could use in LAMMPS input file
## in.lammps
pair_style pace
pair_coeff * * output_potential.yace Al Ni
LAMMPS compilation:#
You could get the supported version of LAMMPS from GitHub repository
Build with make
#
Follow LAMMPS installation instructions
- Go to
lammps/src
folder - Compile the ML-PACE library by running
make lib-pace args="-b"
- Include
ML-PACE
in the compilation by runningmake yes-ml-pace
- Compile lammps as usual, i.e.
make serial
ormake mpi
.
Build with cmake
#
- Create build directory and go there with
cd lammps
mkdir build
cd build
- Configure the lammps build with
cmake -DCMAKE_BUILD_TYPE=Release -DPKG_ML-PACE=ON ../cmake
or
cmake -DCMAKE_BUILD_TYPE=Release -D BUILD_MPI=ON -DPKG_ML-PACE=ON ../cmake
For more information see here.
- Build LAMMPS using
cmake --build .
ormake