Performance

There are multiple different parts in dynasor that can be computationally expensive, and depending on the use case different parts of the code can become performance limiting. The most time-consuming part of the dynamic structure factor calculations is usually the calculation of the Fourier transformed densities. This part is implemented in numba, and is significantly accelerated by parallelization. See below for run times of typical use cases.

Here, we try to provide some rough estimates for how long various types of calculations may take.

If trajectory reading and other overheads are disregarded and the incoherent part is not evaluated (calculate_incoherent=False), calculations should scale as N_qpts * N_atoms * max_frames. Here N_qpts and N_atoms refer to the number of \(\boldsymbol{q}\)-points and atoms respectively. Correlating the density between different frames in the window is often very fast and thus simulations often do not scale with time_window.

If the incoherent part (calculate_incoherent=True) is evaluated calculations will typically scale as N_qpts * N_atoms * max_frames * time_window. The dynasor calculations will typically become slower when calculating the incoherent intermediate scattering function.

Parallelization

Parallelization is done both over \(\boldsymbol{q}\)-points (N_qpts) when calculating the densities, and over time lags time_window in the dynamic correlation functions. By default the calculations will be parallelized over all available cores.

Typical use cases

Here we will consider a few typical use-cases in order to provide a rough estimate for how long typical calculations may take, and how calculations may scale with number of available cores.

Few \(\boldsymbol{q}\)-points

Below are some time estimates for a three component system of about 40,000 atoms and with a trajectory containing 10,000 snapshots. Here, only 5 high-symmetry \(\boldsymbol{q}\)-points are used as is typical when analyzing solids. The time window is set to 1000. Note that reading the trajectory (xyz format) accounts for a large part of the total computational time.

_images/CsPbI3_natoms40000_xyz_few_qpts.svg

Many thousands of \(\boldsymbol{q}\)-points

Below are some time estimates for a three component system of about 10,000 atoms and with a trajectory containing 10,000 snapshots. Here, we consider 1000 \(\boldsymbol{q}\)-points and conduct a spherical averge over \(\boldsymbol{q}\)-points. The time window is set to 400.

_images/CsPbI3_natoms8640_xyz_many_qpts.svg

Trajectory reading

The time spent on trajectory reading can be significant for the plain text based formats (e.g., xyz and lammps-dump). Therefore, if possible, we recommend using the binary formats if performance is essential and trajectory reading is a computational bottleneck.

_images/dynasor_trajectory_reader_benchmark.svg

Time for reading a trajectory containing 10,000 snapshots in common trajectory formats.

The internal extxyz-reader is multithreaded with decent scaling. In the example only one core was used.