Skip to content

Output data formats

Urs Ganse edited this page Nov 27, 2020 · 16 revisions

Three kinds of data

Vlasiator produces three kinds of output files during a simulation run, the contents of which vary based on simulation parameters:

  1. logfile.txt, the simulation run log. This is a timestamped ascii file providing basic diagnostic output of the run, including memory usage, time steps etc.
  2. diagnostic.txt. The contents of this file can be configured by the diagnostic = options in the run config file. In general, this ascii file will contain one line per simulation timestep, with the columns determined by the selected data reducers. These include, for example, simple scalar values like overall plasma mass, number of velocity space blocks in the simulation, charge balance, divergence of magnetic field etc.
  3. VLSV files are the main output data products. These files come in multiple varieties:
  • Restart files. These contain the whole simulation state, including the full phase space density, all relevant electromagnetic fields and metadata. Simulations can be restarted from them (hence the name), but they tend to be very heavy, easily multiple terabytes in size for production runs.
  • Bulk files. In these, reduced spatial simulation data is written for further scientific analysis. Usually, this includes moments of the distribution functions and electromagnetic fields, but can also contain much more complex data reducer operators, as listed below. It is also possible to configure a subset of the velocity distribution functions to be written for further analysis.

The VLSV file format

The VLSV library is used to write this versatile container format. Analysator can be used to load and handle these files in python.

The file format is optimized for parallel write performance: Data is dumped to disk in the same memory structure as it is in the Vlasiator simulation, as binary blobs. Once all data is written, an XML footer that describes the data gets added to the end.

An example XML footer might look like this:

<VLSV>
   <MESH arraysize="208101" datasize="8" datatype="uint" max_refinement_level="1" name="SpatialGrid" type="amr_ucd" vectorsize="1" xperiodic="no" yperiodic="no" zperiodic="no">989580</MESH>
   <MESH arraysize="652800" datasize="8" datatype="uint" name="fsgrid" type="multi_ucd" vectorsize="1" xperiodic="no" yperiodic="no" zperiodic="no">4011008</MESH>
   <PARAMETER arraysize="1" datasize="8" datatype="float" name="time" vectorsize="1">989488</PARAMETER>
   <PARAMETER arraysize="1" datasize="8" datatype="float" name="dt" vectorsize="1">989496</PARAMETER>
   <VARIABLE arraysize="123544" datasize="8" datatype="uint" mesh="SpatialGrid" name="CellID" vectorsize="1">1136</VARIABLE>
   <VARIABLE arraysize="652800" datasize="8" datatype="float" mesh="fsgrid" name="fg_b" unit="T" unitConversion="1.0" unitLaTeX="$\mathrm{T}$" variableLaTeX="$B$" vectorsize="3">9558184</VARIABLE>
</VLSV>

Each XML tag describes one dataset in the file, with arraysize, datatype, datasize and vectorsize describing the array. The XML tag's content contains the byte offset in the file, where this dataset's raw binary data lies.

The two most important tag types are PARAMETER, for single numbers describing the file as a whole, such as resolutions, timesteps etc., and VARIABLE, for spatially varying data reducer data maps.

Additional metadata is often added to the datasets, such as their physical units, LaTeX formatted plotting hints, etc.

Spatial ordering: Vlasov- vs. FSGrid vs. Velocity space variables

Note that the XML tags in the file do not yet give sufficient information to describe the spatial structure of the variable arrays.

Simulation data reducers

Clone this wiki locally