Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Utilities page to readthedocs #69

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,7 @@ Usage
:hidden:

usage/how_to_run
usage/utilities

Development
-----------
Expand Down
3 changes: 1 addition & 2 deletions docs/source/install/dependencies.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,4 @@ Optional dependencies include:
- `OpenMP 3.1+ <https://www.openmp.org>`__: for threaded CPU execution
- `CCache <https://ccache.dev>`__: to speed up rebuilds (needs 3.7.9+ for CUDA)
- `Ninja <https://ninja-build.org>`__: for faster parallel compiles
- `Parallel HDF5 <https://github.com/HDFGroup/hdf5>`__: to write to HDF5. Parallel HDF5 is required, and modules with pre-built Parallel HDF5 installations are
available on most HPC machines.
- `Parallel HDF5 <https://github.com/HDFGroup/hdf5>`__: to write to HDF5. Parallel HDF5 is required, and modules with pre-built Parallel HDF5 installations are available on most HPC machines.
188 changes: 188 additions & 0 deletions docs/source/usage/utilities.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,188 @@
Utilities
#########

Output Format
=============
ExaEpi writes data on two levels: individual data and community data.

Individual data fields
----------------------
Individual datasets have more fields written on day 0 that are constant across the simulation.
Data is either written as an ``int`` or a ``real``.

- Day 0 only (constant)

- Int data:

- ``age_group``: Ranges from 0 to 4, indicating age group of agent. Age ranges are 0-5, 5-17, 18-29, 30-64, and 65+, respectively.

- ``family``: ID of family of agent.

- ``home_i``: x-coordinate of home location on AMReX grid. Can be used to match particle to home community.

- ``home_j``: y-coordinate of home location on AMReX grid. Can be used to match particle to home community.

- ``work_i``: x-coordinate of work location on AMReX grid. Can be used to match particle to work community.

- ``work_j``: y-coordinate of work location on AMReX grid. Can be used to match particle to work community.

- ``nborhood``: Home neighborhood.

- ``school``: Ranges from -1 to 5, indicating type of school. -1 indicates not of school age, 0 indicates of school age but not in school, 1 indicates high school, 2 indicates middle school, and 3-5 indicate different elementary schools.

- ``work_nborhood``: Work neighborhood.

- ``strain``: Marks which strain of disease (used for multiple diseases).

- Real data:

- ``incubation_period``: (currently misleadingly named) Latent period length, i.e. time between exposure and infectiousness.

- ``infectious_period``: Infections period length, i.e. time during which agent can spread disease

- ``symptomdev_period``: Symptom development period length (known as incubation period in some contexts), i.e. time between exposure and symptoms appearing.

- Every day (time-varying)

- Int data

- ``withdrawn``: Marks whether the agent is withdrawn. 0 for not withdrawn, 1 for withdrawn.

- ``status``: Ranges from 0 to 4, indicating disease status. Corresponds to never infected, infected, immune, susceptible, and dead, respectively.

- ``symptomatic``: Ranges from 0 to 2, indicating symptomaticity. Corresponds to presymptomatic (not yet symptomatic but will be), symptomatic, and a symptomatic.

- Real data

- ``treatment_timer``: Time that agent has been in hospital.

- ``disease_counter``: Time that agent has been infected.

- ``infection_prob``: Probability that agent is infected at end of day.

Community data fields
---------------------
All data is written as a real, although all data should be integers. If any community does not exist, its ``unit``, ``FIPS``, ``Tract``, and ``comm`` values will be -1, while all agent counts will be 0.

- ``total``: Total population in community.

- ``never_infected``: Number of never-infected agents in community.

- ``infected``: Number of infected agents in community.

- ``immune``: Number of immune (recently recovered from disease) agents in community.

- ``susceptible``: Number of susceptible (recovered from disease long enough ago to be able to be infected again) agents in community.

- ``unit``: Unit at cell.

- ``FIPS``: FIPS code (county code) of community.

- ``Tract``: Census tract of community.

- ``comm``: Community ID (unique).

Reading output
==============
Output is read in different ways, depending on I/O format selected.
If ``-DAMReX_HDF5`` flag is set to ``TRUE`` when compiling, HDF5 will be used.
Otherwise, the default AMReX plotfile will be written.

AMReX plotfile
--------------
Any library that has functionality for AMReX plotfiles can be used:
see `AMReX documentation <https://amrex-codes.github.io/amrex/docs_html/Visualization_Chapter.html#chap-visualization>`__ for more.

We suggest using yt with Python:

.. code-block:: python

import yt
from yt.frontends.boxlib.data_structures import AMReXDataset

ds = AMReXDataset("plt00000")
ad = ds.all_data()

community_totals = ad["total"]
statuses = ad["particle_status"]
infection_probs = ad["particle_infection_prob"]

Note that individual-level fields are prefixed by ``particle_``, while community-level fields are as-is.

HDF5
----
The individual-level data is contained in files at ``plt*****/agents/agents.h5``, while the community-level data is conatined in files ``plt*****.h5``. Note that string attributes are written as byte strings, i.e. ``b'FIPS'`` instead of ``"FIPS"``.

Individual-level data
^^^^^^^^^^^^^^^^^^^^^
The data itself is located in the ``level_0`` group, separated by datatype. ``int`` data is at ``level_0/data:datatype=0`` and the ``real`` data is at ``level_0/data:datatype=1``.

Data is written agent-by-agent, so all fields for a given agent will be contiguous. The first two ``int`` components for any agent are its CPU and its particle ID on the CPU, while the first two ``real`` components for any agent are its x- and y-positions on the AMReX grid.

The names of other ``int`` components can be accessed as an attribute of the file named ``"int_component_x"``, where x ranges from 2 to the maximum number of components per agent. Not counting the first two non-labelled components, the number of ``int`` components is an attribute of the file named ``"num_component_int"``, so the total number of components is ``num_component_int + 2``. Thus, the size of the ``level_0/data:datatype=0`` dataset is N * (num_component_int + 2), where N is the number of agents.

The same scheme applies to ``real`` data, with every instance of ``int`` replaced with ``real``.

Community-level data
^^^^^^^^^^^^^^^^^^^^
Each field is written as its own dataset in the ``level_0`` group. The number of datasets can be accessed as an attribute of the file named ``"num_components"``, and the name of each dataset can be accessed as the attribute named ``"component_x"`` where x ranges from 0 to ``num_components - 1``, and the corresponding dataset is found at ``level_0/data:datatype=x``. The order of the data is the same within a plotfile, so one can match the community IDs to the FIPS codes by index, for example.

Python example
^^^^^^^^^^^^^^
We provide an example of parsing files using h5py and python:

.. code-block:: python

import h5py

with h5py.File("plt00000.h5", "r") as comm_file:
# Get the datatype number for each field
community_indicies = {}
for i in range(comm_file.attrs["num_components"][0]):
community_indices[comm_file.attrs["component_" + str(i)]] = i
# community_indices is:
# {b'total': 0, b'never_infected': 1, b'infected': 2, b'immune': 3,
# b'susceptible': 4, b'unit': 5, b'FIPS': 6, b'Tract': 7, b'comm': 8}

# For example, get the totals
community_totals = comm_file["level_0"]["data:datatype=" + str(community_indicies[b'total'])]

with h5py.File("plt00000/agents/agents.h5", "r") as agent_file:
# Get the index for each int field
int_indicies = {}
num_int_comps = agent_file.attrs["num_component_int"][0] + 2
for i in range(2, num_int_comps):
int_indicies[agent_file.attrs["int_component_" + str(i)]] = i

int_data = agent_file["level_0"]["data:datatype=0"][()].reshape(-1, num_int_comps)
statuses = int_data[:, int_indicies[b'status']]

# Get the index for each real field
real_indicies = {}
num_real_comps = agent_file.attrs["num_component_real"][0] + 2
for i in range(2, num_real_comps):
real_indicies[agent_file.attrs["real_component_" + str(i)]] = i

real_data = agent_file["level_0"]["data:datatype=1"][()].reshape(-1, num_real_comps)
statuses = real_data[:, real_indicies[b'infection_prob']]

Processing scripts
==================
We provide a few processing scripts for visualization and data aggregation.

Visualization
-------------
There are some visualization scripts located in ``utilities/plotMovie``.
Primarily, ``generate_frames.py`` allows one to generate still frames from
either native AMReX output or HDF5 output, plotting infections (or an arbitrary
function on community-level data) spatially across a given map (as a .shx file).
Some of these shapefiles (for USA, California, and the Bay Area) are located in
the ``data/`` directory. See the script file and its comments for further details.

Data Processing
---------------
To aggregate data written to HDF5 output, whether by county or census tract,
we have some functions designed to aid processing in ``utilities/hfd5_process.py``.
These functions allow grouping of data and counting various conditions of individuals,
such as counting total number of infections in each census tract in each age group
over all days in a simulation. See the script file and its comments for further details.
Loading
Loading