diff --git a/man/read_data.Rd b/man/read_data.Rd
new file mode 100644
index 00000000..87fcdfdc
--- /dev/null
+++ b/man/read_data.Rd
@@ -0,0 +1,50 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/read_data.R
+\name{read_data}
+\alias{read_data}
+\title{Read in the dataset of incident case counts}
+\usage{
+read_data(
+  data_path,
+  disease = c("COVID-19", "Influenza", "test"),
+  state_abb,
+  report_date,
+  max_reference_date,
+  min_reference_date
+)
+}
+\arguments{
+\item{data_path}{The path to the local file. This could contain a glob and
+must be in parquet format.}
+
+\item{disease}{One of "COVID-19" or "Influenza"}
+
+\item{state_abb}{A two-letter uppercase abbreviation}
+
+\item{report_date}{The desired single report date}
+
+\item{max_reference_date, min_reference_date}{The first and last reference
+dates, inclusive, of the timeseries}
+}
+\value{
+A dataframe with one or more rows and columns \code{report_date},
+\code{reference_date}, \code{state_abb}, \code{confirm}
+}
+\description{
+Each row of the table corresponds to a single facilities' cases for a
+reference-date/report-date/disease tuple. We want to aggregate these counts
+to the level of geographic aggregate/report-date/reference-date/disease.
+}
+\details{
+We handle two distinct cases for geographic aggregates:
+\enumerate{
+\item A single state: Subset to facilities \strong{in that state only} and aggregate
+up to the state level 2. The US overall: Aggregate over all facilities
+without any subsetting
+}
+
+Note that we do \emph{not} apply exclusions here. The exclusions are applied
+later, after the aggregations. That means that for the US overall, we
+aggregate over points that might potentially be excluded at the state level.
+Our recourse in this case is to exclude the US overall aggregate point.
+}