From 85f47fc0c2c8d5bbc2b8e4f65dd9914a213a87bd Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Wed, 20 Dec 2023 08:36:27 -0500
Subject: [PATCH 01/11] docs: add overview to core documentation

---
 docs/index.md | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/docs/index.md b/docs/index.md
index 0c6266a3..8fd8fb21 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,3 +1,14 @@
 # Core Functionality
 
-AutoRA includes core functionality for running AutoRA experiments. 
+AutoRA includes core functionality for running AutoRA experiments organized into these submodules:
+
+- `autora.state`, which underpins the unified `State` interface for writing experimentalists, experiment runners and 
+  theorists
+- `autora.serializer`, utilities for saving and loading `States`
+- `autora.workflow`, command line tools for running experimentalists, experiment runners and theorists
+- `autora.variable`, for representing experimental metadata describing the type and domain of variables
+- `autora.utils`, utilities and helper functions not specifically linked to any specific core functionality  
+
+It also provides some basic experimentalists in the `autora.experimentalist` submodule. However, most 
+genuinely useful experimentalists and theorists are provided as optional dependencies to the `autora` package.
+

From 7fb524ace017d965cd124c568ec9b3c78ece906e Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Wed, 20 Dec 2023 12:23:57 -0500
Subject: [PATCH 02/11] docs: add State mechanism notebook

---
 docs/The State Mechanism.ipynb | 922 +++++++++++++++++++++++++++++++++
 1 file changed, 922 insertions(+)
 create mode 100644 docs/The State Mechanism.ipynb

diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb
new file mode 100644
index 00000000..789dbbde
--- /dev/null
+++ b/docs/The State Mechanism.ipynb	
@@ -0,0 +1,922 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# The `State` mechanism"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A `State` is an object representing data from an experiment, like the conditions, observed experiment data and models. \n",
+    "In the AutoRA framework, experimentalists, experiment runners and theorists are functions which \n",
+    "- operate on `States` and \n",
+    "- return `States`.\n",
+    "\n",
+    "The `autora.state` submodule provides classes and functions to help build these functions. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Basic Aim: $f(S) = S^\\prime$\n",
+    "\n",
+    "The AutoRA State mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n",
+    "- Data – stored as an immutable `State`\n",
+    "- Procedures – functions which act on `State` objects to add new data and return a new `State`.\n",
+    "\n",
+    "Procedures generate data. Some common procedures which appear in AutoRA experiments, and the data they produce are:\n",
+    "\n",
+    "| Procedure         | Data            |\n",
+    "|-------------------|-----------------|\n",
+    "| Experimentalist   | Conditions      |\n",
+    "| Experiment Runner | Experiment Data |\n",
+    "| Theorist          | Model           |\n",
+    "\n",
+    "The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n",
+    "- Takes in existing Data $S$\n",
+    "- Adds new data $\\Delta S$\n",
+    "- Returns an updated state of the Data $S^\\prime$  \n",
+    "\n",
+    "$$\n",
+    "\\begin{aligned}\n",
+    "f(S) &= S + \\Delta S \\\\\n",
+    "     &= S^\\prime\n",
+    "\\end{aligned}\n",
+    "$$\n",
+    "\n",
+    "AutoRA includes:\n",
+    "- Classes to represent the Data $S$ – the `State` object (and the derived `StandardState` – a pre-defined version \n",
+    "with the common fields needed for cyclical experiments)  \n",
+    "- Functions to make it easier to write procedures of the form $f(S) = S^\\prime$"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "import autora.state\n",
+    "from autora.variable import VariableCollection, Variable"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## `State` objects\n",
+    "TODO: write this part"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s_0 = autora.state.StandardState(\n",
+    "    variables=VariableCollection(\n",
+    "        independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n",
+    "        dependent_variables=[Variable(\"y\")]\n",
+    "    ),\n",
+    "    conditions=pd.DataFrame({\"x\":[]}),\n",
+    "    experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n",
+    "    models=[]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## `Variable` and `VariableCollection`\n",
+    "TODO: move this to a different file"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Making a function of the form $f(S) = S^\\prime$\n",
+    "\n",
+    "There are several equivalent ways to make a function of the form $f(S) = S^\\prime$. These are (from \n",
+    "simplest but most restrictive, to most complex but with the greatest flexibility):\n",
+    "- Use the `autora.state.on_state` decorator\n",
+    "- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`\n",
+    "\n",
+    "There are also special cases, like the `autora.state.estimator_on_state` wrapper for `scikit-learn` estimators.  \n",
+    "\n",
+    "Say you have a function to generate new experimental conditions, given some variables."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def generate_conditions(variables, num_samples=5, random_state=42):\n",
+    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
+    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
+    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
+    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
+    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
+    "    return conditions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are several equivalent ways to make this into a function of the form $f(S) = S^\\prime$. These are (from \n",
+    "simplest but most restrictive, to most complex but with the greatest flexibility):\n",
+    "- Decorate it with `autora.state.on_state`\n",
+    "- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Use the `autora.state.on_state` decorator\n",
+    "\n",
+    "`autora.state.on_state` is a wrapper for functions which changes their arguments. \n",
+    "\n",
+    "The most concise way to use it is as a decorator on the function where it is defined. You can specify how the \n",
+    "returned values should be mapped to fields on the `State` using the `@autora.state.on_state(output=...)` argument."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
+       "0  5.479121\n",
+       "1 -1.222431\n",
+       "2  7.171958\n",
+       "3  3.947361\n",
+       "4 -8.116453, experiment_data=Empty DataFrame\n",
+       "Columns: [x, y]\n",
+       "Index: [], models=[])"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "@autora.state.on_state(output=[\"conditions\"])\n",
+    "def generate_conditions(variables, num_samples=5, random_state=42):\n",
+    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
+    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
+    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
+    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
+    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
+    "    return conditions\n",
+    "\n",
+    "# Example\n",
+    "generate_conditions(s_0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Fully equivalently, you can modify `generate_conditions` to return a dictionary of values with the appropriate field \n",
+    "names from `State`: "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
+       "0  5.479121\n",
+       "1 -1.222431\n",
+       "2  7.171958\n",
+       "3  3.947361\n",
+       "4 -8.116453, experiment_data=Empty DataFrame\n",
+       "Columns: [x, y]\n",
+       "Index: [], models=[])"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "@autora.state.on_state\n",
+    "def generate_conditions(variables, num_samples=5, random_state=42):\n",
+    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
+    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
+    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
+    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
+    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
+    "    return {\"conditions\": conditions}                       # Return a dictionary with the appropriate name\n",
+    "\n",
+    "# Example\n",
+    "generate_conditions(s_0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Deep dive: `autora.state_on_state`\n",
+    "The decorator notation is equivalent to the following:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
+       "0  1.521127\n",
+       "1  3.362120\n",
+       "2  1.065391\n",
+       "3 -5.844244\n",
+       "4 -6.444732, experiment_data=Empty DataFrame\n",
+       "Columns: [x, y]\n",
+       "Index: [], models=[])"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def generate_conditions_inner(variables, num_samples=5, random_state=42):\n",
+    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
+    "    result = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
+    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
+    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
+    "        result[iv.name] = c                             #  - Save the new values to the DataFrame\n",
+    "    return result\n",
+    "\n",
+    "generate_conditions = autora.state.on_state(generate_conditions_inner, output=[\"conditions\"])\n",
+    "\n",
+    "# Example\n",
+    "generate_conditions(s_0, random_state=180)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "During the `generate_conditions(s_0, random_state=180)` call, `autora.state.on_state` does the following:\n",
+    "- Inspects the signature of `generate_conditions_inner` to see which variables are required – in this case:\n",
+    "    - `variables`, \n",
+    "    - `num_samples` and \n",
+    "    - `random_state`.\n",
+    "- Looks for fields with those names on `s_0`:\n",
+    "    - Finds a field called `variables`.\n",
+    "- Calls `generate_conditions_inner` with those fields as arguments, plus any arguments specified in the \n",
+    "`generate_conditions` call (here just `random_state`)\n",
+    "- Converts the returned value `result` into `Delta(conditions=result)` using the name specified in `output=[\"conditions\"]`\n",
+    "- Returns `s_0 + Delta(conditions=result)`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Fully equivalently to using the `autora.state.on_state` wrapper, you can construct a function which takes and returns \n",
+    "`State` objects. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
+       "0  5.479121\n",
+       "1 -1.222431\n",
+       "2  7.171958\n",
+       "3  3.947361\n",
+       "4 -8.116453, experiment_data=Empty DataFrame\n",
+       "Columns: [x, y]\n",
+       "Index: [], models=[])"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):\n",
+    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
+    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
+    "    for iv in state.variables.independent_variables:        # Loop through the independent variables\n",
+    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
+    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
+    "    delta = autora.state.Delta(conditions=conditions)       # Construct a new Delta representing the updated data\n",
+    "    new_state = state + delta                               # Construct a new state, \"adding\" the Delta\n",
+    "    return new_state\n",
+    "\n",
+    "# Example\n",
+    "generate_conditions(s_0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Special case: `autora.state.estimator_on_state` for `scikit-learn` estimators\n",
+    "\n",
+    "The \"theorist\" component in an AutoRA cycle is often a `scikit-learn` compatible estimator which implements a curve \n",
+    "fitting function like a linear, logistic or symbolic regression. `scikit-learn` estimators are classes, and they have\n",
+    " a specific wrapper: `autora.state.estimator_on_state`, used as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Returned models: [LinearRegression()]\n",
+      "Last model's coefficients: y = [3.49729147] x + [1.99930059]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from sklearn.linear_model import LinearRegression\n",
+    "\n",
+    "\n",
+    "estimator = LinearRegression(fit_intercept=True)       # Initialize the regressor with all its parameters\n",
+    "theorist = autora.state.estimator_on_state(estimator)  # Wrap the estimator\n",
+    "\n",
+    "\n",
+    "# Example\n",
+    "variables = s_0.variables          # Reuse the variables from before \n",
+    "xs = np.linspace(-10, 10, 101)     # Make an array of x-values \n",
+    "noise = np.random.default_rng(179).normal(0., 0.5, xs.shape)  # Gaussian noise\n",
+    "ys = (3.5 * xs + 2. + noise)       # Calculate y = 3.5 x + 2 + noise  \n",
+    "\n",
+    "s_1 = autora.state.StandardState(  # Initialize the State with those data\n",
+    "    variables=variables,\n",
+    "    experiment_data=pd.DataFrame({\"x\":xs, \"y\":ys}),\n",
+    ")\n",
+    "s_1_prime = theorist(s_1)         # Run the theorist\n",
+    "print(f\"Returned models: \"\n",
+    "      f\"{s_1_prime.models}\")      \n",
+    "print(f\"Last model's coefficients: \"\n",
+    "      f\"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "During the `theorist(s_1)` call, `autora.state.estimator_on_state` does the following:\n",
+    "- Gets the names of the independent and dependent variables from the `s_1.variables`\n",
+    "- Gathers the values of those variables from `s_1.experiment_data`\n",
+    "- Passes those values to the `LinearRegression().fit(x, y)` method\n",
+    "- Constructs `Delta(models=[LinearRegression()])` with the fitted regressor\n",
+    "- Returns `s_1 + Delta(models=[LinearRegression()])`"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example\n",
+    "Sebastian wishes to run an experiment. He knows:\n",
+    "- which variables he wants to investigate: \n",
+    "    - $x$, the independent variable, is a number in the range $-10$ to $10$,\n",
+    "    - $y$, the dependent variable, is a number with an unknown range.\n",
+    "\n",
+    "and will use this knowledge to **initialize a `State` object**.\n",
+    "\n",
+    "He planned procedures for:\n",
+    "- making a list of conditions to observe, \n",
+    "- running the experiment, given the list of conditions,\n",
+    "- generating a model to describe the data\n",
+    "\n",
+    "and he will write each of these down as a **function**."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Initialize the `State` object\n",
+    "Sebastian writes down the current State of his knowledge about the problem in a `State` object.\n",
+    "\n",
+    "However, he doesn't yet know which conditions to look at – those will be generated by his procedures. \n",
+    "Nor does he have any experiment data. So he initializes DataFrames to hold those results, but \n",
+    "leaves both empty. Likewise, he doesn't have any models right now, so he creates an empty list for those."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "from autora.state import StandardState\n",
+    "from autora.variable import VariableCollection, Variable\n",
+    "\n",
+    "s_0 = StandardState(\n",
+    "    variables=VariableCollection(\n",
+    "        independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n",
+    "        dependent_variables=[Variable(\"y\")]\n",
+    "    ),\n",
+    "    conditions=pd.DataFrame({\"x\":[]}),\n",
+    "    experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n",
+    "    models=[]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Write \"experimentalist\" procedure for generating conditions\n",
+    "\n",
+    "Sebastian writes down the procedure for making a list of conditions to observe. He writes this as a function \n",
+    "which acts on the things he knows from the state, and returns a dataframe with the new conditions. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>x</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>5.479121</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>-1.222431</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>7.171958</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>3.947361</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>-8.116453</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "          x\n",
+       "0  5.479121\n",
+       "1 -1.222431\n",
+       "2  7.171958\n",
+       "3  3.947361\n",
+       "4 -8.116453"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "\n",
+    "\n",
+    "def generate_conditions(variables, num_samples=5, random_state=42):\n",
+    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
+    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
+    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
+    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
+    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
+    "    return conditions\n",
+    "\n",
+    "# Example\n",
+    "generate_conditions(s_0.variables)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Finally, he \"wraps\" the `generate_conditions` function using a utility from the `autora.state` submodule, to make his\n",
+    " finished experimentalist. The purpose of the wrapper is to turn the basic function he wrote into one which accepts \n",
+    " a `State` object as input and returns a `State` object."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
+       "0  5.479121\n",
+       "1 -1.222431\n",
+       "2  7.171958\n",
+       "3  3.947361\n",
+       "4 -8.116453, experiment_data=Empty DataFrame\n",
+       "Columns: [x, y]\n",
+       "Index: [], models=[])"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from autora.state import on_state\n",
+    "\n",
+    "\n",
+    "experimentalist = on_state(  # Utility which adds the `State` functionality to a function\n",
+    "    generate_conditions,     # Pass in the basic `generate_conditions` function\n",
+    "    output=[\"conditions\"]    # Say that the value returned from `generate_conditions` should be \n",
+    "                             # used as `conditions` on the State\n",
+    ")\n",
+    "\n",
+    "# Example\n",
+    "experimentalist(s_0)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Write \"experiment runner\" procedure for gathering observations "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autora.state import on_state\n",
+    "\n",
+    "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n",
+    "s_1 = experimentalist(s_0, random_state=42)\n",
+    "s_1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\n",
+    "## Theoretical Overview\n",
+    "\n",
+    "The fundamental idea is this:\n",
+    "- We define a \"state\" object $S$ which can be modified with a \"delta\" (a new result) $\\Delta S$.\n",
+    "- A new state at some point $i+1$ is $$S_{i+1} = S_i + \\Delta S_{i+1}$$\n",
+    "- The cycle state after $n$ steps is thus $$S_n = S_{0} +  \\sum^{n}_{i=1} \\Delta S_{i}$$\n",
+    "\n",
+    "To represent $S$ and $\\Delta S$ in code, you can use `autora.state.State` and `autora.state.Delta`\n",
+    "respectively. To operate on these, we define functions.\n",
+    "\n",
+    "- Each operation in an AER cycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a\n",
+    "function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not.\n",
+    "  $$ f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta S_{i+1}$$\n",
+    "- There is a wrapper function $w$ (`autora.state.wrap_to_use_state`) which changes the signature of $f$ to\n",
+    "require $S$ and aggregates the resulting $\\Delta S_{i+1}$\n",
+    "  $$w\\left[f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta\n",
+    "S_{i+1}\\right] \\rightarrow \\left[ f^\\prime(S_i, a_0, ..., a_m) \\rightarrow S_{i} + \\Delta\n",
+    "S_{i+1} = S_{i+1}\\right]$$\n",
+    "\n",
+    "- Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\\prime$, the full AER cycle can\n",
+    "then be represented as:\n",
+    "  $$S_n = f_n^\\prime(...f_2^\\prime(f_1^\\prime(S_0)))$$\n",
+    "\n",
+    "There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we\n",
+    "can define a full AER cycle using python notation as shown in the following example."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Example\n",
+    "\n",
+    "First initialize the State. In this case, we use the pre-defined `StandardState` which implements the standard AER\n",
+    "naming convention.\n",
+    "There are two variables `x` with a range [-10, 10] and `y` with an unspecified range."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autora.state import StandardState\n",
+    "from autora.variable import VariableCollection, Variable\n",
+    "\n",
+    "s_0 = StandardState(\n",
+    "    variables=VariableCollection(\n",
+    "        independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n",
+    "        dependent_variables=[Variable(\"y\")]\n",
+    "    )\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Specify the experimentalist. Use a standard function `random_pool`.\n",
+    "This gets 5 independent random samples (by default, configurable using an argument)\n",
+    "from the value_range of the independent variables, and returns them in a DataFrame.\n",
+    "To make this work as a function on the State objects, we wrap it in the `on_state` function."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autora.experimentalist.random_ import random_pool\n",
+    "from autora.state import on_state\n",
+    "\n",
+    "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n",
+    "s_1 = experimentalist(s_0, random_state=42)\n",
+    "s_1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Specify the experiment runner. This calculates a linear function, adds noise, assigns the value to the `y` column\n",
+    " in a new DataFrame."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autora.state import on_state\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "\n",
+    "\n",
+    "@on_state(output=[\"experiment_data\"])\n",
+    "def experiment_runner(conditions: pd.DataFrame, c=[2, 4], random_state = None):\n",
+    "    rng = np.random.default_rng(random_state)\n",
+    "    x = conditions[\"x\"]\n",
+    "    noise = rng.normal(0, 1, len(x))\n",
+    "    y = c[0] + (c[1] * x) + noise\n",
+    "    observations = conditions.assign(y = y)\n",
+    "    return observations\n",
+    "\n",
+    "# Which does the following:\n",
+    "experiment_runner(s_1, random_state=43)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "A completely analogous definition, using the separate `@inputs_from_state` and `@outputs_to_delta(...)` decorators\n",
+    "rather than the combined `@on_state(...)` decorator would be:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autora.state import inputs_from_state, outputs_to_delta\n",
+    "\n",
+    "\n",
+    "@inputs_from_state\n",
+    "@outputs_to_delta(\"experiment_data\")\n",
+    "def experiment_runner_alt_1(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n",
+    "    x = conditions[\"x\"]\n",
+    "    rng = np.random.default_rng(random_state)\n",
+    "    noise = rng.normal(0, 1, len(x))\n",
+    "    y = c[0] + (c[1] * x) + noise\n",
+    "    xy = conditions.assign(y = y)\n",
+    "    return xy\n",
+    "\n",
+    "# Which does the following:\n",
+    "experiment_runner_alt_1(s_1, random_state=42)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Or alternatively:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def experiment_runner_alt_2_core(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n",
+    "    x = conditions[\"x\"]\n",
+    "    rng = np.random.default_rng(random_state)\n",
+    "    noise = rng.normal(0, 1, len(x))\n",
+    "    y = c[0] + (c[1] * x) + noise\n",
+    "    xy = conditions.assign(y = y)\n",
+    "    return xy\n",
+    "\n",
+    "experiment_runner_alt_2 = on_state(experiment_runner_alt_2_core, output=[\"experiment_data\"])\n",
+    "experiment_runner_alt_2(s_1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Specify a theorist, using a standard LinearRegression from scikit-learn."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.linear_model import LinearRegression\n",
+    "from autora.state import estimator_on_state\n",
+    "\n",
+    "theorist = estimator_on_state(LinearRegression(fit_intercept=True))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now we can run the theorist on the output from the experiment_runner,\n",
+    "which itself uses the output from the experimentalist."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "theorist(experiment_runner(experimentalist(s_0)))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If we like, we can run the experimentalist, experiment_runner and theorist ten times."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s_ = s_0\n",
+    "for i in range(10):\n",
+    "    s_ = experimentalist(s_, random_state=180+i)\n",
+    "    s_ = experiment_runner(s_, random_state=2*180+i)\n",
+    "    s_ = theorist(s_)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The experiment_data has 50 entries (10 cycles and 5 samples per cycle):"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "s_.experiment_data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The fitted coefficients are close to the original intercept = 2, gradient = 4"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "print(s_.model.intercept_, s_.model.coef_)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}

From 6b07b894e099b517b33866eb0589c34ff45c70a3 Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Wed, 20 Dec 2023 13:16:03 -0500
Subject: [PATCH 03/11] docs: update State Mechanism notebook withe basic
 information about the State

---
 docs/The State Mechanism.ipynb | 685 ++++++++++-----------------------
 1 file changed, 199 insertions(+), 486 deletions(-)

diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb
index 789dbbde..52e160f6 100644
--- a/docs/The State Mechanism.ipynb	
+++ b/docs/The State Mechanism.ipynb	
@@ -61,6 +61,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "from dataclasses import dataclass, field\n",
+    "\n",
     "import numpy as np\n",
     "import pandas as pd\n",
     "import autora.state\n",
@@ -72,7 +74,196 @@
    "metadata": {},
    "source": [
     "## `State` objects\n",
-    "TODO: write this part"
+    "\n",
+    "`State` objects contain metadata describing an experiment, and the data gathered during an experiment. Any `State` \n",
+    "object used in an AutoRA cycle will be a subclass of the `autora.state.State`, with the necessary fields specified. \n",
+    "(The `autora.state.StandardState` provides some sensible defaults.)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "@dataclass(frozen=True)\n",
+    "class BasicState(autora.state.State):\n",
+    "   data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={\"delta\": \"extend\"})\n",
+    "   \n",
+    "s = BasicState()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Because it is a python dataclass, the `State` fields can be accessed using attribute notation, for example:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "Empty DataFrame\n",
+       "Columns: []\n",
+       "Index: []"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "s.data  # an empty DataFrame with a column \"x\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`State` objects can be updated by adding `Delta` objects. A `Delta` represents new data, and is combined with the \n",
+    "existing data in the `State` object. The `State` itself is immutable by design, so adding a `Delta` to it creates a new \n",
+    "`State`."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "BasicState(data=   x  y\n",
+       "0  1  1)"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "s + autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]}))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "When carrying out this \"addition\", `s_0`: \n",
+    "- inspects the `Delta` it has been passed and finds any field names matching fields on `s_0`, in this case \n",
+    "`experiment_data`.\n",
+    "- For each matching field it combines the data in a way determined by the field's metadata. The key options are:\n",
+    "    - \"replace\" means that the data in the `Delta` object completely replace the data in the `State`,\n",
+    "    - \"extend\" means that the data in the `Delta` object are combined – for pandas DataFrames this means that the new\n",
+    "     data are concatenated to the bottom of the existing DataFrame.\n",
+    "    \n",
+    "    For full details on which options are available, see the documentation for the `autora.state` module. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>x</th>\n",
+       "      <th>y</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>1</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2</td>\n",
+       "      <td>2</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   x  y\n",
+       "0  1  1\n",
+       "1  2  2"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "(s + \n",
+    " autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]})) + \n",
+    " autora.state.Delta(data=pd.DataFrame({\"x\":[2], \"y\":[2]}))\n",
+    " ).data  # Access just the experiment_data on the updated State"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### `StandardState`\n",
+    "\n",
+    "For typical AutoRA experiments, you can use the `autora.state.StandardState` object, which has fields for variables, \n",
+    "conditions, experiment data and models. You can initialize a `StandardState` object like this:"
    ]
   },
   {
@@ -100,6 +291,13 @@
     "TODO: move this to a different file"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -412,491 +610,6 @@
     "- Constructs `Delta(models=[LinearRegression()])` with the fitted regressor\n",
     "- Returns `s_1 + Delta(models=[LinearRegression()])`"
    ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Example\n",
-    "Sebastian wishes to run an experiment. He knows:\n",
-    "- which variables he wants to investigate: \n",
-    "    - $x$, the independent variable, is a number in the range $-10$ to $10$,\n",
-    "    - $y$, the dependent variable, is a number with an unknown range.\n",
-    "\n",
-    "and will use this knowledge to **initialize a `State` object**.\n",
-    "\n",
-    "He planned procedures for:\n",
-    "- making a list of conditions to observe, \n",
-    "- running the experiment, given the list of conditions,\n",
-    "- generating a model to describe the data\n",
-    "\n",
-    "and he will write each of these down as a **function**."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Initialize the `State` object\n",
-    "Sebastian writes down the current State of his knowledge about the problem in a `State` object.\n",
-    "\n",
-    "However, he doesn't yet know which conditions to look at – those will be generated by his procedures. \n",
-    "Nor does he have any experiment data. So he initializes DataFrames to hold those results, but \n",
-    "leaves both empty. Likewise, he doesn't have any models right now, so he creates an empty list for those."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "from autora.state import StandardState\n",
-    "from autora.variable import VariableCollection, Variable\n",
-    "\n",
-    "s_0 = StandardState(\n",
-    "    variables=VariableCollection(\n",
-    "        independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n",
-    "        dependent_variables=[Variable(\"y\")]\n",
-    "    ),\n",
-    "    conditions=pd.DataFrame({\"x\":[]}),\n",
-    "    experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n",
-    "    models=[]\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Write \"experimentalist\" procedure for generating conditions\n",
-    "\n",
-    "Sebastian writes down the procedure for making a list of conditions to observe. He writes this as a function \n",
-    "which acts on the things he knows from the state, and returns a dataframe with the new conditions. "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>x</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <th>0</th>\n",
-       "      <td>5.479121</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>1</th>\n",
-       "      <td>-1.222431</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>2</th>\n",
-       "      <td>7.171958</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>3</th>\n",
-       "      <td>3.947361</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <th>4</th>\n",
-       "      <td>-8.116453</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "          x\n",
-       "0  5.479121\n",
-       "1 -1.222431\n",
-       "2  7.171958\n",
-       "3  3.947361\n",
-       "4 -8.116453"
-      ]
-     },
-     "execution_count": null,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "import numpy as np\n",
-    "\n",
-    "\n",
-    "def generate_conditions(variables, num_samples=5, random_state=42):\n",
-    "    rng = np.random.default_rng(random_state)               # Initialize a random number generator\n",
-    "    conditions = pd.DataFrame()                             # Create a DataFrame to hold the results  \n",
-    "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
-    "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
-    "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
-    "    return conditions\n",
-    "\n",
-    "# Example\n",
-    "generate_conditions(s_0.variables)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Finally, he \"wraps\" the `generate_conditions` function using a utility from the `autora.state` submodule, to make his\n",
-    " finished experimentalist. The purpose of the wrapper is to turn the basic function he wrote into one which accepts \n",
-    " a `State` object as input and returns a `State` object."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions=          x\n",
-       "0  5.479121\n",
-       "1 -1.222431\n",
-       "2  7.171958\n",
-       "3  3.947361\n",
-       "4 -8.116453, experiment_data=Empty DataFrame\n",
-       "Columns: [x, y]\n",
-       "Index: [], models=[])"
-      ]
-     },
-     "execution_count": null,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "from autora.state import on_state\n",
-    "\n",
-    "\n",
-    "experimentalist = on_state(  # Utility which adds the `State` functionality to a function\n",
-    "    generate_conditions,     # Pass in the basic `generate_conditions` function\n",
-    "    output=[\"conditions\"]    # Say that the value returned from `generate_conditions` should be \n",
-    "                             # used as `conditions` on the State\n",
-    ")\n",
-    "\n",
-    "# Example\n",
-    "experimentalist(s_0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Write \"experiment runner\" procedure for gathering observations "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from autora.state import on_state\n",
-    "\n",
-    "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n",
-    "s_1 = experimentalist(s_0, random_state=42)\n",
-    "s_1"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "\n",
-    "## Theoretical Overview\n",
-    "\n",
-    "The fundamental idea is this:\n",
-    "- We define a \"state\" object $S$ which can be modified with a \"delta\" (a new result) $\\Delta S$.\n",
-    "- A new state at some point $i+1$ is $$S_{i+1} = S_i + \\Delta S_{i+1}$$\n",
-    "- The cycle state after $n$ steps is thus $$S_n = S_{0} +  \\sum^{n}_{i=1} \\Delta S_{i}$$\n",
-    "\n",
-    "To represent $S$ and $\\Delta S$ in code, you can use `autora.state.State` and `autora.state.Delta`\n",
-    "respectively. To operate on these, we define functions.\n",
-    "\n",
-    "- Each operation in an AER cycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a\n",
-    "function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not.\n",
-    "  $$ f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta S_{i+1}$$\n",
-    "- There is a wrapper function $w$ (`autora.state.wrap_to_use_state`) which changes the signature of $f$ to\n",
-    "require $S$ and aggregates the resulting $\\Delta S_{i+1}$\n",
-    "  $$w\\left[f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta\n",
-    "S_{i+1}\\right] \\rightarrow \\left[ f^\\prime(S_i, a_0, ..., a_m) \\rightarrow S_{i} + \\Delta\n",
-    "S_{i+1} = S_{i+1}\\right]$$\n",
-    "\n",
-    "- Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\\prime$, the full AER cycle can\n",
-    "then be represented as:\n",
-    "  $$S_n = f_n^\\prime(...f_2^\\prime(f_1^\\prime(S_0)))$$\n",
-    "\n",
-    "There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we\n",
-    "can define a full AER cycle using python notation as shown in the following example."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Example\n",
-    "\n",
-    "First initialize the State. In this case, we use the pre-defined `StandardState` which implements the standard AER\n",
-    "naming convention.\n",
-    "There are two variables `x` with a range [-10, 10] and `y` with an unspecified range."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from autora.state import StandardState\n",
-    "from autora.variable import VariableCollection, Variable\n",
-    "\n",
-    "s_0 = StandardState(\n",
-    "    variables=VariableCollection(\n",
-    "        independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n",
-    "        dependent_variables=[Variable(\"y\")]\n",
-    "    )\n",
-    ")"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Specify the experimentalist. Use a standard function `random_pool`.\n",
-    "This gets 5 independent random samples (by default, configurable using an argument)\n",
-    "from the value_range of the independent variables, and returns them in a DataFrame.\n",
-    "To make this work as a function on the State objects, we wrap it in the `on_state` function."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from autora.experimentalist.random_ import random_pool\n",
-    "from autora.state import on_state\n",
-    "\n",
-    "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n",
-    "s_1 = experimentalist(s_0, random_state=42)\n",
-    "s_1"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Specify the experiment runner. This calculates a linear function, adds noise, assigns the value to the `y` column\n",
-    " in a new DataFrame."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from autora.state import on_state\n",
-    "import numpy as np\n",
-    "import pandas as pd\n",
-    "\n",
-    "\n",
-    "@on_state(output=[\"experiment_data\"])\n",
-    "def experiment_runner(conditions: pd.DataFrame, c=[2, 4], random_state = None):\n",
-    "    rng = np.random.default_rng(random_state)\n",
-    "    x = conditions[\"x\"]\n",
-    "    noise = rng.normal(0, 1, len(x))\n",
-    "    y = c[0] + (c[1] * x) + noise\n",
-    "    observations = conditions.assign(y = y)\n",
-    "    return observations\n",
-    "\n",
-    "# Which does the following:\n",
-    "experiment_runner(s_1, random_state=43)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "A completely analogous definition, using the separate `@inputs_from_state` and `@outputs_to_delta(...)` decorators\n",
-    "rather than the combined `@on_state(...)` decorator would be:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from autora.state import inputs_from_state, outputs_to_delta\n",
-    "\n",
-    "\n",
-    "@inputs_from_state\n",
-    "@outputs_to_delta(\"experiment_data\")\n",
-    "def experiment_runner_alt_1(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n",
-    "    x = conditions[\"x\"]\n",
-    "    rng = np.random.default_rng(random_state)\n",
-    "    noise = rng.normal(0, 1, len(x))\n",
-    "    y = c[0] + (c[1] * x) + noise\n",
-    "    xy = conditions.assign(y = y)\n",
-    "    return xy\n",
-    "\n",
-    "# Which does the following:\n",
-    "experiment_runner_alt_1(s_1, random_state=42)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Or alternatively:"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "def experiment_runner_alt_2_core(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n",
-    "    x = conditions[\"x\"]\n",
-    "    rng = np.random.default_rng(random_state)\n",
-    "    noise = rng.normal(0, 1, len(x))\n",
-    "    y = c[0] + (c[1] * x) + noise\n",
-    "    xy = conditions.assign(y = y)\n",
-    "    return xy\n",
-    "\n",
-    "experiment_runner_alt_2 = on_state(experiment_runner_alt_2_core, output=[\"experiment_data\"])\n",
-    "experiment_runner_alt_2(s_1)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Specify a theorist, using a standard LinearRegression from scikit-learn."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "from sklearn.linear_model import LinearRegression\n",
-    "from autora.state import estimator_on_state\n",
-    "\n",
-    "theorist = estimator_on_state(LinearRegression(fit_intercept=True))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Now we can run the theorist on the output from the experiment_runner,\n",
-    "which itself uses the output from the experimentalist."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "theorist(experiment_runner(experimentalist(s_0)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "If we like, we can run the experimentalist, experiment_runner and theorist ten times."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "s_ = s_0\n",
-    "for i in range(10):\n",
-    "    s_ = experimentalist(s_, random_state=180+i)\n",
-    "    s_ = experiment_runner(s_, random_state=2*180+i)\n",
-    "    s_ = theorist(s_)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The experiment_data has 50 entries (10 cycles and 5 samples per cycle):"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "s_.experiment_data"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "The fitted coefficients are close to the original intercept = 2, gradient = 4"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "print(s_.model.intercept_, s_.model.coef_)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
   }
  ],
  "metadata": {

From 00457aa3b5b9c21c4ddfae86875774766f07c124 Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Wed, 20 Dec 2023 15:45:05 -0500
Subject: [PATCH 04/11] docs: simplify state documentation

---
 docs/The State Mechanism.ipynb | 35 +++++++++-------------------------
 1 file changed, 9 insertions(+), 26 deletions(-)

diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb
index 52e160f6..605fd9ed 100644
--- a/docs/The State Mechanism.ipynb	
+++ b/docs/The State Mechanism.ipynb	
@@ -23,7 +23,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Basic Aim: $f(S) = S^\\prime$\n",
+    "## Core Principle: every procedure accepts a `State` and returns a `State`\n",
     "\n",
     "The AutoRA State mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n",
     "- Data – stored as an immutable `State`\n",
@@ -38,7 +38,7 @@
     "| Theorist          | Model           |\n",
     "\n",
     "The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n",
-    "- Takes in existing Data $S$\n",
+    "- Takes in existing Data in a `State` $S$\n",
     "- Adds new data $\\Delta S$\n",
     "- Returns an updated state of the Data $S^\\prime$  \n",
     "\n",
@@ -287,22 +287,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## `Variable` and `VariableCollection`\n",
-    "TODO: move this to a different file"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": []
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "## Making a function of the form $f(S) = S^\\prime$\n",
+    "## Making a function of the correct form\n",
     "\n",
     "There are several equivalent ways to make a function of the form $f(S) = S^\\prime$. These are (from \n",
     "simplest but most restrictive, to most complex but with the greatest flexibility):\n",
@@ -311,7 +296,7 @@
     "\n",
     "There are also special cases, like the `autora.state.estimator_on_state` wrapper for `scikit-learn` estimators.  \n",
     "\n",
-    "Say you have a function to generate new experimental conditions, given some variables."
+    "Say you have a function to generate new experimental conditions, given some variables. "
    ]
   },
   {
@@ -333,10 +318,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "There are several equivalent ways to make this into a function of the form $f(S) = S^\\prime$. These are (from \n",
-    "simplest but most restrictive, to most complex but with the greatest flexibility):\n",
-    "- Decorate it with `autora.state.on_state`\n",
-    "- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`"
+    "We'll look at each of the ways you can make this into a function of the required form. "
    ]
   },
   {
@@ -345,7 +327,7 @@
    "source": [
     "### Use the `autora.state.on_state` decorator\n",
     "\n",
-    "`autora.state.on_state` is a wrapper for functions which changes their arguments. \n",
+    "`autora.state.on_state` is a wrapper for functions which allows them to accept `State` objects as the first argument.\n",
     "\n",
     "The most concise way to use it is as a decorator on the function where it is defined. You can specify how the \n",
     "returned values should be mapped to fields on the `State` using the `@autora.state.on_state(output=...)` argument."
@@ -392,7 +374,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Fully equivalently, you can modify `generate_conditions` to return a dictionary of values with the appropriate field \n",
+    "Fully equivalently, you can modify `generate_conditions` to return a Delta of values with the appropriate field \n",
     "names from `State`: "
    ]
   },
@@ -427,7 +409,8 @@
     "    for iv in variables.independent_variables:              # Loop through the independent variables\n",
     "        c = rng.uniform(*iv.value_range, size=num_samples)  #  - Generate a uniform sample from the range\n",
     "        conditions[iv.name] = c                             #  - Save the new values to the DataFrame\n",
-    "    return {\"conditions\": conditions}                       # Return a dictionary with the appropriate name\n",
+    "    return autora.state.Delta(conditions=conditions)        # Return a Delta with the appropriate names\n",
+    "    # return {\"conditions\": conditions}                     # Returning a dictionary is equivalent\n",
     "\n",
     "# Example\n",
     "generate_conditions(s_0)"

From a3e3db8392df46d27157ae576a3184321c0d8698 Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Wed, 20 Dec 2023 15:45:15 -0500
Subject: [PATCH 05/11] docs: add basic Variable documentation

---
 docs/Variable.ipynb | 327 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 327 insertions(+)
 create mode 100644 docs/Variable.ipynb

diff --git a/docs/Variable.ipynb b/docs/Variable.ipynb
new file mode 100644
index 00000000..15e0bf18
--- /dev/null
+++ b/docs/Variable.ipynb
@@ -0,0 +1,327 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "6f464ab4d943192c",
+   "metadata": {},
+   "source": [
+    "# `autora.variable`: `Variable` and `VariableCollection`\n",
+    "\n",
+    "`autora.variable.Variable` represents an experimental variable: \n",
+    "- an independent variable, or\n",
+    "- dependent variable.\n",
+    "\n",
+    "They can be initialized as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "c2bfbd97b0a14547",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from autora.variable import Variable\n",
+    "\n",
+    "x1 = Variable(\n",
+    "    name=\"x1\",\n",
+    ")\n",
+    "x2 = Variable(\n",
+    "    name=\"x2\",\n",
+    ")\n",
+    "y = Variable(\n",
+    "    name=\"y\",\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3d195cbb145dcd58",
+   "metadata": {},
+   "source": [
+    "A group of `Variables` representing the domain of an experiment is a `autora.variable.VariableCollection`. \n",
+    "\n",
+    "They can be initialized as follows:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "8f1dce3b50b7984c",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "VariableCollection(independent_variables=[Variable(name='x1', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False), Variable(name='x2', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=<ValueType.REAL: 'real'>, variable_label='', rescale=1, is_covariate=False)], covariates=[])"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from autora.variable import VariableCollection\n",
+    "\n",
+    "VariableCollection(\n",
+    "    independent_variables=[x1, x2],\n",
+    "    dependent_variables=[y]\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "80e85b4c6997a5fe",
+   "metadata": {},
+   "source": [
+    "For the full list of arguments, see the documentation in the `autora.variable` submodule.\n",
+    "\n",
+    "Some functions included in AutoRA use specific values stored on the Variable objects. For instance, the \n",
+    "`autora.experimentalist.grid.pool` uses the `allowed_values` field to create a grid of conditions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6eb32ff49345119e",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>x1</th>\n",
+       "      <th>x2</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>-1</td>\n",
+       "      <td>11</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>-1</td>\n",
+       "      <td>12</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>-1</td>\n",
+       "      <td>13</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>-2</td>\n",
+       "      <td>11</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>-2</td>\n",
+       "      <td>12</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>5</th>\n",
+       "      <td>-2</td>\n",
+       "      <td>13</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>6</th>\n",
+       "      <td>-3</td>\n",
+       "      <td>11</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>7</th>\n",
+       "      <td>-3</td>\n",
+       "      <td>12</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>8</th>\n",
+       "      <td>-3</td>\n",
+       "      <td>13</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   x1  x2\n",
+       "0  -1  11\n",
+       "1  -1  12\n",
+       "2  -1  13\n",
+       "3  -2  11\n",
+       "4  -2  12\n",
+       "5  -2  13\n",
+       "6  -3  11\n",
+       "7  -3  12\n",
+       "8  -3  13"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from autora.experimentalist.grid import grid_pool\n",
+    "\n",
+    "grid_pool(\n",
+    "    VariableCollection(independent_variables=[\n",
+    "        Variable(name=\"x1\", allowed_values=[-1, -2, -3]),\n",
+    "        Variable(name=\"x2\", allowed_values=[11, 12, 13])\n",
+    "    ])\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6f3f12554ba12ad",
+   "metadata": {},
+   "source": [
+    "The `autora.experimentalist.random.pool` uses the `value_range` field to sample conditions:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f890f05dd5c601ab",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>x1</th>\n",
+       "      <th>x2</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0.456338</td>\n",
+       "      <td>101.527294</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>1.008636</td>\n",
+       "      <td>101.297280</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>0.319617</td>\n",
+       "      <td>101.962166</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>-1.753273</td>\n",
+       "      <td>101.859696</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>-1.933420</td>\n",
+       "      <td>101.201565</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "         x1          x2\n",
+       "0  0.456338  101.527294\n",
+       "1  1.008636  101.297280\n",
+       "2  0.319617  101.962166\n",
+       "3 -1.753273  101.859696\n",
+       "4 -1.933420  101.201565"
+      ]
+     },
+     "execution_count": null,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from autora.experimentalist.random import random_pool\n",
+    "\n",
+    "random_pool(\n",
+    "    VariableCollection(independent_variables=[\n",
+    "        Variable(name=\"x1\", value_range=(-3, 3)),\n",
+    "        Variable(name=\"x2\", value_range=(101, 102))\n",
+    "    ]), \n",
+    "    random_state=180\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4ab2b25903f40a7",
+   "metadata": {},
+   "source": [
+    "The `autora.state.estimator_from_state` function uses the `names` of the variables to pass the correct columns to a \n",
+    "scikit-learn compatible estimator for curve fitting."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f4d28f5979fe9cb",
+   "metadata": {},
+   "source": [
+    "Check the documentation for any functions you are using to determine whether you need to include specific metadata."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}

From d4be0824c4af34b90530bce901f3fdef90f30269 Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Wed, 20 Dec 2023 15:45:31 -0500
Subject: [PATCH 06/11] docs: add new pages to mkdocs config

---
 mkdocs.yml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mkdocs.yml b/mkdocs.yml
index 5aee1786..71011cd2 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -12,6 +12,8 @@ theme:
     - content.code.copy
 nav:
 - Home: 'index.md'
+- State: 'The State Mechanism.ipynb'
+- Variable: 'Variable.ipynb'
 - Experimentalist Pipeline: 'pipeline/Experimentalist Pipeline Examples.ipynb'
 - Experimentalists:
   - Pooler:

From 8085f9a7b46378aec6c6b2a8908a524256481dbf Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Fri, 19 Jan 2024 11:31:17 -0500
Subject: [PATCH 07/11] Update docs/index.md

Co-authored-by: Ben Andrew <benwallaceandrew@gmail.com>
---
 docs/index.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/index.md b/docs/index.md
index 8fd8fb21..3602a90e 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -7,7 +7,7 @@ AutoRA includes core functionality for running AutoRA experiments organized into
 - `autora.serializer`, utilities for saving and loading `States`
 - `autora.workflow`, command line tools for running experimentalists, experiment runners and theorists
 - `autora.variable`, for representing experimental metadata describing the type and domain of variables
-- `autora.utils`, utilities and helper functions not specifically linked to any specific core functionality  
+- `autora.utils`, utilities and helper functions not linked to any specific core functionality  
 
 It also provides some basic experimentalists in the `autora.experimentalist` submodule. However, most 
 genuinely useful experimentalists and theorists are provided as optional dependencies to the `autora` package.

From dd84c6e94a4edb41488c557a50d9f38b24e5d097 Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Fri, 19 Jan 2024 11:31:29 -0500
Subject: [PATCH 08/11] Update docs/The State Mechanism.ipynb

Co-authored-by: Ben Andrew <benwallaceandrew@gmail.com>
---
 docs/The State Mechanism.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb
index 605fd9ed..f5e379bd 100644
--- a/docs/The State Mechanism.ipynb	
+++ b/docs/The State Mechanism.ipynb	
@@ -25,7 +25,7 @@
    "source": [
     "## Core Principle: every procedure accepts a `State` and returns a `State`\n",
     "\n",
-    "The AutoRA State mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n",
+    "The AutoRA `State` mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n",
     "- Data – stored as an immutable `State`\n",
     "- Procedures – functions which act on `State` objects to add new data and return a new `State`.\n",
     "\n",

From 861528857fa827c6925f532301ad29f566a34144 Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Fri, 19 Jan 2024 11:31:49 -0500
Subject: [PATCH 09/11] Update docs/Variable.ipynb

Co-authored-by: Ben Andrew <benwallaceandrew@gmail.com>
---
 docs/Variable.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/Variable.ipynb b/docs/Variable.ipynb
index 15e0bf18..c3ff1058 100644
--- a/docs/Variable.ipynb
+++ b/docs/Variable.ipynb
@@ -292,7 +292,7 @@
    "metadata": {},
    "source": [
     "The `autora.state.estimator_from_state` function uses the `names` of the variables to pass the correct columns to a \n",
-    "scikit-learn compatible estimator for curve fitting."
+    "`scikit-learn` compatible estimator for curve fitting."
    ]
   },
   {

From 9ea437ca37ba7153a8aa183083d7bb2494fd9be9 Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Fri, 19 Jan 2024 11:32:01 -0500
Subject: [PATCH 10/11] Update docs/The State Mechanism.ipynb

Co-authored-by: Ben Andrew <benwallaceandrew@gmail.com>
---
 docs/The State Mechanism.ipynb | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb
index f5e379bd..e10c1ccf 100644
--- a/docs/The State Mechanism.ipynb	
+++ b/docs/The State Mechanism.ipynb	
@@ -40,7 +40,7 @@
     "The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n",
     "- Takes in existing Data in a `State` $S$\n",
     "- Adds new data $\\Delta S$\n",
-    "- Returns an updated state of the Data $S^\\prime$  \n",
+    "- Returns an updated `State` $S^\\prime$  \n",
     "\n",
     "$$\n",
     "\\begin{aligned}\n",

From 7654d44bc4638a20688506b32eea9bcb909de8bf Mon Sep 17 00:00:00 2001
From: John Gerrard Holland <john_holland1@brown.edu>
Date: Fri, 19 Jan 2024 11:34:47 -0500
Subject: [PATCH 11/11] docs: fix reference names

---
 docs/The State Mechanism.ipynb | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb
index e10c1ccf..d3393cc6 100644
--- a/docs/The State Mechanism.ipynb	
+++ b/docs/The State Mechanism.ipynb	
@@ -182,9 +182,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "When carrying out this \"addition\", `s_0`: \n",
-    "- inspects the `Delta` it has been passed and finds any field names matching fields on `s_0`, in this case \n",
-    "`experiment_data`.\n",
+    "When carrying out this \"addition\", `s`: \n",
+    "- inspects the `Delta` it has been passed and finds any field names matching fields on `s`, in this case \n",
+    "`data`.\n",
     "- For each matching field it combines the data in a way determined by the field's metadata. The key options are:\n",
     "    - \"replace\" means that the data in the `Delta` object completely replace the data in the `State`,\n",
     "    - \"extend\" means that the data in the `Delta` object are combined – for pandas DataFrames this means that the new\n",