From 85f47fc0c2c8d5bbc2b8e4f65dd9914a213a87bd Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Wed, 20 Dec 2023 08:36:27 -0500 Subject: [PATCH 01/11] docs: add overview to core documentation --- docs/index.md | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 0c6266a3..8fd8fb21 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,3 +1,14 @@ # Core Functionality -AutoRA includes core functionality for running AutoRA experiments. +AutoRA includes core functionality for running AutoRA experiments organized into these submodules: + +- `autora.state`, which underpins the unified `State` interface for writing experimentalists, experiment runners and + theorists +- `autora.serializer`, utilities for saving and loading `States` +- `autora.workflow`, command line tools for running experimentalists, experiment runners and theorists +- `autora.variable`, for representing experimental metadata describing the type and domain of variables +- `autora.utils`, utilities and helper functions not specifically linked to any specific core functionality + +It also provides some basic experimentalists in the `autora.experimentalist` submodule. However, most +genuinely useful experimentalists and theorists are provided as optional dependencies to the `autora` package. + From 7fb524ace017d965cd124c568ec9b3c78ece906e Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Wed, 20 Dec 2023 12:23:57 -0500 Subject: [PATCH 02/11] docs: add State mechanism notebook --- docs/The State Mechanism.ipynb | 922 +++++++++++++++++++++++++++++++++ 1 file changed, 922 insertions(+) create mode 100644 docs/The State Mechanism.ipynb diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb new file mode 100644 index 00000000..789dbbde --- /dev/null +++ b/docs/The State Mechanism.ipynb @@ -0,0 +1,922 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# The `State` mechanism" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A `State` is an object representing data from an experiment, like the conditions, observed experiment data and models. \n", + "In the AutoRA framework, experimentalists, experiment runners and theorists are functions which \n", + "- operate on `States` and \n", + "- return `States`.\n", + "\n", + "The `autora.state` submodule provides classes and functions to help build these functions. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Basic Aim: $f(S) = S^\\prime$\n", + "\n", + "The AutoRA State mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n", + "- Data – stored as an immutable `State`\n", + "- Procedures – functions which act on `State` objects to add new data and return a new `State`.\n", + "\n", + "Procedures generate data. Some common procedures which appear in AutoRA experiments, and the data they produce are:\n", + "\n", + "| Procedure | Data |\n", + "|-------------------|-----------------|\n", + "| Experimentalist | Conditions |\n", + "| Experiment Runner | Experiment Data |\n", + "| Theorist | Model |\n", + "\n", + "The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n", + "- Takes in existing Data $S$\n", + "- Adds new data $\\Delta S$\n", + "- Returns an updated state of the Data $S^\\prime$ \n", + "\n", + "$$\n", + "\\begin{aligned}\n", + "f(S) &= S + \\Delta S \\\\\n", + " &= S^\\prime\n", + "\\end{aligned}\n", + "$$\n", + "\n", + "AutoRA includes:\n", + "- Classes to represent the Data $S$ – the `State` object (and the derived `StandardState` – a pre-defined version \n", + "with the common fields needed for cyclical experiments) \n", + "- Functions to make it easier to write procedures of the form $f(S) = S^\\prime$" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import numpy as np\n", + "import pandas as pd\n", + "import autora.state\n", + "from autora.variable import VariableCollection, Variable" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `State` objects\n", + "TODO: write this part" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s_0 = autora.state.StandardState(\n", + " variables=VariableCollection(\n", + " independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n", + " dependent_variables=[Variable(\"y\")]\n", + " ),\n", + " conditions=pd.DataFrame({\"x\":[]}),\n", + " experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n", + " models=[]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## `Variable` and `VariableCollection`\n", + "TODO: move this to a different file" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Making a function of the form $f(S) = S^\\prime$\n", + "\n", + "There are several equivalent ways to make a function of the form $f(S) = S^\\prime$. These are (from \n", + "simplest but most restrictive, to most complex but with the greatest flexibility):\n", + "- Use the `autora.state.on_state` decorator\n", + "- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`\n", + "\n", + "There are also special cases, like the `autora.state.estimator_on_state` wrapper for `scikit-learn` estimators. \n", + "\n", + "Say you have a function to generate new experimental conditions, given some variables." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def generate_conditions(variables, num_samples=5, random_state=42):\n", + " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", + " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", + " for iv in variables.independent_variables: # Loop through the independent variables\n", + " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", + " conditions[iv.name] = c # - Save the new values to the DataFrame\n", + " return conditions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are several equivalent ways to make this into a function of the form $f(S) = S^\\prime$. These are (from \n", + "simplest but most restrictive, to most complex but with the greatest flexibility):\n", + "- Decorate it with `autora.state.on_state`\n", + "- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Use the `autora.state.on_state` decorator\n", + "\n", + "`autora.state.on_state` is a wrapper for functions which changes their arguments. \n", + "\n", + "The most concise way to use it is as a decorator on the function where it is defined. You can specify how the \n", + "returned values should be mapped to fields on the `State` using the `@autora.state.on_state(output=...)` argument." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453, experiment_data=Empty DataFrame\n", + "Columns: [x, y]\n", + "Index: [], models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@autora.state.on_state(output=[\"conditions\"])\n", + "def generate_conditions(variables, num_samples=5, random_state=42):\n", + " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", + " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", + " for iv in variables.independent_variables: # Loop through the independent variables\n", + " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", + " conditions[iv.name] = c # - Save the new values to the DataFrame\n", + " return conditions\n", + "\n", + "# Example\n", + "generate_conditions(s_0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Fully equivalently, you can modify `generate_conditions` to return a dictionary of values with the appropriate field \n", + "names from `State`: " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453, experiment_data=Empty DataFrame\n", + "Columns: [x, y]\n", + "Index: [], models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@autora.state.on_state\n", + "def generate_conditions(variables, num_samples=5, random_state=42):\n", + " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", + " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", + " for iv in variables.independent_variables: # Loop through the independent variables\n", + " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", + " conditions[iv.name] = c # - Save the new values to the DataFrame\n", + " return {\"conditions\": conditions} # Return a dictionary with the appropriate name\n", + "\n", + "# Example\n", + "generate_conditions(s_0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Deep dive: `autora.state_on_state`\n", + "The decorator notation is equivalent to the following:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 1.521127\n", + "1 3.362120\n", + "2 1.065391\n", + "3 -5.844244\n", + "4 -6.444732, experiment_data=Empty DataFrame\n", + "Columns: [x, y]\n", + "Index: [], models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def generate_conditions_inner(variables, num_samples=5, random_state=42):\n", + " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", + " result = pd.DataFrame() # Create a DataFrame to hold the results \n", + " for iv in variables.independent_variables: # Loop through the independent variables\n", + " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", + " result[iv.name] = c # - Save the new values to the DataFrame\n", + " return result\n", + "\n", + "generate_conditions = autora.state.on_state(generate_conditions_inner, output=[\"conditions\"])\n", + "\n", + "# Example\n", + "generate_conditions(s_0, random_state=180)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "During the `generate_conditions(s_0, random_state=180)` call, `autora.state.on_state` does the following:\n", + "- Inspects the signature of `generate_conditions_inner` to see which variables are required – in this case:\n", + " - `variables`, \n", + " - `num_samples` and \n", + " - `random_state`.\n", + "- Looks for fields with those names on `s_0`:\n", + " - Finds a field called `variables`.\n", + "- Calls `generate_conditions_inner` with those fields as arguments, plus any arguments specified in the \n", + "`generate_conditions` call (here just `random_state`)\n", + "- Converts the returned value `result` into `Delta(conditions=result)` using the name specified in `output=[\"conditions\"]`\n", + "- Returns `s_0 + Delta(conditions=result)`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Fully equivalently to using the `autora.state.on_state` wrapper, you can construct a function which takes and returns \n", + "`State` objects. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453, experiment_data=Empty DataFrame\n", + "Columns: [x, y]\n", + "Index: [], models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def generate_conditions(state: autora.state.StandardState, num_samples=5, random_state=42):\n", + " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", + " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", + " for iv in state.variables.independent_variables: # Loop through the independent variables\n", + " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", + " conditions[iv.name] = c # - Save the new values to the DataFrame\n", + " delta = autora.state.Delta(conditions=conditions) # Construct a new Delta representing the updated data\n", + " new_state = state + delta # Construct a new state, \"adding\" the Delta\n", + " return new_state\n", + "\n", + "# Example\n", + "generate_conditions(s_0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Special case: `autora.state.estimator_on_state` for `scikit-learn` estimators\n", + "\n", + "The \"theorist\" component in an AutoRA cycle is often a `scikit-learn` compatible estimator which implements a curve \n", + "fitting function like a linear, logistic or symbolic regression. `scikit-learn` estimators are classes, and they have\n", + " a specific wrapper: `autora.state.estimator_on_state`, used as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Returned models: [LinearRegression()]\n", + "Last model's coefficients: y = [3.49729147] x + [1.99930059]\n" + ] + } + ], + "source": [ + "from sklearn.linear_model import LinearRegression\n", + "\n", + "\n", + "estimator = LinearRegression(fit_intercept=True) # Initialize the regressor with all its parameters\n", + "theorist = autora.state.estimator_on_state(estimator) # Wrap the estimator\n", + "\n", + "\n", + "# Example\n", + "variables = s_0.variables # Reuse the variables from before \n", + "xs = np.linspace(-10, 10, 101) # Make an array of x-values \n", + "noise = np.random.default_rng(179).normal(0., 0.5, xs.shape) # Gaussian noise\n", + "ys = (3.5 * xs + 2. + noise) # Calculate y = 3.5 x + 2 + noise \n", + "\n", + "s_1 = autora.state.StandardState( # Initialize the State with those data\n", + " variables=variables,\n", + " experiment_data=pd.DataFrame({\"x\":xs, \"y\":ys}),\n", + ")\n", + "s_1_prime = theorist(s_1) # Run the theorist\n", + "print(f\"Returned models: \"\n", + " f\"{s_1_prime.models}\") \n", + "print(f\"Last model's coefficients: \"\n", + " f\"y = {s_1_prime.models[-1].coef_[0]} x + {s_1_prime.models[-1].intercept_}\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "During the `theorist(s_1)` call, `autora.state.estimator_on_state` does the following:\n", + "- Gets the names of the independent and dependent variables from the `s_1.variables`\n", + "- Gathers the values of those variables from `s_1.experiment_data`\n", + "- Passes those values to the `LinearRegression().fit(x, y)` method\n", + "- Constructs `Delta(models=[LinearRegression()])` with the fitted regressor\n", + "- Returns `s_1 + Delta(models=[LinearRegression()])`" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example\n", + "Sebastian wishes to run an experiment. He knows:\n", + "- which variables he wants to investigate: \n", + " - $x$, the independent variable, is a number in the range $-10$ to $10$,\n", + " - $y$, the dependent variable, is a number with an unknown range.\n", + "\n", + "and will use this knowledge to **initialize a `State` object**.\n", + "\n", + "He planned procedures for:\n", + "- making a list of conditions to observe, \n", + "- running the experiment, given the list of conditions,\n", + "- generating a model to describe the data\n", + "\n", + "and he will write each of these down as a **function**." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Initialize the `State` object\n", + "Sebastian writes down the current State of his knowledge about the problem in a `State` object.\n", + "\n", + "However, he doesn't yet know which conditions to look at – those will be generated by his procedures. \n", + "Nor does he have any experiment data. So he initializes DataFrames to hold those results, but \n", + "leaves both empty. Likewise, he doesn't have any models right now, so he creates an empty list for those." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "from autora.state import StandardState\n", + "from autora.variable import VariableCollection, Variable\n", + "\n", + "s_0 = StandardState(\n", + " variables=VariableCollection(\n", + " independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n", + " dependent_variables=[Variable(\"y\")]\n", + " ),\n", + " conditions=pd.DataFrame({\"x\":[]}),\n", + " experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n", + " models=[]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Write \"experimentalist\" procedure for generating conditions\n", + "\n", + "Sebastian writes down the procedure for making a list of conditions to observe. He writes this as a function \n", + "which acts on the things he knows from the state, and returns a dataframe with the new conditions. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x
05.479121
1-1.222431
27.171958
33.947361
4-8.116453
\n", + "
" + ], + "text/plain": [ + " x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "\n", + "\n", + "def generate_conditions(variables, num_samples=5, random_state=42):\n", + " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", + " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", + " for iv in variables.independent_variables: # Loop through the independent variables\n", + " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", + " conditions[iv.name] = c # - Save the new values to the DataFrame\n", + " return conditions\n", + "\n", + "# Example\n", + "generate_conditions(s_0.variables)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, he \"wraps\" the `generate_conditions` function using a utility from the `autora.state` submodule, to make his\n", + " finished experimentalist. The purpose of the wrapper is to turn the basic function he wrote into one which accepts \n", + " a `State` object as input and returns a `State` object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453, experiment_data=Empty DataFrame\n", + "Columns: [x, y]\n", + "Index: [], models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.state import on_state\n", + "\n", + "\n", + "experimentalist = on_state( # Utility which adds the `State` functionality to a function\n", + " generate_conditions, # Pass in the basic `generate_conditions` function\n", + " output=[\"conditions\"] # Say that the value returned from `generate_conditions` should be \n", + " # used as `conditions` on the State\n", + ")\n", + "\n", + "# Example\n", + "experimentalist(s_0)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Write \"experiment runner\" procedure for gathering observations " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from autora.state import on_state\n", + "\n", + "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n", + "s_1 = experimentalist(s_0, random_state=42)\n", + "s_1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "## Theoretical Overview\n", + "\n", + "The fundamental idea is this:\n", + "- We define a \"state\" object $S$ which can be modified with a \"delta\" (a new result) $\\Delta S$.\n", + "- A new state at some point $i+1$ is $$S_{i+1} = S_i + \\Delta S_{i+1}$$\n", + "- The cycle state after $n$ steps is thus $$S_n = S_{0} + \\sum^{n}_{i=1} \\Delta S_{i}$$\n", + "\n", + "To represent $S$ and $\\Delta S$ in code, you can use `autora.state.State` and `autora.state.Delta`\n", + "respectively. To operate on these, we define functions.\n", + "\n", + "- Each operation in an AER cycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a\n", + "function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not.\n", + " $$ f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta S_{i+1}$$\n", + "- There is a wrapper function $w$ (`autora.state.wrap_to_use_state`) which changes the signature of $f$ to\n", + "require $S$ and aggregates the resulting $\\Delta S_{i+1}$\n", + " $$w\\left[f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta\n", + "S_{i+1}\\right] \\rightarrow \\left[ f^\\prime(S_i, a_0, ..., a_m) \\rightarrow S_{i} + \\Delta\n", + "S_{i+1} = S_{i+1}\\right]$$\n", + "\n", + "- Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\\prime$, the full AER cycle can\n", + "then be represented as:\n", + " $$S_n = f_n^\\prime(...f_2^\\prime(f_1^\\prime(S_0)))$$\n", + "\n", + "There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we\n", + "can define a full AER cycle using python notation as shown in the following example." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example\n", + "\n", + "First initialize the State. In this case, we use the pre-defined `StandardState` which implements the standard AER\n", + "naming convention.\n", + "There are two variables `x` with a range [-10, 10] and `y` with an unspecified range." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from autora.state import StandardState\n", + "from autora.variable import VariableCollection, Variable\n", + "\n", + "s_0 = StandardState(\n", + " variables=VariableCollection(\n", + " independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n", + " dependent_variables=[Variable(\"y\")]\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Specify the experimentalist. Use a standard function `random_pool`.\n", + "This gets 5 independent random samples (by default, configurable using an argument)\n", + "from the value_range of the independent variables, and returns them in a DataFrame.\n", + "To make this work as a function on the State objects, we wrap it in the `on_state` function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from autora.experimentalist.random_ import random_pool\n", + "from autora.state import on_state\n", + "\n", + "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n", + "s_1 = experimentalist(s_0, random_state=42)\n", + "s_1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Specify the experiment runner. This calculates a linear function, adds noise, assigns the value to the `y` column\n", + " in a new DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from autora.state import on_state\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "\n", + "@on_state(output=[\"experiment_data\"])\n", + "def experiment_runner(conditions: pd.DataFrame, c=[2, 4], random_state = None):\n", + " rng = np.random.default_rng(random_state)\n", + " x = conditions[\"x\"]\n", + " noise = rng.normal(0, 1, len(x))\n", + " y = c[0] + (c[1] * x) + noise\n", + " observations = conditions.assign(y = y)\n", + " return observations\n", + "\n", + "# Which does the following:\n", + "experiment_runner(s_1, random_state=43)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A completely analogous definition, using the separate `@inputs_from_state` and `@outputs_to_delta(...)` decorators\n", + "rather than the combined `@on_state(...)` decorator would be:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from autora.state import inputs_from_state, outputs_to_delta\n", + "\n", + "\n", + "@inputs_from_state\n", + "@outputs_to_delta(\"experiment_data\")\n", + "def experiment_runner_alt_1(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n", + " x = conditions[\"x\"]\n", + " rng = np.random.default_rng(random_state)\n", + " noise = rng.normal(0, 1, len(x))\n", + " y = c[0] + (c[1] * x) + noise\n", + " xy = conditions.assign(y = y)\n", + " return xy\n", + "\n", + "# Which does the following:\n", + "experiment_runner_alt_1(s_1, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or alternatively:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def experiment_runner_alt_2_core(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n", + " x = conditions[\"x\"]\n", + " rng = np.random.default_rng(random_state)\n", + " noise = rng.normal(0, 1, len(x))\n", + " y = c[0] + (c[1] * x) + noise\n", + " xy = conditions.assign(y = y)\n", + " return xy\n", + "\n", + "experiment_runner_alt_2 = on_state(experiment_runner_alt_2_core, output=[\"experiment_data\"])\n", + "experiment_runner_alt_2(s_1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Specify a theorist, using a standard LinearRegression from scikit-learn." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LinearRegression\n", + "from autora.state import estimator_on_state\n", + "\n", + "theorist = estimator_on_state(LinearRegression(fit_intercept=True))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can run the theorist on the output from the experiment_runner,\n", + "which itself uses the output from the experimentalist." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "theorist(experiment_runner(experimentalist(s_0)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we like, we can run the experimentalist, experiment_runner and theorist ten times." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s_ = s_0\n", + "for i in range(10):\n", + " s_ = experimentalist(s_, random_state=180+i)\n", + " s_ = experiment_runner(s_, random_state=2*180+i)\n", + " s_ = theorist(s_)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The experiment_data has 50 entries (10 cycles and 5 samples per cycle):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s_.experiment_data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The fitted coefficients are close to the original intercept = 2, gradient = 4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print(s_.model.intercept_, s_.model.coef_)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} From 6b07b894e099b517b33866eb0589c34ff45c70a3 Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Wed, 20 Dec 2023 13:16:03 -0500 Subject: [PATCH 03/11] docs: update State Mechanism notebook withe basic information about the State --- docs/The State Mechanism.ipynb | 685 ++++++++++----------------------- 1 file changed, 199 insertions(+), 486 deletions(-) diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb index 789dbbde..52e160f6 100644 --- a/docs/The State Mechanism.ipynb +++ b/docs/The State Mechanism.ipynb @@ -61,6 +61,8 @@ "metadata": {}, "outputs": [], "source": [ + "from dataclasses import dataclass, field\n", + "\n", "import numpy as np\n", "import pandas as pd\n", "import autora.state\n", @@ -72,7 +74,196 @@ "metadata": {}, "source": [ "## `State` objects\n", - "TODO: write this part" + "\n", + "`State` objects contain metadata describing an experiment, and the data gathered during an experiment. Any `State` \n", + "object used in an AutoRA cycle will be a subclass of the `autora.state.State`, with the necessary fields specified. \n", + "(The `autora.state.StandardState` provides some sensible defaults.)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "@dataclass(frozen=True)\n", + "class BasicState(autora.state.State):\n", + " data: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={\"delta\": \"extend\"})\n", + " \n", + "s = BasicState()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because it is a python dataclass, the `State` fields can be accessed using attribute notation, for example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
\n", + "
" + ], + "text/plain": [ + "Empty DataFrame\n", + "Columns: []\n", + "Index: []" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s.data # an empty DataFrame with a column \"x\"" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "`State` objects can be updated by adding `Delta` objects. A `Delta` represents new data, and is combined with the \n", + "existing data in the `State` object. The `State` itself is immutable by design, so adding a `Delta` to it creates a new \n", + "`State`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "BasicState(data= x y\n", + "0 1 1)" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s + autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]}))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "When carrying out this \"addition\", `s_0`: \n", + "- inspects the `Delta` it has been passed and finds any field names matching fields on `s_0`, in this case \n", + "`experiment_data`.\n", + "- For each matching field it combines the data in a way determined by the field's metadata. The key options are:\n", + " - \"replace\" means that the data in the `Delta` object completely replace the data in the `State`,\n", + " - \"extend\" means that the data in the `Delta` object are combined – for pandas DataFrames this means that the new\n", + " data are concatenated to the bottom of the existing DataFrame.\n", + " \n", + " For full details on which options are available, see the documentation for the `autora.state` module. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
xy
011
122
\n", + "
" + ], + "text/plain": [ + " x y\n", + "0 1 1\n", + "1 2 2" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "(s + \n", + " autora.state.Delta(data=pd.DataFrame({\"x\":[1], \"y\":[1]})) + \n", + " autora.state.Delta(data=pd.DataFrame({\"x\":[2], \"y\":[2]}))\n", + " ).data # Access just the experiment_data on the updated State" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### `StandardState`\n", + "\n", + "For typical AutoRA experiments, you can use the `autora.state.StandardState` object, which has fields for variables, \n", + "conditions, experiment data and models. You can initialize a `StandardState` object like this:" ] }, { @@ -100,6 +291,13 @@ "TODO: move this to a different file" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "metadata": {}, @@ -412,491 +610,6 @@ "- Constructs `Delta(models=[LinearRegression()])` with the fitted regressor\n", "- Returns `s_1 + Delta(models=[LinearRegression()])`" ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Example\n", - "Sebastian wishes to run an experiment. He knows:\n", - "- which variables he wants to investigate: \n", - " - $x$, the independent variable, is a number in the range $-10$ to $10$,\n", - " - $y$, the dependent variable, is a number with an unknown range.\n", - "\n", - "and will use this knowledge to **initialize a `State` object**.\n", - "\n", - "He planned procedures for:\n", - "- making a list of conditions to observe, \n", - "- running the experiment, given the list of conditions,\n", - "- generating a model to describe the data\n", - "\n", - "and he will write each of these down as a **function**." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Initialize the `State` object\n", - "Sebastian writes down the current State of his knowledge about the problem in a `State` object.\n", - "\n", - "However, he doesn't yet know which conditions to look at – those will be generated by his procedures. \n", - "Nor does he have any experiment data. So he initializes DataFrames to hold those results, but \n", - "leaves both empty. Likewise, he doesn't have any models right now, so he creates an empty list for those." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import pandas as pd\n", - "from autora.state import StandardState\n", - "from autora.variable import VariableCollection, Variable\n", - "\n", - "s_0 = StandardState(\n", - " variables=VariableCollection(\n", - " independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n", - " dependent_variables=[Variable(\"y\")]\n", - " ),\n", - " conditions=pd.DataFrame({\"x\":[]}),\n", - " experiment_data=pd.DataFrame({\"x\":[], \"y\":[]}),\n", - " models=[]\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write \"experimentalist\" procedure for generating conditions\n", - "\n", - "Sebastian writes down the procedure for making a list of conditions to observe. He writes this as a function \n", - "which acts on the things he knows from the state, and returns a dataframe with the new conditions. " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
x
05.479121
1-1.222431
27.171958
33.947361
4-8.116453
\n", - "
" - ], - "text/plain": [ - " x\n", - "0 5.479121\n", - "1 -1.222431\n", - "2 7.171958\n", - "3 3.947361\n", - "4 -8.116453" - ] - }, - "execution_count": null, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import numpy as np\n", - "\n", - "\n", - "def generate_conditions(variables, num_samples=5, random_state=42):\n", - " rng = np.random.default_rng(random_state) # Initialize a random number generator\n", - " conditions = pd.DataFrame() # Create a DataFrame to hold the results \n", - " for iv in variables.independent_variables: # Loop through the independent variables\n", - " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", - " conditions[iv.name] = c # - Save the new values to the DataFrame\n", - " return conditions\n", - "\n", - "# Example\n", - "generate_conditions(s_0.variables)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Finally, he \"wraps\" the `generate_conditions` function using a utility from the `autora.state` submodule, to make his\n", - " finished experimentalist. The purpose of the wrapper is to turn the basic function he wrote into one which accepts \n", - " a `State` object as input and returns a `State` object." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", - "0 5.479121\n", - "1 -1.222431\n", - "2 7.171958\n", - "3 3.947361\n", - "4 -8.116453, experiment_data=Empty DataFrame\n", - "Columns: [x, y]\n", - "Index: [], models=[])" - ] - }, - "execution_count": null, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "from autora.state import on_state\n", - "\n", - "\n", - "experimentalist = on_state( # Utility which adds the `State` functionality to a function\n", - " generate_conditions, # Pass in the basic `generate_conditions` function\n", - " output=[\"conditions\"] # Say that the value returned from `generate_conditions` should be \n", - " # used as `conditions` on the State\n", - ")\n", - "\n", - "# Example\n", - "experimentalist(s_0)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "### Write \"experiment runner\" procedure for gathering observations " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from autora.state import on_state\n", - "\n", - "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n", - "s_1 = experimentalist(s_0, random_state=42)\n", - "s_1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "## Theoretical Overview\n", - "\n", - "The fundamental idea is this:\n", - "- We define a \"state\" object $S$ which can be modified with a \"delta\" (a new result) $\\Delta S$.\n", - "- A new state at some point $i+1$ is $$S_{i+1} = S_i + \\Delta S_{i+1}$$\n", - "- The cycle state after $n$ steps is thus $$S_n = S_{0} + \\sum^{n}_{i=1} \\Delta S_{i}$$\n", - "\n", - "To represent $S$ and $\\Delta S$ in code, you can use `autora.state.State` and `autora.state.Delta`\n", - "respectively. To operate on these, we define functions.\n", - "\n", - "- Each operation in an AER cycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a\n", - "function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not.\n", - " $$ f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta S_{i+1}$$\n", - "- There is a wrapper function $w$ (`autora.state.wrap_to_use_state`) which changes the signature of $f$ to\n", - "require $S$ and aggregates the resulting $\\Delta S_{i+1}$\n", - " $$w\\left[f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta\n", - "S_{i+1}\\right] \\rightarrow \\left[ f^\\prime(S_i, a_0, ..., a_m) \\rightarrow S_{i} + \\Delta\n", - "S_{i+1} = S_{i+1}\\right]$$\n", - "\n", - "- Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\\prime$, the full AER cycle can\n", - "then be represented as:\n", - " $$S_n = f_n^\\prime(...f_2^\\prime(f_1^\\prime(S_0)))$$\n", - "\n", - "There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we\n", - "can define a full AER cycle using python notation as shown in the following example." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Example\n", - "\n", - "First initialize the State. In this case, we use the pre-defined `StandardState` which implements the standard AER\n", - "naming convention.\n", - "There are two variables `x` with a range [-10, 10] and `y` with an unspecified range." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from autora.state import StandardState\n", - "from autora.variable import VariableCollection, Variable\n", - "\n", - "s_0 = StandardState(\n", - " variables=VariableCollection(\n", - " independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n", - " dependent_variables=[Variable(\"y\")]\n", - " )\n", - ")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Specify the experimentalist. Use a standard function `random_pool`.\n", - "This gets 5 independent random samples (by default, configurable using an argument)\n", - "from the value_range of the independent variables, and returns them in a DataFrame.\n", - "To make this work as a function on the State objects, we wrap it in the `on_state` function." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from autora.experimentalist.random_ import random_pool\n", - "from autora.state import on_state\n", - "\n", - "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n", - "s_1 = experimentalist(s_0, random_state=42)\n", - "s_1" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Specify the experiment runner. This calculates a linear function, adds noise, assigns the value to the `y` column\n", - " in a new DataFrame." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from autora.state import on_state\n", - "import numpy as np\n", - "import pandas as pd\n", - "\n", - "\n", - "@on_state(output=[\"experiment_data\"])\n", - "def experiment_runner(conditions: pd.DataFrame, c=[2, 4], random_state = None):\n", - " rng = np.random.default_rng(random_state)\n", - " x = conditions[\"x\"]\n", - " noise = rng.normal(0, 1, len(x))\n", - " y = c[0] + (c[1] * x) + noise\n", - " observations = conditions.assign(y = y)\n", - " return observations\n", - "\n", - "# Which does the following:\n", - "experiment_runner(s_1, random_state=43)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "A completely analogous definition, using the separate `@inputs_from_state` and `@outputs_to_delta(...)` decorators\n", - "rather than the combined `@on_state(...)` decorator would be:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from autora.state import inputs_from_state, outputs_to_delta\n", - "\n", - "\n", - "@inputs_from_state\n", - "@outputs_to_delta(\"experiment_data\")\n", - "def experiment_runner_alt_1(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n", - " x = conditions[\"x\"]\n", - " rng = np.random.default_rng(random_state)\n", - " noise = rng.normal(0, 1, len(x))\n", - " y = c[0] + (c[1] * x) + noise\n", - " xy = conditions.assign(y = y)\n", - " return xy\n", - "\n", - "# Which does the following:\n", - "experiment_runner_alt_1(s_1, random_state=42)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Or alternatively:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "def experiment_runner_alt_2_core(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n", - " x = conditions[\"x\"]\n", - " rng = np.random.default_rng(random_state)\n", - " noise = rng.normal(0, 1, len(x))\n", - " y = c[0] + (c[1] * x) + noise\n", - " xy = conditions.assign(y = y)\n", - " return xy\n", - "\n", - "experiment_runner_alt_2 = on_state(experiment_runner_alt_2_core, output=[\"experiment_data\"])\n", - "experiment_runner_alt_2(s_1)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Specify a theorist, using a standard LinearRegression from scikit-learn." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.linear_model import LinearRegression\n", - "from autora.state import estimator_on_state\n", - "\n", - "theorist = estimator_on_state(LinearRegression(fit_intercept=True))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now we can run the theorist on the output from the experiment_runner,\n", - "which itself uses the output from the experimentalist." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "theorist(experiment_runner(experimentalist(s_0)))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "If we like, we can run the experimentalist, experiment_runner and theorist ten times." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s_ = s_0\n", - "for i in range(10):\n", - " s_ = experimentalist(s_, random_state=180+i)\n", - " s_ = experiment_runner(s_, random_state=2*180+i)\n", - " s_ = theorist(s_)" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The experiment_data has 50 entries (10 cycles and 5 samples per cycle):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "s_.experiment_data" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "The fitted coefficients are close to the original intercept = 2, gradient = 4" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "print(s_.model.intercept_, s_.model.coef_)" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] } ], "metadata": { From 00457aa3b5b9c21c4ddfae86875774766f07c124 Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Wed, 20 Dec 2023 15:45:05 -0500 Subject: [PATCH 04/11] docs: simplify state documentation --- docs/The State Mechanism.ipynb | 35 +++++++++------------------------- 1 file changed, 9 insertions(+), 26 deletions(-) diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb index 52e160f6..605fd9ed 100644 --- a/docs/The State Mechanism.ipynb +++ b/docs/The State Mechanism.ipynb @@ -23,7 +23,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Basic Aim: $f(S) = S^\\prime$\n", + "## Core Principle: every procedure accepts a `State` and returns a `State`\n", "\n", "The AutoRA State mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n", "- Data – stored as an immutable `State`\n", @@ -38,7 +38,7 @@ "| Theorist | Model |\n", "\n", "The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n", - "- Takes in existing Data $S$\n", + "- Takes in existing Data in a `State` $S$\n", "- Adds new data $\\Delta S$\n", "- Returns an updated state of the Data $S^\\prime$ \n", "\n", @@ -287,22 +287,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## `Variable` and `VariableCollection`\n", - "TODO: move this to a different file" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Making a function of the form $f(S) = S^\\prime$\n", + "## Making a function of the correct form\n", "\n", "There are several equivalent ways to make a function of the form $f(S) = S^\\prime$. These are (from \n", "simplest but most restrictive, to most complex but with the greatest flexibility):\n", @@ -311,7 +296,7 @@ "\n", "There are also special cases, like the `autora.state.estimator_on_state` wrapper for `scikit-learn` estimators. \n", "\n", - "Say you have a function to generate new experimental conditions, given some variables." + "Say you have a function to generate new experimental conditions, given some variables. " ] }, { @@ -333,10 +318,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "There are several equivalent ways to make this into a function of the form $f(S) = S^\\prime$. These are (from \n", - "simplest but most restrictive, to most complex but with the greatest flexibility):\n", - "- Decorate it with `autora.state.on_state`\n", - "- Modify `generate_conditions` to accept a `StandardState` and update this with a `Delta`" + "We'll look at each of the ways you can make this into a function of the required form. " ] }, { @@ -345,7 +327,7 @@ "source": [ "### Use the `autora.state.on_state` decorator\n", "\n", - "`autora.state.on_state` is a wrapper for functions which changes their arguments. \n", + "`autora.state.on_state` is a wrapper for functions which allows them to accept `State` objects as the first argument.\n", "\n", "The most concise way to use it is as a decorator on the function where it is defined. You can specify how the \n", "returned values should be mapped to fields on the `State` using the `@autora.state.on_state(output=...)` argument." @@ -392,7 +374,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Fully equivalently, you can modify `generate_conditions` to return a dictionary of values with the appropriate field \n", + "Fully equivalently, you can modify `generate_conditions` to return a Delta of values with the appropriate field \n", "names from `State`: " ] }, @@ -427,7 +409,8 @@ " for iv in variables.independent_variables: # Loop through the independent variables\n", " c = rng.uniform(*iv.value_range, size=num_samples) # - Generate a uniform sample from the range\n", " conditions[iv.name] = c # - Save the new values to the DataFrame\n", - " return {\"conditions\": conditions} # Return a dictionary with the appropriate name\n", + " return autora.state.Delta(conditions=conditions) # Return a Delta with the appropriate names\n", + " # return {\"conditions\": conditions} # Returning a dictionary is equivalent\n", "\n", "# Example\n", "generate_conditions(s_0)" From a3e3db8392df46d27157ae576a3184321c0d8698 Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Wed, 20 Dec 2023 15:45:15 -0500 Subject: [PATCH 05/11] docs: add basic Variable documentation --- docs/Variable.ipynb | 327 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 327 insertions(+) create mode 100644 docs/Variable.ipynb diff --git a/docs/Variable.ipynb b/docs/Variable.ipynb new file mode 100644 index 00000000..15e0bf18 --- /dev/null +++ b/docs/Variable.ipynb @@ -0,0 +1,327 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "6f464ab4d943192c", + "metadata": {}, + "source": [ + "# `autora.variable`: `Variable` and `VariableCollection`\n", + "\n", + "`autora.variable.Variable` represents an experimental variable: \n", + "- an independent variable, or\n", + "- dependent variable.\n", + "\n", + "They can be initialized as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "c2bfbd97b0a14547", + "metadata": {}, + "outputs": [], + "source": [ + "from autora.variable import Variable\n", + "\n", + "x1 = Variable(\n", + " name=\"x1\",\n", + ")\n", + "x2 = Variable(\n", + " name=\"x2\",\n", + ")\n", + "y = Variable(\n", + " name=\"y\",\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "3d195cbb145dcd58", + "metadata": {}, + "source": [ + "A group of `Variables` representing the domain of an experiment is a `autora.variable.VariableCollection`. \n", + "\n", + "They can be initialized as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "8f1dce3b50b7984c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "VariableCollection(independent_variables=[Variable(name='x1', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False), Variable(name='x2', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.variable import VariableCollection\n", + "\n", + "VariableCollection(\n", + " independent_variables=[x1, x2],\n", + " dependent_variables=[y]\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "80e85b4c6997a5fe", + "metadata": {}, + "source": [ + "For the full list of arguments, see the documentation in the `autora.variable` submodule.\n", + "\n", + "Some functions included in AutoRA use specific values stored on the Variable objects. For instance, the \n", + "`autora.experimentalist.grid.pool` uses the `allowed_values` field to create a grid of conditions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6eb32ff49345119e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2
0-111
1-112
2-113
3-211
4-212
5-213
6-311
7-312
8-313
\n", + "
" + ], + "text/plain": [ + " x1 x2\n", + "0 -1 11\n", + "1 -1 12\n", + "2 -1 13\n", + "3 -2 11\n", + "4 -2 12\n", + "5 -2 13\n", + "6 -3 11\n", + "7 -3 12\n", + "8 -3 13" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.experimentalist.grid import grid_pool\n", + "\n", + "grid_pool(\n", + " VariableCollection(independent_variables=[\n", + " Variable(name=\"x1\", allowed_values=[-1, -2, -3]),\n", + " Variable(name=\"x2\", allowed_values=[11, 12, 13])\n", + " ])\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "6f3f12554ba12ad", + "metadata": {}, + "source": [ + "The `autora.experimentalist.random.pool` uses the `value_range` field to sample conditions:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f890f05dd5c601ab", + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2
00.456338101.527294
11.008636101.297280
20.319617101.962166
3-1.753273101.859696
4-1.933420101.201565
\n", + "
" + ], + "text/plain": [ + " x1 x2\n", + "0 0.456338 101.527294\n", + "1 1.008636 101.297280\n", + "2 0.319617 101.962166\n", + "3 -1.753273 101.859696\n", + "4 -1.933420 101.201565" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.experimentalist.random import random_pool\n", + "\n", + "random_pool(\n", + " VariableCollection(independent_variables=[\n", + " Variable(name=\"x1\", value_range=(-3, 3)),\n", + " Variable(name=\"x2\", value_range=(101, 102))\n", + " ]), \n", + " random_state=180\n", + ")" + ] + }, + { + "cell_type": "markdown", + "id": "f4ab2b25903f40a7", + "metadata": {}, + "source": [ + "The `autora.state.estimator_from_state` function uses the `names` of the variables to pass the correct columns to a \n", + "scikit-learn compatible estimator for curve fitting." + ] + }, + { + "cell_type": "markdown", + "id": "3f4d28f5979fe9cb", + "metadata": {}, + "source": [ + "Check the documentation for any functions you are using to determine whether you need to include specific metadata." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From d4be0824c4af34b90530bce901f3fdef90f30269 Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Wed, 20 Dec 2023 15:45:31 -0500 Subject: [PATCH 06/11] docs: add new pages to mkdocs config --- mkdocs.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mkdocs.yml b/mkdocs.yml index 5aee1786..71011cd2 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -12,6 +12,8 @@ theme: - content.code.copy nav: - Home: 'index.md' +- State: 'The State Mechanism.ipynb' +- Variable: 'Variable.ipynb' - Experimentalist Pipeline: 'pipeline/Experimentalist Pipeline Examples.ipynb' - Experimentalists: - Pooler: From 8085f9a7b46378aec6c6b2a8908a524256481dbf Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Fri, 19 Jan 2024 11:31:17 -0500 Subject: [PATCH 07/11] Update docs/index.md Co-authored-by: Ben Andrew --- docs/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/index.md b/docs/index.md index 8fd8fb21..3602a90e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -7,7 +7,7 @@ AutoRA includes core functionality for running AutoRA experiments organized into - `autora.serializer`, utilities for saving and loading `States` - `autora.workflow`, command line tools for running experimentalists, experiment runners and theorists - `autora.variable`, for representing experimental metadata describing the type and domain of variables -- `autora.utils`, utilities and helper functions not specifically linked to any specific core functionality +- `autora.utils`, utilities and helper functions not linked to any specific core functionality It also provides some basic experimentalists in the `autora.experimentalist` submodule. However, most genuinely useful experimentalists and theorists are provided as optional dependencies to the `autora` package. From dd84c6e94a4edb41488c557a50d9f38b24e5d097 Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Fri, 19 Jan 2024 11:31:29 -0500 Subject: [PATCH 08/11] Update docs/The State Mechanism.ipynb Co-authored-by: Ben Andrew --- docs/The State Mechanism.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb index 605fd9ed..f5e379bd 100644 --- a/docs/The State Mechanism.ipynb +++ b/docs/The State Mechanism.ipynb @@ -25,7 +25,7 @@ "source": [ "## Core Principle: every procedure accepts a `State` and returns a `State`\n", "\n", - "The AutoRA State mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n", + "The AutoRA `State` mechanism is an implementation of the functional programming paradigm. It distinguishes between:\n", "- Data – stored as an immutable `State`\n", "- Procedures – functions which act on `State` objects to add new data and return a new `State`.\n", "\n", From 861528857fa827c6925f532301ad29f566a34144 Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Fri, 19 Jan 2024 11:31:49 -0500 Subject: [PATCH 09/11] Update docs/Variable.ipynb Co-authored-by: Ben Andrew --- docs/Variable.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/Variable.ipynb b/docs/Variable.ipynb index 15e0bf18..c3ff1058 100644 --- a/docs/Variable.ipynb +++ b/docs/Variable.ipynb @@ -292,7 +292,7 @@ "metadata": {}, "source": [ "The `autora.state.estimator_from_state` function uses the `names` of the variables to pass the correct columns to a \n", - "scikit-learn compatible estimator for curve fitting." + "`scikit-learn` compatible estimator for curve fitting." ] }, { From 9ea437ca37ba7153a8aa183083d7bb2494fd9be9 Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Fri, 19 Jan 2024 11:32:01 -0500 Subject: [PATCH 10/11] Update docs/The State Mechanism.ipynb Co-authored-by: Ben Andrew --- docs/The State Mechanism.ipynb | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb index f5e379bd..e10c1ccf 100644 --- a/docs/The State Mechanism.ipynb +++ b/docs/The State Mechanism.ipynb @@ -40,7 +40,7 @@ "The data produced by each procedure $f$ can be seen as additions to the existing data. Each procedure $f$:\n", "- Takes in existing Data in a `State` $S$\n", "- Adds new data $\\Delta S$\n", - "- Returns an updated state of the Data $S^\\prime$ \n", + "- Returns an updated `State` $S^\\prime$ \n", "\n", "$$\n", "\\begin{aligned}\n", From 7654d44bc4638a20688506b32eea9bcb909de8bf Mon Sep 17 00:00:00 2001 From: John Gerrard Holland Date: Fri, 19 Jan 2024 11:34:47 -0500 Subject: [PATCH 11/11] docs: fix reference names --- docs/The State Mechanism.ipynb | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/The State Mechanism.ipynb b/docs/The State Mechanism.ipynb index e10c1ccf..d3393cc6 100644 --- a/docs/The State Mechanism.ipynb +++ b/docs/The State Mechanism.ipynb @@ -182,9 +182,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "When carrying out this \"addition\", `s_0`: \n", - "- inspects the `Delta` it has been passed and finds any field names matching fields on `s_0`, in this case \n", - "`experiment_data`.\n", + "When carrying out this \"addition\", `s`: \n", + "- inspects the `Delta` it has been passed and finds any field names matching fields on `s`, in this case \n", + "`data`.\n", "- For each matching field it combines the data in a way determined by the field's metadata. The key options are:\n", " - \"replace\" means that the data in the `Delta` object completely replace the data in the `State`,\n", " - \"extend\" means that the data in the `Delta` object are combined – for pandas DataFrames this means that the new\n",