diff --git a/.github/workflows/test-pytest.yml b/.github/workflows/test-pytest.yml index 3ce245e0..4c7aeab7 100644 --- a/.github/workflows/test-pytest.yml +++ b/.github/workflows/test-pytest.yml @@ -26,4 +26,4 @@ jobs: python-version: ${{ matrix.python-version }} cache: "pip" - run: pip install ".[test]" - - run: pytest --doctest-modules + - run: pytest --doctest-modules --import-mode importlib diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index e3649715..600c860e 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -1,6 +1,6 @@ repos: - repo: https://github.com/ambv/black - rev: 22.12.0 + rev: 23.7.0 hooks: - id: black - repo: https://github.com/pycqa/isort @@ -11,7 +11,7 @@ repos: - "--filter-files" - "--project=autora" - repo: https://github.com/pycqa/flake8 - rev: 6.0.0 + rev: 6.1.0 hooks: - id: flake8 args: @@ -19,7 +19,7 @@ repos: - "--extend-ignore=E203" - "--per-file-ignores=__init__.py:F401" - repo: https://github.com/pre-commit/mirrors-mypy - rev: "v0.991" + rev: "v1.5.1" hooks: - id: mypy additional_dependencies: [types-requests,scipy,pytest] diff --git a/docs/cycle/Basic Introduction to Functions and States.ipynb b/docs/cycle/Basic Introduction to Functions and States.ipynb new file mode 100644 index 00000000..a41bf38a --- /dev/null +++ b/docs/cycle/Basic Introduction to Functions and States.ipynb @@ -0,0 +1,747 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Basic Introduction to Functions and States" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using the functions and objects in `autora.state`, we can build flexible pipelines and cycles which operate on state\n", + "objects.\n", + "\n", + "## Theoretical Overview\n", + "\n", + "The fundamental idea is this:\n", + "- We define a \"state\" object $S$ which can be modified with a \"delta\" (a new result) $\\Delta S$.\n", + "- A new state at some point $i+1$ is $$S_{i+1} = S_i + \\Delta S_{i+1}$$\n", + "- The cycle state after $n$ steps is thus $$S_n = S_{0} + \\sum^{n}_{i=1} \\Delta S_{i}$$\n", + "\n", + "To represent $S$ and $\\Delta S$ in code, you can use `autora.state.State` and `autora.state.Delta`\n", + "respectively. To operate on these, we define functions.\n", + "\n", + "- Each operation in an AER cycle (theorist, experimentalist, experiment_runner, etc.) is implemented as a\n", + "function with $n$ arguments $s_j$ which are members of $S$ and $m$ others $a_k$ which are not.\n", + " $$ f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta S_{i+1}$$\n", + "- There is a wrapper function $w$ (`autora.state.wrap_to_use_state`) which changes the signature of $f$ to\n", + "require $S$ and aggregates the resulting $\\Delta S_{i+1}$\n", + " $$w\\left[f(s_0, ..., s_n, a_0, ..., a_m) \\rightarrow \\Delta\n", + "S_{i+1}\\right] \\rightarrow \\left[ f^\\prime(S_i, a_0, ..., a_m) \\rightarrow S_{i} + \\Delta\n", + "S_{i+1} = S_{i+1}\\right]$$\n", + "\n", + "- Assuming that the other arguments $a_k$ are provided by partial evaluation of the $f^\\prime$, the full AER cycle can\n", + "then be represented as:\n", + " $$S_n = f_n^\\prime(...f_2^\\prime(f_1^\\prime(S_0)))$$\n", + "\n", + "There are additional helper functions to wrap common experimentalists, experiment runners and theorists so that we\n", + "can define a full AER cycle using python notation as shown in the following example." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Example\n", + "\n", + "First initialize the State. In this case, we use the pre-defined `StandardState` which implements the standard AER\n", + "naming convention.\n", + "There are two variables `x` with a range [-10, 10] and `y` with an unspecified range." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from autora.state import StandardState\n", + "from autora.variable import VariableCollection, Variable\n", + "\n", + "s_0 = StandardState(\n", + " variables=VariableCollection(\n", + " independent_variables=[Variable(\"x\", value_range=(-10, 10))],\n", + " dependent_variables=[Variable(\"y\")]\n", + " )\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Specify the experimentalist. Use a standard function `random_pool`.\n", + "This gets 5 independent random samples (by default, configurable using an argument)\n", + "from the value_range of the independent variables, and returns them in a DataFrame.\n", + "To make this work as a function on the State objects, we wrap it in the `on_state` function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453, experiment_data=None, models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.experimentalist.random_ import random_pool\n", + "from autora.state import on_state\n", + "\n", + "experimentalist = on_state(function=random_pool, output=[\"conditions\"])\n", + "s_1 = experimentalist(s_0, random_state=42)\n", + "s_1" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Specify the experiment runner. This calculates a linear function, adds noise, assigns the value to the `y` column\n", + " in a new DataFrame." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453, experiment_data= x y\n", + "0 5.479121 24.160713\n", + "1 -1.222431 -2.211546\n", + "2 7.171958 30.102304\n", + "3 3.947361 16.880769\n", + "4 -8.116453 -32.457650, models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.state import on_state\n", + "import numpy as np\n", + "import pandas as pd\n", + "\n", + "\n", + "@on_state(output=[\"experiment_data\"])\n", + "def experiment_runner(conditions: pd.DataFrame, c=[2, 4], random_state = None):\n", + " rng = np.random.default_rng(random_state)\n", + " x = conditions[\"x\"]\n", + " noise = rng.normal(0, 1, len(x))\n", + " y = c[0] + (c[1] * x) + noise\n", + " observations = conditions.assign(y = y)\n", + " return observations\n", + "\n", + "# Which does the following:\n", + "experiment_runner(s_1, random_state=43)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A completely analogous definition, using the separate `@inputs_from_state` and `@outputs_to_delta(...)` decorators\n", + "rather than the combined `@on_state(...)` decorator would be:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453, experiment_data= x y\n", + "0 5.479121 24.221201\n", + "1 -1.222431 -3.929709\n", + "2 7.171958 31.438285\n", + "3 3.947361 18.730007\n", + "4 -8.116453 -32.416847, models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.state import inputs_from_state, outputs_to_delta\n", + "\n", + "\n", + "@inputs_from_state\n", + "@outputs_to_delta(\"experiment_data\")\n", + "def experiment_runner_alt_1(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n", + " x = conditions[\"x\"]\n", + " rng = np.random.default_rng(random_state)\n", + " noise = rng.normal(0, 1, len(x))\n", + " y = c[0] + (c[1] * x) + noise\n", + " xy = conditions.assign(y = y)\n", + " return xy\n", + "\n", + "# Which does the following:\n", + "experiment_runner_alt_1(s_1, random_state=42)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or alternatively:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 5.479121\n", + "1 -1.222431\n", + "2 7.171958\n", + "3 3.947361\n", + "4 -8.116453, experiment_data= x y\n", + "0 5.479121 24.372288\n", + "1 -1.222431 -1.583178\n", + "2 7.171958 30.032529\n", + "3 3.947361 16.745934\n", + "4 -8.116453 -31.388814, models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def experiment_runner_alt_2_core(conditions: pd.DataFrame, c=[2, 4], random_state=None):\n", + " x = conditions[\"x\"]\n", + " rng = np.random.default_rng(random_state)\n", + " noise = rng.normal(0, 1, len(x))\n", + " y = c[0] + (c[1] * x) + noise\n", + " xy = conditions.assign(y = y)\n", + " return xy\n", + "\n", + "experiment_runner_alt_2 = on_state(experiment_runner_alt_2_core, output=[\"experiment_data\"])\n", + "experiment_runner_alt_2(s_1)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Specify a theorist, using a standard LinearRegression from scikit-learn." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LinearRegression\n", + "from autora.state import estimator_on_state\n", + "\n", + "theorist = estimator_on_state(LinearRegression(fit_intercept=True))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can run the theorist on the output from the experiment_runner,\n", + "which itself uses the output from the experimentalist." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-10, 10), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 6.159515\n", + "1 -7.713961\n", + "2 -0.655764\n", + "3 9.297426\n", + "4 2.601009, experiment_data= x y\n", + "0 6.159515 27.502964\n", + "1 -7.713961 -30.950686\n", + "2 -0.655764 -1.488309\n", + "3 9.297426 38.992089\n", + "4 2.601009 13.351848, models=[LinearRegression()])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "theorist(experiment_runner(experimentalist(s_0)))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we like, we can run the experimentalist, experiment_runner and theorist ten times." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s_ = s_0\n", + "for i in range(10):\n", + " s_ = experimentalist(s_, random_state=180+i)\n", + " s_ = experiment_runner(s_, random_state=2*180+i)\n", + " s_ = theorist(s_)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The experiment_data has 50 entries (10 cycles and 5 samples per cycle):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
xy
01.5211278.997542
13.36212015.339784
21.0653915.938495
3-5.844244-21.453802
4-6.444732-24.975886
55.72458524.929289
61.7818059.555725
7-1.015081-2.632280
82.04408312.001204
97.70932430.806166
10-6.680454-24.846327
11-3.630735-11.346701
12-0.4983221.794183
13-4.043702-15.594289
145.77286525.094876
159.02893137.677228
168.05263734.472556
173.77411516.791553
18-8.405662-31.734315
195.43350622.975112
20-9.644367-36.919598
211.6731317.548614
227.60031632.294054
234.35466620.998850
246.04727326.670616
25-5.608438-20.570161
260.7338905.029705
27-2.781912-9.190651
28-2.308464-6.179939
29-3.547105-12.875100
300.9450896.013183
312.69489714.141356
327.44589331.312279
334.42310519.647015
342.20096111.587911
35-4.915881-17.061782
36-2.997968-10.397403
370.0994544.949820
38-3.924786-13.532503
397.05095031.085545
40-8.077780-31.084307
414.39148117.991533
426.74916230.242121
432.24680410.411612
444.47798919.571584
45-0.2627341.181040
46-7.187250-26.718313
47-0.7909850.058681
486.54533427.510641
49-7.185274-26.510872
\n", + "
" + ], + "text/plain": [ + " x y\n", + "0 1.521127 8.997542\n", + "1 3.362120 15.339784\n", + "2 1.065391 5.938495\n", + "3 -5.844244 -21.453802\n", + "4 -6.444732 -24.975886\n", + "5 5.724585 24.929289\n", + "6 1.781805 9.555725\n", + "7 -1.015081 -2.632280\n", + "8 2.044083 12.001204\n", + "9 7.709324 30.806166\n", + "10 -6.680454 -24.846327\n", + "11 -3.630735 -11.346701\n", + "12 -0.498322 1.794183\n", + "13 -4.043702 -15.594289\n", + "14 5.772865 25.094876\n", + "15 9.028931 37.677228\n", + "16 8.052637 34.472556\n", + "17 3.774115 16.791553\n", + "18 -8.405662 -31.734315\n", + "19 5.433506 22.975112\n", + "20 -9.644367 -36.919598\n", + "21 1.673131 7.548614\n", + "22 7.600316 32.294054\n", + "23 4.354666 20.998850\n", + "24 6.047273 26.670616\n", + "25 -5.608438 -20.570161\n", + "26 0.733890 5.029705\n", + "27 -2.781912 -9.190651\n", + "28 -2.308464 -6.179939\n", + "29 -3.547105 -12.875100\n", + "30 0.945089 6.013183\n", + "31 2.694897 14.141356\n", + "32 7.445893 31.312279\n", + "33 4.423105 19.647015\n", + "34 2.200961 11.587911\n", + "35 -4.915881 -17.061782\n", + "36 -2.997968 -10.397403\n", + "37 0.099454 4.949820\n", + "38 -3.924786 -13.532503\n", + "39 7.050950 31.085545\n", + "40 -8.077780 -31.084307\n", + "41 4.391481 17.991533\n", + "42 6.749162 30.242121\n", + "43 2.246804 10.411612\n", + "44 4.477989 19.571584\n", + "45 -0.262734 1.181040\n", + "46 -7.187250 -26.718313\n", + "47 -0.790985 0.058681\n", + "48 6.545334 27.510641\n", + "49 -7.185274 -26.510872" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s_.experiment_data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The fitted coefficients are close to the original intercept = 2, gradient = 4" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "[2.08476524] [[4.00471062]]\n" + ] + } + ], + "source": [ + "print(s_.model.intercept_, s_.model.coef_)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/docs/cycle/Combining Experimentalists with State.ipynb b/docs/cycle/Combining Experimentalists with State.ipynb new file mode 100644 index 00000000..a7d6680a --- /dev/null +++ b/docs/cycle/Combining Experimentalists with State.ipynb @@ -0,0 +1,2643 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Building Mixture Experimentalists\n", + "\n", + "## Introduction\n", + "\n", + "One thing the State/Delta mechanism should support is making more complex experimentalists which combine others.\n", + "One example which have been suggested by the AER group are a \"mixture experimentalist\" which weights the outputs of\n", + "other experimentalists.\n", + "\n", + "How experimentalists are typically defined has a major impact on whether this kind of mixture experimentalist is easy\n", + " or hard to implement. Since the research group is currently (August 2023) deciding how experimentalists should\n", + " generally be defined, now seems a good time to look at the different basic options for standards & conventions.\n", + "\n", + "To help the discussion, here we've put together some examples based on some toy experimentalists.\n", + "\n", + "### Outline of the Open Question\n", + "The question has to do with whether \"additional data\" beyond the conditions are included in the same or a different\n", + "data array.\n", + " (\"Additional data\" are data which are generated by the experimentalist and potentially needed by another\n", + " experimentalist down the line, but are not the conditions themselves).\n", + "\n", + "The two competing conventions are if an experimentalist returns some extra data:\n", + "- They are included in the `conditions` array as additional columns, _or_\n", + "- They are passed as a _different_ array alongside the `conditions`.\n", + "\n", + "### Notebook Outline\n", + "\n", + "The examples are organized as follows:\n", + "\n", + "- A combination experimentalist which aggregates additional measures from the component experimentalists.\n", + " - Where the measure is passed back in the conditions array, or\n", + " - Where the measure is passed back in a separate array\n", + "- A combination experimentalist where the components need the full State as they have complex arguments\n", + "\n", + "\n", + "### Toy Experimentalists\n", + "\n", + "We're combining experimentalists which samples conditions based on whether they are downvoted (or not)\n", + "according to some criteria:\n", + "- The \"Avoid Negative\" experimentalist, which downvotes conditions which have negative values (with one downvote per\n", + "negative value in the conditions $x_i$: if both $x_1$ and $x_2$ are negative, the condition gets 2 downvotes, and so\n", + "on) and returns all the conditions in the \"preferred\" order (fewest downvotes first),\n", + "- The \"Avoid Even\" experimentalist, which downvotes conditions which are closer to even numbers more (with one downvote\n", + "per even value in the conditions and half a downvote if a condition is $1/2$ away from an even number) and returns all the conditions in the \"preferred\" order,\n", + "- The \"Avoid Repeat\" experimentalist, which downvotes conditions which have already been seen based on the number of\n", + "times a condition has been seen and returns all the conditions in the \"preferred\" order,\n", + "- The \"Combine Downvotes\" experimentalist, which sums the downvotes of the others and returns the top $n$ \"preferred\"\n", + "conditions\n", + "(with the fewest downvotes); in the case of a tie, it returns conditions the order of the original conditions list.\n", + "\n", + "\n", + "We also need to see what happens when we:\n", + "- Try to extend a dataframe with an extra data frame which has new columns." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Combination Experimentalist which Aggregates Measures" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Returns an extended conditions array" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "from typing import List, Optional\n", + "\n", + "import numpy as np\n", + "import pandas as pd\n", + "from matplotlib import pyplot as plt\n", + "\n", + "from autora.variable import VariableCollection, Variable" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2
0-3.0-1.0
1-2.00.0
2-1.01.0
30.02.0
41.03.0
52.04.0
63.05.0
\n", + "
" + ], + "text/plain": [ + " x1 x2\n", + "0 -3.0 -1.0\n", + "1 -2.0 0.0\n", + "2 -1.0 1.0\n", + "3 0.0 2.0\n", + "4 1.0 3.0\n", + "5 2.0 4.0\n", + "6 3.0 5.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "conditions_ = pd.DataFrame({\"x1\": np.linspace(-3, 3, 7), \"x2\": np.linspace(-1, 5, 7)})\n", + "conditions_" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2downvotes
30.02.00
41.03.00
52.04.00
63.05.00
1-2.00.01
2-1.01.01
0-3.0-1.02
\n", + "
" + ], + "text/plain": [ + " x1 x2 downvotes\n", + "3 0.0 2.0 0\n", + "4 1.0 3.0 0\n", + "5 2.0 4.0 0\n", + "6 3.0 5.0 0\n", + "1 -2.0 0.0 1\n", + "2 -1.0 1.0 1\n", + "0 -3.0 -1.0 2" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def avoid_negative(conditions: pd.DataFrame):\n", + " downvotes = (conditions_ < 0).sum(axis=1)\n", + " with_votes = pd.DataFrame.assign(conditions, downvotes=downvotes)\n", + " with_votes_sorted = with_votes.sort_values(by=\"downvotes\", ascending=True)\n", + " return with_votes_sorted\n", + "\n", + "avoid_negative(conditions_)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Text(0.5, 1.0, 'Avoid-even function')" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "def avoid_even_function(x):\n", + " y = 1 - np.minimum(np.mod(x, 2), np.mod(-x, 2))\n", + " return y\n", + "\n", + "x = np.linspace(-1, 4, 101)\n", + "plt.plot(x, avoid_even_function(x))\n", + "plt.title(\"Avoid-even function\")\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2downvotes
0-3.0-1.00.0
2-1.01.00.0
41.03.00.0
63.05.00.0
1-2.00.02.0
30.02.02.0
52.04.02.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 downvotes\n", + "0 -3.0 -1.0 0.0\n", + "2 -1.0 1.0 0.0\n", + "4 1.0 3.0 0.0\n", + "6 3.0 5.0 0.0\n", + "1 -2.0 0.0 2.0\n", + "3 0.0 2.0 2.0\n", + "5 2.0 4.0 2.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def avoid_even(conditions: pd.DataFrame):\n", + " downvotes = avoid_even_function(conditions_).sum(axis=1)\n", + " with_votes = pd.DataFrame.assign(conditions, downvotes=downvotes)\n", + " with_votes_sorted = with_votes.sort_values(by=\"downvotes\", ascending=True)\n", + " return with_votes_sorted\n", + "\n", + "avoid_even(conditions_)\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x20.downvotes1.downvotesdownvotes
0-3.0-1.0101
1-2.00.0112
2-1.01.0123
30.02.0134
41.03.0145
52.04.0156
63.05.0167
\n", + "
" + ], + "text/plain": [ + " x1 x2 0.downvotes 1.downvotes downvotes\n", + "0 -3.0 -1.0 1 0 1\n", + "1 -2.0 0.0 1 1 2\n", + "2 -1.0 1.0 1 2 3\n", + "3 0.0 2.0 1 3 4\n", + "4 1.0 3.0 1 4 5\n", + "5 2.0 4.0 1 5 6\n", + "6 3.0 5.0 1 6 7" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def combine_downvotes(conditions, *arrays: pd.DataFrame):\n", + " result = conditions.copy()\n", + " for i, a in enumerate(arrays):\n", + " a_name = a.attrs.get(\"name\", i)\n", + " result[f\"{a_name}.downvotes\"] = a.downvotes\n", + " result[\"downvotes\"] = result.loc[:,result.columns.str.contains('.*\\.downvotes')].sum(axis=1)\n", + " return result\n", + "\n", + "combine_downvotes(\n", + " conditions_,\n", + " conditions_.assign(downvotes=1),\n", + " conditions_.assign(downvotes=[0, 1, 2, 3, 4, 5, 6]).sample(frac=1)\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2downvotes
0-3.0-1.00.0
1-2.00.00.0
2-1.01.00.0
30.02.00.0
41.03.00.0
52.04.00.0
63.05.00.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 downvotes\n", + "0 -3.0 -1.0 0.0\n", + "1 -2.0 0.0 0.0\n", + "2 -1.0 1.0 0.0\n", + "3 0.0 2.0 0.0\n", + "4 1.0 3.0 0.0\n", + "5 2.0 4.0 0.0\n", + "6 3.0 5.0 0.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def downvote_order(conditions: pd.DataFrame, experimentalists: List):\n", + " downvoted_conditions = []\n", + " for e in experimentalists:\n", + " new_downvoted_conditions = e(conditions)\n", + " new_downvoted_conditions.attrs[\"name\"] = e.__name__\n", + " downvoted_conditions.append(new_downvoted_conditions)\n", + " result = combine_downvotes(conditions, *downvoted_conditions)\n", + " result = result.sort_values(by=\"downvotes\", ascending=True)\n", + " return result\n", + "\n", + "downvote_order(conditions_, experimentalists=[])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2avoid_negative.downvotesdownvotes
30.02.000
41.03.000
52.04.000
63.05.000
1-2.00.011
2-1.01.011
0-3.0-1.022
\n", + "
" + ], + "text/plain": [ + " x1 x2 avoid_negative.downvotes downvotes\n", + "3 0.0 2.0 0 0\n", + "4 1.0 3.0 0 0\n", + "5 2.0 4.0 0 0\n", + "6 3.0 5.0 0 0\n", + "1 -2.0 0.0 1 1\n", + "2 -1.0 1.0 1 1\n", + "0 -3.0 -1.0 2 2" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "downvote_order(conditions_, experimentalists=[avoid_negative])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2avoid_negative.downvotesavoid_even.downvotesdownvotes
41.03.000.00.0
63.05.000.00.0
2-1.01.010.01.0
0-3.0-1.020.02.0
30.02.002.02.0
52.04.002.02.0
1-2.00.012.03.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 avoid_negative.downvotes avoid_even.downvotes downvotes\n", + "4 1.0 3.0 0 0.0 0.0\n", + "6 3.0 5.0 0 0.0 0.0\n", + "2 -1.0 1.0 1 0.0 1.0\n", + "0 -3.0 -1.0 2 0.0 2.0\n", + "3 0.0 2.0 0 2.0 2.0\n", + "5 2.0 4.0 0 2.0 2.0\n", + "1 -2.0 0.0 1 2.0 3.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "downvote_order(conditions_, experimentalists=[avoid_negative, avoid_even])\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Adding this dataframe to a State object:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2avoid_negative.downvotesavoid_even.downvotesdownvotes
41.03.000.00.0
63.05.000.00.0
2-1.01.010.01.0
0-3.0-1.020.02.0
30.02.002.02.0
52.04.002.02.0
1-2.00.012.03.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 avoid_negative.downvotes avoid_even.downvotes downvotes\n", + "4 1.0 3.0 0 0.0 0.0\n", + "6 3.0 5.0 0 0.0 0.0\n", + "2 -1.0 1.0 1 0.0 1.0\n", + "0 -3.0 -1.0 2 0.0 2.0\n", + "3 0.0 2.0 0 2.0 2.0\n", + "5 2.0 4.0 0 2.0 2.0\n", + "1 -2.0 0.0 1 2.0 3.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.state import Delta, on_state, State, StandardState, inputs_from_state\n", + "\n", + "s = StandardState() + Delta(conditions=downvote_order(conditions_, experimentalists=[avoid_negative, avoid_even]))\n", + "s.conditions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Return a separate array of additional measures\n", + "\n", + "To ensure we don't mix up the order of return values and to facilitate updating the returned values in future without\n", + " breaking dependents functions when returning multiple objects, we return a structured object –\n", + "in this case a simple dictionary of results. (We could just as well use a `UserDict` or a `Delta` object for this\n", + "purpose – they have the same interface.)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2
30.02.0
41.03.0
52.04.0
63.05.0
1-2.00.0
2-1.01.0
0-3.0-1.0
\n", + "
" + ], + "text/plain": [ + " x1 x2\n", + "3 0.0 2.0\n", + "4 1.0 3.0\n", + "5 2.0 4.0\n", + "6 3.0 5.0\n", + "1 -2.0 0.0\n", + "2 -1.0 1.0\n", + "0 -3.0 -1.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def avoid_negative_separate(conditions: pd.DataFrame):\n", + " downvotes = (conditions_ < 0).sum(axis=1).sort_values(ascending=True)\n", + " conditions_sorted = pd.DataFrame(conditions, index=downvotes.index)\n", + " return {\"conditions\": conditions_sorted, \"downvotes\": downvotes}\n", + "\n", + "avoid_negative_separate(conditions_)[\"conditions\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "( x1 x2\n", + " 0 -3.0 -1.0\n", + " 2 -1.0 1.0\n", + " 4 1.0 3.0\n", + " 6 3.0 5.0\n", + " 1 -2.0 0.0\n", + " 3 0.0 2.0\n", + " 5 2.0 4.0,\n", + " 0 0.0\n", + " 2 0.0\n", + " 4 0.0\n", + " 6 0.0\n", + " 1 2.0\n", + " 3 2.0\n", + " 5 2.0\n", + " dtype: float64)" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def avoid_even_separate(conditions: pd.DataFrame):\n", + " downvotes = avoid_even_function(conditions_).sum(axis=1).sort_values(ascending=True)\n", + " conditions_sorted = pd.DataFrame(conditions, index=downvotes.index)\n", + " return {\"conditions\": conditions_sorted, \"downvotes\": downvotes}\n", + "\n", + "avoid_even_separate(conditions_)[\"conditions\"], avoid_even_separate(conditions_)[\"downvotes\"]" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'conditions': x1 x2\n", + " 0 -3.0 -1.0\n", + " 1 -2.0 0.0\n", + " 2 -1.0 1.0\n", + " 3 0.0 2.0\n", + " 4 1.0 3.0\n", + " 5 2.0 4.0\n", + " 6 3.0 5.0,\n", + " 'downvotes': initial total\n", + " 0 0 0\n", + " 1 0 0\n", + " 2 0 0\n", + " 3 0 0\n", + " 4 0 0\n", + " 5 0 0\n", + " 6 0 0}" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def downvote_order_separate(conditions: pd.DataFrame, experimentalists: List):\n", + " downvote_arrays = {\"initial\": pd.Series(0, index=conditions.index)}\n", + " for e in experimentalists:\n", + " downvote_arrays[e.__name__] = e(conditions)[\"downvotes\"]\n", + " combined_downvotes = pd.DataFrame(downvote_arrays)\n", + " combined_downvotes[\"total\"] = combined_downvotes.sum(axis=1)\n", + " combined_downvotes_sorted = combined_downvotes.sort_values(by=\"total\", ascending=True)\n", + " conditions_sorted = pd.DataFrame(conditions, index=combined_downvotes_sorted.index)\n", + " return {\n", + " \"conditions\": conditions_sorted,\n", + " \"downvotes\": combined_downvotes_sorted,\n", + " }\n", + "\n", + "downvote_order_separate(conditions_, experimentalists=[])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2initialavoid_even_separateavoid_negative_separatetotal
0-3.0-1.000.022.0
1-2.00.002.013.0
2-1.01.000.011.0
30.02.002.002.0
41.03.000.000.0
52.04.002.002.0
63.05.000.000.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 initial avoid_even_separate avoid_negative_separate total\n", + "0 -3.0 -1.0 0 0.0 2 2.0\n", + "1 -2.0 0.0 0 2.0 1 3.0\n", + "2 -1.0 1.0 0 0.0 1 1.0\n", + "3 0.0 2.0 0 2.0 0 2.0\n", + "4 1.0 3.0 0 0.0 0 0.0\n", + "5 2.0 4.0 0 2.0 0 2.0\n", + "6 3.0 5.0 0 0.0 0 0.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "results = downvote_order_separate(conditions_, experimentalists=[avoid_even_separate, avoid_negative_separate])\n", + "\n", + "pd.DataFrame.join(results[\"conditions\"], results[\"downvotes\"]).sort_index()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Combination Experimentalist Needing The Full State\n", + "In this case, we have at least one component-experimentalist which needs the full state.\n", + "\n", + "### Experimentalists Return Combined Results and Measures" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2downvotes
0-3.0-1.02.0
1-2.00.00.0
2-1.01.00.0
30.02.00.0
41.03.00.0
52.04.00.0
63.05.01.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 downvotes\n", + "0 -3.0 -1.0 2.0\n", + "1 -2.0 0.0 0.0\n", + "2 -1.0 1.0 0.0\n", + "3 0.0 2.0 0.0\n", + "4 1.0 3.0 0.0\n", + "5 2.0 4.0 0.0\n", + "6 3.0 5.0 1.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def avoid_repeat(conditions, experiment_data: pd.DataFrame, variables: VariableCollection):\n", + " iv_column_names = [v.name for v in variables.independent_variables]\n", + " count_already_seen = pd.Series(experiment_data.groupby(iv_column_names).size(), name=\"downvotes\")\n", + " conditions = pd.DataFrame.join(conditions, count_already_seen, on=iv_column_names).fillna(0)\n", + " return {\"conditions\": conditions, \"already_seen\": count_already_seen}\n", + "\n", + "experiment_data_ = pd.DataFrame(dict(x1=[-3, 3, -3], x2=[-1, 5, -1]))\n", + "variables_ = VariableCollection(independent_variables=[Variable(\"x1\"), Variable(\"x2\")])\n", + "\n", + "avoid_repeat(\n", + " conditions=conditions_,\n", + " experiment_data=experiment_data_,\n", + " variables=variables_\n", + ")[\"conditions\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We wrap the `avoid_repeat` function with the usual `on_state` wrapper to make it compatible with the state mechanism.\n", + " As it already returns a dictionary, we don't need to specify the output names.\n", + " Then we can the wrapped function on the State object." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/jholla10/Developer/autora-core/src/autora/state/delta.py:273: UserWarning: These fields: ['already_seen'] could not be used to update StandardState, which has these fields & aliases: ['variables', 'conditions', 'experiment_data', 'models', 'model']\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x1', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False), Variable(name='x2', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[], covariates=[]), conditions= x1 x2 downvotes\n", + "0 -3.0 -1.0 2.0\n", + "1 -2.0 0.0 0.0\n", + "2 -1.0 1.0 0.0\n", + "3 0.0 2.0 0.0\n", + "4 1.0 3.0 0.0\n", + "5 2.0 4.0 0.0\n", + "6 3.0 5.0 1.0, experiment_data= x1 x2\n", + "0 -3 -1\n", + "1 3 5\n", + "2 -3 -1, models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "avoid_repeat_state = on_state(avoid_repeat)\n", + "s = StandardState(\n", + " experiment_data=pd.DataFrame(dict(x1=[-3, 3, -3], x2=[-1, 5, -1])),\n", + " variables=VariableCollection(independent_variables=[Variable(\"x1\"), Variable(\"x2\")])\n", + ")\n", + "avoid_repeat_state(s, conditions=conditions_)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The way we handle this is to write a function which operates on the State directly, passing it to\n", + "experimentalists wrapped with `on_state`, then combine their outputs.\n", + "This is done as follows if our conditions are returned with the downvotes in the same dataframe:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/jholla10/Developer/autora-core/src/autora/state/delta.py:273: UserWarning: These fields: ['already_seen'] could not be used to update StandardState, which has these fields & aliases: ['variables', 'conditions', 'experiment_data', 'models', 'model']\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2avoid_repeat.downvotesavoid_negative.downvotesavoid_even.downvotesdownvotes
41.03.00.000.00.0
2-1.01.00.010.01.0
63.05.01.000.01.0
30.02.00.002.02.0
52.04.00.002.02.0
1-2.00.00.012.03.0
0-3.0-1.02.020.04.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 avoid_repeat.downvotes avoid_negative.downvotes \\\n", + "4 1.0 3.0 0.0 0 \n", + "2 -1.0 1.0 0.0 1 \n", + "6 3.0 5.0 1.0 0 \n", + "3 0.0 2.0 0.0 0 \n", + "5 2.0 4.0 0.0 0 \n", + "1 -2.0 0.0 0.0 1 \n", + "0 -3.0 -1.0 2.0 2 \n", + "\n", + " avoid_even.downvotes downvotes \n", + "4 0.0 0.0 \n", + "2 0.0 1.0 \n", + "6 0.0 1.0 \n", + "3 2.0 2.0 \n", + "5 2.0 2.0 \n", + "1 2.0 3.0 \n", + "0 0.0 4.0 " + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@on_state()\n", + "def combine_downvotes_state(\n", + " state: State,\n", + " conditions: pd.DataFrame,\n", + " experimentalists: List,\n", + " num_samples: int\n", + "):\n", + " # iv_column_names = [v.name for v in s.variables.independent_variables]\n", + " downvoted_conditions = []\n", + " for e in experimentalists:\n", + " new_state = e(state, conditions=conditions)\n", + " this_downvoted_conditions = new_state.conditions\n", + " this_downvoted_conditions.attrs[\"name\"] = e.__name__\n", + " downvoted_conditions.append(this_downvoted_conditions)\n", + " combined_downvotes = combine_downvotes(conditions, *downvoted_conditions)\n", + " combined_downvotes_sorted_filtered = combined_downvotes\\\n", + " .sort_values(by=\"downvotes\", ascending=True)\\\n", + " .iloc[:num_samples]\n", + "\n", + " d = Delta(conditions=combined_downvotes_sorted_filtered)\n", + " return d\n", + "\n", + "combine_downvotes_state(\n", + " s,\n", + " conditions=conditions_,\n", + " experimentalists=[\n", + " on_state(avoid_repeat),\n", + " on_state(avoid_negative, output=[\"conditions\"]),\n", + " on_state(avoid_even, output=[\"conditions\"])\n", + " ],\n", + " num_samples=7\n", + ").conditions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Experimentalists Return Separate Conditions and Additional Measures\n", + "\n", + "If we return separate conditions and measures, then we need to split up the\n", + "combined downvoted conditions from the downvotes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'conditions': x1 x2\n", + " 0 -3.0 -1.0\n", + " 1 -2.0 0.0\n", + " 2 -1.0 1.0\n", + " 3 0.0 2.0\n", + " 4 1.0 3.0\n", + " 5 2.0 4.0\n", + " 6 3.0 5.0,\n", + " 'downvotes': 0 2.0\n", + " 1 0.0\n", + " 2 0.0\n", + " 3 0.0\n", + " 4 0.0\n", + " 5 0.0\n", + " 6 1.0\n", + " Name: downvotes, dtype: float64}" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def avoid_repeat_separate(\n", + " conditions: pd.DataFrame,\n", + " experiment_data: pd.DataFrame,\n", + " variables: VariableCollection\n", + "):\n", + " conditions_with_downvotes = avoid_repeat(\n", + " conditions=conditions,\n", + " experiment_data=experiment_data,\n", + " variables=variables\n", + " )[\"conditions\"]\n", + "\n", + " # Now we split up the results\n", + " iv_column_names = [v.name for v in variables.independent_variables]\n", + " conditions = conditions_with_downvotes[iv_column_names]\n", + " downvotes = conditions_with_downvotes[\"downvotes\"]\n", + "\n", + " return {\"conditions\": conditions, \"downvotes\": downvotes}\n", + "\n", + "avoid_repeat_separate(\n", + " conditions=conditions_,\n", + " experiment_data=experiment_data_,\n", + " variables=variables_\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the aggregation function, we have to gather the \"downvotes\" from the individual experimentalists\n", + "(having passed them the full state as well as some seed conditions), then combine them,\n", + "before we can split off the conditions and downvotes for the result object" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "/Users/jholla10/Developer/autora-core/src/autora/state/delta.py:273: UserWarning: These fields: ['downvotes'] could not be used to update StandardState, which has these fields & aliases: ['variables', 'conditions', 'experiment_data', 'models', 'model']\n", + " warnings.warn(\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2
41.03.0
2-1.01.0
63.05.0
30.02.0
52.04.0
1-2.00.0
0-3.0-1.0
\n", + "
" + ], + "text/plain": [ + " x1 x2\n", + "4 1.0 3.0\n", + "2 -1.0 1.0\n", + "6 3.0 5.0\n", + "3 0.0 2.0\n", + "5 2.0 4.0\n", + "1 -2.0 0.0\n", + "0 -3.0 -1.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@on_state()\n", + "def combine_downvotes_separate_state(\n", + " state: State,\n", + " conditions: pd.DataFrame,\n", + " experimentalists: List,\n", + " variables: VariableCollection,\n", + " num_samples: int\n", + "):\n", + " # iv_column_names = [v.name for v in s.variables.independent_variables]\n", + " all_downvotes = []\n", + " for e in experimentalists:\n", + " delta = e(state, conditions=conditions)\n", + " this_downvotes_series = delta[\"downvotes\"]\n", + " this_downvotes_series.attrs[\"name\"] = e.__name__\n", + " all_downvotes.append(this_downvotes_series.to_frame(\"downvotes\"))\n", + " combined_downvotes = combine_downvotes(conditions, *all_downvotes)\n", + "\n", + " combined_downvotes_sorted_filtered = combined_downvotes\\\n", + " .sort_values(by=\"downvotes\", ascending=True)\\\n", + " .iloc[:num_samples]\n", + "\n", + " iv_column_names = [v.name for v in variables.independent_variables]\n", + " result_conditions = combined_downvotes_sorted_filtered[iv_column_names]\n", + " result_downvotes = combined_downvotes_sorted_filtered[\"downvotes\"]\n", + "\n", + " d = Delta(conditions=result_conditions, downvotes=result_downvotes)\n", + " return d\n", + "\n", + "combine_downvotes_separate_state(\n", + " s,\n", + " conditions=conditions_,\n", + " experimentalists=[\n", + " # Here we have to use `inputs_from_state` but return our dictionary.\n", + " # There isn't a `downvotes` field we can update,\n", + " # so if we try to use the state mechanism, we lose the downvotes data\n", + " inputs_from_state(avoid_repeat_separate),\n", + " inputs_from_state(avoid_negative_separate),\n", + " inputs_from_state(avoid_even_separate)\n", + " ],\n", + " num_samples=7\n", + ").conditions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Chained Experimentalists\n", + "We can also define experimentalists which add their vote to the existing vote, if it exists:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def combine_downvotes(a, b, *arrays):\n", + " if isinstance(b, pd.Series):\n", + " new_downvotes = b\n", + " elif isinstance(b, pd.DataFrame):\n", + " new_downvotes = b.downvotes\n", + " if \"downvotes\" in a.columns:\n", + " result = a.assign(downvotes=a.downvotes + new_downvotes)\n", + " else:\n", + " result = a.assign(downvotes=new_downvotes)\n", + " if len(arrays) == 0:\n", + " return result\n", + " else:\n", + " return combine_downvotes(result, arrays[0], *arrays[1:])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If we pass in some conditions with no downvotes (`conditions_`)\n", + "and then combine with a DataFrame with constant downvotes `conditions_.assign(downvotes=1)`\n", + "we get constant total downvotes:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2downvotes
0-3.0-1.01
1-2.00.01
2-1.01.01
30.02.01
41.03.01
52.04.01
63.05.01
\n", + "
" + ], + "text/plain": [ + " x1 x2 downvotes\n", + "0 -3.0 -1.0 1\n", + "1 -2.0 0.0 1\n", + "2 -1.0 1.0 1\n", + "3 0.0 2.0 1\n", + "4 1.0 3.0 1\n", + "5 2.0 4.0 1\n", + "6 3.0 5.0 1" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "combine_downvotes(\n", + " conditions_,\n", + " conditions_.assign(downvotes=1)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can add another set of downvotes, which are summed with the existing ones:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2downvotes
0-3.0-1.01
1-2.00.02
2-1.01.03
30.02.04
41.03.05
52.04.06
63.05.07
\n", + "
" + ], + "text/plain": [ + " x1 x2 downvotes\n", + "0 -3.0 -1.0 1\n", + "1 -2.0 0.0 2\n", + "2 -1.0 1.0 3\n", + "3 0.0 2.0 4\n", + "4 1.0 3.0 5\n", + "5 2.0 4.0 6\n", + "6 3.0 5.0 7" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "combine_downvotes(\n", + " conditions_,\n", + " conditions_.assign(downvotes=1),\n", + " conditions_.assign(downvotes=[0, 1, 2, 3, 4, 5, 6]).sample(frac=1)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using these, we can build functions which are aware of and add to existing downvotes if they exist." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x1', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False), Variable(name='x2', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[], covariates=[]), conditions= x1 x2 downvotes avoid_even.downvotes\n", + "0 -3.0 -1.0 0.0 0.0\n", + "1 -2.0 0.0 2.0 2.0\n", + "2 -1.0 1.0 0.0 0.0\n", + "3 0.0 2.0 2.0 2.0\n", + "4 1.0 3.0 0.0 0.0\n", + "5 2.0 4.0 2.0 2.0\n", + "6 3.0 5.0 0.0 0.0, experiment_data= x1 x2\n", + "0 -3 -1\n", + "1 3 5\n", + "2 -3 -1, models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@on_state()\n", + "def avoid_even_chainable(conditions: pd.DataFrame, variables: VariableCollection):\n", + " iv_names = [v.name for v in variables.independent_variables]\n", + " downvotes = avoid_even_function(conditions_[iv_names]).sum(axis=1)\n", + " result = combine_downvotes(conditions, downvotes)\n", + " result[\"avoid_even.downvotes\"] = downvotes\n", + " return {\"conditions\": result}\n", + "avoid_even_chainable(s, conditions=conditions_)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x1', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False), Variable(name='x2', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[], covariates=[]), conditions= x1 x2 downvotes avoid_negative.downvotes\n", + "0 -3.0 -1.0 2 2\n", + "1 -2.0 0.0 1 1\n", + "2 -1.0 1.0 1 1\n", + "3 0.0 2.0 0 0\n", + "4 1.0 3.0 0 0\n", + "5 2.0 4.0 0 0\n", + "6 3.0 5.0 0 0, experiment_data= x1 x2\n", + "0 -3 -1\n", + "1 3 5\n", + "2 -3 -1, models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@on_state()\n", + "def avoid_negative_chainable(conditions: pd.DataFrame, variables: VariableCollection):\n", + " iv_names = [v.name for v in variables.independent_variables]\n", + " downvotes = (conditions_[iv_names] < 0).sum(axis=1)\n", + " result = combine_downvotes(conditions, downvotes)\n", + " result[\"avoid_negative.downvotes\"] = downvotes\n", + " return {\"conditions\": result}\n", + "avoid_negative_chainable(s, conditions=conditions_)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x1', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False), Variable(name='x2', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[], covariates=[]), conditions= x1 x2 downvotes avoid_repeat.downvotes\n", + "0 -3.0 -1.0 2.0 2.0\n", + "1 -2.0 0.0 0.0 0.0\n", + "2 -1.0 1.0 0.0 0.0\n", + "3 0.0 2.0 0.0 0.0\n", + "4 1.0 3.0 0.0 0.0\n", + "5 2.0 4.0 0.0 0.0\n", + "6 3.0 5.0 1.0 1.0, experiment_data= x1 x2\n", + "0 -3 -1\n", + "1 3 5\n", + "2 -3 -1, models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@on_state()\n", + "def avoid_repeat_chainable(\n", + " conditions: pd.DataFrame,\n", + " experiment_data: pd.DataFrame,\n", + " variables: VariableCollection\n", + "):\n", + " iv_column_names = [v.name for v in variables.independent_variables]\n", + " count_already_seen = pd.Series(experiment_data.groupby(iv_column_names).size(), name=\"downvotes\")\n", + " downvotes = pd.DataFrame.join(conditions, count_already_seen, on=iv_column_names).fillna(0)[\"downvotes\"]\n", + " result = combine_downvotes(conditions, downvotes)\n", + " result[\"avoid_repeat.downvotes\"] = downvotes\n", + " return {\"conditions\": result}\n", + "\n", + "\n", + "avoid_repeat_chainable(\n", + " s, conditions=conditions_\n", + ")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2downvotesavoid_repeat.downvotes
1-2.00.00.00.0
2-1.01.00.00.0
30.02.00.00.0
41.03.00.00.0
52.04.00.00.0
63.05.01.01.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 downvotes avoid_repeat.downvotes\n", + "1 -2.0 0.0 0.0 0.0\n", + "2 -1.0 1.0 0.0 0.0\n", + "3 0.0 2.0 0.0 0.0\n", + "4 1.0 3.0 0.0 0.0\n", + "5 2.0 4.0 0.0 0.0\n", + "6 3.0 5.0 1.0 1.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "@on_state()\n", + "def sample_downvotes(conditions: pd.DataFrame, num_samples:Optional[int]=None):\n", + " conditions = conditions.sort_values(by=\"downvotes\").iloc[:num_samples]\n", + " return Delta(conditions=conditions)\n", + "\n", + "sample_downvotes(\n", + " avoid_repeat_chainable(s, conditions=conditions_),\n", + " num_samples=6\n", + ").conditions\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2downvotesavoid_repeat.downvotesavoid_even.downvotesavoid_negative.downvotes
41.03.00.00.00.00
2-1.01.01.00.00.01
63.05.01.01.00.00
30.02.02.00.02.00
52.04.02.00.02.00
1-2.00.03.00.02.01
0-3.0-1.04.02.00.02
\n", + "
" + ], + "text/plain": [ + " x1 x2 downvotes avoid_repeat.downvotes avoid_even.downvotes \\\n", + "4 1.0 3.0 0.0 0.0 0.0 \n", + "2 -1.0 1.0 1.0 0.0 0.0 \n", + "6 3.0 5.0 1.0 1.0 0.0 \n", + "3 0.0 2.0 2.0 0.0 2.0 \n", + "5 2.0 4.0 2.0 0.0 2.0 \n", + "1 -2.0 0.0 3.0 0.0 2.0 \n", + "0 -3.0 -1.0 4.0 2.0 0.0 \n", + "\n", + " avoid_negative.downvotes \n", + "4 0 \n", + "2 1 \n", + "6 0 \n", + "3 0 \n", + "5 0 \n", + "1 1 \n", + "0 2 " + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s_0 = s + Delta(conditions=conditions_) # add the seed conditions\n", + "s_1 = avoid_repeat_chainable(s_0)\n", + "s_2 = avoid_even_chainable(s_1)\n", + "s_3 = avoid_negative_chainable(s_2)\n", + "s_4 = sample_downvotes(s_3, num_samples=7)\n", + "s_4.conditions" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## What Happens When We Extend a Dataframe With New Columns in the State Mechanism\n", + "If we have an experiment_data field which has particular columns:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "s_0 = StandardState(\n", + " experiment_data=pd.DataFrame({\"x1\":[-10], \"x2\":[-10], \"y\":[-10]})\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "... and we add data with extra columns:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "new_experiment_data = pd.DataFrame({\"x1\":[5], \"x2\":[5], \"y\":[5], \"new_column\": [15]})\n", + "s_1 = s_0 + Delta(experiment_data=new_experiment_data)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " then the additional columns just\n", + "get added on the end, and any missing values are replaced by NaNs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
x1x2ynew_column
0-10-10-10NaN
155515.0
\n", + "
" + ], + "text/plain": [ + " x1 x2 y new_column\n", + "0 -10 -10 -10 NaN\n", + "1 5 5 5 15.0" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "s_1.experiment_data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 2 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython2" + } + }, + "nbformat": 4, + "nbformat_minor": 0 +} diff --git a/docs/cycle/Linear and Cyclical Workflows using Functions and States.ipynb b/docs/cycle/Linear and Cyclical Workflows using Functions and States.ipynb index e2a9babb..f04ee104 100644 --- a/docs/cycle/Linear and Cyclical Workflows using Functions and States.ipynb +++ b/docs/cycle/Linear and Cyclical Workflows using Functions and States.ipynb @@ -46,7 +46,7 @@ "import numpy as np\n", "import pandas as pd\n", "from autora.variable import VariableCollection, Variable\n", - "from autora.state.bundled import StandardState\n", + "from autora.state import StandardState\n", "\n", "s = StandardState(\n", " variables=VariableCollection(independent_variables=[Variable(\"x\", value_range=(-15,15))],\n", @@ -121,12 +121,12 @@ "metadata": {}, "outputs": [], "source": [ - "from autora.state.delta import wrap_to_use_state, Delta\n", + "from autora.state import on_state, Delta\n", "\n", "def ground_truth(x: pd.Series, c=(432, -144, -3, 1)):\n", " return c[0] + c[1] * x + c[2] * x**2 + c[3] * x**3\n", "\n", - "@wrap_to_use_state\n", + "@on_state\n", "def experiment_runner(conditions, std=100., random_state=None):\n", " \"\"\"Coefs from https://www.maa.org/sites/default/files/0025570x28304.di021116.02p0130a.pdf\"\"\"\n", " rng = np.random.default_rng(random_state)\n", @@ -178,27 +178,27 @@ " \n", " 0\n", " -15.0\n", - " -1457.949701\n", + " -1457.218119\n", " \n", " \n", " 1\n", " -14.7\n", - " -1275.900522\n", + " -1275.332030\n", " \n", " \n", " 2\n", " -14.4\n", - " -1101.584447\n", + " -1102.558433\n", " \n", " \n", " 3\n", " -14.1\n", - " -938.510951\n", + " -937.742130\n", " \n", " \n", " 4\n", " -13.8\n", - " -780.229165\n", + " -780.935825\n", " \n", " \n", " ...\n", @@ -208,27 +208,27 @@ " \n", " 96\n", " 13.8\n", - " 500.274061\n", + " 501.733867\n", " \n", " \n", " 97\n", " 14.1\n", - " 608.306420\n", + " 607.023667\n", " \n", " \n", " 98\n", " 14.4\n", - " 720.885521\n", + " 721.623458\n", " \n", " \n", " 99\n", " 14.7\n", - " 843.944513\n", + " 843.627156\n", " \n", " \n", " 100\n", " 15.0\n", - " 971.655807\n", + " 973.391517\n", " \n", " \n", "\n", @@ -237,17 +237,17 @@ ], "text/plain": [ " x y\n", - "0 -15.0 -1457.949701\n", - "1 -14.7 -1275.900522\n", - "2 -14.4 -1101.584447\n", - "3 -14.1 -938.510951\n", - "4 -13.8 -780.229165\n", + "0 -15.0 -1457.218119\n", + "1 -14.7 -1275.332030\n", + "2 -14.4 -1102.558433\n", + "3 -14.1 -937.742130\n", + "4 -13.8 -780.935825\n", ".. ... ...\n", - "96 13.8 500.274061\n", - "97 14.1 608.306420\n", - "98 14.4 720.885521\n", - "99 14.7 843.944513\n", - "100 15.0 971.655807\n", + "96 13.8 501.733867\n", + "97 14.1 607.023667\n", + "98 14.4 721.623458\n", + "99 14.7 843.627156\n", + "100 15.0 973.391517\n", "\n", "[101 rows x 2 columns]" ] @@ -268,7 +268,7 @@ "### Defining The Theorist\n", "\n", "Now we define a theorist, which does a linear regression on the polynomial of degree 5. We define a regressor and a\n", - "method to return its feature names and coefficients, and then the theorist to handle it. Here, we use a different wrapper `theorist_from_estimator` that wraps the regressor and returns a function with the same functionality, but operating on `State` fields. In this case, we want to use the `State` field `experiment_data` and extend the `State` field `models`." + "method to return its feature names and coefficients, and then the theorist to handle it. Here, we use a different wrapper `estimator_on_state` that wraps the regressor and returns a function with the same functionality, but operating on `State` fields. In this case, we want to use the `State` field `experiment_data` and extend the `State` field `models`." ] }, { @@ -278,13 +278,13 @@ "outputs": [], "source": [ "from sklearn.linear_model import LinearRegression\n", - "from autora.state.wrapper import theorist_from_estimator\n", + "from autora.state import estimator_on_state\n", "from sklearn.pipeline import make_pipeline as make_theorist_pipeline\n", "from sklearn.preprocessing import PolynomialFeatures\n", "\n", "# Completely standard scikit-learn pipeline regressor\n", "regressor = make_theorist_pipeline(PolynomialFeatures(degree=5), LinearRegression())\n", - "theorist = theorist_from_estimator(regressor)\n", + "theorist = estimator_on_state(regressor)\n", "\n", "def get_equation(r):\n", " t = r.named_steps['polynomialfeatures'].get_feature_names_out()\n", @@ -744,7 +744,7 @@ "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -776,7 +776,7 @@ "outputs": [ { "data": { - "image/png": "", + "image/png": "", "text/plain": [ "
" ] @@ -874,17 +874,17 @@ "\n", "v1.model=None, \n", "v1.experiment_data= x y\n", - "0 -15.0 -1386.402949\n", - "1 -14.7 -1073.690228\n", - "2 -14.4 -1072.951606\n", - "3 -14.1 -1096.806703\n", - "4 -13.8 -838.977013\n", + "0 -15.0 -1646.530156\n", + "1 -14.7 -1336.437358\n", + "2 -14.4 -1055.375424\n", + "3 -14.1 -1100.425725\n", + "4 -13.8 -929.288485\n", ".. ... ...\n", - "96 13.8 384.625949\n", - "97 14.1 559.333146\n", - "98 14.4 795.556490\n", - "99 14.7 920.071641\n", - "100 15.0 907.742229\n", + "96 13.8 461.151029\n", + "97 14.1 512.259065\n", + "98 14.4 795.078025\n", + "99 14.7 930.233261\n", + "100 15.0 986.124289\n", "\n", "[101 rows x 2 columns]\n" ] @@ -952,6 +952,118 @@ "v3 = next(cycle_generator)\n", "print(f\"{v3.model=}, \\n{v3.experiment_data.shape=}\")\n" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Adding The Experimentalist\n", + "\n", + "Modifying the code to use a custom experimentalist is simple. We define an experimentalist which adds some observations\n", + "each cycle:\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "StandardState(variables=VariableCollection(independent_variables=[Variable(name='x', value_range=(-15, 15), allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], dependent_variables=[Variable(name='y', value_range=None, allowed_values=None, units='', type=, variable_label='', rescale=1, is_covariate=False)], covariates=[]), conditions= x\n", + "0 -3.681470\n", + "1 13.752780\n", + "2 -4.058959\n", + "3 10.911147\n", + "4 -1.159941, experiment_data=Empty DataFrame\n", + "Columns: [x, y]\n", + "Index: [], models=[])" + ] + }, + "execution_count": null, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from autora.experimentalist.random_ import random_pool\n", + "experimentalist = on_state(random_pool, output=[\"conditions\"])\n", + "experimentalist(s)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "u0 = s\n", + "for i in range(5):\n", + " u0 = experimentalist(u0, num_samples=10, random_state=42+i)\n", + " u0 = experiment_runner(u0, random_state=43+i)\n", + " u0 = theorist(u0)\n", + " show_best_fit(u0)\n", + " plt.title(f\"{i=}, {len(u0.experiment_data)=}\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/docs/experimentalists/pooler/grid/index.md b/docs/experimentalists/grid/index.md similarity index 81% rename from docs/experimentalists/pooler/grid/index.md rename to docs/experimentalists/grid/index.md index 97b314aa..474f78f1 100644 --- a/docs/experimentalists/pooler/grid/index.md +++ b/docs/experimentalists/grid/index.md @@ -22,12 +22,14 @@ This means that there are various combinations that these variables can form, th ### Example Code + ```python -from autora.experimentalist.pooler.grid import grid_pool -from autora.variable import Variable +from autora.experimentalist.grid import grid_pool +from autora.variable import Variable, VariableCollection iv_1 = Variable(allowed_values=[1, 2, 3]) iv_2 = Variable(allowed_values=[4, 5, 6]) +variables = VariableCollection(independent_variables=[iv_1, iv_2]) -pool = grid_pool([iv_1, iv_2]) +pool = grid_pool(variables) ``` diff --git a/docs/experimentalists/pooler/grid/quickstart.md b/docs/experimentalists/grid/quickstart.md similarity index 83% rename from docs/experimentalists/pooler/grid/quickstart.md rename to docs/experimentalists/grid/quickstart.md index 740bb904..35777517 100644 --- a/docs/experimentalists/pooler/grid/quickstart.md +++ b/docs/experimentalists/grid/quickstart.md @@ -10,5 +10,5 @@ You will need: you can import the grid pooler via: ```python -from autora.experimentalist.pooler.grid import grid_pool +from autora.experimentalist.grid import grid_pool ``` diff --git a/docs/experimentalists/pooler/random/quickstart.md b/docs/experimentalists/pooler/random/quickstart.md deleted file mode 100644 index 4219f89b..00000000 --- a/docs/experimentalists/pooler/random/quickstart.md +++ /dev/null @@ -1,14 +0,0 @@ -# Quickstart Guide - -You will need: - -- `python` 3.8 or greater: [https://www.python.org/downloads/](https://www.python.org/downloads/) - - -*Random Pooler* is part of the `autora-core` package and does not need to be installed separately - -you can import the random pooler via: - -```python -from autora.experimentalist.pooler.random_pooler import random_pool -``` diff --git a/docs/experimentalists/pooler/random/index.md b/docs/experimentalists/random/index.md similarity index 87% rename from docs/experimentalists/pooler/random/index.md rename to docs/experimentalists/random/index.md index 59fe7450..774283ba 100644 --- a/docs/experimentalists/pooler/random/index.md +++ b/docs/experimentalists/random/index.md @@ -22,8 +22,10 @@ This means that there are 9 possible combinations for these variables (3x3), fro | 3 | (3,4) | (3,5) | X | ### Example Code + ```python -from autora.experimentalist.pooler.random_pooler import random_pool -pool = random_pool([1, 2, 3],[4, 5, 6], n=3) +from autora.experimentalist.random import random_pool + +pool = random_pool([1, 2, 3], [4, 5, 6], num_samples=3) ``` diff --git a/docs/experimentalists/random/quickstart.md b/docs/experimentalists/random/quickstart.md new file mode 100644 index 00000000..491c1528 --- /dev/null +++ b/docs/experimentalists/random/quickstart.md @@ -0,0 +1,29 @@ +# Quickstart Guide + +You will need: + +- `python` 3.8 or greater: [https://www.python.org/downloads/](https://www.python.org/downloads/) + + +*Random Pooler* and *Sampler* are part of the `autora-core` package and do not need to be installed separately + +You can import and invoke the pool like this: + +```python +from autora.variable import VariableCollection, Variable +from autora.experimentalist.random import pool + +pool( + VariableCollection(independent_variables=[Variable(name="x", allowed_values=range(10))]), + random_state=1 +) +``` + +You can import the sampler like this: + +```python +from autora.experimentalist.random import sample + +sample([1, 1, 2, 2, 3, 3], num_samples=2) +``` + diff --git a/docs/experimentalists/sampler/random/index.md b/docs/experimentalists/sampler/random/index.md deleted file mode 100644 index e20be0d5..00000000 --- a/docs/experimentalists/sampler/random/index.md +++ /dev/null @@ -1,10 +0,0 @@ -# Random Sampler - -Uniform random sampling without replacement from a pool of conditions. - -### Example Code -```python -from autora.experimentalist.sampler.random_sampler import random_sample - -pool = random_sample([1, 1, 2, 2, 3, 3], n=2) -``` diff --git a/docs/experimentalists/sampler/random/quickstart.md b/docs/experimentalists/sampler/random/quickstart.md deleted file mode 100644 index a9337826..00000000 --- a/docs/experimentalists/sampler/random/quickstart.md +++ /dev/null @@ -1,14 +0,0 @@ -# Quickstart Guide - -You will need: - -- `python` 3.8 or greater: [https://www.python.org/downloads/](https://www.python.org/downloads/) - - -*Random Sampler* is part of the `autora-core` package and does not need to be installed separately - -you can import the random sampler via: - -```python -from autora.experimentalist.sampler.random_sampler import random_sample -``` diff --git a/src/autora/experimentalist/grid.py b/src/autora/experimentalist/grid.py new file mode 100644 index 00000000..f605efb5 --- /dev/null +++ b/src/autora/experimentalist/grid.py @@ -0,0 +1,108 @@ +"""Tools to make grids of experimental conditions.""" +from itertools import product + +import pandas as pd + +from autora.variable import VariableCollection + + +def pool(variables: VariableCollection) -> pd.DataFrame: + """Creates exhaustive pool of conditions given a definition of variables with allowed_values. + + Args: + variables: a VariableCollection with `independent_variables` – a sequence of Variable + objects, each of which has an attribute `allowed_values` containing a sequence of + values. + + Returns: a Result / Delta object with the conditions as a pd.DataFrame in the `conditions` field + + Examples: + >>> from autora.state import State + >>> from autora.variable import VariableCollection, Variable + >>> from dataclasses import dataclass, field + >>> import pandas as pd + >>> import numpy as np + + With one independent variable "x", and some allowed values, we get exactly those values + back when running the experimentalist: + >>> pool(VariableCollection( + ... independent_variables=[Variable(name="x", allowed_values=[1, 2, 3])] + ... )) + x + 0 1 + 1 2 + 2 3 + + The allowed_values must be specified: + >>> pool(VariableCollection(independent_variables=[Variable(name="x")])) + Traceback (most recent call last): + ... + AssertionError: grid_pool only supports independent variables with discrete... + + With two independent variables, we get the cartesian product: + >>> pool( + ... VariableCollection(independent_variables=[ + ... Variable(name="x1", allowed_values=[1, 2]), + ... Variable(name="x2", allowed_values=[3, 4]), + ... ])) + x1 x2 + 0 1 3 + 1 1 4 + 2 2 3 + 3 2 4 + + If any of the variables have unspecified allowed_values, we get an error: + >>> pool( + ... VariableCollection(independent_variables=[ + ... Variable(name="x1", allowed_values=[1, 2]), + ... Variable(name="x2"), + ... ])) + Traceback (most recent call last): + ... + AssertionError: grid_pool only supports independent variables with discrete... + + + We can specify arrays of allowed values: + >>> pool( + ... VariableCollection(independent_variables=[ + ... Variable(name="x", allowed_values=np.linspace(-10, 10, 101)), + ... Variable(name="y", allowed_values=[3, 4]), + ... Variable(name="z", allowed_values=np.linspace(20, 30, 11)), + ... ])) + x y z + 0 -10.0 3 20.0 + 1 -10.0 3 21.0 + 2 -10.0 3 22.0 + 3 -10.0 3 23.0 + 4 -10.0 3 24.0 + ... ... .. ... + 2217 10.0 4 26.0 + 2218 10.0 4 27.0 + 2219 10.0 4 28.0 + 2220 10.0 4 29.0 + 2221 10.0 4 30.0 + + [2222 rows x 3 columns] + + """ + ivs = variables.independent_variables + # Get allowed values for each IV + l_iv_values = [] + l_iv_names = [] + for iv in ivs: + assert iv.allowed_values is not None, ( + f"grid_pool only supports independent variables with discrete allowed values, " + f"but allowed_values is None on {iv=} " + ) + l_iv_values.append(iv.allowed_values) + l_iv_names.append(iv.name) + + # Return Cartesian product of all IV values + pool = product(*l_iv_values) + conditions = pd.DataFrame(pool, columns=l_iv_names) + + return conditions + + +grid_pool = pool +"""Alias for pool""" diff --git a/src/autora/experimentalist/pooler/grid.py b/src/autora/experimentalist/pooler/grid.py deleted file mode 100644 index dadc2a4a..00000000 --- a/src/autora/experimentalist/pooler/grid.py +++ /dev/null @@ -1,19 +0,0 @@ -from itertools import product -from typing import List - -from autora.variable import IV - - -def grid_pool(ivs: List[IV]): - """Creates exhaustive pool from discrete values using a Cartesian product of sets""" - # Get allowed values for each IV - l_iv_values = [] - for iv in ivs: - assert iv.allowed_values is not None, ( - f"gridsearch_pool only supports independent variables with discrete allowed values, " - f"but allowed_values is None on {iv=} " - ) - l_iv_values.append(iv.allowed_values) - - # Return Cartesian product of all IV values - return product(*l_iv_values) diff --git a/src/autora/experimentalist/pooler/random_pooler.py b/src/autora/experimentalist/pooler/random_pooler.py deleted file mode 100644 index 78ad104e..00000000 --- a/src/autora/experimentalist/pooler/random_pooler.py +++ /dev/null @@ -1,52 +0,0 @@ -import random -from typing import Iterable, List, Tuple - -import numpy as np - -from autora.utils.deprecation import deprecated_alias -from autora.variable import IV - - -def random_pool( - ivs: List[IV], num_samples: int = 1, duplicates: bool = True -) -> Iterable: - """ - Creates combinations from lists of discrete values using random selection. - Args: - ivs: List of independent variables - n: Number of samples to sample - duplicates: Boolean if duplicate value are allowed. - - """ - l_samples: List[Tuple] = [] - # Create list of pools of values sample from - l_iv_values = [] - for iv in ivs: - assert iv.allowed_values is not None, ( - f"gridsearch_pool only supports independent variables with discrete allowed values, " - f"but allowed_values is None on {iv=} " - ) - l_iv_values.append(iv.allowed_values) - - # Check to ensure infinite search won't occur if duplicates not allowed - if not duplicates: - l_pool_len = [len(set(s)) for s in l_iv_values] - n_combinations = np.product(l_pool_len) - try: - assert num_samples <= n_combinations - except AssertionError: - raise AssertionError( - f"Number to sample n({num_samples}) is larger than the number " - f"of unique combinations({n_combinations})." - ) - - # Random sample from the pools until n is met - while len(l_samples) < num_samples: - l_samples.append(tuple(map(random.choice, l_iv_values))) - if not duplicates: - l_samples = [*set(l_samples)] - - return iter(l_samples) - - -random_pooler = deprecated_alias(random_pool, "random_pooler") diff --git a/src/autora/experimentalist/random.py b/src/autora/experimentalist/random.py new file mode 100644 index 00000000..b4101adf --- /dev/null +++ b/src/autora/experimentalist/random.py @@ -0,0 +1,183 @@ +from typing import Optional, Union + +import numpy as np +import pandas as pd + +from autora.variable import ValueType, VariableCollection + + +def pool( + variables: VariableCollection, + num_samples: int = 5, + random_state: Optional[int] = None, + replace: bool = True, +) -> pd.DataFrame: + """ + Create a sequence of conditions randomly sampled from independent variables. + + Args: + variables: the description of all the variables in the AER experiment. + num_samples: the number of conditions to produce + random_state: the seed value for the random number generator + replace: if True, allow repeated values + + Returns: the generated conditions as a dataframe + + Examples: + >>> from autora.state import State + >>> from autora.variable import VariableCollection, Variable + >>> from dataclasses import dataclass, field + >>> import pandas as pd + >>> import numpy as np + + With one independent variable "x", and some allowed_values we get some of those values + back when running the experimentalist: + >>> pool( + ... VariableCollection( + ... independent_variables=[Variable(name="x", allowed_values=range(10)) + ... ]), random_state=1) + x + 0 4 + 1 5 + 2 7 + 3 9 + 4 0 + + + ... with one independent variable "x", and a value_range, + we get a sample of the range back when running the experimentalist: + >>> pool( + ... VariableCollection(independent_variables=[ + ... Variable(name="x", value_range=(-5, 5)) + ... ]), random_state=1) + x + 0 0.118216 + 1 4.504637 + 2 -3.558404 + 3 4.486494 + 4 -1.881685 + + + + The allowed_values or value_range must be specified: + >>> pool(VariableCollection(independent_variables=[Variable(name="x")])) + Traceback (most recent call last): + ... + ValueError: allowed_values or [value_range and type==REAL] needs to be set... + + With two independent variables, we get independent samples on both axes: + >>> pool(VariableCollection(independent_variables=[ + ... Variable(name="x1", allowed_values=range(1, 5)), + ... Variable(name="x2", allowed_values=range(1, 500)), + ... ]), num_samples=10, replace=True, random_state=1) + x1 x2 + 0 2 434 + 1 3 212 + 2 4 137 + 3 4 414 + 4 1 129 + 5 1 205 + 6 4 322 + 7 4 275 + 8 1 43 + 9 2 14 + + If any of the variables have unspecified allowed_values, we get an error: + >>> pool( + ... VariableCollection(independent_variables=[ + ... Variable(name="x1", allowed_values=[1, 2]), + ... Variable(name="x2"), + ... ])) + Traceback (most recent call last): + ... + ValueError: allowed_values or [value_range and type==REAL] needs to be set... + + + We can specify arrays of allowed values: + + >>> pool( + ... VariableCollection(independent_variables=[ + ... Variable(name="x", allowed_values=np.linspace(-10, 10, 101)), + ... Variable(name="y", allowed_values=[3, 4]), + ... Variable(name="z", allowed_values=np.linspace(20, 30, 11)), + ... ]), random_state=1) + x y z + 0 -0.6 3 29.0 + 1 0.2 4 24.0 + 2 5.2 4 23.0 + 3 9.0 3 29.0 + 4 -9.4 3 22.0 + + + """ + rng = np.random.default_rng(random_state) + + raw_conditions = {} + for iv in variables.independent_variables: + if iv.allowed_values is not None: + raw_conditions[iv.name] = rng.choice( + iv.allowed_values, size=num_samples, replace=replace + ) + elif (iv.value_range is not None) and (iv.type == ValueType.REAL): + raw_conditions[iv.name] = rng.uniform(*iv.value_range, size=num_samples) + + else: + raise ValueError( + "allowed_values or [value_range and type==REAL] needs to be set for " + "%s" % (iv) + ) + + return pd.DataFrame(raw_conditions) + + +random_pool = pool +"""Alias for `pool`""" + + +def sample( + conditions: Union[pd.DataFrame, np.ndarray, np.recarray], + num_samples: int = 1, + random_state: Optional[int] = None, + replace: bool = False, +) -> pd.DataFrame: + """ + Take a random sample from some input conditions. + + Args: + conditions: the conditions to sample from + num_samples: the number of conditions to produce + random_state: the seed value for the random number generator + replace: if True, allow repeated values + + Returns: a Result object with a field `conditions` containing a DataFrame of the sampled + conditions + + Examples: + From a pd.DataFrame: + >>> import pandas as pd + >>> sample( + ... pd.DataFrame({"x": range(100, 200)}), num_samples=5, random_state=180) + x + 67 167 + 71 171 + 64 164 + 63 163 + 96 196 + + From a list (returns a DataFrame): + >>> sample(range(1000), num_samples=5, random_state=180) + 0 + 270 270 + 908 908 + 109 109 + 331 331 + 978 978 + """ + conditions_ = pd.DataFrame(conditions) + return pd.DataFrame.sample( + conditions_, random_state=random_state, n=num_samples, replace=replace + ) + + +random_sample = sample +"""Alias for `sample`""" diff --git a/src/autora/experimentalist/sampler/random_sampler.py b/src/autora/experimentalist/sampler/random_sampler.py deleted file mode 100644 index 7e28d2c3..00000000 --- a/src/autora/experimentalist/sampler/random_sampler.py +++ /dev/null @@ -1,26 +0,0 @@ -import random -from typing import Iterable, Sequence, Union - -from autora.utils.deprecation import deprecated_alias - - -def random_sample(conditions: Union[Iterable, Sequence], num_samples: int = 1): - """ - Uniform random sampling without replacement from a pool of conditions. - Args: - conditions: Pool of conditions - n: number of samples to collect - - Returns: Sampled pool - - """ - - if isinstance(conditions, Iterable): - conditions = list(conditions) - random.shuffle(conditions) - samples = conditions[0:num_samples] - - return samples - - -random_sampler = deprecated_alias(random_sample, "random_sampler") diff --git a/src/autora/state.py b/src/autora/state.py new file mode 100644 index 00000000..ab25baac --- /dev/null +++ b/src/autora/state.py @@ -0,0 +1,1365 @@ +"""Classes to represent cycle state $S$ as $S_n = S_{0} + \\sum_{i=1}^n \\Delta S_{i}$.""" + +from __future__ import annotations + +import inspect +import logging +import warnings +from collections import UserDict +from dataclasses import dataclass, field, fields, is_dataclass, replace +from enum import Enum +from functools import singledispatch, wraps +from typing import ( + Callable, + Generic, + List, + Mapping, + Optional, + Protocol, + Sequence, + TypeVar, + Union, +) + +import numpy as np +import pandas as pd +from sklearn.base import BaseEstimator + +from autora.variable import VariableCollection + +_logger = logging.getLogger(__name__) +T = TypeVar("T") +C = TypeVar("C", covariant=True) + + +class DeltaAddable(Protocol[C]): + """A class which a Delta or other Mapping can be added to, returning the same class""" + + def __add__(self: C, other: Union[Delta, Mapping]) -> C: + ... + + +S = TypeVar("S", bound=DeltaAddable) + + +@dataclass(frozen=True) +class State: + """ + Base object for dataclasses which use the Delta mechanism. + + Examples: + >>> from dataclasses import dataclass, field + >>> from typing import List, Optional + + We define a dataclass where each field (which is going to be delta-ed) has additional + metadata "delta" which describes its delta behaviour. + >>> @dataclass(frozen=True) + ... class ListState(State): + ... l: List = field(default_factory=list, metadata={"delta": "extend"}) + ... m: List = field(default_factory=list, metadata={"delta": "replace"}) + + Now we instantiate the dataclass... + >>> l = ListState(l=list("abc"), m=list("xyz")) + >>> l + ListState(l=['a', 'b', 'c'], m=['x', 'y', 'z']) + + ... and can add deltas to it. `l` will be extended: + >>> l + Delta(l=list("def")) + ListState(l=['a', 'b', 'c', 'd', 'e', 'f'], m=['x', 'y', 'z']) + + ... wheras `m` will be replaced: + >>> l + Delta(m=list("uvw")) + ListState(l=['a', 'b', 'c'], m=['u', 'v', 'w']) + + ... they can be chained: + >>> l + Delta(l=list("def")) + Delta(m=list("uvw")) + ListState(l=['a', 'b', 'c', 'd', 'e', 'f'], m=['u', 'v', 'w']) + + ... and we update multiple fields with one Delta: + >>> l + Delta(l=list("ghi"), m=list("rst")) + ListState(l=['a', 'b', 'c', 'g', 'h', 'i'], m=['r', 's', 't']) + + A non-existent field will be ignored: + >>> l + Delta(o="not a field") + ListState(l=['a', 'b', 'c'], m=['x', 'y', 'z']) + + ... but will trigger a warning: + >>> with warnings.catch_warnings(record=True) as w: + ... _ = l + Delta(o="not a field") + ... print(w[0].message) # doctest: +NORMALIZE_WHITESPACE + These fields: ['o'] could not be used to update ListState, + which has these fields & aliases: ['l', 'm'] + + We can also use the `.update` method to do the same thing: + >>> l.update(l=list("ghi"), m=list("rst")) + ListState(l=['a', 'b', 'c', 'g', 'h', 'i'], m=['r', 's', 't']) + + We can also define fields which `append` the last result: + >>> @dataclass(frozen=True) + ... class AppendState(State): + ... n: List = field(default_factory=list, metadata={"delta": "append"}) + + >>> m = AppendState(n=list("ɑβɣ")) + >>> m + AppendState(n=['ɑ', 'β', 'ɣ']) + + `n` will be appended: + >>> m + Delta(n="∂") + AppendState(n=['ɑ', 'β', 'ɣ', '∂']) + + The metadata key "converter" is used to coerce types (inspired by + [PEP 712](https://peps.python.org/pep-0712/)): + >>> @dataclass(frozen=True) + ... class CoerceStateList(State): + ... o: Optional[List] = field(default=None, metadata={"delta": "replace"}) + ... p: List = field(default_factory=list, metadata={"delta": "replace", + ... "converter": list}) + + >>> r = CoerceStateList() + + If there is no `metadata["converter"]` set for a field, no coercion occurs + >>> r + Delta(o="not a list") + CoerceStateList(o='not a list', p=[]) + + If there is a `metadata["converter"]` set for a field, the data are coerced: + >>> r + Delta(p="not a list") + CoerceStateList(o=None, p=['n', 'o', 't', ' ', 'a', ' ', 'l', 'i', 's', 't']) + + If the input data are of the correct type, they are returned unaltered: + >>> r + Delta(p=["a", "list"]) + CoerceStateList(o=None, p=['a', 'list']) + + With a converter, inputs are converted to the type output by the converter: + >>> @dataclass(frozen=True) + ... class CoerceStateDataFrame(State): + ... q: pd.DataFrame = field(default_factory=pd.DataFrame, + ... metadata={"delta": "replace", + ... "converter": pd.DataFrame}) + + If the type is already correct, the object is passed to the converter, + but should be returned unchanged: + >>> s = CoerceStateDataFrame() + >>> (s + Delta(q=pd.DataFrame([("a",1,"alpha"), ("b",2,"beta")], columns=list("xyz")))).q + x y z + 0 a 1 alpha + 1 b 2 beta + + If the type is not correct, the object is converted if possible. For a dataframe, + we can convert records: + >>> (s + Delta(q=[("a",1,"alpha"), ("b",2,"beta")])).q + 0 1 2 + 0 a 1 alpha + 1 b 2 beta + + ... or an array: + >>> (s + Delta(q=np.linspace([1, 2], [10, 15], 3))).q + 0 1 + 0 1.0 2.0 + 1 5.5 8.5 + 2 10.0 15.0 + + ... or a dictionary: + >>> (s + Delta(q={"a": [1,2,3], "b": [4,5,6]})).q + a b + 0 1 4 + 1 2 5 + 2 3 6 + + ... or a list: + >>> (s + Delta(q=[11, 12, 13])).q + 0 + 0 11 + 1 12 + 2 13 + + ... but not, for instance, a string: + >>> (s + Delta(q="not compatible with pd.DataFrame")).q + Traceback (most recent call last): + ... + ValueError: DataFrame constructor not properly called! + + Without a converter: + >>> @dataclass(frozen=True) + ... class CoerceStateDataFrameNoConverter(State): + ... r: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={"delta": "replace"}) + + ... there is no coercion – the object is passed unchanged + >>> t = CoerceStateDataFrameNoConverter() + >>> (t + Delta(r=np.linspace([1, 2], [10, 15], 3))).r + array([[ 1. , 2. ], + [ 5.5, 8.5], + [10. , 15. ]]) + + + A converter can cast from a DataFrame to a np.ndarray (with a single datatype), + for instance: + >>> @dataclass(frozen=True) + ... class CoerceStateArray(State): + ... r: Optional[np.ndarray] = field(default=None, + ... metadata={"delta": "replace", + ... "converter": np.asarray}) + + Here we pass a dataframe, but expect a numpy array: + >>> (CoerceStateArray() + Delta(r=pd.DataFrame([("a",1), ("b",2)], columns=list("xy")))).r + array([['a', 1], + ['b', 2]], dtype=object) + + We can define aliases which can transform between different potential field + names. + + >>> @dataclass(frozen=True) + ... class FieldAliasState(State): + ... things: List[str] = field( + ... default_factory=list, + ... metadata={"delta": "extend", + ... "aliases": {"thing": lambda m: [m]}} + ... ) + + In the "normal" case, the Delta object is expected to include a list of data in the + correct format which is used to extend the object: + >>> FieldAliasState(things=["0"]) + Delta(things=["1", "2"]) + FieldAliasState(things=['0', '1', '2']) + + However, say the standard return from a step in AER is a single `thing`, rather than a + sequence of them: + >>> FieldAliasState(things=["0"]) + Delta(thing="1") + FieldAliasState(things=['0', '1']) + + + If a cycle function relies on the existence of the `s.thing` as a property of your state + `s`, rather than accessing `s.things[-1]`, then you could additionally define a `property`: + + >>> class FieldAliasStateWithProperty(FieldAliasState): # inherit from FieldAliasState + ... @property + ... def thing(self): + ... return self.things[-1] + + Now you can access both `s.things` and `s.thing` as required by your code. The State only + shows `things` in the string representation... + >>> u = FieldAliasStateWithProperty(things=["0"]) + Delta(thing="1") + >>> u + FieldAliasStateWithProperty(things=['0', '1']) + + ... and exposes `things` as an attribute: + >>> u.things + ['0', '1'] + + ... but also exposes `thing`, always returning the last value. + >>> u.thing + '1' + + """ + + def __add__(self, other: Union[Delta, Mapping]): + updates = dict() + other_fields_unused = list(other.keys()) + for self_field in fields(self): + other_value, key = _get_value(self_field, other) + if other_value is None: + continue + other_fields_unused.remove(key) + + self_field_key = self_field.name + self_value = getattr(self, self_field_key) + delta_behavior = self_field.metadata["delta"] + + if (constructor := self_field.metadata.get("converter", None)) is not None: + coerced_other_value = constructor(other_value) + else: + coerced_other_value = other_value + + if delta_behavior == "extend": + extended_value = _extend(self_value, coerced_other_value) + updates[self_field_key] = extended_value + elif delta_behavior == "append": + appended_value = _append(self_value, coerced_other_value) + updates[self_field_key] = appended_value + elif delta_behavior == "replace": + updates[self_field_key] = coerced_other_value + else: + raise NotImplementedError( + "delta_behaviour=`%s` not implemented" % delta_behavior + ) + + if len(other_fields_unused) > 0: + warnings.warn( + "These fields: %s could not be used to update %s, " + "which has these fields & aliases: %s" + % ( + other_fields_unused, + type(self).__name__, + _get_field_names_and_aliases(self), + ), + ) + + new = replace(self, **updates) + return new + + def update(self, **kwargs): + """ + Return a new version of the State with values updated. + + This is identical to adding a `Delta`. + + If you need to replace values, ignoring the State value aggregation rules, + use `dataclasses.replace` instead. + """ + return self + Delta(**kwargs) + + +def _get_value(f, other: Union[Delta, Mapping]): + """ + Given a `State`'s `dataclasses.field` f, get a value from `other` and report its name. + + Returns: a tuple (the value, the key associated with that value) + + Examples: + >>> from dataclasses import field, dataclass, fields + >>> @dataclass + ... class Example: + ... a: int = field() # base case + ... b: List[int] = field(metadata={"aliases": {"ba": lambda b: [b]}}) # Single alias + ... c: List[int] = field(metadata={"aliases": { + ... "ca": lambda x: x, # pass the value unchanged + ... "cb": lambda x: [x] # wrap the value in a list + ... }}) # Multiple alias + + For a field with no aliases, we retrieve values with the base name: + >>> f_a = fields(Example)[0] + >>> _get_value(f_a, Delta(a=1)) + (1, 'a') + + ... and only the base name: + >>> print(_get_value(f_a, Delta(b=2))) # no match for b + (None, None) + + Any other names are unimportant: + >>> _get_value(f_a, Delta(b=2, a=1)) + (1, 'a') + + For fields with an alias, we retrieve values with the base name: + >>> f_b = fields(Example)[1] + >>> _get_value(f_b, Delta(b=[2])) + ([2], 'b') + + ... or for the alias name, transformed by the alias lambda function: + >>> _get_value(f_b, Delta(ba=21)) + ([21], 'ba') + + We preferentially get the base name, and then any aliases: + >>> _get_value(f_b, Delta(b=2, ba=21)) + (2, 'b') + + ... , regardless of their order in the `Delta` object: + >>> _get_value(f_b, Delta(ba=21, b=2)) + (2, 'b') + + Other names are ignored: + >>> _get_value(f_b, Delta(a=1)) + (None, None) + + and the order of other names is unimportant: + >>> _get_value(f_b, Delta(a=1, b=2)) + (2, 'b') + + For fields with multiple aliases, we retrieve values with the base name: + >>> f_c = fields(Example)[2] + >>> _get_value(f_c, Delta(c=[3])) + ([3], 'c') + + ... for any alias: + >>> _get_value(f_c, Delta(ca=31)) + (31, 'ca') + + ... transformed by the alias lambda function : + >>> _get_value(f_c, Delta(cb=32)) + ([32], 'cb') + + ... and ignoring any other names: + >>> print(_get_value(f_c, Delta(a=1))) + (None, None) + + ... preferentially in the order base name, 1st alias, 2nd alias, ... nth alias: + >>> _get_value(f_c, Delta(c=3, ca=31, cb=32)) + (3, 'c') + + >>> _get_value(f_c, Delta(ca=31, cb=32)) + (31, 'ca') + + >>> _get_value(f_c, Delta(cb=32)) + ([32], 'cb') + + >>> print(_get_value(f_c, Delta())) + (None, None) + + This works with dict objects: + >>> _get_value(f_a, dict(a=13)) + (13, 'a') + + ... with multiple keys: + >>> _get_value(f_b, dict(a=13, b=24, c=35)) + (24, 'b') + + ... and with aliases: + >>> _get_value(f_b, dict(ba=222)) + ([222], 'ba') + + This works with UserDicts: + >>> class MyDelta(UserDict): + ... pass + + >>> _get_value(f_a, MyDelta(a=14)) + (14, 'a') + + ... with multiple keys: + >>> _get_value(f_b, MyDelta(a=1, b=4, c=9)) + (4, 'b') + + ... and with aliases: + >>> _get_value(f_b, MyDelta(ba=234)) + ([234], 'ba') + + """ + + key = f.name + aliases = f.metadata.get("aliases", {}) + + value, used_key = None, None + + if key in other.keys(): + value = other[key] + used_key = key + elif aliases: # ... is not an empty dict + for alias_key, wrapping_function in aliases.items(): + if alias_key in other: + value = wrapping_function(other[alias_key]) + used_key = alias_key + break # we only evaluate the first match + + return value, used_key + + +def _get_field_names_and_aliases(s: State): + """ + Get a list of field names and their aliases from a State object + + Args: + s: a State object + + Returns: a list of field names and their aliases on `s` + + Examples: + >>> from dataclasses import field + >>> @dataclass(frozen=True) + ... class SomeState(State): + ... l: List = field(default_factory=list) + ... m: List = field(default_factory=list) + >>> _get_field_names_and_aliases(SomeState()) + ['l', 'm'] + + >>> @dataclass(frozen=True) + ... class SomeStateWithAliases(State): + ... l: List = field(default_factory=list, metadata={"aliases": {"l1": None, "l2": None}}) + ... m: List = field(default_factory=list, metadata={"aliases": {"m1": None}}) + >>> _get_field_names_and_aliases(SomeStateWithAliases()) + ['l', 'l1', 'l2', 'm', 'm1'] + + """ + result = [] + + for f in fields(s): + name = f.name + result.append(name) + + aliases = f.metadata.get("aliases", {}) + result.extend(aliases) + + return result + + +class Delta(UserDict, Generic[S]): + """ + Represents a delta where the base object determines the extension behavior. + + Examples: + >>> from dataclasses import dataclass + + First we define the dataclass to act as the basis: + >>> from typing import Optional, List + >>> @dataclass(frozen=True) + ... class ListState: + ... l: Optional[List] = None + ... m: Optional[List] = None + ... + """ + + pass + + +Result = Delta +"""`Result` is an alias for `Delta`.""" + + +@singledispatch +def _extend(a, b): + """ + Function to extend supported datatypes. + + """ + raise NotImplementedError("`_extend` not implemented for %s, %s" % (a, b)) + + +@_extend.register(type(None)) +def _extend_none(_, b): + """ + Implementation of `_extend` to support None-types. + + Examples: + >>> _extend(None, []) + [] + + >>> _extend(None, [3]) + [3] + """ + return b + + +@_extend.register(list) +def _extend_list(a, b): + """ + Implementation of `_extend` to support Lists. + + Examples: + >>> _extend([], []) + [] + + >>> _extend([1,2], [3]) + [1, 2, 3] + """ + return a + b + + +@_extend.register(pd.DataFrame) +def _extend_pd_dataframe(a, b): + """ + Implementation of `_extend` to support DataFrames. + + Examples: + >>> _extend(pd.DataFrame({"a": []}), pd.DataFrame({"a": []})) + Empty DataFrame + Columns: [a] + Index: [] + + >>> _extend(pd.DataFrame({"a": [1,2,3]}), pd.DataFrame({"a": [4,5,6]})) + a + 0 1 + 1 2 + 2 3 + 3 4 + 4 5 + 5 6 + """ + return pd.concat((a, b), ignore_index=True) + + +@_extend.register(np.ndarray) +def _extend_np_ndarray(a, b): + """ + Implementation of `_extend` to support Numpy ndarrays. + + Examples: + >>> _extend(np.array([(1,2,3), (4,5,6)]), np.array([(7,8,9)])) + array([[1, 2, 3], + [4, 5, 6], + [7, 8, 9]]) + """ + return np.row_stack([a, b]) + + +@_extend.register(dict) +def _extend_dict(a, b): + """ + Implementation of `_extend` to support Dictionaries. + + Examples: + >>> _extend({"a": "cats"}, {"b": "dogs"}) + {'a': 'cats', 'b': 'dogs'} + """ + return dict(a, **b) + + +def _append(a: List[T], b: T) -> List[T]: + """ + Function to create a new list with an item appended to it. + + Examples: + Given a starting list `a_`: + >>> a_ = [1, 2, 3] + + ... we can append a value: + >>> _append(a_, 4) + [1, 2, 3, 4] + + `a_` is unchanged + >>> a_ == [1, 2, 3] + True + + Why not just use `list.append`? `list.append` mutates `a` in place, which we can't allow + in the AER cycle – parts of the cycle rely on purely functional code which doesn't + (accidentally or intentionally) manipulate existing data. + >>> list.append(a_, 4) # not what we want + >>> a_ + [1, 2, 3, 4] + """ + return a + [b] + + +def inputs_from_state(f): + """Decorator to make target `f` into a function on a `State` and `**kwargs`. + + This wrapper makes it easier to pass arguments to a function from a State. + + It was inspired by the pytest "fixtures" mechanism. + + Args: + f: a function with arguments that could be fields on a `State` + and that returns a `Delta`. + + Returns: a version of `f` which takes and returns `State` objects. + + Examples: + >>> from dataclasses import dataclass, field + >>> import pandas as pd + >>> from typing import List, Optional + + The `State` it operates on needs to have the metadata described in the state module: + >>> @dataclass(frozen=True) + ... class U(State): + ... conditions: List[int] = field(metadata={"delta": "replace"}) + + We indicate the inputs required by the parameter names. + The output must be (compatible with) a `Delta` object. + >>> @inputs_from_state + ... def experimentalist(conditions): + ... new_conditions = [c + 10 for c in conditions] + ... return new_conditions + + >>> experimentalist(U(conditions=[1,2,3,4])) + [11, 12, 13, 14] + + >>> experimentalist(U(conditions=[101,102,103,104])) + [111, 112, 113, 114] + + A dictionary can be returned and used: + >>> @inputs_from_state + ... def returns_a_dictionary(conditions): + ... new_conditions = [c + 10 for c in conditions] + ... return {"conditions": new_conditions} + >>> returns_a_dictionary(U(conditions=[2])) + {'conditions': [12]} + + >>> from autora.variable import VariableCollection, Variable + >>> from sklearn.base import BaseEstimator + >>> from sklearn.linear_model import LinearRegression + + >>> @inputs_from_state + ... def theorist(experiment_data: pd.DataFrame, variables: VariableCollection, **kwargs): + ... ivs = [vi.name for vi in variables.independent_variables] + ... dvs = [vi.name for vi in variables.dependent_variables] + ... X, y = experiment_data[ivs], experiment_data[dvs] + ... model = LinearRegression(fit_intercept=True).set_params(**kwargs).fit(X, y) + ... return model + + >>> @dataclass(frozen=True) + ... class V(State): + ... variables: VariableCollection # field(metadata={"delta":... }) omitted ∴ immutable + ... experiment_data: pd.DataFrame = field(metadata={"delta": "extend"}) + ... model: Optional[BaseEstimator] = field(metadata={"delta": "replace"}, default=None) + + >>> v = V( + ... variables=VariableCollection(independent_variables=[Variable("x")], + ... dependent_variables=[Variable("y")]), + ... experiment_data=pd.DataFrame({"x": [0,1,2,3,4], "y": [2,3,4,5,6]}) + ... ) + >>> model = theorist(v) + >>> model.coef_, model.intercept_ + (array([[1.]]), array([2.])) + + Arguments from the state can be overridden by passing them in as keyword arguments (kwargs): + >>> theorist(v, experiment_data=pd.DataFrame({"x": [0,1,2,3], "y": [12,13,14,15]}))\\ + ... .intercept_ + array([12.]) + + ... and other arguments supported by the inner function can also be passed + (if and only if the inner function allows for and handles `**kwargs` arguments alongside + the values from the state). + >>> theorist(v, fit_intercept=False).intercept_ + 0.0 + + Any parameters not provided by the state must be provided by default values or by the + caller. If the default is specified: + >>> @inputs_from_state + ... def experimentalist(conditions, offset=25): + ... new_conditions = [c + offset for c in conditions] + ... return new_conditions + + ... then it need not be passed. + >>> experimentalist(U(conditions=[1,2,3,4])) + [26, 27, 28, 29] + + If a default isn't specified: + >>> @inputs_from_state + ... def experimentalist(conditions, offset): + ... new_conditions = [c + offset for c in conditions] + ... return new_conditions + + ... then calling the experimentalist without it will throw an error: + >>> experimentalist(U(conditions=[1,2,3,4])) + Traceback (most recent call last): + ... + TypeError: experimentalist() missing 1 required positional argument: 'offset' + + ... which can be fixed by passing the argument as a keyword to the wrapped function. + >>> experimentalist(U(conditions=[1,2,3,4]), offset=2) + [3, 4, 5, 6] + + The state itself is passed through if the inner function requests the `state`: + >>> @inputs_from_state + ... def function_which_needs_whole_state(state, conditions): + ... print("Doing something on: ", state) + ... new_conditions = [c + 2 for c in conditions] + ... return new_conditions + >>> function_which_needs_whole_state(U(conditions=[1,2,3,4])) + Doing something on: U(conditions=[1, 2, 3, 4]) + [3, 4, 5, 6] + + """ + # Get the set of parameter names from function f's signature + parameters_ = set(inspect.signature(f).parameters.keys()) + + @wraps(f) + def _f(state_: S, /, **kwargs) -> S: + # Get the parameters needed which are available from the state_. + # All others must be provided as kwargs or default values on f. + assert is_dataclass(state_) + from_state = parameters_.intersection({i.name for i in fields(state_)}) + arguments_from_state = {k: getattr(state_, k) for k in from_state} + if "state" in parameters_: + arguments_from_state["state"] = state_ + arguments = dict(arguments_from_state, **kwargs) + result = f(**arguments) + return result + + return _f + + +def outputs_to_delta(*output: str): + """ + Decorator factory to wrap outputs from a function as Deltas. + + Examples: + >>> @outputs_to_delta("conditions") + ... def add_five(x): + ... return [xi + 5 for xi in x] + + >>> add_five([1, 2, 3]) + {'conditions': [6, 7, 8]} + + >>> @outputs_to_delta("c") + ... def add_six(conditions): + ... return [c + 5 for c in conditions] + + >>> add_six([1, 2, 3]) + {'c': [6, 7, 8]} + + >>> @outputs_to_delta("+1", "-1") + ... def plus_minus_1(x): + ... a = [xi + 1 for xi in x] + ... b = [xi - 1 for xi in x] + ... return a, b + + >>> plus_minus_1([1, 2, 3]) + {'+1': [2, 3, 4], '-1': [0, 1, 2]} + + + If the wrong number of values are specified for the return, then there might be errors. + If multiple outputs are expected, but only a single output is returned, we get a warning: + >>> @outputs_to_delta("1", "2") + ... def returns_single_result_when_more_expected(): + ... return "a" + >>> returns_single_result_when_more_expected() # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS + Traceback (most recent call last): + ... + AssertionError: function `` + has to return multiple values to match `('1', '2')`. Got `a` instead. + + If multiple outputs are expected, but the wrong number are returned, we get a warning: + >>> @outputs_to_delta("1", "2", "3") + ... def returns_wrong_number_of_results(): + ... return "a", "b" + >>> returns_wrong_number_of_results() # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS + Traceback (most recent call last): + ... + AssertionError: function `` + has to return exactly `3` values to match `('1', '2', '3')`. Got `('a', 'b')` instead. + + However, if a single output is expected, and multiple are returned, these are treated as + a single object and no error occurs: + >>> @outputs_to_delta("foo") + ... def returns_a_tuple(): + ... return "a", "b", "c" + >>> returns_a_tuple() + {'foo': ('a', 'b', 'c')} + + If we fail to specify output names, an error is returned immediately. + >>> @outputs_to_delta() + ... def decorator_missing_arguments(): + ... return "a", "b", "c" + Traceback (most recent call last): + ... + ValueError: `output` names must be specified. + + """ + + def decorator(f): + if len(output) == 0: + raise ValueError("`output` names must be specified.") + + elif len(output) == 1: + + @wraps(f) + def inner(*args, **kwargs): + result = f(*args, **kwargs) + delta = Delta(**{output[0]: result}) + return delta + + else: + + @wraps(f) + def inner(*args, **kwargs): + result = f(*args, **kwargs) + assert isinstance(result, tuple), ( + "function `%s` has to return multiple values " + "to match `%s`. Got `%s` instead." % (f, output, result) + ) + assert len(output) == len(result), ( + "function `%s` has to return " + "exactly `%s` values " + "to match `%s`. " + "Got `%s` instead." + "" % (f, len(output), output, result) + ) + delta = Delta(**dict(zip(output, result))) + return delta + + return inner + + return decorator + + +def delta_to_state(f): + """Decorator to make `f` which takes a `State` and returns a `Delta` return an updated `State`. + + This wrapper handles adding a returned Delta to an input State object. + + Args: + f: the function which returns a `Delta` object + + Returns: the function modified to return a State object + + Examples: + >>> from dataclasses import dataclass, field + >>> import pandas as pd + >>> from typing import List, Optional + + The `State` it operates on needs to have the metadata described in the state module: + >>> @dataclass(frozen=True) + ... class U(State): + ... conditions: List[int] = field(metadata={"delta": "replace"}) + + We indicate the inputs required by the parameter names. + The output must be (compatible with) a `Delta` object. + >>> @delta_to_state + ... @inputs_from_state + ... def experimentalist(conditions): + ... new_conditions = [c + 10 for c in conditions] + ... return Delta(conditions=new_conditions) + + >>> experimentalist(U(conditions=[1,2,3,4])) + U(conditions=[11, 12, 13, 14]) + + >>> experimentalist(U(conditions=[101,102,103,104])) + U(conditions=[111, 112, 113, 114]) + + If the output of the function is not a `Delta` object (or something compatible with its + interface), then an error is thrown. + >>> @delta_to_state + ... @inputs_from_state + ... def returns_bare_conditions(conditions): + ... new_conditions = [c + 10 for c in conditions] + ... return new_conditions + + >>> returns_bare_conditions(U(conditions=[1])) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE + Traceback (most recent call last): + ... + AssertionError: Output of must be a `Delta`, + `UserDict`, or `dict`. + + A dictionary can be returned and used: + >>> @delta_to_state + ... @inputs_from_state + ... def returns_a_dictionary(conditions): + ... new_conditions = [c + 10 for c in conditions] + ... return {"conditions": new_conditions} + >>> returns_a_dictionary(U(conditions=[2])) + U(conditions=[12]) + + ... as can an object which subclasses UserDict (like `Delta`) + >>> class MyDelta(UserDict): + ... pass + >>> @delta_to_state + ... @inputs_from_state + ... def returns_a_userdict(conditions): + ... new_conditions = [c + 10 for c in conditions] + ... return MyDelta(conditions=new_conditions) + >>> returns_a_userdict(U(conditions=[3])) + U(conditions=[13]) + + We recommend using the `Delta` object rather than a `UserDict` or `dict` as its + functionality may be expanded in future. + + >>> from autora.variable import VariableCollection, Variable + >>> from sklearn.base import BaseEstimator + >>> from sklearn.linear_model import LinearRegression + + >>> @delta_to_state + ... @inputs_from_state + ... def theorist(experiment_data: pd.DataFrame, variables: VariableCollection, **kwargs): + ... ivs = [vi.name for vi in variables.independent_variables] + ... dvs = [vi.name for vi in variables.dependent_variables] + ... X, y = experiment_data[ivs], experiment_data[dvs] + ... new_model = LinearRegression(fit_intercept=True).set_params(**kwargs).fit(X, y) + ... return Delta(model=new_model) + + >>> @dataclass(frozen=True) + ... class V(State): + ... variables: VariableCollection # field(metadata={"delta":... }) omitted ∴ immutable + ... experiment_data: pd.DataFrame = field(metadata={"delta": "extend"}) + ... model: Optional[BaseEstimator] = field(metadata={"delta": "replace"}, default=None) + + >>> v = V( + ... variables=VariableCollection(independent_variables=[Variable("x")], + ... dependent_variables=[Variable("y")]), + ... experiment_data=pd.DataFrame({"x": [0,1,2,3,4], "y": [2,3,4,5,6]}) + ... ) + >>> v_prime = theorist(v) + >>> v_prime.model.coef_, v_prime.model.intercept_ + (array([[1.]]), array([2.])) + + Arguments from the state can be overridden by passing them in as keyword arguments (kwargs): + >>> theorist(v, experiment_data=pd.DataFrame({"x": [0,1,2,3], "y": [12,13,14,15]}))\\ + ... .model.intercept_ + array([12.]) + + ... and other arguments supported by the inner function can also be passed + (if and only if the inner function allows for and handles `**kwargs` arguments alongside + the values from the state). + >>> theorist(v, fit_intercept=False).model.intercept_ + 0.0 + + Any parameters not provided by the state must be provided by default values or by the + caller. If the default is specified: + >>> @delta_to_state + ... @inputs_from_state + ... def experimentalist(conditions, offset=25): + ... new_conditions = [c + offset for c in conditions] + ... return Delta(conditions=new_conditions) + + ... then it need not be passed. + >>> experimentalist(U(conditions=[1,2,3,4])) + U(conditions=[26, 27, 28, 29]) + + If a default isn't specified: + >>> @delta_to_state + ... @inputs_from_state + ... def experimentalist(conditions, offset): + ... new_conditions = [c + offset for c in conditions] + ... return Delta(conditions=new_conditions) + + ... then calling the experimentalist without it will throw an error: + >>> experimentalist(U(conditions=[1,2,3,4])) + Traceback (most recent call last): + ... + TypeError: experimentalist() missing 1 required positional argument: 'offset' + + ... which can be fixed by passing the argument as a keyword to the wrapped function. + >>> experimentalist(U(conditions=[1,2,3,4]), offset=2) + U(conditions=[3, 4, 5, 6]) + + The state itself is passed through if the inner function requests the `state`: + >>> @delta_to_state + ... @inputs_from_state + ... def function_which_needs_whole_state(state, conditions): + ... print("Doing something on: ", state) + ... new_conditions = [c + 2 for c in conditions] + ... return Delta(conditions=new_conditions) + >>> function_which_needs_whole_state(U(conditions=[1,2,3,4])) + Doing something on: U(conditions=[1, 2, 3, 4]) + U(conditions=[3, 4, 5, 6]) + + """ + + @wraps(f) + def _f(state_: S, **kwargs) -> S: + delta = f(state_, **kwargs) + assert isinstance(delta, Mapping), ( + "Output of %s must be a `Delta`, `UserDict`, " "or `dict`." % f + ) + new_state = state_ + delta + return new_state + + return _f + + +def on_state( + function: Optional[Callable] = None, output: Optional[Sequence[str]] = None +): + """Decorator (factory) to make target `function` into a function on a `State` and `**kwargs`. + + This combines the functionality of `outputs_to_delta` and `inputs_from_state` + + Args: + function: the function to be wrapped + output: list specifying State field names for the return values of `function` + + Returns: + + Examples: + >>> from dataclasses import dataclass, field + >>> import pandas as pd + >>> from typing import List, Optional + + The `State` it operates on needs to have the metadata described in the state module: + >>> @dataclass(frozen=True) + ... class W(State): + ... conditions: List[int] = field(metadata={"delta": "replace"}) + + We indicate the inputs required by the parameter names. + >>> def add_ten(conditions): + ... return [c + 10 for c in conditions] + >>> experimentalist = on_state(function=add_ten, output=["conditions"]) + + >>> experimentalist(W(conditions=[1,2,3,4])) + W(conditions=[11, 12, 13, 14]) + + You can wrap functions which return a Delta object natively, by omitting the `output` + argument: + >>> @on_state() + ... def add_five(conditions): + ... return Delta(conditions=[c + 5 for c in conditions]) + + >>> add_five(W(conditions=[1, 2, 3, 4])) + W(conditions=[6, 7, 8, 9]) + + If you fail to declare outputs for a function which doesn't return a Delta: + >>> @on_state() + ... def missing_output_param(conditions): + ... return [c + 5 for c in conditions] + + ... an exception is raised: + >>> missing_output_param(W(conditions=[1])) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE + Traceback (most recent call last): + ... + AssertionError: Output of must be a `Delta`, + `UserDict`, or `dict`. + + You can use the @on_state(output=[...]) as a decorator: + >>> @on_state(output=["conditions"]) + ... def add_six(conditions): + ... return [c + 6 for c in conditions] + + >>> add_six(W(conditions=[1, 2, 3, 4])) + W(conditions=[7, 8, 9, 10]) + + """ + + def decorator(f): + f_ = f + if output is not None: + f_ = outputs_to_delta(*output)(f_) + f_ = inputs_from_state(f_) + f_ = delta_to_state(f_) + return f_ + + if function is None: + return decorator + else: + return decorator(function) + + +StateFunction = Callable[[State], State] + + +class StandardStateVariables(Enum): + CONDITIONS = "conditions" + EXPERIMENT_DATA = "experiment_data" + MODELS = "models" + VARIABLES = "variables" + + +@dataclass(frozen=True) +class StandardState(State): + """ + Examples: + The state can be initialized emtpy + >>> from autora.variable import VariableCollection, Variable + >>> s = StandardState() + >>> s + StandardState(variables=None, conditions=None, experiment_data=None, models=[]) + + The `variables` can be updated using a `Delta`: + >>> dv1 = Delta(variables=VariableCollection(independent_variables=[Variable("1")])) + >>> s + dv1 + StandardState(variables=VariableCollection(independent_variables=[Variable(name='1',...) + + ... and are replaced by each `Delta`: + >>> dv2 = Delta(variables=VariableCollection(independent_variables=[Variable("2")])) + >>> s + dv1 + dv2 + StandardState(variables=VariableCollection(independent_variables=[Variable(name='2',...) + + The `conditions` can be updated using a `Delta`: + >>> dc1 = Delta(conditions=pd.DataFrame({"x": [1, 2, 3]})) + >>> (s + dc1).conditions + x + 0 1 + 1 2 + 2 3 + + ... and are replaced by each `Delta`: + >>> dc2 = Delta(conditions=pd.DataFrame({"x": [4, 5]})) + >>> (s + dc1 + dc2).conditions + x + 0 4 + 1 5 + + Datatypes other than `pd.DataFrame` will be coerced into a `DataFrame` if possible. + >>> import numpy as np + >>> dc3 = Delta(conditions=np.core.records.fromrecords([(8, "h"), (9, "i")], names="n,c")) + >>> (s + dc3).conditions + n c + 0 8 h + 1 9 i + + If they are passed without column names, no column names are inferred. + This is to ensure that accidental mislabeling of columns cannot occur. + Column names should usually be provided. + >>> dc4 = Delta(conditions=[(6,), (7,)]) + >>> (s + dc4).conditions + 0 + 0 6 + 1 7 + + Datatypes which are incompatible with a pd.DataFrame will throw an error: + >>> s + Delta(conditions="not compatible with pd.DataFrame") + Traceback (most recent call last): + ... + ValueError: ... + + Experiment data can be updated using a Delta: + >>> ded1 = Delta(experiment_data=pd.DataFrame({"x": [1,2,3], "y": ["a", "b", "c"]})) + >>> (s + ded1).experiment_data + x y + 0 1 a + 1 2 b + 2 3 c + + ... and are extended with each Delta: + >>> ded2 = Delta(experiment_data=pd.DataFrame({"x": [4, 5, 6], "y": ["d", "e", "f"]})) + >>> (s + ded1 + ded2).experiment_data + x y + 0 1 a + 1 2 b + 2 3 c + 3 4 d + 4 5 e + 5 6 f + + If they are passed without column names, no column names are inferred. + This is to ensure that accidental mislabeling of columns cannot occur. + >>> ded3 = Delta(experiment_data=pd.DataFrame([(7, "g"), (8, "h")])) + >>> (s + ded3).experiment_data + 0 1 + 0 7 g + 1 8 h + + If there are already data present, the column names must match. + >>> (s + ded2 + ded3).experiment_data + x y 0 1 + 0 4.0 d NaN NaN + 1 5.0 e NaN NaN + 2 6.0 f NaN NaN + 3 NaN NaN 7.0 g + 4 NaN NaN 8.0 h + + `experiment_data` other than `pd.DataFrame` will be coerced into a `DataFrame` if possible. + >>> import numpy as np + >>> ded4 = Delta( + ... experiment_data=np.core.records.fromrecords([(1, "a"), (2, "b")], names=["x", "y"])) + >>> (s + ded4).experiment_data + x y + 0 1 a + 1 2 b + + `experiment_data` which are incompatible with a pd.DataFrame will throw an error: + >>> s + Delta(experiment_data="not compatible with pd.DataFrame") + Traceback (most recent call last): + ... + ValueError: ... + + `models` can be updated using a Delta: + >>> from sklearn.dummy import DummyClassifier + >>> dm1 = Delta(models=[DummyClassifier(constant=1)]) + >>> dm2 = Delta(models=[DummyClassifier(constant=2), DummyClassifier(constant=3)]) + >>> (s + dm1).models + [DummyClassifier(constant=1)] + + >>> (s + dm1 + dm2).models + [DummyClassifier(constant=1), DummyClassifier(constant=2), DummyClassifier(constant=3)] + + The last model is available under the `model` property: + >>> (s + dm1 + dm2).model + DummyClassifier(constant=3) + + If there is no model, `None` is returned: + >>> print(s.model) + None + + `models` can also be updated using a Delta with a single `model`: + >>> dm3 = Delta(model=DummyClassifier(constant=4)) + >>> (s + dm1 + dm3).model + DummyClassifier(constant=4) + + As before, the `models` list is extended: + >>> (s + dm1 + dm3).models + [DummyClassifier(constant=1), DummyClassifier(constant=4)] + + No coercion or validation occurs with `models` or `model`: + >>> (s + dm1 + Delta(model="not a model")).models + [DummyClassifier(constant=1), 'not a model'] + + """ + + variables: Optional[VariableCollection] = field( + default=None, metadata={"delta": "replace"} + ) + conditions: Optional[pd.DataFrame] = field( + default=None, metadata={"delta": "replace", "converter": pd.DataFrame} + ) + experiment_data: Optional[pd.DataFrame] = field( + default=None, metadata={"delta": "extend", "converter": pd.DataFrame} + ) + models: List[BaseEstimator] = field( + default_factory=list, + metadata={"delta": "extend", "aliases": {"model": lambda model: [model]}}, + ) + + @property + def model(self): + """Alias for the last model in the `models`.""" + try: + return self.models[-1] + except IndexError: + return None + + +X = TypeVar("X") +Y = TypeVar("Y") +XY = TypeVar("XY") + + +def estimator_on_state(estimator: BaseEstimator) -> StateFunction: + """ + Convert a scikit-learn compatible estimator into a function on a `State` object. + + Supports passing additional `**kwargs` which are used to update the estimator's params + before fitting. + + Examples: + Initialize a function which operates on the state, `state_fn` and runs a LinearRegression. + >>> from sklearn.linear_model import LinearRegression + >>> state_fn = estimator_on_state(LinearRegression()) + + Define the state on which to operate (here an instance of the `StandardState`): + >>> from autora.state import StandardState + >>> from autora.variable import Variable, VariableCollection + >>> import pandas as pd + >>> s = StandardState( + ... variables=VariableCollection( + ... independent_variables=[Variable("x")], + ... dependent_variables=[Variable("y")]), + ... experiment_data=pd.DataFrame({"x": [1,2,3], "y":[3,6,9]}) + ... ) + + Run the function, which fits the model and adds the result to the `StandardState` + >>> state_fn(s).model.coef_ + array([[3.]]) + + """ + + @on_state() + def theorist( + experiment_data: pd.DataFrame, variables: VariableCollection, **kwargs + ): + ivs = [v.name for v in variables.independent_variables] + dvs = [v.name for v in variables.dependent_variables] + X, y = experiment_data[ivs], experiment_data[dvs] + new_model = estimator.set_params(**kwargs).fit(X, y) + return Delta(model=new_model) + + return theorist + + +def experiment_runner_on_state(f: Callable[[X], XY]) -> StateFunction: + """Wrapper for experiment_runner of the form $f(x) \rarrow (x,y)$, where `f` + returns both $x$ and $y$ values in a complete dataframe. + + Examples: + The conditions are some x-values in a StandardState object: + >>> from autora.state import StandardState + >>> s = StandardState(conditions=pd.DataFrame({"x": [1, 2, 3]})) + + The function can be defined on a DataFrame, allowing the explicit inclusion of + metadata like column names. + >>> def x_to_xy_fn(c: pd.DataFrame) -> pd.Series: + ... result = c.assign(y=lambda df: 2 * df.x + 1) + ... return result + + We apply the wrapped function to `s` and look at the returned experiment_data: + >>> experiment_runner_on_state(x_to_xy_fn)(s).experiment_data + x y + 0 1 3 + 1 2 5 + 2 3 7 + + We can also define functions of several variables: + >>> def xs_to_xy_fn(c: pd.DataFrame) -> pd.Series: + ... result = c.assign(y=c.x0 + c.x1) + ... return result + + With the relevant variables as conditions: + >>> t = StandardState(conditions=pd.DataFrame({"x0": [1, 2, 3], "x1": [10, 20, 30]})) + >>> experiment_runner_on_state(xs_to_xy_fn)(t).experiment_data + x0 x1 y + 0 1 10 11 + 1 2 20 22 + 2 3 30 33 + + """ + + @on_state() + def experiment_runner(conditions: pd.DataFrame, **kwargs): + x = conditions + experiment_data = f(x, **kwargs) + return Delta(experiment_data=experiment_data) + + return experiment_runner diff --git a/src/autora/state/bundled.py b/src/autora/state/bundled.py deleted file mode 100644 index 7a878907..00000000 --- a/src/autora/state/bundled.py +++ /dev/null @@ -1,175 +0,0 @@ -from dataclasses import dataclass, field -from typing import List, Optional - -import pandas as pd -from sklearn.base import BaseEstimator - -from autora.state.delta import State -from autora.variable import VariableCollection - - -@dataclass(frozen=True) -class StandardState(State): - """ - Examples: - The state can be initialized emtpy - >>> from autora.state.delta import Delta - >>> from autora.variable import VariableCollection, Variable - >>> s = StandardState() - >>> s - StandardState(variables=None, conditions=None, experiment_data=None, models=[]) - - The `variables` can be updated using a `Delta`: - >>> dv1 = Delta(variables=VariableCollection(independent_variables=[Variable("1")])) - >>> s + dv1 - StandardState(variables=VariableCollection(independent_variables=[Variable(name='1',...) - - ... and are replaced by each `Delta`: - >>> dv2 = Delta(variables=VariableCollection(independent_variables=[Variable("2")])) - >>> s + dv1 + dv2 - StandardState(variables=VariableCollection(independent_variables=[Variable(name='2',...) - - The `conditions` can be updated using a `Delta`: - >>> dc1 = Delta(conditions=pd.DataFrame({"x": [1, 2, 3]})) - >>> (s + dc1).conditions - x - 0 1 - 1 2 - 2 3 - - ... and are replaced by each `Delta`: - >>> dc2 = Delta(conditions=pd.DataFrame({"x": [4, 5]})) - >>> (s + dc1 + dc2).conditions - x - 0 4 - 1 5 - - Datatypes other than `pd.DataFrame` will be coerced into a `DataFrame` if possible. - >>> import numpy as np - >>> dc3 = Delta(conditions=np.core.records.fromrecords([(8, "h"), (9, "i")], names="n,c")) - >>> (s + dc3).conditions - n c - 0 8 h - 1 9 i - - If they are passed without column names, no column names are inferred. - This is to ensure that accidental mislabeling of columns cannot occur. - Column names should usually be provided. - >>> dc4 = Delta(conditions=[(6,), (7,)]) - >>> (s + dc4).conditions - 0 - 0 6 - 1 7 - - Datatypes which are incompatible with a pd.DataFrame will throw an error: - >>> s + Delta(conditions="not compatible with pd.DataFrame") - Traceback (most recent call last): - ... - ValueError: ... - - Experiment data can be updated using a Delta: - >>> ded1 = Delta(experiment_data=pd.DataFrame({"x": [1,2,3], "y": ["a", "b", "c"]})) - >>> (s + ded1).experiment_data - x y - 0 1 a - 1 2 b - 2 3 c - - ... and are extended with each Delta: - >>> ded2 = Delta(experiment_data=pd.DataFrame({"x": [4, 5, 6], "y": ["d", "e", "f"]})) - >>> (s + ded1 + ded2).experiment_data - x y - 0 1 a - 1 2 b - 2 3 c - 3 4 d - 4 5 e - 5 6 f - - If they are passed without column names, no column names are inferred. - This is to ensure that accidental mislabeling of columns cannot occur. - >>> ded3 = Delta(experiment_data=pd.DataFrame([(7, "g"), (8, "h")])) - >>> (s + ded3).experiment_data - 0 1 - 0 7 g - 1 8 h - - If there are already data present, the column names must match. - >>> (s + ded2 + ded3).experiment_data - x y 0 1 - 0 4.0 d NaN NaN - 1 5.0 e NaN NaN - 2 6.0 f NaN NaN - 3 NaN NaN 7.0 g - 4 NaN NaN 8.0 h - - `experiment_data` other than `pd.DataFrame` will be coerced into a `DataFrame` if possible. - >>> import numpy as np - >>> ded4 = Delta( - ... experiment_data=np.core.records.fromrecords([(1, "a"), (2, "b")], names=["x", "y"])) - >>> (s + ded4).experiment_data - x y - 0 1 a - 1 2 b - - `experiment_data` which are incompatible with a pd.DataFrame will throw an error: - >>> s + Delta(experiment_data="not compatible with pd.DataFrame") - Traceback (most recent call last): - ... - ValueError: ... - - `models` can be updated using a Delta: - >>> from sklearn.dummy import DummyClassifier - >>> dm1 = Delta(models=[DummyClassifier(constant=1)]) - >>> dm2 = Delta(models=[DummyClassifier(constant=2), DummyClassifier(constant=3)]) - >>> (s + dm1).models - [DummyClassifier(constant=1)] - - >>> (s + dm1 + dm2).models - [DummyClassifier(constant=1), DummyClassifier(constant=2), DummyClassifier(constant=3)] - - The last model is available under the `model` property: - >>> (s + dm1 + dm2).model - DummyClassifier(constant=3) - - If there is no model, `None` is returned: - >>> print(s.model) - None - - `models` can also be updated using a Delta with a single `model`: - >>> dm3 = Delta(model=DummyClassifier(constant=4)) - >>> (s + dm1 + dm3).model - DummyClassifier(constant=4) - - As before, the `models` list is extended: - >>> (s + dm1 + dm3).models - [DummyClassifier(constant=1), DummyClassifier(constant=4)] - - No coercion or validation occurs with `models` or `model`: - >>> (s + dm1 + Delta(model="not a model")).models - [DummyClassifier(constant=1), 'not a model'] - - - """ - - variables: Optional[VariableCollection] = field( - default=None, metadata={"delta": "replace"} - ) - conditions: Optional[pd.DataFrame] = field( - default=None, metadata={"delta": "replace", "converter": pd.DataFrame} - ) - experiment_data: Optional[pd.DataFrame] = field( - default=None, metadata={"delta": "extend", "converter": pd.DataFrame} - ) - models: List[BaseEstimator] = field( - default_factory=list, - metadata={"delta": "extend", "aliases": {"model": lambda model: [model]}}, - ) - - @property - def model(self): - """Alias for the last model in the `models`.""" - try: - return self.models[-1] - except IndexError: - return None diff --git a/src/autora/state/delta.py b/src/autora/state/delta.py deleted file mode 100644 index 3763c7d6..00000000 --- a/src/autora/state/delta.py +++ /dev/null @@ -1,619 +0,0 @@ -"""Classes to represent cycle state $S$ as $S_n = S_{0} + \\sum_{i=1}^n \\Delta S_{i}$.""" -from __future__ import annotations - -import dataclasses -import inspect -import logging -from collections import UserDict -from dataclasses import dataclass, fields, replace -from functools import singledispatch, wraps -from typing import Generic, List, TypeVar - -import numpy as np -import pandas as pd - -_logger = logging.getLogger(__name__) -S = TypeVar("S") -T = TypeVar("T") - - -@dataclass(frozen=True) -class State: - """ - Base object for dataclasses which use the Delta mechanism. - - Examples: - >>> from dataclasses import dataclass, field - >>> from typing import List, Optional - - We define a dataclass where each field (which is going to be delta-ed) has additional - metadata "delta" which describes its delta behaviour. - >>> @dataclass(frozen=True) - ... class ListState(State): - ... l: List = field(default_factory=list, metadata={"delta": "extend"}) - ... m: List = field(default_factory=list, metadata={"delta": "replace"}) - - Now we instantiate the dataclass... - >>> l = ListState(l=list("abc"), m=list("xyz")) - >>> l - ListState(l=['a', 'b', 'c'], m=['x', 'y', 'z']) - - ... and can add deltas to it. `l` will be extended: - >>> l + Delta(l=list("def")) - ListState(l=['a', 'b', 'c', 'd', 'e', 'f'], m=['x', 'y', 'z']) - - ... wheras `m` will be replaced: - >>> l + Delta(m=list("uvw")) - ListState(l=['a', 'b', 'c'], m=['u', 'v', 'w']) - - ... they can be chained: - >>> l + Delta(l=list("def")) + Delta(m=list("uvw")) - ListState(l=['a', 'b', 'c', 'd', 'e', 'f'], m=['u', 'v', 'w']) - - ... and we update multiple fields with one Delta: - >>> l + Delta(l=list("ghi"), m=list("rst")) - ListState(l=['a', 'b', 'c', 'g', 'h', 'i'], m=['r', 's', 't']) - - A non-existent field will be ignored: - >>> l + Delta(o="not a field") - ListState(l=['a', 'b', 'c'], m=['x', 'y', 'z']) - - We can also use the `.update` method to do the same thing: - >>> l.update(l=list("ghi"), m=list("rst")) - ListState(l=['a', 'b', 'c', 'g', 'h', 'i'], m=['r', 's', 't']) - - We can also define fields which `append` the last result: - >>> @dataclass(frozen=True) - ... class AppendState(State): - ... n: List = field(default_factory=list, metadata={"delta": "append"}) - - >>> m = AppendState(n=list("ɑβɣ")) - >>> m - AppendState(n=['ɑ', 'β', 'ɣ']) - - `n` will be appended: - >>> m + Delta(n="∂") - AppendState(n=['ɑ', 'β', 'ɣ', '∂']) - - The metadata key "converter" is used to coerce types (inspired by - [PEP 712](https://peps.python.org/pep-0712/)): - >>> @dataclass(frozen=True) - ... class CoerceStateList(State): - ... o: Optional[List] = field(default=None, metadata={"delta": "replace"}) - ... p: List = field(default_factory=list, metadata={"delta": "replace", - ... "converter": list}) - - >>> r = CoerceStateList() - - If there is no `metadata["converter"]` set for a field, no coercion occurs - >>> r + Delta(o="not a list") - CoerceStateList(o='not a list', p=[]) - - If there is a `metadata["converter"]` set for a field, the data are coerced: - >>> r + Delta(p="not a list") - CoerceStateList(o=None, p=['n', 'o', 't', ' ', 'a', ' ', 'l', 'i', 's', 't']) - - If the input data are of the correct type, they are returned unaltered: - >>> r + Delta(p=["a", "list"]) - CoerceStateList(o=None, p=['a', 'list']) - - With a converter, inputs are converted to the type output by the converter: - >>> import pandas as pd - >>> @dataclass(frozen=True) - ... class CoerceStateDataFrame(State): - ... q: pd.DataFrame = field(default_factory=pd.DataFrame, - ... metadata={"delta": "replace", - ... "converter": pd.DataFrame}) - - If the type is already correct, the object is passed to the converter, - but should be returned unchanged: - >>> s = CoerceStateDataFrame() - >>> (s + Delta(q=pd.DataFrame([("a",1,"alpha"), ("b",2,"beta")], columns=list("xyz")))).q - x y z - 0 a 1 alpha - 1 b 2 beta - - If the type is not correct, the object is converted if possible. For a dataframe, - we can convert records: - >>> (s + Delta(q=[("a",1,"alpha"), ("b",2,"beta")])).q - 0 1 2 - 0 a 1 alpha - 1 b 2 beta - - ... or an array: - >>> (s + Delta(q=np.linspace([1, 2], [10, 15], 3))).q - 0 1 - 0 1.0 2.0 - 1 5.5 8.5 - 2 10.0 15.0 - - ... or a dictionary: - >>> (s + Delta(q={"a": [1,2,3], "b": [4,5,6]})).q - a b - 0 1 4 - 1 2 5 - 2 3 6 - - ... or a list: - >>> (s + Delta(q=[11, 12, 13])).q - 0 - 0 11 - 1 12 - 2 13 - - ... but not, for instance, a string: - >>> (s + Delta(q="not compatible with pd.DataFrame")).q - Traceback (most recent call last): - ... - ValueError: DataFrame constructor not properly called! - - Without a converter: - >>> @dataclass(frozen=True) - ... class CoerceStateDataFrameNoConverter(State): - ... r: pd.DataFrame = field(default_factory=pd.DataFrame, metadata={"delta": "replace"}) - - ... there is no coercion – the object is passed unchanged - >>> t = CoerceStateDataFrameNoConverter() - >>> (t + Delta(r=np.linspace([1, 2], [10, 15], 3))).r - array([[ 1. , 2. ], - [ 5.5, 8.5], - [10. , 15. ]]) - - - A converter can cast from a DataFrame to a np.ndarray (with a single datatype), - for instance: - >>> import numpy as np - >>> @dataclass(frozen=True) - ... class CoerceStateArray(State): - ... r: Optional[np.ndarray] = field(default=None, - ... metadata={"delta": "replace", - ... "converter": np.asarray}) - - Here we pass a dataframe, but expect a numpy array: - >>> (CoerceStateArray() + Delta(r=pd.DataFrame([("a",1), ("b",2)], columns=list("xy")))).r - array([['a', 1], - ['b', 2]], dtype=object) - - We can define aliases which can transform between different potential field - names. - - >>> @dataclass(frozen=True) - ... class FieldAliasState(State): - ... things: List[str] = field( - ... default_factory=list, - ... metadata={"delta": "extend", - ... "aliases": {"thing": lambda m: [m]}} - ... ) - - In the "normal" case, the Delta object is expected to include a list of data in the - correct format which is used to extend the object: - >>> FieldAliasState(things=["0"]) + Delta(things=["1", "2"]) - FieldAliasState(things=['0', '1', '2']) - - However, say the standard return from a step in AER is a single `thing`, rather than a - sequence of them: - >>> FieldAliasState(things=["0"]) + Delta(thing="1") - FieldAliasState(things=['0', '1']) - - - If a cycle function relies on the existence of the `s.thing` as a property of your state - `s`, rather than accessing `s.things[-1]`, then you could additionally define a `property`: - - >>> class FieldAliasStateWithProperty(FieldAliasState): # inherit from FieldAliasState - ... @property - ... def thing(self): - ... return self.things[-1] - - Now you can access both `s.things` and `s.thing` as required by your code. The State only - shows `things` in the string representation... - >>> s = FieldAliasStateWithProperty(things=["0"]) + Delta(thing="1") - >>> s - FieldAliasStateWithProperty(things=['0', '1']) - - ... and exposes `things` as an attribute: - >>> s.things - ['0', '1'] - - ... but also exposes `thing`, always returning the last value. - >>> s.thing - '1' - - - - - """ - - def __add__(self, other: Delta): - updates = dict() - for self_field in fields(self): - - other_value = _get_value(self_field, other) - if other_value is None: - continue - - self_field_key = self_field.name - self_value = getattr(self, self_field_key) - delta_behavior = self_field.metadata["delta"] - - if (constructor := self_field.metadata.get("converter", None)) is not None: - coerced_other_value = constructor(other_value) - else: - coerced_other_value = other_value - - if delta_behavior == "extend": - extended_value = extend(self_value, coerced_other_value) - updates[self_field_key] = extended_value - elif delta_behavior == "append": - appended_value = append(self_value, coerced_other_value) - updates[self_field_key] = appended_value - elif delta_behavior == "replace": - updates[self_field_key] = coerced_other_value - else: - raise NotImplementedError( - "delta_behaviour=`%s` not implemented" % (delta_behavior) - ) - - new = replace(self, **updates) - return new - - def update(self, **kwargs): - return self + Delta(**kwargs) - - -def _get_value(f, other: Delta): - """ - Given a `State`'s `dataclasses.field` f, get a value from `other` - - Examples: - >>> from dataclasses import field, dataclass, fields - >>> @dataclass - ... class Example: - ... a: int = field() # base case - ... b: List[int] = field(metadata={"aliases": {"ba": lambda b: [b]}}) # Single alias - ... c: List[int] = field(metadata={"aliases": { - ... "ca": lambda x: x, # pass the value unchanged - ... "cb": lambda x: [x] # wrap the value in a list - ... }}) # Multiple alias - - For a field with no aliases, we retrieve values with the base name: - >>> f_a = fields(Example)[0] - >>> _get_value(f_a, Delta(a=1)) - 1 - - ... and only the base name: - >>> print(_get_value(f_a, Delta(b=2))) # no match for b - None - - Any other names are unimportant: - >>> _get_value(f_a, Delta(b=2, a=1)) - 1 - - For fields with an alias, we retrieve values with the base name: - >>> f_b = fields(Example)[1] - >>> _get_value(f_b, Delta(b=[2])) - [2] - - ... or for the alias name, transformed by the alias lambda function: - >>> _get_value(f_b, Delta(ba=21)) - [21] - - We preferentially get the base name, and then any aliases: - >>> _get_value(f_b, Delta(b=2, ba=21)) - 2 - - ... , regardless of their order in the `Delta` object: - >>> _get_value(f_b, Delta(ba=21, b=2)) - 2 - - Other names are ignored: - >>> print(_get_value(f_b, Delta(a=1))) - None - - and the order of other names is unimportant: - >>> _get_value(f_b, Delta(a=1, b=2)) - 2 - - For fields with multiple aliases, we retrieve values with the base name: - >>> f_c = fields(Example)[2] - >>> _get_value(f_c, Delta(c=[3])) - [3] - - ... for any alias: - >>> _get_value(f_c, Delta(ca=31)) - 31 - - ... transformed by the alias lambda function : - >>> _get_value(f_c, Delta(cb=32)) - [32] - - ... and ignoring any other names: - >>> print(_get_value(f_c, Delta(a=1))) - None - - ... preferentially in the order base name, 1st alias, 2nd alias, ... nth alias: - >>> _get_value(f_c, Delta(c=3, ca=31, cb=32)) - 3 - - >>> _get_value(f_c, Delta(ca=31, cb=32)) - 31 - - >>> _get_value(f_c, Delta(cb=32)) - [32] - - >>> print(_get_value(f_c, Delta())) - None - - """ - - key = f.name - - try: - value = other.data[key] - return value - except KeyError: - pass - - try: - aliases = f.metadata["aliases"] - except KeyError: - return - - for alias_key, wrapping_function in aliases.items(): - try: - value = wrapping_function(other.data[alias_key]) - return value - except KeyError: - pass - - return - - -class Delta(UserDict, Generic[S]): - """ - Represents a delta where the base object determines the extension behavior. - - Examples: - >>> from dataclasses import dataclass - - First we define the dataclass to act as the basis: - >>> from typing import Optional, List - >>> @dataclass(frozen=True) - ... class ListState: - ... l: Optional[List] = None - ... m: Optional[List] = None - ... - """ - - pass - - -Result = Delta -"""`Result` is an alias for `Delta`.""" - - -@singledispatch -def extend(a, b): - """ - Function to extend supported datatypes. - - """ - raise NotImplementedError("`extend` not implemented for %s, %s" % (a, b)) - - -@extend.register(type(None)) -def extend_none(a, b): - """ - Examples: - >>> extend(None, []) - [] - - >>> extend(None, [3]) - [3] - """ - return b - - -@extend.register(list) -def extend_list(a, b): - """ - Examples: - >>> extend([], []) - [] - - >>> extend([1,2], [3]) - [1, 2, 3] - """ - return a + b - - -@extend.register(pd.DataFrame) -def extend_pd_dataframe(a, b): - """ - Examples: - >>> extend(pd.DataFrame({"a": []}), pd.DataFrame({"a": []})) - Empty DataFrame - Columns: [a] - Index: [] - - >>> extend(pd.DataFrame({"a": [1,2,3]}), pd.DataFrame({"a": [4,5,6]})) - a - 0 1 - 1 2 - 2 3 - 3 4 - 4 5 - 5 6 - """ - return pd.concat((a, b), ignore_index=True) - - -@extend.register(np.ndarray) -def extend_np_ndarray(a, b): - """ - Examples: - >>> extend(np.array([(1,2,3), (4,5,6)]), np.array([(7,8,9)])) - array([[1, 2, 3], - [4, 5, 6], - [7, 8, 9]]) - """ - return np.row_stack([a, b]) - - -@extend.register(dict) -def extend_dict(a, b): - """ - Examples: - >>> extend({"a": "cats"}, {"b": "dogs"}) - {'a': 'cats', 'b': 'dogs'} - """ - return dict(a, **b) - - -def append(a: List[T], b: T) -> List[T]: - """ - Function to create a new list with an item appended to it. - - Examples: - Given a starting list `a`: - >>> a = [1, 2, 3] - - ... we can append a value: - >>> append(a, 4) - [1, 2, 3, 4] - - `a` is unchanged - >>> a == [1, 2, 3] - True - - Why not just use `list.append`? `list.append` mutates `a` in place, which we can't allow - in the AER cycle – parts of the cycle rely on purely functional code which doesn't - (accidentally or intentionally) manipulate existing data. - >>> list.append(a, 4) # not what we want - >>> a - [1, 2, 3, 4] - """ - return a + [b] - - -def wrap_to_use_state(f): - """Decorator to make target `f` into a function on a `State` and `**kwargs`. - - This wrapper makes it easier to pass arguments to a function from a State. - - It was inspired by the pytest "fixtures" mechanism. - - Args: - f: a function with arguments that could be fields on a `State` - and that returns a `Delta`. - - Returns: a version of `f` which takes and returns `State` objects. - - Examples: - >>> from autora.state.delta import State, Delta - >>> from dataclasses import dataclass, field - >>> import pandas as pd - >>> from typing import List, Optional - - The `State` it operates on needs to have the metadata described in the state module: - >>> @dataclass(frozen=True) - ... class S(State): - ... conditions: List[int] = field(metadata={"delta": "replace"}) - - We indicate the inputs required by the parameter names. - The output must be a `Delta` object. - >>> from autora.state.delta import Delta - >>> @wrap_to_use_state - ... def experimentalist(conditions): - ... new_conditions = [c + 10 for c in conditions] - ... return Delta(conditions=new_conditions) - - >>> experimentalist(S(conditions=[1,2,3,4])) - S(conditions=[11, 12, 13, 14]) - - >>> experimentalist(S(conditions=[101,102,103,104])) - S(conditions=[111, 112, 113, 114]) - - >>> from autora.variable import VariableCollection, Variable - >>> from sklearn.base import BaseEstimator - >>> from sklearn.linear_model import LinearRegression - - >>> @wrap_to_use_state - ... def theorist(experiment_data: pd.DataFrame, variables: VariableCollection, **kwargs): - ... ivs = [v.name for v in variables.independent_variables] - ... dvs = [v.name for v in variables.dependent_variables] - ... X, y = experiment_data[ivs], experiment_data[dvs] - ... new_model = LinearRegression(fit_intercept=True).set_params(**kwargs).fit(X, y) - ... return Delta(model=new_model) - - >>> @dataclass(frozen=True) - ... class T(State): - ... variables: VariableCollection # field(metadata={"delta":... }) omitted ∴ immutable - ... experiment_data: pd.DataFrame = field(metadata={"delta": "extend"}) - ... model: Optional[BaseEstimator] = field(metadata={"delta": "replace"}, default=None) - - >>> t = T( - ... variables=VariableCollection(independent_variables=[Variable("x")], - ... dependent_variables=[Variable("y")]), - ... experiment_data=pd.DataFrame({"x": [0,1,2,3,4], "y": [2,3,4,5,6]}) - ... ) - >>> t_prime = theorist(t) - >>> t_prime.model.coef_, t_prime.model.intercept_ - (array([[1.]]), array([2.])) - - Arguments from the state can be overridden by passing them in as keyword arguments (kwargs): - >>> theorist(t, experiment_data=pd.DataFrame({"x": [0,1,2,3], "y": [12,13,14,15]}))\\ - ... .model.intercept_ - array([12.]) - - ... and other arguments supported by the inner function can also be passed - (if and only if the inner function allows for and handles `**kwargs` arguments alongside - the values from the state). - >>> theorist(t, fit_intercept=False).model.intercept_ - 0.0 - - Any parameters not provided by the state must be provided by default values or by the - caller. If the default is specified: - >>> @wrap_to_use_state - ... def experimentalist(conditions, offset=25): - ... new_conditions = [c + offset for c in conditions] - ... return Delta(conditions=new_conditions) - - ... then it need not be passed. - >>> experimentalist(S(conditions=[1,2,3,4])) - S(conditions=[26, 27, 28, 29]) - - If a default isn't specified: - >>> @wrap_to_use_state - ... def experimentalist(conditions, offset): - ... new_conditions = [c + offset for c in conditions] - ... return Delta(conditions=new_conditions) - - ... then calling the experimentalist without it will throw an error: - >>> experimentalist(S(conditions=[1,2,3,4])) - Traceback (most recent call last): - ... - TypeError: experimentalist() missing 1 required positional argument: 'offset' - - ... which can be fixed by passing the argument as a keyword to the wrapped function. - >>> experimentalist(S(conditions=[1,2,3,4]), offset=2) - S(conditions=[3, 4, 5, 6]) - - """ - # Get the set of parameter names from function f's signature - parameters_ = set(inspect.signature(f).parameters.keys()) - - @wraps(f) - def _f(state_: S, /, **kwargs) -> S: - # Get the parameters needed which are available from the state_. - # All others must be provided as kwargs or default values on f. - assert dataclasses.is_dataclass(state_) - from_state = parameters_.intersection( - {i.name for i in dataclasses.fields(state_)} - ) - arguments_from_state = {k: getattr(state_, k) for k in from_state} - arguments = dict(arguments_from_state, **kwargs) - delta = f(**arguments) - new_state = state_ + delta - return new_state - - return _f diff --git a/src/autora/state/history.py b/src/autora/state/history.py deleted file mode 100644 index fbb33944..00000000 --- a/src/autora/state/history.py +++ /dev/null @@ -1,722 +0,0 @@ -""" Classes for storing and passing a cycle's state as an immutable history. """ -from __future__ import annotations - -from dataclasses import dataclass -from typing import Any, Dict, Iterable, List, Optional, Sequence, Set, Union - -from numpy.typing import ArrayLike -from sklearn.base import BaseEstimator - -from autora.state.delta import Delta -from autora.state.protocol import ( - ResultKind, - SupportsControllerStateHistory, - SupportsDataKind, -) -from autora.state.snapshot import Snapshot -from autora.variable import VariableCollection - - -class History(SupportsControllerStateHistory): - """ - An immutable object for tracking the state and history of an AER cycle. - """ - - def __init__( - self, - variables: Optional[VariableCollection] = None, - params: Optional[Dict] = None, - conditions: Optional[List[ArrayLike]] = None, - observations: Optional[List[ArrayLike]] = None, - models: Optional[List[BaseEstimator]] = None, - history: Optional[Sequence[Result]] = None, - ): - """ - - Args: - variables: a single datum to be marked as "variables" - params: a single datum to be marked as "params" - conditions: an iterable of data, each to be marked as "conditions" - observations: an iterable of data, each to be marked as "observations" - models: an iterable of data, each to be marked as "models" - history: an iterable of Result objects to be used as the initial history. - - Examples: - Empty input leads to an empty state: - >>> History() - History([]) - - ... or with values for any or all of the parameters: - >>> from autora.variable import VariableCollection - >>> History(variables=VariableCollection()) # doctest: +ELLIPSIS - History([Result(data=VariableCollection(...), kind=ResultKind.VARIABLES)]) - - >>> History(params={"some": "params"}) - History([Result(data={'some': 'params'}, kind=ResultKind.PARAMS)]) - - >>> History(conditions=["a condition"]) - History([Result(data='a condition', kind=ResultKind.CONDITION)]) - - >>> History(observations=["an observation"]) - History([Result(data='an observation', kind=ResultKind.OBSERVATION)]) - - >>> from sklearn.linear_model import LinearRegression - >>> History(models=[LinearRegression()]) - History([Result(data=LinearRegression(), kind=ResultKind.MODEL)]) - - Parameters passed to the constructor are included in the history in the following order: - `history`, `variables`, `params`, `conditions`, `observations`, `models` - >>> History(models=['m1', 'm2'], conditions=['c1', 'c2'], - ... observations=['o1', 'o2'], params={'a': 'param'}, - ... variables=VariableCollection(), - ... history=[Result("from history", ResultKind.VARIABLES)] - ... ) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - History([Result(data='from history', kind=ResultKind.VARIABLES), - Result(data=VariableCollection(...), kind=ResultKind.VARIABLES), - Result(data={'a': 'param'}, kind=ResultKind.PARAMS), - Result(data='c1', kind=ResultKind.CONDITION), - Result(data='c2', kind=ResultKind.CONDITION), - Result(data='o1', kind=ResultKind.OBSERVATION), - Result(data='o2', kind=ResultKind.OBSERVATION), - Result(data='m1', kind=ResultKind.MODEL), - Result(data='m2', kind=ResultKind.MODEL)]) - """ - self.data: List - - if history is not None: - self.data = list(history) - else: - self.data = [] - - self.data += _init_result_list( - variables=variables, - params=params, - conditions=conditions, - observations=observations, - models=models, - ) - - def update( - self, - variables=None, - params=None, - conditions=None, - observations=None, - models=None, - history=None, - ): - """ - Create a new object with updated values. - - Examples: - The initial object is empty: - >>> h0 = History() - >>> h0 - History([]) - - We can update the variables using the `.update` method: - >>> from autora.variable import VariableCollection - >>> h1 = h0.update(variables=VariableCollection()) - >>> h1 # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - History([Result(data=VariableCollection(...), kind=ResultKind.VARIABLES)]) - - ... the original object is unchanged: - >>> h0 - History([]) - - We can update the variables again: - >>> h2 = h1.update(variables=VariableCollection(["some IV"])) - >>> h2._by_kind # doctest: +ELLIPSIS - Snapshot(variables=VariableCollection(independent_variables=['some IV'],...), ...) - - ... and we see that there is only ever one variables object returned. - - Params is treated the same way as variables: - >>> hp = h0.update(params={'first': 'params'}) - >>> hp - History([Result(data={'first': 'params'}, kind=ResultKind.PARAMS)]) - - ... where only the most recent "params" object is returned from the `.params` property. - >>> hp = hp.update(params={'second': 'params'}) - >>> hp.params - {'second': 'params'} - - ... however, the full history of the params objects remains available, if needed: - >>> hp # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'first': 'params'}, kind=ResultKind.PARAMS), - Result(data={'second': 'params'}, kind=ResultKind.PARAMS)]) - - When we update the conditions, observations or models, a new entry is added to the - history: - >>> h3 = h0.update(models=["1st model"]) - >>> h3 # doctest: +NORMALIZE_WHITESPACE - History([Result(data='1st model', kind=ResultKind.MODEL)]) - - ... so we can see the history of all the models, for instance. - >>> h3 = h3.update(models=["2nd model"]) # doctest: +NORMALIZE_WHITESPACE - >>> h3 # doctest: +NORMALIZE_WHITESPACE - History([Result(data='1st model', kind=ResultKind.MODEL), - Result(data='2nd model', kind=ResultKind.MODEL)]) - - ... and the full history of models is available using the `.models` parameter: - >>> h3.models - ['1st model', '2nd model'] - - The same for the observations: - >>> h4 = h0.update(observations=["1st observation"]) - >>> h4 - History([Result(data='1st observation', kind=ResultKind.OBSERVATION)]) - - >>> h4.update(observations=["2nd observation"] - ... ) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - History([Result(data='1st observation', kind=ResultKind.OBSERVATION), - Result(data='2nd observation', kind=ResultKind.OBSERVATION)]) - - - The same for the conditions: - >>> h5 = h0.update(conditions=["1st condition"]) - >>> h5 - History([Result(data='1st condition', kind=ResultKind.CONDITION)]) - - >>> h5.update(conditions=["2nd condition"]) # doctest: +NORMALIZE_WHITESPACE - History([Result(data='1st condition', kind=ResultKind.CONDITION), - Result(data='2nd condition', kind=ResultKind.CONDITION)]) - - You can also update with multiple conditions, observations and models: - >>> h0.update(conditions=['c1', 'c2']) # doctest: +NORMALIZE_WHITESPACE - History([Result(data='c1', kind=ResultKind.CONDITION), - Result(data='c2', kind=ResultKind.CONDITION)]) - - >>> h0.update(models=['m1', 'm2'], variables={'m': 1} - ... ) # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'m': 1}, kind=ResultKind.VARIABLES), - Result(data='m1', kind=ResultKind.MODEL), - Result(data='m2', kind=ResultKind.MODEL)]) - - >>> h0.update(models=['m1'], observations=['o1'], variables={'m': 1} - ... ) # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'m': 1}, kind=ResultKind.VARIABLES), - Result(data='o1', kind=ResultKind.OBSERVATION), - Result(data='m1', kind=ResultKind.MODEL)]) - - We can also update with a complete history: - >>> History().update(history=[Result(data={'m': 2}, kind=ResultKind.VARIABLES), - ... Result(data='o1', kind=ResultKind.OBSERVATION), - ... Result(data='m1', kind=ResultKind.MODEL)], - ... conditions=['c1'] - ... ) # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'m': 2}, kind=ResultKind.VARIABLES), - Result(data='o1', kind=ResultKind.OBSERVATION), - Result(data='m1', kind=ResultKind.MODEL), - Result(data='c1', kind=ResultKind.CONDITION)]) - - """ - - if history is not None: - history_extension = history - else: - history_extension = [] - - history_extension += _init_result_list( - variables=variables, - params=params, - conditions=conditions, - observations=observations, - models=models, - ) - new_full_history = self.data + history_extension - - return History(history=new_full_history) - - def __add__(self, other: Delta): - """The initial object is empty: - >>> h0 = History() - >>> h0 - History([]) - - We can update the variables using the `.update` method: - >>> from autora.variable import VariableCollection - >>> h1 = h0 + Delta(variables=VariableCollection()) - >>> h1 # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - History([Result(data=VariableCollection(...), kind=ResultKind.VARIABLES)]) - - ... the original object is unchanged: - >>> h0 - History([]) - - We can update the variables again: - >>> h2 = h1 + Delta(variables=VariableCollection(["some IV"])) - >>> h2._by_kind # doctest: +ELLIPSIS - Snapshot(variables=VariableCollection(independent_variables=['some IV'],...), ...) - - ... and we see that there is only ever one variables object returned. - - Params is treated the same way as variables: - >>> hp = h0 + Delta(params={'first': 'params'}) - >>> hp - History([Result(data={'first': 'params'}, kind=ResultKind.PARAMS)]) - - ... where only the most recent "params" object is returned from the `.params` property. - >>> hp = hp + Delta(params={'second': 'params'}) - >>> hp.params - {'second': 'params'} - - ... however, the full history of the params objects remains available, if needed: - >>> hp # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'first': 'params'}, kind=ResultKind.PARAMS), - Result(data={'second': 'params'}, kind=ResultKind.PARAMS)]) - - When we update the conditions, observations or models, a new entry is added to the - history: - >>> h3 = h0 + Delta(models=["1st model"]) - >>> h3 # doctest: +NORMALIZE_WHITESPACE - History([Result(data='1st model', kind=ResultKind.MODEL)]) - - ... so we can see the history of all the models, for instance. - >>> h3 = h3 + Delta(models=["2nd model"]) # doctest: +NORMALIZE_WHITESPACE - >>> h3 # doctest: +NORMALIZE_WHITESPACE - History([Result(data='1st model', kind=ResultKind.MODEL), - Result(data='2nd model', kind=ResultKind.MODEL)]) - - ... and the full history of models is available using the `.models` parameter: - >>> h3.models - ['1st model', '2nd model'] - - The same for the observations: - >>> h4 = h0 + Delta(observations=["1st observation"]) - >>> h4 - History([Result(data='1st observation', kind=ResultKind.OBSERVATION)]) - - >>> h4 + Delta(observations=["2nd observation"] - ... ) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - History([Result(data='1st observation', kind=ResultKind.OBSERVATION), - Result(data='2nd observation', kind=ResultKind.OBSERVATION)]) - - - The same for the conditions: - >>> h5 = h0 + Delta(conditions=["1st condition"]) - >>> h5 - History([Result(data='1st condition', kind=ResultKind.CONDITION)]) - - >>> h5 + Delta(conditions=["2nd condition"]) # doctest: +NORMALIZE_WHITESPACE - History([Result(data='1st condition', kind=ResultKind.CONDITION), - Result(data='2nd condition', kind=ResultKind.CONDITION)]) - - You can also update with multiple conditions, observations and models: - >>> h0 + Delta(conditions=['c1', 'c2']) # doctest: +NORMALIZE_WHITESPACE - History([Result(data='c1', kind=ResultKind.CONDITION), - Result(data='c2', kind=ResultKind.CONDITION)]) - - >>> h0 + Delta(models=['m1', 'm2'], variables={'m': 1} - ... ) # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'m': 1}, kind=ResultKind.VARIABLES), - Result(data='m1', kind=ResultKind.MODEL), - Result(data='m2', kind=ResultKind.MODEL)]) - - >>> h0 + Delta(models=['m1'], observations=['o1'], variables={'m': 1} - ... ) # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'m': 1}, kind=ResultKind.VARIABLES), - Result(data='o1', kind=ResultKind.OBSERVATION), - Result(data='m1', kind=ResultKind.MODEL)]) - - We can also update with a complete history: - >>> History() + Delta(history=[Result(data={'m': 2}, kind=ResultKind.VARIABLES), - ... Result(data='o1', kind=ResultKind.OBSERVATION), - ... Result(data='m1', kind=ResultKind.MODEL)], - ... conditions=['c1'] - ... ) # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'m': 2}, kind=ResultKind.VARIABLES), - Result(data='o1', kind=ResultKind.OBSERVATION), - Result(data='m1', kind=ResultKind.MODEL), - Result(data='c1', kind=ResultKind.CONDITION)]) - """ - return self.update(**other) - - def __repr__(self): - return f"{type(self).__name__}({self.history})" - - @property - def _by_kind(self): - return _history_to_kind(self.data) - - @property - def variables(self) -> VariableCollection: - """ - - Examples: - The initial object is empty: - >>> h = History() - - ... and returns an emtpy variables object - >>> h.variables - VariableCollection(independent_variables=[], dependent_variables=[], covariates=[]) - - We can update the variables using the `.update` method: - >>> from autora.variable import VariableCollection - >>> h = h.update(variables=VariableCollection(independent_variables=['some IV'])) - >>> h.variables # doctest: +ELLIPSIS - VariableCollection(independent_variables=['some IV'], ...) - - We can update the variables again: - >>> h = h.update(variables=VariableCollection(["some other IV"])) - >>> h.variables # doctest: +ELLIPSIS - VariableCollection(independent_variables=['some other IV'], ...) - - ... and we see that there is only ever one variables object returned.""" - return self._by_kind.variables - - @property - def params(self) -> Dict: - """ - - Returns: - - Examples: - Params is treated the same way as variables: - >>> h = History() - >>> h = h.update(params={'first': 'params'}) - >>> h.params - {'first': 'params'} - - ... where only the most recent "params" object is returned from the `.params` property. - >>> h = h.update(params={'second': 'params'}) - >>> h.params - {'second': 'params'} - - ... however, the full history of the params objects remains available, if needed: - >>> h # doctest: +NORMALIZE_WHITESPACE - History([Result(data={'first': 'params'}, kind=ResultKind.PARAMS), - Result(data={'second': 'params'}, kind=ResultKind.PARAMS)]) - """ - return self._by_kind.params - - @property - def conditions(self) -> List[ArrayLike]: - """ - Returns: - - Examples: - View the sequence of models with one conditions: - >>> h = History(conditions=[(1,2,3,)]) - >>> h.conditions - [(1, 2, 3)] - - ... or more conditions: - >>> h = h.update(conditions=[(4,5,6),(7,8,9)]) # doctest: +NORMALIZE_WHITESPACE - >>> h.conditions - [(1, 2, 3), (4, 5, 6), (7, 8, 9)] - - """ - return self._by_kind.conditions - - @property - def observations(self) -> List[ArrayLike]: - """ - - Returns: - - Examples: - The sequence of all observations is returned - >>> h = History(observations=["1st observation"]) - >>> h.observations - ['1st observation'] - - >>> h = h.update(observations=["2nd observation"]) - >>> h.observations # doctest: +ELLIPSIS - ['1st observation', '2nd observation'] - - """ - return self._by_kind.observations - - @property - def models(self) -> List[BaseEstimator]: - """ - - Returns: - - Examples: - View the sequence of models with one model: - >>> s = History(models=["1st model"]) - >>> s.models # doctest: +NORMALIZE_WHITESPACE - ['1st model'] - - ... or more models: - >>> s = s.update(models=["2nd model"]) # doctest: +NORMALIZE_WHITESPACE - >>> s.models - ['1st model', '2nd model'] - - """ - return self._by_kind.models - - @property - def history(self) -> List[Result]: - """ - - Examples: - We initialze some history: - >>> h = History(models=['m1', 'm2'], conditions=['c1', 'c2'], - ... observations=['o1', 'o2'], params={'a': 'param'}, - ... variables=VariableCollection(), - ... history=[Result("from history", ResultKind.VARIABLES)]) - - Parameters passed to the constructor are included in the history in the following order: - `history`, `variables`, `params`, `conditions`, `observations`, `models` - - >>> h.history # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - [Result(data='from history', kind=ResultKind.VARIABLES), - Result(data=VariableCollection(...), kind=ResultKind.VARIABLES), - Result(data={'a': 'param'}, kind=ResultKind.PARAMS), - Result(data='c1', kind=ResultKind.CONDITION), - Result(data='c2', kind=ResultKind.CONDITION), - Result(data='o1', kind=ResultKind.OBSERVATION), - Result(data='o2', kind=ResultKind.OBSERVATION), - Result(data='m1', kind=ResultKind.MODEL), - Result(data='m2', kind=ResultKind.MODEL)] - - If we add a new value, like the params object, the updated value is added to the - end of the history: - >>> h = h.update(params={'new': 'param'}) - >>> h.history # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - [..., Result(data={'new': 'param'}, kind=ResultKind.PARAMS)] - - """ - return self.data - - def filter_by(self, kind: Optional[Set[Union[str, ResultKind]]] = None) -> History: - """ - Return a copy of the object with only data belonging to the specified kinds. - - Examples: - >>> h = History(models=['m1', 'm2'], conditions=['c1', 'c2'], - ... observations=['o1', 'o2'], params={'a': 'param'}, - ... variables=VariableCollection(), - ... history=[Result("from history", ResultKind.VARIABLES)]) - - >>> h.filter_by(kind={"MODEL"}) # doctest: +NORMALIZE_WHITESPACE - History([Result(data='m1', kind=ResultKind.MODEL), - Result(data='m2', kind=ResultKind.MODEL)]) - - >>> h.filter_by(kind={ResultKind.OBSERVATION}) # doctest: +NORMALIZE_WHITESPACE - History([Result(data='o1', kind=ResultKind.OBSERVATION), - Result(data='o2', kind=ResultKind.OBSERVATION)]) - - If we don't specify any filter criteria, we get the full history back: - >>> h.filter_by() # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS - History([Result(data='from history', kind=ResultKind.VARIABLES), - Result(data=VariableCollection(...), kind=ResultKind.VARIABLES), - Result(data={'a': 'param'}, kind=ResultKind.PARAMS), - Result(data='c1', kind=ResultKind.CONDITION), - Result(data='c2', kind=ResultKind.CONDITION), - Result(data='o1', kind=ResultKind.OBSERVATION), - Result(data='o2', kind=ResultKind.OBSERVATION), - Result(data='m1', kind=ResultKind.MODEL), - Result(data='m2', kind=ResultKind.MODEL)]) - - """ - if kind is None: - return self - else: - kind_ = {ResultKind(s) for s in kind} - filtered_history = _filter_history(self.data, kind_) - new_object = History(history=filtered_history) - return new_object - - -@dataclass(frozen=True) -class Result(SupportsDataKind): - """ - Container class for data and variables. - - Examples: - >>> Result() - Result(data=None, kind=None) - - >>> Result("a") - Result(data='a', kind=None) - - >>> Result(None, "MODEL") - Result(data=None, kind=ResultKind.MODEL) - - >>> Result(data="b") - Result(data='b', kind=None) - - >>> Result("c", "OBSERVATION") - Result(data='c', kind=ResultKind.OBSERVATION) - """ - - data: Optional[Any] = None - kind: Optional[ResultKind] = None - - def __post_init__(self): - if isinstance(self.kind, str): - object.__setattr__(self, "kind", ResultKind(self.kind)) - - -def _init_result_list( - variables: Optional[VariableCollection] = None, - params: Optional[Dict] = None, - conditions: Optional[Iterable[ArrayLike]] = None, - observations: Optional[Iterable[ArrayLike]] = None, - models: Optional[Iterable[BaseEstimator]] = None, -) -> List[Result]: - """ - Initialize a list of Result objects - - Returns: - - Args: - variables: a single datum to be marked as "variables" - params: a single datum to be marked as "params" - conditions: an iterable of data, each to be marked as "conditions" - observations: an iterable of data, each to be marked as "observations" - models: an iterable of data, each to be marked as "models" - - Examples: - Empty input leads to an empty state: - >>> _init_result_list() - [] - - ... or with values for any or all of the parameters: - >>> from autora.variable import VariableCollection - >>> _init_result_list(variables=VariableCollection()) # doctest: +ELLIPSIS - [Result(data=VariableCollection(...), kind=ResultKind.VARIABLES)] - - >>> _init_result_list(params={"some": "params"}) - [Result(data={'some': 'params'}, kind=ResultKind.PARAMS)] - - >>> _init_result_list(conditions=["a condition"]) - [Result(data='a condition', kind=ResultKind.CONDITION)] - - >>> _init_result_list(observations=["an observation"]) - [Result(data='an observation', kind=ResultKind.OBSERVATION)] - - >>> from sklearn.linear_model import LinearRegression - >>> _init_result_list(models=[LinearRegression()]) - [Result(data=LinearRegression(), kind=ResultKind.MODEL)] - - The input arguments are added to the data in the order `variables`, - `params`, `conditions`, `observations`, `models`: - >>> _init_result_list(variables=VariableCollection(), - ... params={"some": "params"}, - ... conditions=["a condition"], - ... observations=["an observation", "another observation"], - ... models=[LinearRegression()], - ... ) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS - [Result(data=VariableCollection(...), kind=ResultKind.VARIABLES), - Result(data={'some': 'params'}, kind=ResultKind.PARAMS), - Result(data='a condition', kind=ResultKind.CONDITION), - Result(data='an observation', kind=ResultKind.OBSERVATION), - Result(data='another observation', kind=ResultKind.OBSERVATION), - Result(data=LinearRegression(), kind=ResultKind.MODEL)] - - """ - data = [] - - if variables is not None: - data.append(Result(variables, ResultKind.VARIABLES)) - - if params is not None: - data.append(Result(params, ResultKind.PARAMS)) - - for seq, kind in [ - (conditions, ResultKind.CONDITION), - (observations, ResultKind.OBSERVATION), - (models, ResultKind.MODEL), - ]: - if seq is not None: - for i in seq: - data.append(Result(i, kind=kind)) - - return data - - -def _history_to_kind(history: Sequence[Result]) -> Snapshot: - """ - Convert a sequence of results into a Snapshot instance: - - Examples: - History might be empty - >>> history_ = [] - >>> _history_to_kind(history_) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS - Snapshot(variables=VariableCollection(...), params={}, - conditions=[], observations=[], models=[]) - - ... or with values for any or all of the parameters: - >>> history_ = _init_result_list(params={"some": "params"}) - >>> _history_to_kind(history_) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS - Snapshot(..., params={'some': 'params'}, ...) - - >>> history_ += _init_result_list(conditions=["a condition"]) - >>> _history_to_kind(history_) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS - Snapshot(..., params={'some': 'params'}, conditions=['a condition'], ...) - - >>> _history_to_kind(history_).params - {'some': 'params'} - - >>> history_ += _init_result_list(observations=["an observation"]) - >>> _history_to_kind(history_) # doctest: +NORMALIZE_WHITESPACE +ELLIPSIS - Snapshot(..., params={'some': 'params'}, conditions=['a condition'], - observations=['an observation'], ...) - - >>> from sklearn.linear_model import LinearRegression - >>> history_ = [Result(LinearRegression(), kind=ResultKind.MODEL)] - >>> _history_to_kind(history_) # doctest: +ELLIPSIS - Snapshot(..., models=[LinearRegression()]) - - >>> from autora.variable import VariableCollection, IV - >>> variables = VariableCollection(independent_variables=[IV(name="example")]) - >>> history_ = [Result(variables, kind=ResultKind.VARIABLES)] - >>> _history_to_kind(history_) # doctest: +ELLIPSIS - Snapshot(variables=VariableCollection(independent_variables=[IV(name='example', ... - - >>> history_ = [Result({'some': 'params'}, kind=ResultKind.PARAMS)] - >>> _history_to_kind(history_) # doctest: +ELLIPSIS - Snapshot(..., params={'some': 'params'}, ...) - - """ - namespace = Snapshot( - variables=_get_last_data_with_default( - history, kind={ResultKind.VARIABLES}, default=VariableCollection() - ), - params=_get_last_data_with_default( - history, kind={ResultKind.PARAMS}, default={} - ), - observations=_list_data( - _filter_history(history, kind={ResultKind.OBSERVATION}) - ), - models=_list_data(_filter_history(history, kind={ResultKind.MODEL})), - conditions=_list_data(_filter_history(history, kind={ResultKind.CONDITION})), - ) - return namespace - - -def _list_data(data: Sequence[SupportsDataKind]): - """ - Extract the `.data` attribute of each item in a sequence, and return as a list. - - Examples: - >>> _list_data([]) - [] - - >>> _list_data([Result("a"), Result("b")]) - ['a', 'b'] - """ - return list(r.data for r in data) - - -def _filter_history(data: Iterable[SupportsDataKind], kind: Set[ResultKind]): - return filter(lambda r: r.kind in kind, data) - - -def _get_last(data: Sequence[SupportsDataKind], kind: Set[ResultKind]): - results_new_to_old = reversed(data) - last_of_kind = next(_filter_history(results_new_to_old, kind=kind)) - return last_of_kind - - -def _get_last_data_with_default(data: Sequence[SupportsDataKind], kind, default): - try: - result = _get_last(data, kind).data - except StopIteration: - result = default - return result diff --git a/src/autora/state/param.py b/src/autora/state/param.py deleted file mode 100644 index 1fca3cfc..00000000 --- a/src/autora/state/param.py +++ /dev/null @@ -1,143 +0,0 @@ -""" Functions for handling cycle-state-dependent parameters. """ -from __future__ import annotations - -import copy -import logging -from typing import Dict, Mapping - -import numpy as np - -from autora.state.protocol import SupportsControllerState -from autora.utils.deprecation import deprecate as deprecate -from autora.utils.dictionary import LazyDict - -_logger = logging.getLogger(__name__) - - -def _get_state_dependent_properties(state: SupportsControllerState): - """ - Examples: - Even with an empty data object, we can initialize the dictionary, - >>> from autora.state.snapshot import Snapshot - >>> state_dependent_properties = _get_state_dependent_properties(Snapshot()) - - ... but it will raise an exception if a value isn't yet available when we try to use it - >>> state_dependent_properties["%models[-1]%"] # doctest: +ELLIPSIS - Traceback (most recent call last): - ... - IndexError: list index out of range - - Nevertheless, we can iterate through its keys no problem: - >>> [key for key in state_dependent_properties.keys()] # doctest: +NORMALIZE_WHITESPACE - ['%observations.ivs[-1]%', '%observations.dvs[-1]%', '%observations.ivs%', - '%observations.dvs%', '%experiment_data.conditions[-1]%', - '%experiment_data.observations[-1]%', '%experiment_data.conditions%', - '%experiment_data.observations%', '%models[-1]%', '%models%'] - - """ - - n_ivs = len(state.variables.independent_variables) - n_dvs = len(state.variables.dependent_variables) - state_dependent_property_dict = LazyDict( - { - "%observations.ivs[-1]%": deprecate( - lambda: np.array(state.observations[-1])[:, 0:n_ivs], - "%observations.ivs[-1]% is deprecated, " - "use %experiment_data.conditions[-1]% instead.", - ), - "%observations.dvs[-1]%": deprecate( - lambda: np.array(state.observations[-1])[:, n_ivs:], - "%observations.dvs[-1]% is deprecated, " - "use %experiment_data.observations[-1]% instead.", - ), - "%observations.ivs%": deprecate( - lambda: np.row_stack( - [np.empty([0, n_ivs + n_dvs])] + list(state.observations) - )[:, 0:n_ivs], - "%observations.ivs% is deprecated, use %experiment_data.conditions% instead.", - ), - "%observations.dvs%": deprecate( - lambda: np.row_stack(state.observations)[:, n_ivs:], - "%observations.dvs% is deprecated, " - "use %experiment_data.observations% instead", - ), - "%experiment_data.conditions[-1]%": lambda: np.array( - state.observations[-1] - )[:, 0:n_ivs], - "%experiment_data.observations[-1]%": lambda: np.array( - state.observations[-1] - )[:, n_ivs:], - "%experiment_data.conditions%": lambda: np.row_stack( - [np.empty([0, n_ivs + n_dvs])] + list(state.observations) - )[:, 0:n_ivs], - "%experiment_data.observations%": lambda: np.row_stack(state.observations)[ - :, n_ivs: - ], - "%models[-1]%": lambda: state.models[-1], - "%models%": lambda: state.models, - } - ) - return state_dependent_property_dict - - -def _resolve_properties(params: Dict, state_dependent_properties: Mapping): - """ - Resolve state-dependent properties inside a nested dictionary. - - In this context, a state-dependent-property is a string which is meant to be replaced by its - updated, current value before the dictionary is used. A state-dependent property might be - something like "the last theorist available" or "all the experimental results until now". - - Args: - params: a (nested) dictionary of keys and values, where some values might be - "cycle property names" - state_dependent_properties: a dictionary of "property names" and their "real values" - - Returns: a (nested) dictionary where "property names" are replaced by the "real values" - - Examples: - - >>> params_0 = {"key": "%foo%"} - >>> cycle_properties_0 = {"%foo%": 180} - >>> _resolve_properties(params_0,cycle_properties_0) - {'key': 180} - - >>> params_1 = {"key": "%bar%", "nested_dict": {"inner_key": "%foobar%"}} - >>> cycle_properties_1 = {"%bar%": 1, "%foobar%": 2} - >>> _resolve_properties(params_1,cycle_properties_1) - {'key': 1, 'nested_dict': {'inner_key': 2}} - - >>> params_2 = {"key": "baz"} - >>> _resolve_properties(params_2,cycle_properties_1) - {'key': 'baz'} - - """ - params_ = copy.copy(params) - for key, value in params_.items(): - if isinstance(value, dict): - params_[key] = _resolve_properties(value, state_dependent_properties) - elif isinstance(value, str) and ( - value in state_dependent_properties - ): # value is a key in the cycle_properties dictionary - params_[key] = state_dependent_properties[value] - else: - _logger.debug(f"leaving {params=} unchanged") - - return params_ - - -def resolve_state_params(params: Dict, state: SupportsControllerState) -> Dict: - """ - Returns the `params` attribute of the input, with `cycle properties` resolved. - - Examples: - >>> from autora.state.history import History - >>> params = {"experimentalist": {"source": "%models[-1]%"}} - >>> s = History(models=["the first model", "the second model"]) - >>> resolve_state_params(params, s) - {'experimentalist': {'source': 'the second model'}} - - """ - state_dependent_properties = _get_state_dependent_properties(state) - resolved_params = _resolve_properties(params, state_dependent_properties) - return resolved_params diff --git a/src/autora/state/protocol.py b/src/autora/state/protocol.py deleted file mode 100644 index e1a16be7..00000000 --- a/src/autora/state/protocol.py +++ /dev/null @@ -1,158 +0,0 @@ -from enum import Enum -from typing import ( - Any, - Dict, - Generic, - Mapping, - Optional, - Protocol, - Sequence, - Set, - TypeVar, - Union, - runtime_checkable, -) - -from numpy.typing import ArrayLike -from sklearn.base import BaseEstimator - -from autora.variable import VariableCollection - -State = TypeVar("State") - - -class ResultKind(str, Enum): - """ - Kinds of results which can be held in the Result object. - - Examples: - >>> ResultKind.CONDITION is ResultKind.CONDITION - True - - >>> ResultKind.CONDITION is ResultKind.VARIABLES - False - - >>> ResultKind.CONDITION == "CONDITION" - True - - >>> ResultKind.CONDITION == "VARIABLES" - False - - >>> ResultKind.CONDITION in {ResultKind.CONDITION, ResultKind.PARAMS} - True - - >>> ResultKind.VARIABLES in {ResultKind.CONDITION, ResultKind.PARAMS} - False - """ - - CONDITION = "CONDITION" - OBSERVATION = "OBSERVATION" - MODEL = "MODEL" - PARAMS = "PARAMS" - VARIABLES = "VARIABLES" - - def __repr__(self): - cls_name = self.__class__.__name__ - return f"{cls_name}.{self.name}" - - -class SupportsDataKind(Protocol): - """Object with attributes for `data` and `kind`""" - - data: Optional[Any] - kind: Optional[ResultKind] - - -class SupportsStateParamsField(Protocol): - """Support a state with a params property.""" - - params: Dict - - -class SupportsStateParamsProperty(Protocol): - """Support a state with a params property.""" - - @property - def params(self) -> Dict: - ... - - -SupportsStateParams = Union[SupportsStateParamsField, SupportsStateParamsProperty] - - -class SupportsControllerStateFields(Protocol): - """Support representing snapshots of a controller state as mutable fields.""" - - variables: VariableCollection - params: Dict - conditions: Sequence[ArrayLike] - observations: Sequence[ArrayLike] - models: Sequence[BaseEstimator] - - def update(self: State, **kwargs) -> State: - ... - - -class SupportsControllerStateProperties(Protocol): - """Support representing snapshots of a controller state as immutable properties.""" - - def update(self: State, **kwargs) -> State: - ... - - @property - def variables(self) -> VariableCollection: - ... - - @property - def params(self) -> Dict: - ... - - @property - def conditions(self) -> Sequence[ArrayLike]: - ... - - @property - def observations(self) -> Sequence[ArrayLike]: - ... - - @property - def models(self) -> Sequence[BaseEstimator]: - ... - - -SupportsControllerState = Union[ - SupportsControllerStateFields, SupportsControllerStateProperties -] - - -class SupportsControllerStateHistory(SupportsControllerStateProperties, Protocol): - """Represents controller state as a linear sequence of entries.""" - - def __init__(self, history: Sequence[SupportsDataKind]): - ... - - def filter_by(self: State, kind: Optional[Set[Union[str, ResultKind]]]) -> State: - ... - - @property - def history(self) -> Sequence[SupportsDataKind]: - ... - - -class Executor(Protocol, Generic[State]): - """A Callable which, given some state, and some parameters, returns an updated state.""" - - def __call__(self, __state: State, params: Dict) -> State: - ... - - -ExecutorCollection = Mapping[str, Executor] - - -@runtime_checkable -class SupportsLoadDump(Protocol): - def dump(self, data, file) -> None: - ... - - def load(self, file) -> Any: - ... diff --git a/src/autora/state/snapshot.py b/src/autora/state/snapshot.py deleted file mode 100644 index 21be8171..00000000 --- a/src/autora/state/snapshot.py +++ /dev/null @@ -1,201 +0,0 @@ -""" Classes for storing and passing a cycle's state as an immutable snapshot. """ -from dataclasses import dataclass, field -from typing import Dict, List - -from numpy.typing import ArrayLike -from sklearn.base import BaseEstimator - -from autora.state.delta import Delta -from autora.state.protocol import SupportsControllerStateFields -from autora.variable import VariableCollection - - -@dataclass(frozen=True) -class Snapshot(SupportsControllerStateFields): - """An object passed between and updated by processing steps in the Controller.""" - - # Single values - variables: VariableCollection = field(default_factory=VariableCollection) - params: Dict = field(default_factory=dict) - - # Sequences - conditions: List[ArrayLike] = field(default_factory=list) - observations: List[ArrayLike] = field(default_factory=list) - models: List[BaseEstimator] = field(default_factory=list) - - def update( - self, - variables=None, - params=None, - conditions=None, - observations=None, - models=None, - ): - """ - Create a new object with updated values. - - Examples: - The initial object is empty: - >>> s0 = Snapshot() - >>> s0 # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - Snapshot(variables=VariableCollection(...), params={}, conditions=[], - observations=[], models=[]) - - We can update the params using the `.update` method: - >>> s0.update(params={'first': 'params'}) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - Snapshot(..., params={'first': 'params'}, ...) - - ... but the original object is unchanged: - >>> s0 # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - Snapshot(..., params={}, ...) - - For params, only one object is returned from the respective property: - >>> s0.update(params={'first': 'params'}).update(params={'second': 'params'}).params - {'second': 'params'} - - ... and the same applies to variables: - >>> from autora.variable import VariableCollection, IV - >>> (s0.update(variables=VariableCollection([IV("1st IV")])) - ... .update(variables=VariableCollection([IV("2nd IV")]))).variables - VariableCollection(independent_variables=[IV(name='2nd IV',...)], ...) - - When we update the conditions, observations or models, the respective list is extended: - >>> s3 = s0.update(models=["1st model"]) - >>> s3 - Snapshot(..., models=['1st model']) - - ... so we can see the history of all the models, for instance. - >>> s3.update(models=["2nd model"]) - Snapshot(..., models=['1st model', '2nd model']) - - The same applies to observations: - >>> s4 = s0.update(observations=["1st observation"]) - >>> s4 - Snapshot(..., observations=['1st observation'], ...) - - >>> s4.update(observations=["2nd observation"]) # doctest: +ELLIPSIS - Snapshot(..., observations=['1st observation', '2nd observation'], ...) - - - The same applies to conditions: - >>> s5 = s0.update(conditions=["1st condition"]) - >>> s5 - Snapshot(..., conditions=['1st condition'], ...) - - >>> s5.update(conditions=["2nd condition"]) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - Snapshot(..., conditions=['1st condition', '2nd condition'], ...) - - You can also update with multiple conditions, observations and models: - >>> s0.update(conditions=['c1', 'c2']) - Snapshot(..., conditions=['c1', 'c2'], ...) - - >>> s0.update(models=['m1', 'm2'], variables={'m': 1}) - Snapshot(variables={'m': 1}, ..., models=['m1', 'm2']) - - >>> s0.update(models=['m1'], observations=['o1'], variables={'m': 1}) - Snapshot(variables={'m': 1}, ..., observations=['o1'], models=['m1']) - - - Inputs to models, observations and conditions must be Lists - which can be cast to lists: - >>> s0.update(models='m1') # doctest: +ELLIPSIS - Traceback (most recent call last): - ... - AssertionError: 'm1' must be a list, e.g. `['m1']`?) - - """ - - def _coalesce_lists(old, new): - assert isinstance( - old, List - ), f"{repr(old)} must be a list, e.g. `[{repr(old)}]`?)" - if new is not None: - assert isinstance( - new, List - ), f"{repr(new)} must be a list, e.g. `[{repr(new)}]`?)" - return old + list(new) - else: - return old - - variables_ = variables or self.variables - params_ = params or self.params - conditions_ = _coalesce_lists(self.conditions, conditions) - observations_ = _coalesce_lists(self.observations, observations) - models_ = _coalesce_lists(self.models, models) - return Snapshot(variables_, params_, conditions_, observations_, models_) - - def __add__(self, other: Delta): - """ - Add a delta to the object. - - Examples: - The initial object is empty: - >>> s0 = Snapshot() - >>> s0 # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - Snapshot(variables=VariableCollection(...), params={}, conditions=[], - observations=[], models=[]) - - We can update the params using the `+` operator: - >>> from autora.state.delta import Delta - >>> s0 + Delta(params={'first': 'params'}) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - Snapshot(..., params={'first': 'params'}, ...) - - ... but the original object is unchanged: - >>> s0 # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - Snapshot(..., params={}, ...) - - For params, only one object is returned from the respective property: - >>> (s0 + Delta(params={'first': 'params'}) + Delta(params={'second':'params'})).params - {'second': 'params'} - - ... and the same applies to variables: - >>> from autora.variable import VariableCollection, IV - >>> (s0 + Delta(variables=VariableCollection([IV("1st IV")])) + - ... Delta(variables=VariableCollection([IV("2nd IV")]))).variables - VariableCollection(independent_variables=[IV(name='2nd IV',...)], ...) - - When we update the conditions, observations or models, the respective list is extended: - >>> s3 = s0 + Delta(models=["1st model"]) - >>> s3 - Snapshot(..., models=['1st model']) - - ... so we can see the history of all the models, for instance. - >>> s3 + Delta(models=["2nd model"]) - Snapshot(..., models=['1st model', '2nd model']) - - The same applies to observations: - >>> s4 = s0 + Delta(observations=["1st observation"]) - >>> s4 - Snapshot(..., observations=['1st observation'], ...) - - >>> s4 + Delta(observations=["2nd observation"]) # doctest: +ELLIPSIS - Snapshot(..., observations=['1st observation', '2nd observation'], ...) - - - The same applies to conditions: - >>> s5 = s0 + Delta(conditions=["1st condition"]) - >>> s5 - Snapshot(..., conditions=['1st condition'], ...) - - >>> s5 + Delta(conditions=["2nd condition"]) # doctest: +ELLIPSIS +NORMALIZE_WHITESPACE - Snapshot(..., conditions=['1st condition', '2nd condition'], ...) - - You can also update with multiple conditions, observations and models: - >>> s0 + Delta(conditions=['c1', 'c2']) - Snapshot(..., conditions=['c1', 'c2'], ...) - - >>> s0 + Delta(models=['m1', 'm2'], variables={'m': 1}) - Snapshot(variables={'m': 1}, ..., models=['m1', 'm2']) - - >>> s0 + Delta(models=['m1'], observations=['o1'], variables={'m': 1}) - Snapshot(variables={'m': 1}, ..., observations=['o1'], models=['m1']) - - - Inputs to models, observations and conditions must be Lists - which can be cast to lists: - >>> s0 + Delta(models='m1') # doctest: +ELLIPSIS - Traceback (most recent call last): - ... - AssertionError: 'm1' must be a list, e.g. `['m1']`?) - """ - return self.update(**other) diff --git a/src/autora/state/wrapper.py b/src/autora/state/wrapper.py deleted file mode 100644 index 74ecbade..00000000 --- a/src/autora/state/wrapper.py +++ /dev/null @@ -1,89 +0,0 @@ -"""Utilities to wrap common theorist, experimentalist and experiment runners as `f(State)` -so that $n$ processes $f_i$ on states $S$ can be represented as -$$f_n(...(f_1(f_0(S))))$$ - -These are special cases of the [autora.state.delta.wrap_to_use_state][] function. -""" -from __future__ import annotations - -from typing import Callable, Iterable, TypeVar - -import numpy as np -import pandas as pd -from sklearn.base import BaseEstimator - -from autora.experimentalist.pipeline import Pipeline -from autora.state.delta import Delta, State, wrap_to_use_state -from autora.variable import VariableCollection - -S = TypeVar("S") -X = TypeVar("X") -Y = TypeVar("Y") -XY = TypeVar("XY") -Executor = Callable[[State], State] - - -def theorist_from_estimator(estimator: BaseEstimator) -> Executor: - """ - Convert a scikit-learn compatible estimator into a function on a `State` object. - - Supports passing additional `**kwargs` which are used to update the estimator's params - before fitting. - """ - - @wrap_to_use_state - def theorist( - experiment_data: pd.DataFrame, variables: VariableCollection, **kwargs - ): - ivs = [v.name for v in variables.independent_variables] - dvs = [v.name for v in variables.dependent_variables] - X, y = experiment_data[ivs], experiment_data[dvs] - new_model = estimator.set_params(**kwargs).fit(X, y) - return Delta(model=new_model) - - return theorist - - -def experiment_runner_from_x_to_y_function(f: Callable[[X], Y]) -> Executor: - """Wrapper for experiment_runner of the form $f(x) \rarrow y$, where `f` returns just the $y$ - values""" - - @wrap_to_use_state - def experiment_runner(conditions: pd.DataFrame, **kwargs): - x = conditions - y = f(x, **kwargs) - experiment_data = pd.DataFrame.merge(x, y, left_index=True, right_index=True) - return Delta(experiment_data=experiment_data) - - return experiment_runner - - -def experiment_runner_from_x_to_xy_function(f: Callable[[X], XY]) -> Executor: - """Wrapper for experiment_runner of the form $f(x) \rarrow (x,y)$, where `f` - returns both $x$ and $y$ values in a complete dataframe.""" - - @wrap_to_use_state - def experiment_runner(conditions: pd.DataFrame, **kwargs): - x = conditions - experiment_data = f(x, **kwargs) - return Delta(experiment_data=experiment_data) - - return experiment_runner - - -def experimentalist_from_pipeline(pipeline: Pipeline) -> Executor: - """Wrapper for experimentalists of the form $f() \rarrow x$, where `f` - returns both $x$ and $y$ values in a complete dataframe.""" - - @wrap_to_use_state - def experimentalist(params): - conditions = pipeline(**params) - if isinstance(conditions, (pd.DataFrame, np.ndarray, np.recarray)): - conditions_ = conditions - elif isinstance(conditions, Iterable): - conditions_ = np.array(list(conditions)) - else: - raise NotImplementedError("type `%s` is not supported" % (type(conditions))) - return Delta(conditions=conditions_) - - return experimentalist diff --git a/tests/test_experimentalist_pipeline.py b/tests/test_experimentalist_pipeline.py index a02bfa85..08daf529 100644 --- a/tests/test_experimentalist_pipeline.py +++ b/tests/test_experimentalist_pipeline.py @@ -279,7 +279,6 @@ def test_params_parser_one_level(): def test_params_parser_recurse_one(): - params = { "filter_pipeline__step1__n_samples": 100, } @@ -309,7 +308,6 @@ def test_params_parser_recurse_one_n_levels_alternative_divider(): def test_params_parser_recurse(): - params = { "pool__ivs": "%%independent_variables%%", "filter_pipeline__step1__n_samples": 100, diff --git a/tests/test_experimentalist_random.py b/tests/test_experimentalist_random.py deleted file mode 100644 index a81ad483..00000000 --- a/tests/test_experimentalist_random.py +++ /dev/null @@ -1,153 +0,0 @@ -from functools import partial - -import numpy as np -import pytest - -from autora.experimentalist.pipeline import make_pipeline -from autora.experimentalist.pooler.grid import grid_pool -from autora.experimentalist.pooler.random_pooler import random_pool -from autora.experimentalist.sampler.random_sampler import random_sample -from autora.variable import DV, IV, ValueType, VariableCollection - - -def weber_filter(values): - return filter(lambda s: s[0] <= s[1], values) - - -def test_random_pooler_experimentalist(metadata): - """ - Tests the implementation of a random pooler. - """ - num_samples = 10 - - conditions = random_pool(metadata.independent_variables, num_samples=num_samples) - - conditions = np.array(list(conditions)) - - assert conditions.shape[0] == num_samples - assert conditions.shape[1] == len(metadata.independent_variables) - for condition in conditions: - for idx, value in enumerate(condition): - assert value in metadata.independent_variables[idx].allowed_values - - -def test_random_sampler_experimentalist(metadata): - """ - Tests the implementation of the experimentalist pipeline with an exhaustive pool of discrete - values, Weber filter, random selector. Tests two different implementations of the pool function - as a callable and passing in as interator/generator. - - """ - - n_trials = 25 # Number of trails for sampler to select - - # ---Implementation 1 - Pool using Callable via partial function---- - # Set up pipeline functions with partial - pooler_callable = partial(grid_pool, ivs=metadata.independent_variables) - sampler = partial(random_sample, num_samples=n_trials) - pipeline_random_samp = make_pipeline( - [pooler_callable, weber_filter, sampler], - ) - - results = pipeline_random_samp.run() - - # ***Checks*** - # Gridsearch pool is working as expected - _, pool = pipeline_random_samp.steps[0] - pool_len = len(list(pool())) - pool_len_expected = np.prod( - [len(s.allowed_values) for s in metadata.independent_variables] - ) - assert pool_len == pool_len_expected - - # Is sampling the number of trials we expect - assert len(results) == n_trials - - # Filter is selecting where IV1 >= IV2 - assert all([s[0] <= s[1] for s in results]) - - # Is sampling randomly. Runs 10 times and checks if consecutive runs are equal. - # Assert will fail if all 9 pairs return equal. - l_results = [pipeline_random_samp.run() for s in range(10)] - assert not np.all( - [ - np.array_equal(l_results[i], l_results[i + 1]) - for i, s in enumerate(l_results) - if i < len(l_results) - 1 - ] - ) - - -def test_random_experimentalist_generator(metadata): - n_trials = 25 # Number of trails for sampler to select - - pooler_generator = grid_pool(metadata.independent_variables) - sampler = partial(random_sample, num_samples=n_trials) - pipeline_random_samp_poolgen = make_pipeline( - [pooler_generator, weber_filter, sampler] - ) - - results_poolgen = list(pipeline_random_samp_poolgen.run()) - - # Is sampling the number of trials we expect - assert len(results_poolgen) == n_trials - - # Filter is selecting where IV1 >= IV2 - assert all([s[0] <= s[1] for s in results_poolgen]) - - # This will fail - # The Generator is exhausted after the first run and the pool is not regenerated when pipeline - # is run again. The pool should be set up as a callable if the pipeline is to be rerun. - results_poolgen2 = pipeline_random_samp_poolgen.run() - assert len(results_poolgen2) == 0 - - -@pytest.fixture -def metadata(): - # Specify independent variables - iv1 = IV( - name="S1", - allowed_values=np.linspace(0, 5, 5), - units="intensity", - variable_label="Stimulus 1 Intensity", - ) - - iv2 = IV( - name="S2", - allowed_values=np.linspace(0, 5, 5), - units="intensity", - variable_label="Stimulus 2 Intensity", - ) - - iv3 = IV( - name="S3", - allowed_values=[0, 1], - units="binary", - variable_label="Stimulus 3 Binary", - ) - - # Specify dependent variable with type - # The experimentalist pipeline doesn't actually use DVs, they are just specified here for - # example. - dv1 = DV( - name="difference_detected", - value_range=(0, 1), - units="probability", - variable_label="P(difference detected)", - type=ValueType.SIGMOID, - ) - - dv2 = DV( - name="difference_detected_sample", - value_range=(0, 1), - units="response", - variable_label="difference detected", - type=ValueType.PROBABILITY_SAMPLE, - ) - # Variable collection with ivs and dvs - metadata = VariableCollection( - independent_variables=[iv1, iv2, iv3], - dependent_variables=[dv1, dv2], - ) - - return metadata