Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch fANOVA importance analysis #827

Merged
merged 54 commits into from
Aug 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
108fa32
batch tools subpackage
jchen6727 Mar 17, 2024
5446c17
comm update
jchen6727 Mar 18, 2024
f56bac6
new search methods
jchen6727 Mar 19, 2024
b2935c1
fixed import issues
jchen6727 Mar 19, 2024
1bc1eab
updates comm, search
jchen6727 Mar 20, 2024
3d5d0a8
search.py
jchen6727 Mar 21, 2024
cad0574
wrapper tools search
jchen6727 Mar 27, 2024
14502fa
update to batchtools
jchen6727 Mar 30, 2024
d7d3ead
codebase moved to netpyne batchtools
jchen6727 Apr 4, 2024
ccaa1d2
update to comm.py, runners.py
jchen6727 Apr 8, 2024
0b34c0f
update to search (kwargs)
jchen6727 Apr 16, 2024
fbc323e
config update (should delete two keys @ end?)
jchen6727 Apr 18, 2024
9622aa2
replace pubtk with batchtk for compatibility with pip
jchen6727 Apr 25, 2024
c5ffb3d
update the submit (zsh->sh)
jchen6727 Apr 25, 2024
62fbc30
update batchtools, submits
jchen6727 Apr 28, 2024
6273501
fixed socket functionality for submits.py
jchen6727 Apr 30, 2024
0626add
batchtools documentation
jchen6727 May 7, 2024
67108a7
minor updates- search
jchen6727 May 8, 2024
553689b
fixed bug in conn, updated test/examples/* to use dynamic pathing
jchen6727 May 9, 2024
5eb20f8
update CHANGES.md
jchen6727 May 11, 2024
98bad60
Updated documentation `batchtools.rst`
jchen6727 May 14, 2024
2908abe
Merge branch 'development' into batch
jchen6727 May 14, 2024
76b6bd6
update `user_documentation.rst`
jchen6727 May 14, 2024
627e7bf
sort init.py
jchen6727 May 14, 2024
d7929dc
updated documentation with new batchtools (beta)
jchen6727 May 14, 2024
8021936
updating documentation (user_documentation.rst) re: new `batchtools` …
jchen6727 May 14, 2024
6f55c51
updating user documentation
jchen6727 May 14, 2024
05e0656
update per deployment on HPC
jchen6727 May 14, 2024
8cd66ee
Updated logic, bug fix
jchen6727 May 16, 2024
3aae153
Merge branch 'development' into development
jchen6727 May 16, 2024
a0846bb
quick fix, adding cfg.progressBar logic, fixed another issue with the…
jchen6727 May 16, 2024
c1b657a
Merge branch 'batch' into batch
jchen6727 May 16, 2024
2ed6d40
Merge remote-tracking branch 'downstate/batch' into batch
jchen6727 May 16, 2024
03ee809
I love git
Jsprouse0 May 31, 2024
d06481b
updated makedirs
jchen6727 Jun 3, 2024
c618bb4
Merge branch 'development' of https://github.com/jchen6727/netpyne in…
jchen6727 Jun 3, 2024
8c81925
updated mkdir to makedirs (bug fix) -- note the exist_ok change for l…
Jsprouse0 Jun 3, 2024
1131d84
fixed typo
jchen6727 Jul 2, 2024
2ca43d0
fixed generating rhythmic spiking pattern with 'uniform' option
vvbragin Jul 5, 2024
5c1737f
updated link to installation instructions
vvbragin Jul 5, 2024
19098e2
moved set_map from batchtk, additional support for n-dim lists
jchen6727 Jul 7, 2024
b68c2cb
fixed misleading console output when cfg.recordStims is On
vvbragin Jul 11, 2024
56b9721
various examples (using rosenbrock)
jchen6727 Jul 15, 2024
a124b0b
updated .rst documentation
jchen6727 Jul 16, 2024
aa8fc35
Merge remote-tracking branch 'origin/development' into development
jchen6727 Jul 16, 2024
56aa212
updates to batchtools to allow 'cfg' initialization with supplied dic…
jchen6727 Jul 16, 2024
e71b2f5
Merge remote-tracking branch 'jchen6727/development' into batch
jchen6727 Jul 16, 2024
1b6b042
fixed save.py
jchen6727 Jul 17, 2024
648395f
updated documentation to batch
jchen6727 Jul 17, 2024
0d9088a
updates to batchtools
jchen6727 Jul 23, 2024
d661c44
fANOVA importance evaluator (implement in batchtk?)
jchen6727 Jul 30, 2024
fd35529
fanova example
jchen6727 Aug 1, 2024
c4a9c3f
updated user documentation
jchen6727 Aug 1, 2024
f9b7f9c
Merge branch 'batch' into batch
jchen6727 Aug 1, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions doc/source/user_documentation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2963,3 +2963,22 @@ The ``out_json`` output contains a dictionary which includes the ``loss`` metric

In a multi-objective optimization, the relevant ``PYR_loss``, ``BC_loss``, and ``OLM_loss`` components are additionally included (see ``mo_optuna_search.py``)

8. Parameter Importance Evaluation Using fANOVA
-----------------------------------------------
A new feature in the batchtools beta release is the ability to evaluate parameter importance using a functional ANOVA inspired algorithm via the `Optuna` and `scikit-learn` libraries.
(See `the original Hutter paper <http://proceedings.mlr.press/v32/hutter14.pdf>`_ and its `citation <https://automl.github.io/fanova/cite.html>`_)

Currently, only unpaired single parameter importance to a single metric score is supported through the `NetPyNE.batchtools.analysis` `Analyzer` object, with an example of its usage
`here <https://github.com/suny-downstate-medical-center/netpyne/tree/batch/netpyne/batchtools/examples/rosenbrock/fanova_rosenbrock>`_:

In its current iteration, demonstrating the example requires generating an output `grid.csv` using `batch.py`, then loading that `grid.csv` into the `Analyzer` object. Then, using `run_analysis` will generate, per parameter, a single score indicative of the estimated `importance` of the parameter: that is, the estimated effect on the total variance of the model within the given bounds.

.. code-block:: python

# from analysis.py
from netpyne.batchtools.analysis import Analyzer

analyzer = Analyzer(params = ['x.0', 'x.1', 'x.2', 'x.3'], metrics = ['fx']) # specify the parameter space and metrics of the batch function
analyzer.load_file('grid.csv') # load the grid file generated by the batch run
results = analyzer.run_analysis() # run fANOVA analysis and store the importance values in a results dictionary

13 changes: 10 additions & 3 deletions netpyne/batchtools/__init__.py
Original file line number Diff line number Diff line change
@@ -1,19 +1,26 @@
from netpyne.batchtools.runners import NetpyneRunner
from batchtk.runtk import dispatchers

from netpyne.batchtools import submits
from batchtk import runtk
from netpyne.batchtools.analysis import Analyzer

specs = NetpyneRunner()

from netpyne.batchtools.comm import Comm

comm = Comm()

dispatchers = dispatchers
submits = submits
runtk = runtk


comm = Comm()

"""
def analyze_from_file(filename):
analyzer = Fanova()
analyzer.load_file(filename)
analyzer.run_analysis(
"""

#from ray import tune as space.comm
#list and lb ub
Expand Down
55 changes: 55 additions & 0 deletions netpyne/batchtools/analysis.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
import pandas
from collections import namedtuple
import numpy

from optuna.importance._fanova._fanova import _Fanova


class Fanova(object):
def __init__(self, n_trees: int = 64, max_depth: int = 64, seed: int | None = None) -> None:
self._evaluator = _Fanova(
n_trees=n_trees,
max_depth=max_depth,
min_samples_split=2,
min_samples_leaf=1,
seed=seed,
)

def evaluate(self, X: pandas.DataFrame, y: pandas.DataFrame) -> dict:
assert X.shape[0] == y.shape[0] # all rows must be present
assert y.shape[1] == 1 # only evaluation for single metric supported

evaluator = self._evaluator
#mins, maxs = X.min().values, X.max().values #in case bound matching is necessary.
search_spaces = numpy.array([X.min().values, X.max().values]).T # bounds
column_to_encoded_columns = [numpy.atleast_1d(i) for i in range(X.shape[1])] # encoding (no 1 hot/categorical)
evaluator.fit(X.values, y.values.ravel(), search_spaces, column_to_encoded_columns)
importances = numpy.array(
[evaluator.get_importance(i)[0] for i in range(X.shape[1])]
)
return {col: imp for col, imp in zip(X.columns, importances)}


class Analyzer(object):
def __init__(self,
params: list, # list of parameters
metrics: list, # list of metrics
evaluator = Fanova()) -> None:
self.params = params
self.metrics = metrics
self.data = None
self.evaluator = evaluator

def load_file(self,
filename: str # filename (.csv) containing the completed batchtools trials
) -> None:
data = pandas.read_csv(filename)
param_space = data[["config/{}".format(param) for param in self.params]]
param_space = param_space.rename(columns={'config/{}'.format(param): param for param in self.params})
results = data[self.metrics]
self.data = namedtuple('data', ['param_space', 'results'])(param_space, results)

def run_analysis(self) -> dict:
return self.evaluator.evaluate(self.data.param_space, self.data.results)


19 changes: 19 additions & 0 deletions netpyne/batchtools/docs/batchtools.rst
Original file line number Diff line number Diff line change
Expand Up @@ -414,3 +414,22 @@ The ``out_json`` output contains a dictionary which includes the ``loss`` metric

In a multi-objective optimization, the relevant ``PYR_loss``, ``BC_loss``, and ``OLM_loss`` components are additionally included (see ``mo_optuna_search.py``)

8. Parameter Importance Evaluation Using fANOVA
-----------------------------------------------
A new feature in the batchtools beta release is the ability to evaluate parameter importance using a functional ANOVA inspired algorithm via the `Optuna` and `scikit-learn` libraries.
(See `the original Hutter paper <http://proceedings.mlr.press/v32/hutter14.pdf>`_ and its `citation <https://automl.github.io/fanova/cite.html>`_)

Currently, only unpaired single parameter importance to a single metric score is supported through the `NetPyNE.batchtools.analysis` `Analyzer` object, with an example of its usage
`here <https://github.com/suny-downstate-medical-center/netpyne/tree/batch/netpyne/batchtools/examples/rosenbrock/fanova_rosenbrock>`_:

In its current iteration, demonstrating the example requires generating an output `grid.csv` using `batch.py`, then loading that `grid.csv` into the `Analyzer` object. Then, using `run_analysis` will generate, per parameter, a single score indicative of the estimated `importance` of the parameter: that is, the estimated effect on the total variance of the model within the given bounds.

.. code-block:: python

# from analysis.py
from netpyne.batchtools.analysis import Analyzer

analyzer = Analyzer(params = ['x.0', 'x.1', 'x.2', 'x.3'], metrics = ['fx']) # specify the parameter space and metrics of the batch function
analyzer.load_file('grid.csv') # load the grid file generated by the batch run
results = analyzer.run_analysis() # run fANOVA analysis and store the importance values in a results dictionary

3 changes: 3 additions & 0 deletions netpyne/batchtools/examples/CA3/cfg.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@
cfg.saveJson = True
cfg.printRunTime = 0.1
cfg.recordLFP = None # don't save this
cfg.simLabel = 'ca3'
cfg.saveFolder = '.'


cfg.analysis['plotRaster'] = {'saveFig': True} # raster ok
cfg.analysis['plotTraces'] = { } # don't save this
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from netpyne.batchtools.analysis import Analyzer

analyzer = Analyzer(params = ['x.0', 'x.1', 'x.2', 'x.3'], metrics = ['fx'])
analyzer.load_file('optuna.csv')
results = analyzer.run_analysis()

Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
from netpyne.batchtools.search import search
import numpy

params = {'x.0': [0, 3],
'x.1': [0, 3],
'x.2': [0, 3],
'x.3': [0, 3],
params = {'x.0': numpy.linspace(-1, 3, 5),
'x.1': numpy.linspace(-1, 3, 5),
'x.2': numpy.linspace(-1, 3, 5),
'x.3': numpy.linspace(-1, 3, 5),
}

# use shell_config if running directly on the machine
shell_config = {'command': 'python rosenbrock.py',}

search(job_type = 'sh', # or sh
comm_type = 'socket',
label = 'optuna',
label = 'grid',
params = params,
output_path = '../optuna_batch',
output_path = '../grid_batch',
checkpoint_path = '../ray',
run_config = {'command': 'python rosenbrock.py'},
num_samples = 9,
num_samples = 1,
metric = 'fx',
mode = 'min',
algorithm = 'optuna',
algorithm = 'variant_generator',
max_concurrent = 3)