Set shape basis method #276

BalzaniEdoardo · 2024-12-05T20:48:10Z

Add a set_input_shape method that initializes the basis.

The method can accept a list of:

Integers: assumes that the input will be flat (if the integer n = 1) or 2D (for n > 1). In the first case the basis allows inputs of shape (n_samples,) in the second (n_samples, n).
Tuple of integers: the tuple corresponding to the input shape trimming the first axis (sample axis). If input is (n, m, ...) then the basis expects inputs of shape (n_samples, n, m,...)
Arrays: directly the inputs that are going to be provided from compute features.

It prepares the basis and precomputes some internal quantities that are useful for subsequent functionalities: knowing how to split the feature axis, i.e. split_by_features. In the next PR, TransformerBasis will use the pre-computed shapes to split out the concatenated inputs before processing.

Additionally, it will compute and store the number of output features, this could be an information that a user may want to have after building a complex composite basis.

This PR follows #275

EDIT:

PR Summary

In this PR, I refined the class structure to better separate basis attributes and methods, delegating their validation logic to mixin classes wherever possible.

New Classes

CompositeBasisMixin:
This mixin is inherited by additive and multiplicative bases. It implements methods for traversing the composite basis tree, such as __sklearn_clone__ and setup_basis.
AtomicBasisMixin:
Designed for non-composite (atomic) bases, this mixin stores the n_basis_funcs parameter and implements selected methods like __sklearn_clone__, which have uniform implementations across all atomic bases.

New Abstract Methods

set_input_shape:
This method stores the expected input shape, a state attribute required for compatibility with transformers. Parameters set by this method are carried over during cloning, such as in cross-validation. Concrete implementations are provided in AtomicBasisMixin and CompositeBasisMixin.
_set_input_independent_states:
Responsible for setting all state variables that depend on class parameters (provided at initialization and retrievable with get_params).
setup_basis:
Computes all state variables, both input-dependent (e.g., input shape) and input-independent (e.g., kernels for convolutional bases). Concrete implementations are found in Eval, Conv, and Composite basis mixins.

Clone Method for Bases

__sklearn_clone__:
Implements cloning logic to retain input-dependent states (such as input shape), which would otherwise be discarded by sklearn.base.clone. This is implemented in both the CompositeBasisMixin and AtomicBasisMixin.

Modified/Moved Attributes and Methods

Kernel-related logic:
The kernel_ attribute, along with the _check_has_kernel method and set_kernel, has been moved to the Conv mixin.
Input shape validation:
The _check_input_shape_consistency method has been relocated to AtomicBasisMixin and CompositeBasisMixin.
Composite basis setters:
Setters for basis1 and basis2 in composite bases now support cross-validation scenarios.

Inheritance of New Mixins

CompositeBasisMixin:
Inherited by AdditiveBasis and MultiplicativeBasis.
AtomicBasisMixin:
Inherited by the following classes:
- SplineBasis (the superclass for all splines)
- RaisedCosineLinear (the superclass for all raised cosines)
- OrthExponentialBasis (the superclass for orthogonal exponential bases)

codecov-commenter · 2024-12-05T20:55:59Z

Codecov Report

Attention: Patch coverage is 95.36680% with 12 lines in your changes missing coverage. Please review.

Project coverage is 96.02%. Comparing base (3691356) to head (2ade4bd).
Report is 1 commits behind head on development.

Files with missing lines	Patch %	Lines
src/nemos/basis/_transformer_basis.py	78.12%	7 Missing ⚠️
src/nemos/basis/basis.py	92.30%	4 Missing ⚠️
src/nemos/basis/_basis_mixin.py	99.11%	1 Missing ⚠️

Additional details and impacted files

@@               Coverage Diff               @@
##           development     #276      +/-   ##
===============================================
- Coverage        96.13%   96.02%   -0.11%     
===============================================
  Files               34       34              
  Lines             2507     2668     +161     
===============================================
+ Hits              2410     2562     +152     
- Misses              97      106       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

BalzaniEdoardo · 2024-12-16T14:06:18Z

In the next PR I'll improve the API of the transformer and move the TransformerBasis tests in a dedicated script.

If you want an overview on how to work with the TransformerBasis before digging into the code, checkout this

https://nemos--276.org.readthedocs.build/en/276/how_to_guide/plot_05_transformer_basis.html

sjvenditto

Looking good! I just have a couple of suggested changes to clarify in documentation/error messages where set_input_shape will have an impact. Also, some code in a Warning admonition in plot_06_sklearn_pipeline_cv_demo needs to be updated with the new syntax

src/nemos/basis/_basis.py

src/nemos/basis/_basis_mixin.py

docs/how_to_guide/plot_06_sklearn_pipeline_cv_demo.md

Co-authored-by: Sarah Jo Venditto <[email protected]>

billbrod

Haven't had a chance to look at tests, but here's my first go through. Some tweaking in the tutorials, and also:

I think we can probably move away from using numbers in the names of the tutorials. The only reason to do so is for the automatic sorting to work, but we're no longer making use of automatic sorting, right?
I am confused with what lives in the Basis superclass and what lives in AtomicBasisMixin. For example, why does anything to do with n_basis_input or n_basis_funcs live in Basis?

billbrod · 2024-12-18T15:43:05Z

docs/how_to_guide/plot_05_transformer_basis.md

+  name: python3
+---
+
+# Converting NeMoS Bases To scikit-learn Transformers


Title shows up really long in the readme, maybe shorten it to "Using bases as scikit-learn transformers"?

billbrod · 2024-12-18T15:43:42Z

docs/how_to_guide/plot_05_transformer_basis.md

+`scikit-learn` is a powerful machine learning library that provides advanced tools for creating data analysis pipelines, from input transformations to model fitting and cross-validation.
+
+All of `scikit-learn`'s machinery relies on strict assumptions about input structure. In particular, all `scikit-learn` 
+objects require inputs as arrays of at most two dimensions, where the first dimension represents the time (or samples) 


Suggested change

objects require inputs as arrays of at most two dimensions, where the first dimension represents the time (or samples)

objects require inputs to be arrays of at most two dimensions, where the first dimension represents the time (or samples)

billbrod · 2024-12-18T15:43:54Z

docs/how_to_guide/plot_05_transformer_basis.md

+While this may feel rigid, it enables transformations to be seamlessly chained together, greatly simplifying the 
+process of building stable, complex pipelines.
+
+On the other hand, `NeMoS` takes a different approach to feature construction. `NeMoS`' bases are composable constructors that allow for more flexibility in the required input structure. 


Suggested change

On the other hand, `NeMoS` takes a different approach to feature construction. `NeMoS`' bases are composable constructors that allow for more flexibility in the required input structure.

`NeMoS` takes a different approach to feature construction: `NeMoS`' bases are composable constructors that allow for more flexibility in input structure.

billbrod · 2024-12-18T15:51:35Z

docs/how_to_guide/plot_05_transformer_basis.md

+Depending on the basis type, it can accept one or more input arrays or `pynapple` time series data, each of which can take any shape as long as the time (or sample) axis is the first of each array;
+`NeMoS` design favours object composability: one can combine any two or more bases to compute complex features, with a user-friendly interface that can accept a separate array/time series for each input type (e.g., an array with the spike counts, an array for the animal's position, etc.).


Suggested change

Depending on the basis type, it can accept one or more input arrays or `pynapple` time series data, each of which can take any shape as long as the time (or sample) axis is the first of each array;

`NeMoS` design favours object composability: one can combine any two or more bases to compute complex features, with a user-friendly interface that can accept a separate array/time series for each input type (e.g., an array with the spike counts, an array for the animal's position, etc.).

They can accept arrays or `pynapple` time series data, which can take any shape as long as the time (or sample) axis is the first of each array.

Furthermore, `NeMoS` design favours object composability: one can combine bases into [`CompositeBasis`](composing_basis_function) objects to compute complex features, with a user-friendly interface that can accept a separate array/time series for each input type (e.g., an array with the spike counts, an array for the animal's position, etc.).

"depending on basis type" is referring to Composite vs. atomic bases, right? This makes it sound like e.g., raised cosine accepts one whereas msplines accept two. in which case, I think we want the discussion of "multiple inputs" to be more clearly linked to composite

billbrod · 2024-12-18T15:51:56Z

docs/how_to_guide/plot_05_transformer_basis.md

+## From Basis to TransformerBasis
+
+
+With NeMoS, you can easily create a basis accepting two inputs. Let's assume that we want to process neural activity stored in a 2-dimensional spike count array of shape `(n_samples, n_neurons)` and a second array containing the speed of an animal, with shape `(n_samples,)`.


Suggested change

With NeMoS, you can easily create a basis accepting two inputs. Let's assume that we want to process neural activity stored in a 2-dimensional spike count array of shape `(n_samples, n_neurons)` and a second array containing the speed of an animal, with shape `(n_samples,)`.

With NeMoS, you can easily create a basis which accepts two inputs. Let's assume that we want to process neural activity stored in a 2-dimensional spike count array of shape `(n_samples, n_neurons)` and a second array containing the speed of an animal, with shape `(n_samples,)`.

billbrod · 2024-12-18T16:29:40Z

src/nemos/basis/_basis_mixin.py

+    from ._basis import Basis
+
+
+def set_input_shape_state(method):


why not make this more general and accept a list of attributes to copy, which default to ["_n_basis_input_", "_input_shape_"]?

billbrod · 2024-12-18T16:41:03Z

src/nemos/basis/_basis_mixin.py

+        self.set_input_shape(*xi)
+        return self
+
+    def _set_input_independent_states(self) -> "EvalBasisMixin":


currently this only appears to call set_kernel in the convolutional mixin and does nothing anywhere else -- do we really need it?

oh, is it so that the composite basis can just call this method instead of checking whether kernel needs to be set?

billbrod · 2024-12-18T16:45:04Z

src/nemos/basis/_basis_mixin.py

+            # pass down the input shapes
+            self.set_input_shape(*shapes)
+        elif any(set_bases):
+            warnings.warn(


this isn't actually necessary, the following runs fine. should it?

b = nmo.basis.BSplineEval(5) b.setup_basis(np.ones((100, 5))) b2 = nmo.basis.BSplineEval(5) c = b+b2 c.compute_features(np.ones((100, 5)), np.ones((100,2)))

billbrod · 2024-12-18T16:46:26Z

src/nemos/basis/_basis_mixin.py

+    @property
+    def basis1(self):
+        return self._basis1
+
+    @basis1.setter
+    def basis1(self, bas: Basis):
+        self._basis1 = bas


why are these properties if we do nothing special in the getter or setter? why not just use self.basis1 instead of self._basis1?

billbrod · 2024-12-18T16:52:03Z

src/nemos/basis/basis.py

+                f"is {self.n_basis_funcs}."
+            )
+
+    def set_kernel(self):


will this ever get reached? since _check_window_size gets called during init?

BalzaniEdoardo added 2 commits December 5, 2024 15:34

added tests

ad45ea7

linted

24697c2

BalzaniEdoardo added 2 commits December 10, 2024 09:52

merged multi-dim eval

2c43690

Merge branch 'multi_dim_eval_basis' into set_shape_basis_method

fbf0aeb

BalzaniEdoardo mentioned this pull request Dec 12, 2024

Multi dim eval basis #275

Merged

BalzaniEdoardo added 24 commits December 12, 2024 15:37

Merge branch 'multi_dim_eval_basis' into set_shape_basis_method

8318716

Merge branch 'multi_dim_eval_basis' into set_shape_basis_method

881f371

Merge branch 'development' into set_shape_basis_method

2778fb9

re-struct bases

ce5dff2

removed par from docstrings

47174ec

improved modularity of sklearn clone

fd95bcc

added test for list component

096b9bd

linted

dc6bd9e

linted

7471d3c

fixed warn

f85be92

comments on ignore

59d8657

fix warns

4c5e3b7

linted tests

ac6323e

improved unpacking

50548f0

lint

9281a00

start note to transformer

5667dd8

finished note and modified the logic

9829777

note on transformer

23a276c

renamed tutorials

1d81199

fixed docs

0340a3b

edited docs and linted

9629f7f

improved svg, fixed docstring

15f547e

fixed rendering docstrings

8c8c273

added image logos for thumbnail

3333eb7

BalzaniEdoardo mentioned this pull request Dec 15, 2024

Fix transformer #235

Closed

BalzaniEdoardo added 5 commits December 15, 2024 12:23

use a generator for unpacking in a mem efficient way

0c3fc32

fixed typing

5998a2d

return generator for iterating over components

3d91df1

change docstrings

28599c8

change docstrings

bb5363e

BalzaniEdoardo requested review from billbrod and sjvenditto December 16, 2024 14:05

BalzaniEdoardo requested a review from gviejo December 16, 2024 14:39

BalzaniEdoardo added 3 commits December 16, 2024 14:43

add test and lint

61a6091

linted

8bff762

added a test for window size after init for orth exp

f0cb040

sjvenditto requested changes Dec 17, 2024

View reviewed changes

src/nemos/basis/_basis.py Outdated Show resolved Hide resolved

src/nemos/basis/_basis_mixin.py Outdated Show resolved Hide resolved

docs/how_to_guide/plot_06_sklearn_pipeline_cv_demo.md Show resolved Hide resolved

BalzaniEdoardo and others added 4 commits December 18, 2024 09:31

Update src/nemos/basis/_basis.py

090fa40

Co-authored-by: Sarah Jo Venditto <[email protected]>

Update src/nemos/basis/_basis_mixin.py

65c0618

Co-authored-by: Sarah Jo Venditto <[email protected]>

fixes docs

afdd392

linted

2ade4bd

billbrod requested changes Dec 18, 2024

View reviewed changes

BalzaniEdoardo added 2 commits December 18, 2024 12:50

Merge branch 'development' into set_shape_basis_method

b5b5e3f

add ignore links

cbf014b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set shape basis method #276

Set shape basis method #276

BalzaniEdoardo commented Dec 5, 2024 •

edited

Loading

codecov-commenter commented Dec 5, 2024 •

edited

Loading

BalzaniEdoardo commented Dec 16, 2024 •

edited

Loading

sjvenditto left a comment

billbrod left a comment

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

billbrod Dec 18, 2024

	objects require inputs as arrays of at most two dimensions, where the first dimension represents the time (or samples)
	objects require inputs to be arrays of at most two dimensions, where the first dimension represents the time (or samples)

	On the other hand, `NeMoS` takes a different approach to feature construction. `NeMoS`' bases are composable constructors that allow for more flexibility in the required input structure.
	`NeMoS` takes a different approach to feature construction: `NeMoS`' bases are composable constructors that allow for more flexibility in input structure.

		Depending on the basis type, it can accept one or more input arrays or `pynapple` time series data, each of which can take any shape as long as the time (or sample) axis is the first of each array;
		`NeMoS` design favours object composability: one can combine any two or more bases to compute complex features, with a user-friendly interface that can accept a separate array/time series for each input type (e.g., an array with the spike counts, an array for the animal's position, etc.).

		## From Basis to TransformerBasis


		With NeMoS, you can easily create a basis accepting two inputs. Let's assume that we want to process neural activity stored in a 2-dimensional spike count array of shape `(n_samples, n_neurons)` and a second array containing the speed of an animal, with shape `(n_samples,)`.

Set shape basis method #276

Are you sure you want to change the base?

Set shape basis method #276

Conversation

BalzaniEdoardo commented Dec 5, 2024 • edited Loading

PR Summary

New Classes

New Abstract Methods

Clone Method for Bases

Modified/Moved Attributes and Methods

Inheritance of New Mixins

codecov-commenter commented Dec 5, 2024 • edited Loading

Codecov Report

BalzaniEdoardo commented Dec 16, 2024 • edited Loading

sjvenditto left a comment

Choose a reason for hiding this comment

billbrod left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BalzaniEdoardo commented Dec 5, 2024 •

edited

Loading

codecov-commenter commented Dec 5, 2024 •

edited

Loading

BalzaniEdoardo commented Dec 16, 2024 •

edited

Loading