Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set shape basis method #276

Open
wants to merge 42 commits into
base: development
Choose a base branch
from
Open

Conversation

BalzaniEdoardo
Copy link
Collaborator

@BalzaniEdoardo BalzaniEdoardo commented Dec 5, 2024

Add a set_input_shape method that initializes the basis.

The method can accept a list of:

  • Integers: assumes that the input will be flat (if the integer n = 1) or 2D (for n > 1). In the first case the basis allows inputs of shape (n_samples,) in the second (n_samples, n).
  • Tuple of integers: the tuple corresponding to the input shape trimming the first axis (sample axis). If input is (n, m, ...) then the basis expects inputs of shape (n_samples, n, m,...)
  • Arrays: directly the inputs that are going to be provided from compute features.

It prepares the basis and precomputes some internal quantities that are useful for subsequent functionalities: knowing how to split the feature axis, i.e. split_by_features. In the next PR, TransformerBasis will use the pre-computed shapes to split out the concatenated inputs before processing.

Additionally, it will compute and store the number of output features, this could be an information that a user may want to have after building a complex composite basis.

This PR follows #275

EDIT:

PR Summary

In this PR, I refined the class structure to better separate basis attributes and methods, delegating their validation logic to mixin classes wherever possible.

New Classes

  • CompositeBasisMixin:
    This mixin is inherited by additive and multiplicative bases. It implements methods for traversing the composite basis tree, such as __sklearn_clone__ and setup_basis.

  • AtomicBasisMixin:
    Designed for non-composite (atomic) bases, this mixin stores the n_basis_funcs parameter and implements selected methods like __sklearn_clone__, which have uniform implementations across all atomic bases.

New Abstract Methods

  • set_input_shape:
    This method stores the expected input shape, a state attribute required for compatibility with transformers. Parameters set by this method are carried over during cloning, such as in cross-validation. Concrete implementations are provided in AtomicBasisMixin and CompositeBasisMixin.

  • _set_input_independent_states:
    Responsible for setting all state variables that depend on class parameters (provided at initialization and retrievable with get_params).

  • setup_basis:
    Computes all state variables, both input-dependent (e.g., input shape) and input-independent (e.g., kernels for convolutional bases). Concrete implementations are found in Eval, Conv, and Composite basis mixins.

Clone Method for Bases

  • __sklearn_clone__:
    Implements cloning logic to retain input-dependent states (such as input shape), which would otherwise be discarded by sklearn.base.clone. This is implemented in both the CompositeBasisMixin and AtomicBasisMixin.

Modified/Moved Attributes and Methods

  • Kernel-related logic:
    The kernel_ attribute, along with the _check_has_kernel method and set_kernel, has been moved to the Conv mixin.

  • Input shape validation:
    The _check_input_shape_consistency method has been relocated to AtomicBasisMixin and CompositeBasisMixin.

  • Composite basis setters:
    Setters for basis1 and basis2 in composite bases now support cross-validation scenarios.

Inheritance of New Mixins

  • CompositeBasisMixin:
    Inherited by AdditiveBasis and MultiplicativeBasis.

  • AtomicBasisMixin:
    Inherited by the following classes:

    • SplineBasis (the superclass for all splines)
    • RaisedCosineLinear (the superclass for all raised cosines)
    • OrthExponentialBasis (the superclass for orthogonal exponential bases)

@codecov-commenter
Copy link

codecov-commenter commented Dec 5, 2024

Codecov Report

Attention: Patch coverage is 95.36680% with 12 lines in your changes missing coverage. Please review.

Project coverage is 96.02%. Comparing base (3691356) to head (2ade4bd).
Report is 1 commits behind head on development.

Files with missing lines Patch % Lines
src/nemos/basis/_transformer_basis.py 78.12% 7 Missing ⚠️
src/nemos/basis/basis.py 92.30% 4 Missing ⚠️
src/nemos/basis/_basis_mixin.py 99.11% 1 Missing ⚠️
Additional details and impacted files
@@               Coverage Diff               @@
##           development     #276      +/-   ##
===============================================
- Coverage        96.13%   96.02%   -0.11%     
===============================================
  Files               34       34              
  Lines             2507     2668     +161     
===============================================
+ Hits              2410     2562     +152     
- Misses              97      106       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@BalzaniEdoardo BalzaniEdoardo mentioned this pull request Dec 15, 2024
@BalzaniEdoardo
Copy link
Collaborator Author

BalzaniEdoardo commented Dec 16, 2024

In the next PR I'll improve the API of the transformer and move the TransformerBasis tests in a dedicated script.

If you want an overview on how to work with the TransformerBasis before digging into the code, checkout this

https://nemos--276.org.readthedocs.build/en/276/how_to_guide/plot_05_transformer_basis.html

Copy link
Collaborator

@sjvenditto sjvenditto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good! I just have a couple of suggested changes to clarify in documentation/error messages where set_input_shape will have an impact. Also, some code in a Warning admonition in plot_06_sklearn_pipeline_cv_demo needs to be updated with the new syntax

src/nemos/basis/_basis.py Outdated Show resolved Hide resolved
src/nemos/basis/_basis_mixin.py Outdated Show resolved Hide resolved
Copy link
Member

@billbrod billbrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't had a chance to look at tests, but here's my first go through. Some tweaking in the tutorials, and also:

  • I think we can probably move away from using numbers in the names of the tutorials. The only reason to do so is for the automatic sorting to work, but we're no longer making use of automatic sorting, right?
  • I am confused with what lives in the Basis superclass and what lives in AtomicBasisMixin. For example, why does anything to do with n_basis_input or n_basis_funcs live in Basis?

name: python3
---

# Converting NeMoS Bases To scikit-learn Transformers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title shows up really long in the readme, maybe shorten it to "Using bases as scikit-learn transformers"?

`scikit-learn` is a powerful machine learning library that provides advanced tools for creating data analysis pipelines, from input transformations to model fitting and cross-validation.

All of `scikit-learn`'s machinery relies on strict assumptions about input structure. In particular, all `scikit-learn`
objects require inputs as arrays of at most two dimensions, where the first dimension represents the time (or samples)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
objects require inputs as arrays of at most two dimensions, where the first dimension represents the time (or samples)
objects require inputs to be arrays of at most two dimensions, where the first dimension represents the time (or samples)

While this may feel rigid, it enables transformations to be seamlessly chained together, greatly simplifying the
process of building stable, complex pipelines.

On the other hand, `NeMoS` takes a different approach to feature construction. `NeMoS`' bases are composable constructors that allow for more flexibility in the required input structure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
On the other hand, `NeMoS` takes a different approach to feature construction. `NeMoS`' bases are composable constructors that allow for more flexibility in the required input structure.
`NeMoS` takes a different approach to feature construction: `NeMoS`' bases are composable constructors that allow for more flexibility in input structure.

Comment on lines +28 to +29
Depending on the basis type, it can accept one or more input arrays or `pynapple` time series data, each of which can take any shape as long as the time (or sample) axis is the first of each array;
`NeMoS` design favours object composability: one can combine any two or more bases to compute complex features, with a user-friendly interface that can accept a separate array/time series for each input type (e.g., an array with the spike counts, an array for the animal's position, etc.).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Depending on the basis type, it can accept one or more input arrays or `pynapple` time series data, each of which can take any shape as long as the time (or sample) axis is the first of each array;
`NeMoS` design favours object composability: one can combine any two or more bases to compute complex features, with a user-friendly interface that can accept a separate array/time series for each input type (e.g., an array with the spike counts, an array for the animal's position, etc.).
They can accept arrays or `pynapple` time series data, which can take any shape as long as the time (or sample) axis is the first of each array.
Furthermore, `NeMoS` design favours object composability: one can combine bases into [`CompositeBasis`](composing_basis_function) objects to compute complex features, with a user-friendly interface that can accept a separate array/time series for each input type (e.g., an array with the spike counts, an array for the animal's position, etc.).

"depending on basis type" is referring to Composite vs. atomic bases, right? This makes it sound like e.g., raised cosine accepts one whereas msplines accept two. in which case, I think we want the discussion of "multiple inputs" to be more clearly linked to composite

## From Basis to TransformerBasis


With NeMoS, you can easily create a basis accepting two inputs. Let's assume that we want to process neural activity stored in a 2-dimensional spike count array of shape `(n_samples, n_neurons)` and a second array containing the speed of an animal, with shape `(n_samples,)`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
With NeMoS, you can easily create a basis accepting two inputs. Let's assume that we want to process neural activity stored in a 2-dimensional spike count array of shape `(n_samples, n_neurons)` and a second array containing the speed of an animal, with shape `(n_samples,)`.
With NeMoS, you can easily create a basis which accepts two inputs. Let's assume that we want to process neural activity stored in a 2-dimensional spike count array of shape `(n_samples, n_neurons)` and a second array containing the speed of an animal, with shape `(n_samples,)`.

from ._basis import Basis


def set_input_shape_state(method):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not make this more general and accept a list of attributes to copy, which default to ["_n_basis_input_", "_input_shape_"]?

self.set_input_shape(*xi)
return self

def _set_input_independent_states(self) -> "EvalBasisMixin":
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently this only appears to call set_kernel in the convolutional mixin and does nothing anywhere else -- do we really need it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, is it so that the composite basis can just call this method instead of checking whether kernel needs to be set?

# pass down the input shapes
self.set_input_shape(*shapes)
elif any(set_bases):
warnings.warn(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't actually necessary, the following runs fine. should it?

b = nmo.basis.BSplineEval(5)
b.setup_basis(np.ones((100, 5)))
b2 = nmo.basis.BSplineEval(5)
c = b+b2
c.compute_features(np.ones((100, 5)), np.ones((100,2)))

Comment on lines +584 to +590
@property
def basis1(self):
return self._basis1

@basis1.setter
def basis1(self, bas: Basis):
self._basis1 = bas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these properties if we do nothing special in the getter or setter? why not just use self.basis1 instead of self._basis1?

f"is {self.n_basis_funcs}."
)

def set_kernel(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this ever get reached? since _check_window_size gets called during init?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants