Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
agosztolai authored Dec 1, 2023
1 parent 91eba6e commit 392223f
Showing 1 changed file with 28 additions and 28 deletions.
56 changes: 28 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ See full documentation [here](https://agosztolai.github.io/MARBLE/).

## Installation

The code is tested for both cpu and gpu (CUDA) machines running Linux or OSX. Although smaller examples run fast on cpu, for larger datasets, it is highly recommended that you use a gpu machine.
The code is tested for CPU and GPU (CUDA) machines running Linux or OSX. Although smaller examples run fast on CPU, for larger datasets, it is highly recommended that you use a GPU machine.

We recommend you install the code in a fresh Anaconda virtual environment, as follows.

Expand All @@ -40,7 +40,7 @@ We recommend you install the code in a fresh Anaconda virtual environment, as fo
git clone https://github.com/agosztolai/MARBLE
```

- Then, create an new anaconda environment using the provided environment file that matches your system.
- Then, create a new anaconda environment using the provided environment file that matches your system.
- For Linux machines with CUDA:

`conda env create -f environment.yml`
Expand All @@ -49,21 +49,21 @@ git clone https://github.com/agosztolai/MARBLE
`conda env create -f environment_osx_intel.yml`

- For recent M1/M2/M3 Mac:
- Install cmake `brew install cmake` or using the installer on the [cmake website](https://cmake.org/download/)
- Install cmake `brew install cmake` or use the installer on the [cmake website](https://cmake.org/download/)
- Create the environment

`conda env create -f environment_osx_arm.yml`
- Activate the environment `conda activate MARBLE`
- Install pytorch geometric
- Install PyTorch geometric
`pip install -r requirements_osx_arm.txt`

- For Windows computers:
we recommend using WSL2 that allows running a (virtual) Linux machine inside your windows computer, which makes the installation simpler. If you have a NVIDIA GPU, WSL2 will allow to take advantage of the GPU (an older version of WSL will not).
we recommend using WSL2, which allows running a (virtual) Linux machine inside your Windows computer, which makes the installation simpler. If you have a NVIDIA GPU, WSL2 will allow you to take advantage of the GPU (an older version of WSL will not).
- Follow the [instructions to install WSL2](https://learn.microsoft.com/en-us/windows/wsl/install)
- Open "Ubuntu" and install a compiler `sudo apt update && sudo apt install gcc g++`
- Proceed with conda install and environment creation as described for Linux machines.
- If you do not want to use WSL, this is possible albeit more complicated. You need to have a working compiler (e.g. Visual Studio or [MSYS2](https://www.msys2.org/)). Once installed, along with conda you can create the python environment using `conda env create -f environment_windows_native.yml`.
- All the required dependencies are now installed. Finally, activate the environment and install by running inside the main folder
- If you do not want to use WSL, this is possible, albeit more complicated. You need to have a working compiler (e.g. Visual Studio or [MSYS2](https://www.msys2.org/)). Once installed, along with conda you can create the Python environment using `conda env create -f environment_windows_native.yml`.
- All the required dependencies are now installed. Finally, activate the environment and install it by running inside the main folder

```
conda activate MARBLE
Expand All @@ -86,28 +86,28 @@ import MARBLE
data = MARBLE.construct_dataset(pos, features=x)
```

The main attributes are `data.pos` - manifold positions concatenated, `data.x` - manifold signals concatenated and `data.y` - identifiers that tell you which manifold the poitn belongs to. Read more about [other usedul data attributed](#construct).
The main attributes are `data.pos` - manifold positions concatenated, `data.x` - manifold signals concatenated and `data.y` - identifiers that tell you which manifold the point belongs to. Read more about [other usedul data attributed](#construct).

Now you can initialise and train a MARBLE model. Read more about [training parameters](#training).
Now, you can initialise and train a MARBLE model. Read more about [training parameters](#training).

```
from MARBLE import net
model = MARBLE.net(data)
model.run_training(data)
model.fit(data)
```

By default, MARBLE operated in geometry-aware mode. You can enable the geometry-agnostic mode by changing the initalisation step to
By default, MARBLE operated in geometry-aware mode. You can enable the geometry-agnostic mode by changing the initialisation step to

```
model = MARBLE.net(data, params = {'inner_product_features': True})
```

Read more about the geometry-aware and geometry-agnostic modes [here](#innerproduct)

After you have trained your model, you can evaluate evaluate your model on your dataset, or another dataset to obtain an embedding all manifold points in joint latent space (3-dimensional by default) based on their local vector field features.
After you have trained your model, you can evaluate your model on your dataset or another dataset to obtain an embedding of all manifold points in joint latent space (3-dimensional by default) based on their local vector field features.

```
data = model.evaluate(data) #adds an attribute `data.emb`
data = model.transform(data) #adds an attribute `data.emb`
```

To recover the embeddings of individual vector fields, use `data.emb[data.y==0]`.
Expand All @@ -128,7 +128,7 @@ plotting.fields(data) #visualise the original vector fields over manifolds
plotting.embedding(data, data.y.numpy()) #visualise embedding
```

There are loads of parameters to adjust these plots, so have a look at the respective functions.
There are loads of parameters to adjust these plots, so look at the respective functions.

## Examples

Expand All @@ -148,20 +148,20 @@ If you just want to play around with dynamical systems, why not try our (experim
<a name="conditions"></a>
### More on different conditions

Comparing dynamics in a data-driven way is equivalent to comparing the corresponding vector fields based on their respective sample sets. The dynamics to be compared might correspond to different experimental conditions (stimulation conditions, genetic perturbations etc.), dynamical systems (different task, different brain region).
Comparing dynamics in a data-driven way is equivalent to comparing the corresponding vector fields based on their respective sample sets. The dynamics to be compared might correspond to different experimental conditions (stimulation conditions, genetic perturbations, etc.) and dynamical systems (different tasks, different brain regions).

Suppose we have the data pairs `pos1, pos2` and `x1, x2`. Then we may concatenate them as a list to ensure that our pipeline handles them independently (on different manifolds), but embeds them jointly in the same space.

```
pos_list, x_list = [pos1, pos2], [x1, x2]
```

Note, it is sometimes useful to consider that two vector fields lie on independent manifolds (providing them as a list) even when we want to *discover* the contrary. However, when we know that two vector fields lie on the same manifold, then it can be advantageous to stack their corresponding samples (stacking them) as this will enforce geometric relationships between them through the proximity graph.
It is sometimes useful to consider that two vector fields lie on independent manifolds (providing them as a list) even when we want to *discover* the contrary. However, when we know that two vector fields lie on the same manifold, then it can be advantageous to stack their corresponding samples (stacking them) as this will enforce geometric relationships between them through the proximity graph.

<a name="construct"></a>
### More on constructing data object

Our pipleline is build around a Pytorch Geometric data object, which we can obtain by running the following constructor.
Our pipeline is built around a Pytorch Geometric data object, which we can obtain by running the following constructor.

```
import MARBLE
Expand All @@ -170,27 +170,27 @@ data = MARBLE.construct_dataset(pos, features=x, stop_crit=0.03, graph_type='ckn

This command will do several things.

1. Subsample the point cloud using farthest point sampling to achieve even sampling density. Using `stop_crit=0.03` means the average distance between the subsampled points will equal to 3% of the manifold diameter.
2. Fit a nearest neighbour graph to each point cloud, here using the `graph_type=cknn` method using `k=15` nearest neighbours. We implemented other graph algorithms, but cknn typically works. Note, `k` should be large enough to approximate the tangent space, but small enough not to connect (geodesically) distant points of the manifold. The more data you have the higher `k` you can use.
3. Perform operations in local (manifold) gauges or global coordinates. Note that `local_gauge=False` should be used whenever the manifold has negligible curvature on the scale of the local feature. Setting `local_gauge=True` means that the code performs tangent space alignments before computing gradients, however, this will increase the cost of the computations $m^2$-fold, where $m$ is the manifold dimension, because points will be treated as vector spaces. See the example of a [simple vector fields over curved surfaces](https://github.com/agosztolai/MARBLE/blob/main/examples/ex_vector_field_curved_surface.py) for illustration.
1. Subsample the point cloud using farthest point sampling to achieve even sampling density. Using `stop_crit=0.03` means the average distance between the subsampled points will equal 3% of the manifold diameter.
2. Fit a nearest neighbour graph to each point cloud using the `graph_type=cknn` method using `k=15` nearest neighbours. We implemented other graph algorithms, but cknn typically works. Note that `k` should be large enough to approximate the tangent space but small enough not to connect (geodesically) distant points of the manifold. The more data you have, the higher `k` you can use.
3. Perform operations in local (manifold) gauges or global coordinates. Note that `local_gauge=False` should be used whenever the manifold has negligible curvature on the scale of the local feature. Setting `local_gauge=True` means that the code performs tangent space alignments before computing gradients. However, this will increase the cost of the computations $m^2$-fold, where $m$ is the manifold dimension because points will be treated as vector spaces. See the example of a [simple vector fields over curved surfaces](https://github.com/agosztolai/MARBLE/blob/main/examples/ex_vector_field_curved_surface.py) for illustration.


The final data object contains the following attributes (among others):

```
data.pos: positions `pos` concatenated across manifolds
data.x: vectors `x` concatenated across manifolds
data.y: labels for each points denoting which manifold it belongs to
data.edge_index: edge list of proximity graph (each manifold gets its own graph, disconnected from others)
data.y: labels for each point denoting which manifold it belongs to
data.edge_index: edge list of proximity graph (each manifold gets its graph, disconnected from others)
data.gauges: local coordinate bases when `local_gauge=True`
```

### How to pick good parameters

Choosing good parameters for the description of manifold, in particular `stop_crit` and `k`, can be essential for the success of your analysis. The illustration below shows three different scenarios to give you intuition.

1. (left) **'Optimal' scenario.** Here, the sample spacing along trajectories and between trajectories is comparable and `k` is choosen such that the proximity graph connects to neighbours but no further. At the same time `k` is large enough to have enough neighbours for gradient approximation. Notice the trade-off here.
2. (middle) **Suboptimal scenario 1.** Here, the sample spacing is much smaller along the trajectory than between trajectories. This is probably frequently encountered when there are few trials relative to the dimension of the manifold and size of basin of attraction. Fitting a proximity graph to this dataset will lead to either a poorly connected manifold or having too many neighbours pointing to consecutive points on the trajectory, leading to poor gradient approximation. Also, too dense discretisation will mean that second-order features will not pick up on second-order features (curvature)of the trajectories. **Fix:** either increase `stop_crit` and/or subsample your trajectories before using `construct_dataset()`.
1. (left) **'Optimal' scenario.** Here, the sample spacing along trajectories and between trajectories is comparable and `k` is chosen such that the proximity graph connects to neighbours but no further. At the same time, `k` is large enough to have enough neighbours for gradient approximation. Notice the trade-off here.
2. (middle) **Suboptimal scenario 1.** Here, the sample spacing is much smaller along the trajectory than between trajectories. This is probably frequently encountered when there are few trials relative to the dimension of the manifold and the size of the basin of attraction. Fitting a proximity graph to this dataset will lead to a poorly connected manifold or having too many neighbours pointing to consecutive points on the trajectory, leading to poor gradient approximation. Also, too-dense discretisation will mean that second-order features will not pick up on second-order features (curvature)of the trajectories. **Fix:** either increase `stop_crit` and/or subsample your trajectories before using `construct_dataset()`.
3. (right) **Suboptimal scenario 2.** Here, there are too few sample points relative to the curvature of the trajectories. As a result, the gradient approximation will be inaccurate. **Fix:** decrease `stop_crit` or collect more data.

<img src="doc/assets/illustration_for_github.png" width="800"/>
Expand All @@ -212,7 +212,7 @@ params = {'epochs': 50, #optimisation epochs
```

Then proceed by constructing a network object
Then, proceed by constructing a network object

```
model = MARBLE.net(data, params=params)
Expand All @@ -221,7 +221,7 @@ model = MARBLE.net(data, params=params)
Finally, launch training. The code will continuously save checkpoints during training with timestamps.

```
model.run_training(data, outdir='./outputs')
model.fit(data, outdir='./outputs')
```

If you have previously trained a network or have interrupted training, you can load the network directly as
Expand All @@ -230,7 +230,7 @@ If you have previously trained a network or have interrupted training, you can l
model = MARBLE.net(data, loadpath=loadpath)
```

where loadpath can be either a path to the model (with a specific timestamp, or a directory to automatically load the latest model. By running `MARBLE.run_training()`, training will resume from the last checkpoint.
where loadpath can be either a path to the model (with a specific timestamp, or a directory to load the latest model automatically. By running `MARBLE.fit()`, training will resume from the last checkpoint.

<a name="innerproduct"></a>
### Geometry-aware and geometry-agnostic modes
Expand Down Expand Up @@ -268,7 +268,7 @@ If this still does not work, check if there are very small or very large vector

## Stay in touch

If all hopes are lost, or if you want to chat about your use case, get in touch or raise an issue! We are happy to help and looking to further develop this package to make it as useful as possible.
If all hope is lost, or if you want to chat about your use case, get in touch or raise an issue! We are happy to help and looking to further develop this package to make it as useful as possible.


## References
Expand Down

0 comments on commit 392223f

Please sign in to comment.