Add "how to" section on how to load matlab data #2018

h-mayorquin · 2023-09-19T13:40:24Z

We have a couple of users that came to ask us how to load data from matlab. I think that this is a good scenario for writing a how to so we can help new users without having to repeat ourselves.

I am hesistant to go all the way to make a specific extractor from matlab as @alejoe91 suggested because:

Matlab files might come in two formats: hdf5 and the classical one. The hdf5 can't be opened by scipy but the old one can and they don't have the same API. There are some project in that direction (shotout to https://pypi.org/project/pymatreader/ where I had a really small contribution recently) but there are still a lot of inconsistencies.
I don't have matlab and I don't want to build tests where I have to rely on how the users stored their data there. I feel that we will get some users getting erros within matlab and that sounds like annoying to debug and support.

Because of aspects two I feel that is just easier to give users general instructions of how to transform their data in matlab to a general format. In this case, binary should work well and we should then be able to process the data lazily using the BinaryRecordingExtractor. This tutorial does that. If we eventually decide to move all the way to build an extractor for matlab, then we can deprecate this section, meanwhile I think it can be useful to offer support to users.

h-mayorquin · 2023-09-19T13:52:21Z

Here is the link to the relevant section:

https://spikeinterface--2018.org.readthedocs.build/en/2018/how_to/load_matalb_data.html

zm711

I think it would be useful to make sure people can transition. Matlab to python can feel like a pretty big jump, so giving some tips is great. Maybe link the numpy matlab to python doc at the bottom too?
https://numpy.org/doc/stable/user/numpy-for-matlab-users.html

doc/how_to/load_matalb_data.rst

zm711 · 2023-09-19T13:52:57Z

doc/how_to/load_matalb_data.rst

+   file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")
+


Suggested change

file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")

file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")

# or for Windows

# file_path = Path(r"c:\path\to\your\data\your_data_as_a_binary.bin")

I tend to recommend just warning windows users ahead of time so that they don't come with a bunch of path issues.

Good point.
What do you think of the "path/to/your/data/" visual trick to let them know that it is a path? In another package we have similar tutorials:

https://neuroconv.readthedocs.io/en/main/conversion_examples_gallery/recording/spikeglx.html

And we opted for something more concise there (just a variable in caps). I would like to hear if you have any ideas on how to make this clearer.

I think the path to variable is super tricky. Because some users will actually put path/to/variable thinking that it is a hard coded path. And then adding in windows confusion with the opposite slashes. But as far as strategies I still think it is best despite being a bit longer and less concise.

Only issue with the concise variable style is that for other users they might not realize what's behind the variable. So that's why I prefer the path/to/variable style although it is not perfect. For Linux users I think they tend to be more comfortable with the path notation. But Windows has a right+click copy path option that will copy with x\y\z instead, so even though they could put in a Path(c:/my/data) they almost never do. And then since Windows has backslashes with escapes if we don't warn them about using the raw string instead they will get pathway errors due to the escaping and get frustrated.

In summary, I would leave it as you have it, but add in a the comment for windows users so when they copy the path from the computer they know that they likely need the r.

Thanks for your input.

doc/how_to/load_matalb_data.rst

zm711 · 2023-09-19T13:56:33Z

doc/how_to/load_matalb_data.rst

+   recording = BinaryRecordingExtractor(file_path, sampling_frequency=sampling_frequency,
+                                        num_channels=num_channels, dtype=dtype, gain_to_uV=1, offset_to_uV=0)


I wonder if we need to warn them about gain_to_uV and offset_to_uV. Maybe just a comment saying that these are used to convert to the actual voltages in case their data came from a reader that returned "proprietary units" that haven't been converted yet?

@alejoe91 @zm711
My own preference here would be to hide this complexity from matlab users who I presume are less likely to be experienced. I want to write docs on how to load binary data that will indluce details such as gains and the time_axis=1 point that @alejoe91 mentions below.

I am thinking that within the Daniele Procida
this falls squarely into the "how-to/goal-oriented" type of documentation. For that type, I don't want to burden the user with extra details that are not 100 % related to the task at hand. This also how I like my how-to guides to be. I don't want asides.

In fact, I was thinking on not having this arguments at all (that is, leaving them at the default values of None) but I was concerned that some methods of pre-processing or spike-sorting might not work without this. Is that correct, @alejoe91 ? Would it there be any downsides on having a recording without gains or offsets?

If there are no downsides, I would rather omit these and then mention at the end that there is a more complete how-to of read_binary somewhere else that they can go once is available.

How do you guys think about this?

Well the use case could be that the user have their data in int16. In that case, gains and offsets are needed to correctly convert to uV. I think it's an important concept to spend a couple of words on and it is related to the task!

I like the idea of having a super goal oriented tutorial to get the job done. But in this case those two values are (at least in my opinion) important for the task at hand. As @alejoe91 said in the case the data was stored as an int16 and needs to converted into a voltage. I think at a minimum a comment saying that not all data are in the correct format for all sorters so gain_to_uV and offset_to_uV will give you this fine control and then a link to relevant documentation that explains how these options work. Although in this case I tend toward educating (or maybe you'd argue over-educating) the user rather than overwhelming the user.

gain_to_uv and offset_to_uv are an implementation details of data extraction. My fear is that most users will not be familiar with them and that they already will have the data in the right units (as they are using that in MATLAB probably to analyze). Introducing these concepts to new users -as I was once- is likely to raise more questions, confuse and derail. I don't think we can get away with just a few comments.

We can meet in the middle. I added a specific section at the end dealing with integer typed traces, this separate the information streams and does not get in the way of new users and gives us more space to introduce the necessary context for using gains and offsets. Could you check it @alejoe91 and @zm711 ?

I'm fine with that section. It fits with the design I like (hive off the extra info for those that need it, but give an off-ramp for those who don't). Thanks @h-mayorquin :)

Great. Do check the video linked above if you haven't before. I think is very good by the way.

It is now bookmarked. I'll check it out this afternoon :)

doc/how_to/load_matalb_data.rst

alejoe91 · 2023-09-19T14:19:52Z

doc/how_to/load_matalb_data.rst

+Common Pitfalls & Tips
+----------------------
+
+1. **Data Shape**: Always ensure that your MATLAB data matrix's first dimension corresponds to samples/time and the second to channels.


You can mention that if the data is in the other shape, one can use time_axis=1: https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/core/binaryrecordingextractor.py#L66

I added this.

Thinking about this, it makes me feel that the order argument in the get_traces is kind of redundant:

https://github.com/catalystneuro/spikeinterface/blob/1ead6a33e658bf5a0365d21506a90dd9bd32e67c/src/spikeinterface/core/baserecording.py#L260-L261

What do you think?

doc/how_to/load_matalb_data.rst

doc/how_to/index.rst

doc/how_to/load_matlab_data.rst

Co-authored-by: Zach McKenzie <[email protected]>

doc/how_to/load_matlab_data.rst

zm711 · 2023-09-20T12:25:39Z

@h-mayorquin

Sorry just one more comment and then looks good to me.

h-mayorquin · 2023-09-20T12:49:10Z

@zm711

Sorry just one more comment and then looks good to me.

Thanks for all your feedback. No worries if you want to do more suggestions. I think your feedback has improved the PR a lot.

doc/how_to/load_matlab_data.rst

samuelgarcia · 2023-09-20T14:37:40Z

I did not have time to read this yet. Sorry.
Lets keep in mind that how page are local "jupytext" file from examples/how_to/ converted to rst an not direct rst push.

In short slow notebook are done locally in examples/how_to/ and exported to doc/how_to/ manually
And fast notebook are generated by the sphinx-gallery mechanism in examples/modules_gallery.

Co-authored-by: Zach McKenzie <[email protected]>

h-mayorquin · 2023-09-20T14:48:16Z

@samuelgarcia
I did not do this with a notebook, I just wrote the rst as it is.Are you suggesting that I move it somewhere else?

zm711 · 2023-09-20T14:58:58Z

I'll be honest this is my confusion in general. how_to to me means general guides to various things. So for example how to convert matlab to python. It doesn't require a notebook, but as an end user I would go to the how to folder to see how to do this. Whereas the current organization is that how to is just for big scripts to run locally as notebooks and convert. I would tend to just add. I don't have a solution, but I can say as a reader of documentation, doing it @h-mayorquin 's way of adding this to the how to makes sense for me being able to find it even though it clashes with the developer work flow.

alejoe91 · 2023-09-20T15:02:22Z

I'll be honest this is my confusion in general. how_to to me means general guides to various things. So for example how to convert matlab to python. It doesn't require a notebook, but as an end user I would go to the how to folder to see how to do this. Whereas the current organization is that how to is just for big scripts to run locally as notebooks and convert. I would tend to just add. I don't have a solution, but I can say as a reader of documentation, doing it @h-mayorquin 's way of adding this to the how to makes sense for me being able to find it even though it clashes with the developer work flow.

I agree. It's not a rule that all how-tos should be a python script first. Like in this case, there is no point whatsoever to make a python script

h-mayorquin · 2023-09-20T15:07:23Z

I'll be honest this is my confusion in general. how_to to me means general guides to various things. So for example how to convert matlab to python. It doesn't require a notebook, but as an end user I would go to the how to folder to see how to do this. Whereas the current organization is that how to is just for big scripts to run locally as notebooks and convert. I would tend to just add. I don't have a solution, but I can say as a reader of documentation, doing it @h-mayorquin 's way of adding this to the how to makes sense for me being able to find it even though it clashes with the developer work flow.

I was not aware that the "how to" was mean to be that. Sam just explained that to me in a call.

zm711 · 2023-09-21T11:01:43Z

Well, wherever this ends up in the toctree it looks good to me!

alejoe91 · 2023-09-21T11:37:17Z

Same for me! @samuelgarcia wanna take a final look?

Co-authored-by: Alessio Buccino <[email protected]>

add tutorial to load matlab data

fac9823

h-mayorquin added the documentation Improvements or additions to documentation label Sep 19, 2023

h-mayorquin requested review from alejoe91 and zm711 September 19, 2023 13:40

h-mayorquin self-assigned this Sep 19, 2023

zm711 reviewed Sep 19, 2023

View reviewed changes

alejoe91 reviewed Sep 19, 2023

View reviewed changes

doc/how_to/load_matalb_data.rst Outdated Show resolved Hide resolved

alejoe91 mentioned this pull request Sep 19, 2023

Add read_binary and read_zarr functions to extractord and docs API #2019

Merged

alejoe91 reviewed Sep 19, 2023

View reviewed changes

doc/how_to/load_matalb_data.rst Outdated Show resolved Hide resolved

alejoe91 reviewed Sep 19, 2023

View reviewed changes

doc/how_to/index.rst Outdated Show resolved Hide resolved

zm711 mentioned this pull request Sep 19, 2023

how to use S.I. with custom matlab data in MountainSort 5. #1975

Closed

h-mayorquin added 5 commits September 20, 2023 10:24

suggestions

a395c3c

add an assertion

6130e5b

my final version

0842509

final review

1ead6a3

typo

e31978c

zm711 reviewed Sep 20, 2023

View reviewed changes

doc/how_to/load_matlab_data.rst Outdated Show resolved Hide resolved

h-mayorquin and others added 2 commits September 20, 2023 12:57

Update doc/how_to/load_matlab_data.rst

5aba5e0

Co-authored-by: Zach McKenzie <[email protected]>

Merge branch 'main' into add_matlab_documentation

be8d491

zm711 reviewed Sep 20, 2023

View reviewed changes

doc/how_to/load_matlab_data.rst Outdated Show resolved Hide resolved

correction

b231e2d

zm711 reviewed Sep 20, 2023

View reviewed changes

doc/how_to/load_matlab_data.rst Outdated Show resolved Hide resolved

alejoe91 reviewed Sep 20, 2023

View reviewed changes

doc/how_to/load_matlab_data.rst Outdated Show resolved Hide resolved

Update doc/how_to/load_matlab_data.rst

fb76815

Co-authored-by: Zach McKenzie <[email protected]>

h-mayorquin and others added 2 commits September 21, 2023 14:01

Update doc/how_to/load_matlab_data.rst

9ba6fc6

Co-authored-by: Alessio Buccino <[email protected]>

Merge branch 'main' into add_matlab_documentation

add9f98

samuelgarcia approved these changes Sep 27, 2023

View reviewed changes

samuelgarcia merged commit ceaebfa into SpikeInterface:main Sep 27, 2023
8 checks passed

samuelgarcia deleted the add_matlab_documentation branch September 27, 2023 09:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add "how to" section on how to load matlab data #2018

Add "how to" section on how to load matlab data #2018

h-mayorquin commented Sep 19, 2023

h-mayorquin commented Sep 19, 2023

zm711 left a comment

zm711 Sep 19, 2023

h-mayorquin Sep 20, 2023

zm711 Sep 20, 2023

h-mayorquin Sep 20, 2023

zm711 Sep 19, 2023

h-mayorquin Sep 20, 2023

alejoe91 Sep 20, 2023

zm711 Sep 20, 2023

h-mayorquin Sep 20, 2023 •

edited

Loading

zm711 Sep 20, 2023

h-mayorquin Sep 20, 2023

zm711 Sep 20, 2023

alejoe91 Sep 19, 2023

h-mayorquin Sep 20, 2023

zm711 commented Sep 20, 2023

h-mayorquin commented Sep 20, 2023

samuelgarcia commented Sep 20, 2023

h-mayorquin commented Sep 20, 2023

zm711 commented Sep 20, 2023 •

edited

Loading

alejoe91 commented Sep 20, 2023

h-mayorquin commented Sep 20, 2023

zm711 commented Sep 21, 2023

alejoe91 commented Sep 21, 2023

		file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")

		recording = BinaryRecordingExtractor(file_path, sampling_frequency=sampling_frequency,
		num_channels=num_channels, dtype=dtype, gain_to_uV=1, offset_to_uV=0)

Add "how to" section on how to load matlab data #2018

Add "how to" section on how to load matlab data #2018

Conversation

h-mayorquin commented Sep 19, 2023

h-mayorquin commented Sep 19, 2023

zm711 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

h-mayorquin Sep 20, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zm711 commented Sep 20, 2023

h-mayorquin commented Sep 20, 2023

samuelgarcia commented Sep 20, 2023

h-mayorquin commented Sep 20, 2023

zm711 commented Sep 20, 2023 • edited Loading

alejoe91 commented Sep 20, 2023

h-mayorquin commented Sep 20, 2023

zm711 commented Sep 21, 2023

alejoe91 commented Sep 21, 2023

h-mayorquin Sep 20, 2023 •

edited

Loading

zm711 commented Sep 20, 2023 •

edited

Loading