Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "how to" section on how to load matlab data #2018

Merged

Conversation

h-mayorquin
Copy link
Collaborator

We have a couple of users that came to ask us how to load data from matlab. I think that this is a good scenario for writing a how to so we can help new users without having to repeat ourselves.

I am hesistant to go all the way to make a specific extractor from matlab as @alejoe91 suggested because:

  1. Matlab files might come in two formats: hdf5 and the classical one. The hdf5 can't be opened by scipy but the old one can and they don't have the same API. There are some project in that direction (shotout to https://pypi.org/project/pymatreader/ where I had a really small contribution recently) but there are still a lot of inconsistencies.
  2. I don't have matlab and I don't want to build tests where I have to rely on how the users stored their data there. I feel that we will get some users getting erros within matlab and that sounds like annoying to debug and support.

Because of aspects two I feel that is just easier to give users general instructions of how to transform their data in matlab to a general format. In this case, binary should work well and we should then be able to process the data lazily using the BinaryRecordingExtractor. This tutorial does that. If we eventually decide to move all the way to build an extractor for matlab, then we can deprecate this section, meanwhile I think it can be useful to offer support to users.

@h-mayorquin h-mayorquin added the documentation Improvements or additions to documentation label Sep 19, 2023
@h-mayorquin h-mayorquin self-assigned this Sep 19, 2023
@h-mayorquin
Copy link
Collaborator Author

Here is the link to the relevant section:

https://spikeinterface--2018.org.readthedocs.build/en/2018/how_to/load_matalb_data.html

image

Copy link
Collaborator

@zm711 zm711 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to make sure people can transition. Matlab to python can feel like a pretty big jump, so giving some tips is great. Maybe link the numpy matlab to python doc at the bottom too?
https://numpy.org/doc/stable/user/numpy-for-matlab-users.html

doc/how_to/load_matalb_data.rst Outdated Show resolved Hide resolved
doc/how_to/load_matalb_data.rst Outdated Show resolved Hide resolved
Comment on lines 43 to 44
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")
file_path = Path("/The/Path/To/Your/Data/your_data_as_a_binary.bin")
# or for Windows
# file_path = Path(r"c:\path\to\your\data\your_data_as_a_binary.bin")

I tend to recommend just warning windows users ahead of time so that they don't come with a bunch of path issues.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
What do you think of the "path/to/your/data/" visual trick to let them know that it is a path? In another package we have similar tutorials:

https://neuroconv.readthedocs.io/en/main/conversion_examples_gallery/recording/spikeglx.html

And we opted for something more concise there (just a variable in caps). I would like to hear if you have any ideas on how to make this clearer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the path to variable is super tricky. Because some users will actually put path/to/variable thinking that it is a hard coded path. And then adding in windows confusion with the opposite slashes. But as far as strategies I still think it is best despite being a bit longer and less concise.

Only issue with the concise variable style is that for other users they might not realize what's behind the variable. So that's why I prefer the path/to/variable style although it is not perfect. For Linux users I think they tend to be more comfortable with the path notation. But Windows has a right+click copy path option that will copy with x\y\z instead, so even though they could put in a Path(c:/my/data) they almost never do. And then since Windows has backslashes with escapes if we don't warn them about using the raw string instead they will get pathway errors due to the escaping and get frustrated.

In summary, I would leave it as you have it, but add in a the comment for windows users so when they copy the path from the computer they know that they likely need the r.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your input.

doc/how_to/load_matalb_data.rst Outdated Show resolved Hide resolved
Comment on lines 54 to 55
recording = BinaryRecordingExtractor(file_path, sampling_frequency=sampling_frequency,
num_channels=num_channels, dtype=dtype, gain_to_uV=1, offset_to_uV=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need to warn them about gain_to_uV and offset_to_uV. Maybe just a comment saying that these are used to convert to the actual voltages in case their data came from a reader that returned "proprietary units" that haven't been converted yet?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alejoe91 @zm711
My own preference here would be to hide this complexity from matlab users who I presume are less likely to be experienced. I want to write docs on how to load binary data that will indluce details such as gains and the time_axis=1 point that @alejoe91 mentions below.

I am thinking that within the Daniele Procida
this falls squarely into the "how-to/goal-oriented" type of documentation. For that type, I don't want to burden the user with extra details that are not 100 % related to the task at hand. This also how I like my how-to guides to be. I don't want asides.

In fact, I was thinking on not having this arguments at all (that is, leaving them at the default values of None) but I was concerned that some methods of pre-processing or spike-sorting might not work without this. Is that correct, @alejoe91 ? Would it there be any downsides on having a recording without gains or offsets?

If there are no downsides, I would rather omit these and then mention at the end that there is a more complete how-to of read_binary somewhere else that they can go once is available.

How do you guys think about this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the use case could be that the user have their data in int16. In that case, gains and offsets are needed to correctly convert to uV. I think it's an important concept to spend a couple of words on and it is related to the task!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of having a super goal oriented tutorial to get the job done. But in this case those two values are (at least in my opinion) important for the task at hand. As @alejoe91 said in the case the data was stored as an int16 and needs to converted into a voltage. I think at a minimum a comment saying that not all data are in the correct format for all sorters so gain_to_uV and offset_to_uV will give you this fine control and then a link to relevant documentation that explains how these options work. Although in this case I tend toward educating (or maybe you'd argue over-educating) the user rather than overwhelming the user.

Copy link
Collaborator Author

@h-mayorquin h-mayorquin Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gain_to_uv and offset_to_uv are an implementation details of data extraction. My fear is that most users will not be familiar with them and that they already will have the data in the right units (as they are using that in MATLAB probably to analyze). Introducing these concepts to new users -as I was once- is likely to raise more questions, confuse and derail. I don't think we can get away with just a few comments.

We can meet in the middle. I added a specific section at the end dealing with integer typed traces, this separate the information streams and does not get in the way of new users and gives us more space to introduce the necessary context for using gains and offsets. Could you check it @alejoe91 and @zm711 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with that section. It fits with the design I like (hive off the extra info for those that need it, but give an off-ramp for those who don't). Thanks @h-mayorquin :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Do check the video linked above if you haven't before. I think is very good by the way.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is now bookmarked. I'll check it out this afternoon :)

Common Pitfalls & Tips
----------------------

1. **Data Shape**: Always ensure that your MATLAB data matrix's first dimension corresponds to samples/time and the second to channels.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can mention that if the data is in the other shape, one can use time_axis=1: https://github.com/SpikeInterface/spikeinterface/blob/main/src/spikeinterface/core/binaryrecordingextractor.py#L66

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this.

Thinking about this, it makes me feel that the order argument in the get_traces is kind of redundant:

https://github.com/catalystneuro/spikeinterface/blob/1ead6a33e658bf5a0365d21506a90dd9bd32e67c/src/spikeinterface/core/baserecording.py#L260-L261

What do you think?

doc/how_to/index.rst Outdated Show resolved Hide resolved
@zm711
Copy link
Collaborator

zm711 commented Sep 20, 2023

@h-mayorquin

Sorry just one more comment and then looks good to me.

@h-mayorquin
Copy link
Collaborator Author

@zm711

Sorry just one more comment and then looks good to me.

Thanks for all your feedback. No worries if you want to do more suggestions. I think your feedback has improved the PR a lot.

@samuelgarcia
Copy link
Member

I did not have time to read this yet. Sorry.
Lets keep in mind that how page are local "jupytext" file from examples/how_to/ converted to rst an not direct rst push.

In short slow notebook are done locally in examples/how_to/ and exported to doc/how_to/ manually
And fast notebook are generated by the sphinx-gallery mechanism in examples/modules_gallery.

@h-mayorquin
Copy link
Collaborator Author

@samuelgarcia
I did not do this with a notebook, I just wrote the rst as it is.Are you suggesting that I move it somewhere else?

@zm711
Copy link
Collaborator

zm711 commented Sep 20, 2023

I'll be honest this is my confusion in general. how_to to me means general guides to various things. So for example how to convert matlab to python. It doesn't require a notebook, but as an end user I would go to the how to folder to see how to do this. Whereas the current organization is that how to is just for big scripts to run locally as notebooks and convert. I would tend to just add. I don't have a solution, but I can say as a reader of documentation, doing it @h-mayorquin 's way of adding this to the how to makes sense for me being able to find it even though it clashes with the developer work flow.

@alejoe91
Copy link
Member

I'll be honest this is my confusion in general. how_to to me means general guides to various things. So for example how to convert matlab to python. It doesn't require a notebook, but as an end user I would go to the how to folder to see how to do this. Whereas the current organization is that how to is just for big scripts to run locally as notebooks and convert. I would tend to just add. I don't have a solution, but I can say as a reader of documentation, doing it @h-mayorquin 's way of adding this to the how to makes sense for me being able to find it even though it clashes with the developer work flow.

I agree. It's not a rule that all how-tos should be a python script first. Like in this case, there is no point whatsoever to make a python script

@h-mayorquin
Copy link
Collaborator Author

I'll be honest this is my confusion in general. how_to to me means general guides to various things. So for example how to convert matlab to python. It doesn't require a notebook, but as an end user I would go to the how to folder to see how to do this. Whereas the current organization is that how to is just for big scripts to run locally as notebooks and convert. I would tend to just add. I don't have a solution, but I can say as a reader of documentation, doing it @h-mayorquin 's way of adding this to the how to makes sense for me being able to find it even though it clashes with the developer work flow.

I was not aware that the "how to" was mean to be that. Sam just explained that to me in a call.

@zm711
Copy link
Collaborator

zm711 commented Sep 21, 2023

Well, wherever this ends up in the toctree it looks good to me!

@alejoe91
Copy link
Member

Same for me! @samuelgarcia wanna take a final look?

@samuelgarcia samuelgarcia merged commit ceaebfa into SpikeInterface:main Sep 27, 2023
8 checks passed
@samuelgarcia samuelgarcia deleted the add_matlab_documentation branch September 27, 2023 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants