BiocamIO: support sparse event based recordings #1446

mahlzahn · 2024-03-27T14:53:29Z

This PR includes some code extracted from the 3Brain documentation and added with kind permission by 3Brain.

I wrote:

I am contributing to the inclusion of reading recordings from 3brain
with python-neo, probeinterface and spikeinterface. May I ask you the
permission to

include the example code in python-neo under the BSD 3-Clause License
https://github.com/NeuralEnsemble/python-neo

upload the latest file documentation to
https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/biocam

and got the answer:

Regarding your request for permission, yes, please, you can upload your files.

Code extracted from 3Brain documentation https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/biocam/documentation_brw_4.x_bxr_3.x_bcmp_1.x_in_brainwave_5.x_v1.1.3.pdf with kind permission by Inna Forsiuk <[email protected]>

zm711

Thanks @mahlzahn. Just a few questions/comments before Alessio does the final read through.

zm711 · 2024-04-04T21:11:35Z

neo/rawio/biocamrawio.py

+
+
+def readHDF5t_brw4_sparse(rf, t0, t1, nch):
+    useSyntheticNoise = True


I'm not quite sure what the benefit of this is? If it is always true then why do you need the if statement. You're not actually checking anything...

the use_synthetic_noise (default False) should be exposed as an argument

3Brain's code example is meant as a simple demo script, not to be used as an API, which is why a lot is done explicitely like this. I'm not sure how far up the chain this argument would need to go, I haven't personnaly used a reader that had such options before.

From a neo perspective especially for something like this we would want the user to have a say (although like Alessio said a default of False should be used because Neo is not really meant to create data our goal is really just to supply the underlying data).

zm711 · 2024-04-04T21:12:35Z

neo/rawio/biocamrawio.py

+    data = np.zeros((nch, numFrames), dtype=np.int16)
+    # fill the data collection with Gaussian noise if requested
+    if useSyntheticNoise:
+        generateSyntheticNoise(rf, data, wellID, startFrame, numFrames, stdDev=noiseStdDev)


I don't know what others think, but I'm personally not a huge fan of mutating the variable in place (even though I do it sometimes myself). I think it is harder to bug.

I agree, especially with a name as generic as data floating around in the namespace.
From my own fork (for reference) :

if synth: arr = GenerateSyntheticNoise_arr(rf, wellID, t0, t1, numFrames, nch) else: arr = np.zeros((numFrames, nch))

I agree that I like something like this better (although see note for zeros below as well).

zm711 · 2024-04-04T21:32:14Z

neo/rawio/biocamrawio.py

+    # obtain the noise info at the start position
+    noiseChIdxs = rf[wellID]["NoiseChIdxs"][noiseStartPosition:noiseEndPosition]
+    noiseMean = rf[wellID]["NoiseMean"][noiseStartPosition:noiseEndPosition]
+    if stdDev is None:


I don't see this being exposed? If the user will never be allowed to do this why do we need this check?

Also curious about this, I can't find it in the original 3Brain docs.

zm711 · 2024-04-04T21:32:31Z

neo/rawio/biocamrawio.py

+            pos += (toExclusive - fromInclusive) * 2
+
+
+def generateSyntheticNoise(rf, data, wellID, startFrame, numFrames, stdDev=None):


This seems like a private function?

For the sake of readability on my own, I declared both this and the main parsing function at the base level of the file then wrote a bare minimum function like the others before :

def readHDF5t_brw4_event_based(rf, t0, t1, nch): for key in rf: if key[:5] == "Well_": return DecodeEventBasedRawData_arr(rf, key, t0, t1, nch)

zm711 · 2024-04-04T21:33:39Z

neo/rawio/biocamrawio.py

+    return data.T
+
+
+def decodeEventBasedRawData(rf, data, wellID, startFrame, numFrames):


Also private right? Or is this one public? I would argue that for public functions we should aim for PEP8 compliance (snake_case and not camelCase).

zm711 · 2024-04-04T21:36:35Z

neo/rawio/biocamrawio.py

+                wellID = key
                num_channels = len(rf[key]["StoredChIdxs"])
-                if len(rf[key]["Raw"]) % num_channels:
-                    raise RuntimeError(f"Length of raw data array is not multiple of channel number in {key}")
-                num_frames = len(rf[key]["Raw"]) // num_channels
+                if "Raw" in rf[key]:
+                    if len(rf[key]["Raw"]) % num_channels:
+                        raise RuntimeError(f"Length of raw data array is not multiple of channel number in {key}")
+                    if num_frames != len(rf[key]["Raw"]) // num_channels:
+                        raise RuntimeError(f"Estimated number of frames from TOC does not match length of raw data array in {key}")
                break
-        try:
-            num_channels_x = num_channels_y = int(np.sqrt(num_channels))
-        except NameError:
+        if not wellID:


In most of this code you just use key why do you need to set wellID=key? To be honest if wellID is the better name I would prefer just to say for wellID in rf and then just use wellID wherever key was appearing.

zm711 · 2024-04-04T21:38:40Z

neo/rawio/biocamrawio.py

+    # calculate the median mean and standard deviation of all channels to be used for
+    # invalid channels
+    dataMean = np.median(meanCollection)
+    dataStdDev = np.median(stdDevCollection)


This is a bit hard to parse (for the future-- the median of the mean) why not median_data_mean so that in the future we don't do the double take? I'm assuming that's just how the code you're porting was written. So this is less important. Just made me do a quick doubletake.

alejoe91

Thansk @mahlzahn

See comments!

alejoe91 · 2024-04-05T14:34:03Z

neo/rawio/biocamrawio.py

+        if "Raw" in rf[wellID]:
+            read_function = readHDF5t_brw4
+        elif "EventsBasedSparseRaw" in rf[wellID]:
+            read_function = readHDF5t_brw4_sparse


We should probably add a warning here to warn that this is not good practice to fill in the gaps with random synthetic noise.

Couldn't non-filled data interfere with spike detection algorithms based on peak detection ? Since 3Brain data is stored unsigned, filling the gaps with zeros means they will be read as -2048 µV instead of being at the baseline.

I think the best solution would be to fill with a value that when scaled returns 0 volts/ microvolts. So I think this is a good point that we should keep in mind.

I agree that default should return something scaled to 0 volts. But I suggest to change it from the current value of 2048 to something like the mean of the known reads. And in particular, value 2048 fails for channels used as trackers (usually channel 1-1 for calibration events) because this one is only zeroes and sometimes 4095 when there is a an event.

Yeah I don't know the channels well since I don't use this. This is the issue of trying to decide to fill gaps. It sounds reasonable that this needs to be addressed on a stream basis.

I'm not exactly sure how the 3Brain compression algorithm handles the calibration channel or if it's treated differently, but in the current configuration and with default parameters channel 1-1 reads as something like [2048],[2048],[2048],[2048],[4095],[4095],[4095],[2048],[2048],[2048],[2048], and once signed as [ 0],[ 0],[ 0],[ 0],[2047],[2047],[2047],[ 0],[ 0],[ 0],[ 0],. I don't know if any pipelines retain that channel downstream for spike sorting, but even if they do it's scaled to the rest of the channels.

Seeing as the 12 bits signed conversion effectively applies a flat -2048 to everything, I think having something recording-independent might be more reliable.

alejoe91 · 2024-04-05T14:35:02Z

neo/rawio/biocamrawio.py

+def decodeEventBasedRawData(rf, data, wellID, startFrame, numFrames):
+    # Source: Documentation by 3Brain
+    # https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/biocam/documentation_brw_4.x_bxr_3.x_bcmp_1.x_in_brainwave_5.x_v1.1.3.pdf
+    # collect the TOCs


@mahlzahn

Although this is in the 3Brain docs, we think that the coould really use a more pythonic re-write! Would you have time to give it a go?

@b-grimaud any interest in helping re-write this portion in a pythonic way?

Are you referring to the the function as a whole ?

I'm pretty convinced there's a more readable and efficient way to parse bytes than one big while loop but the way the data is structured makes it tricky.

I gave it a try last week but couldn't get anything functional out of it yet, I'll try again this week.

alejoe91 · 2024-04-05T14:37:23Z

neo/rawio/biocamrawio.py

+
+
+def readHDF5t_brw4_sparse(rf, t0, t1, nch):
+    useSyntheticNoise = True


the use_synthetic_noise (default False) should be exposed as an argument

zm711 · 2024-05-29T09:13:57Z

@mahlzahn, just wanted to ping you to see if you have the bandwidth to work on our comments?

b-grimaud · 2024-06-10T17:32:22Z

Since I can't directly contribute to the PR, I forked it and make some updates here. Opening a separate PR might be a bit messy ?

The main improvement is exposing a use_synthetic_noise argument, as well as true_zeroes. The latter, when False, makes sure that zeroes are actually read as zeroes once the recording is converted from unsigned to signed.

Regarding some of the other comments, I'm still not sure what if stdDev is None: (here) is meant for.

Also, if we indexed the data as [j - startFrame, chIdx] instead of [chIdx][j - startFrame] and intialized the array as np.zeros((numFrames, nch)), I don't think we would need to transpose when returning the filled array ? Not sure how much overhead it actually adds, the bit-by-bit parsing is slow enough as is.

zm711 · 2024-07-02T18:57:48Z

I'm honestly neutral on keeping with this PR or opening a new one. If @mahlzahn doesn't have the bandwidth then a new PR that will be worked on is an okay number. Do you want to comment on if you'll have time to work on this soon @mahlzahn or whether @b-grimaud should take over?

mahlzahn · 2024-07-02T19:12:48Z

Hi all, sorry for my silence. I indeed didn’t have much time recently. I now added @b-grimaud as collaborator to my python-neo repository. Maybe you want to push your work to that branch?

zm711 · 2024-07-03T12:03:08Z

Thanks all, whatever works the best for you two! Life gets busy we totally understand! We will leave this on the top of the review pile for now and if you both get too busy then we can retag for future review.

b-grimaud · 2024-07-05T15:16:11Z

Thanks for the update ! It looks like my initial commit was retroactively included in the thread already.

hornauerp · 2024-10-10T10:29:14Z

I just stumbled across this PR, as I wanted to run Kilosort on data from collaborators recorded on 6 well 3Brain plates. I have never worked with their chips, so I am not familiar with the file structure or the relevant neo io functions.

Not sure if anyone is still working on this, but using the code from this PR the read_biocam function works just fine and displays the correct number of channels and duration, but when I try to actually retrieve the traces, I get an out of bounds error: IndexError: index 1040 is out of bounds for axis 0 with size 1024.

Also, in general, I was a bit confused why reading the file only gives one recording, although all six wells were recorded? After a quick look at the code it seems to iterate over all wellIDs but then only stores information for one. Am I missing something or should the wellID be passed as an argument?

b-grimaud · 2024-10-10T10:37:16Z

@hornauerp I think that so far everything was built with single-well MEAs in mind. Since they added 6-wells (and now 96-wells, I believe) MEAs, 3Brain changed the inner structure of their file format so that recordings on standard chips are compatible. That's why the read function is simply iterating over wellID.

I don't know if there is a standard for multi-MEA recordings. It would probably help to deposit an exemple of 6-well data on GIN so that the read function can be properly tested.

zm711 · 2024-10-10T11:48:50Z

Absolutely agree with @b-grimaud here. the only way we can adapt is with test data. So if you'd like to get involved you could make a PR with the fix and provide data or you could share a small test file (we need < 10MB for bandwidth reasons) that we can put into our CI.

That being said our main reviewer for biocam is out of the office at the moment, so our work on biocam will be a bit slower unless you or @b-grimaud wants to give it a stab at update the IO/RawIO.

zm711 · 2024-12-13T15:30:48Z

I've switched this to 0.15.0 for now. If no one has time for these fixes, I can move the milestone to future. Just let us know :)

b-grimaud · 2024-12-17T16:50:37Z

@zm711 I think it depends on the scope of the PR. I'll need to check but as far as I know compressed data from single well MEAs can be read with the code as is. I think some people had comments on it but I'm not quite sure what needs to be changed.

As far as the multiwell issue goes, this would affect both compressed and raw data (unless BrainWave doesn't allow raw recordings for multiwell plates) and would probably fit in another PR.

add reading of sparse event based recordings

cc6ae8d

Code extracted from 3Brain documentation https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/src/master/biocam/documentation_brw_4.x_bxr_3.x_bcmp_1.x_in_brainwave_5.x_v1.1.3.pdf with kind permission by Inna Forsiuk <[email protected]>

mahlzahn changed the title ~~Add reading of sparse event based recordings from 3Brain (Brainwave5) in brw 4.x files~~ BiocamIO: support sparse event based recordings Mar 27, 2024

mahlzahn mentioned this pull request Mar 27, 2024

Request : Add compatibility with next gen BRW files #1156

Open

zm711 reviewed Apr 4, 2024

View reviewed changes

alejoe91 added this to the 0.13.1 milestone Apr 5, 2024

alejoe91 requested changes Apr 5, 2024

View reviewed changes

zm711 modified the milestones: 0.13.1, 0.14.0 May 3, 2024

zm711 linked an issue May 17, 2024 that may be closed by this pull request

Request : Add compatibility with next gen BRW files #1156

Open

Expose use_synthetic_noise as an argument

1d84c9f

Fix warning

97cb693

zm711 modified the milestones: 0.14.0, 0.15.0 Dec 13, 2024



		def readHDF5t_brw4_sparse(rf, t0, t1, nch):
		useSyntheticNoise = True

		pos += (toExclusive - fromInclusive) * 2


		def generateSyntheticNoise(rf, data, wellID, startFrame, numFrames, stdDev=None):

		return data.T


		def decodeEventBasedRawData(rf, data, wellID, startFrame, numFrames):

BiocamIO: support sparse event based recordings #1446

Are you sure you want to change the base?

BiocamIO: support sparse event based recordings #1446

Conversation

mahlzahn commented Mar 27, 2024

zm711 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alejoe91 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zm711 commented May 29, 2024

b-grimaud commented Jun 10, 2024

zm711 commented Jul 2, 2024

mahlzahn commented Jul 2, 2024

zm711 commented Jul 3, 2024

b-grimaud commented Jul 5, 2024

hornauerp commented Oct 10, 2024

b-grimaud commented Oct 10, 2024

zm711 commented Oct 10, 2024

zm711 commented Dec 13, 2024

b-grimaud commented Dec 17, 2024