relative paths #3148

rat-h · 2024-07-04T19:51:01Z

We currently use spikeinterface in a pipe, when for each new recording the script

creates a new directory,
does preprocessing saved into (recording directory)/preproc,
runs spikesorter of choice,
saves sorting into directory (recording directory)/sorting-saved and
runs wave extractor(s) which saves results into (recording directory)/waves.

I can then use the waves folder to see the results.

However, we save all data and processing in a central storage. After copying the results there and downloading them onto another computer, nothing works. The culprits are absolute paths in JSON files.

Would it be possible to use relative paths that will be agnostic of the absolute location of the result folders?

The text was updated successfully, but these errors were encountered:

h-mayorquin · 2024-07-04T22:12:19Z

Yes, we have that functionality. But, how are you saving your data? recording.save()?

zm711 · 2024-07-05T16:57:19Z

I think the waveform extractor has a parameter that must be explicit set see here. If you haven't been setting this to true (it's default is false) then it has been using absolute paths.

rat-h · 2024-07-08T15:34:54Z

Yes, we have that functionality. But, how are you saving your data? recording.save()?

Yes, but only preprocessed data - after filtering and referencing

I think the waveform extractor has a parameter that must be explicit set see here.

That is awesome! Do we have something like that for sorting.save() ? It seems there are a few files which have absolute paths.

$ find sorting-saved  -name '*.json' -exec grep $HOME {} + 
sorting-saved/provenance.json:        "file_path": "/home/spikeinterface/20230508T195033Z-continuous-hdsort-local-T0.45-20240602-155052-20230508T195033Z-HDsortLx36x21x4-3031626-final-final-84/sorting-workingdir/sorter_output/hdsort_output/hdsort_output_results.mat",
sorting-saved/provenance.json:                    "folder_path": "/home/spikeinterface/20230508T195033Z-continuous-hdsort-local-T0.45-20240602-155052-20230508T195033Z-HDsortLx36x21x4-3031626-final-final-84/preprocessed"
sorting-saved/si_folder.json:                    "folder_path": "/home/spikeinterface/20230508T195033Z-continuous-hdsort-local-T0.45-20240602-155052-20230508T195033Z-HDsortLx36x21x4-3031626-final-final-84/preprocessed"

zm711 · 2024-07-08T15:54:02Z

I don't think we have that for sorting.save yet. The plan was to introduce relative options in more places, but I think it is still on the to-do list for some of the other core functionality unless @h-mayorquin has something else in mind.

h-mayorquin · 2024-07-08T16:21:23Z

We do have sorting.save() (it is just hidden by the hook pattern of defining things on base).

I think the provenance should be relative to the parent folder by default. Let me check that.

h-mayorquin · 2024-07-08T16:30:56Z

OK, the provenance is not relative to for some reason, I will fix that. But it is very strange that one of your paths in si_folder.json is not relative.

Can you share a tree in your folder and more about your preprocessing?

rat-h · 2024-07-08T16:54:32Z

Can you share a tree in your folder and more about your preprocessing?

$tree sorting-saved 
sorting-saved
├── numpysorting_info.json
├── properties
│   ├── template_frames_cut_before.npy
│   └── template.npy
├── provenance.json
├── si_folder.json
└── spikes.npy

It is a quite long script which automates the whole process. I think there is a good deal of confusing stuff in it. So, in a nutshell:

recording = si.BinaryRecordingExtractor(
            last['recording']['binfile'],last['recording']['sampling rate'],
            'int16', num_channels=last['recording']['number of channels'])
prob = read_probeinterface(last['recording']['probe']).probes[0]
recording.set_probe(prob,in_place=True)
preproc = [recording]

# -- prerpocessing
def resolvepreproc(cmd:str,rec,config:(dict,None)=None):
    if   cmd == 'centering':
        return si.center(rec)\
            if config is None else\
               si.center(rec,**config)
    elif cmd == 'highpass or band filtering':
        return si.filter(rec)\
            if config is None else\
               si.filter(rec,**config)
    elif cmd == 'referencing':
        return si.common_reference(rec)\
            if config is None else\
               si.common_reference(rec,**config)
    elif cmd == 'whitening':
        return si.whiten(rec)
    elif cmd == 'zscore':
        return si.zscore(rec)\
            if config is None else\
               si.zscore(rec,**config)
    else:
        logger.error(f'Unnknown perprocessing option{cmd}')
        raise RuntimeError(f'Unnknown perprocessing option{cmd}')
for ppm in last['preprocessing']['methods']:
        logger.info(f"PREPROC: {ppm}")
        config = last['preprocessing'][ppm] if ppm in last['preprocessing'] else None
        preproc.append( resolvepreproc(ppm,preproc[-1],config) )
preproc[-1].annotate(is_filtered=True)
#>> Saves preprocessed recording
preproc_saved = preproc[-1].save(
        folder=last['running directory']+"/preprocessed", 
        chunk_duration='1m',**job_kwargs)

# -- Sorting
srdir = last['running directory']+"/sorting-workingdir"
sorting = si.run_sorter(
                sorter_name=last['sorter']['name'],
                recording=preproc_saved, 
                output_folder=srdir,
                **last['sorter'][last['sorter']['name']] )
#>> Saves sorting
sorting_saved = sorting.save(folder=last['running directory']+"/sorting-saved",
                 chunk_duration='1m',**job_kwargs)
os.system(f'rm -fR {srdir}')

# -- Waveforms (optional can be processed on the client side)
we = si.extract_waveforms(
            preproc_saved, sorting_saved, last['running directory']+"/waveforms",
	    use_relative_path=opts.relpaths, # add this this norning :)
            **last['waveexctractor']
        )

Just in case ...

>>> import spikeinterface.full as si
>>> si.__version__
'0.100.8'

h-mayorquin · 2024-07-08T20:31:43Z

#3165 should have no relative paths. If your paths are real and they are an argument that you passed to the __init__ of an interface they should be written relative to the folder where the saved data goes so the latter can be moved around.

If you can give it a try that would be great.

rat-h · 2024-07-08T22:31:28Z

@h-mayorquin sorting was saved, but WaveExtractor failed

cat run.stderr
Traceback (most recent call last):
  File "/home/spikeinterface/./runthepipe2.py", line 609, in <module>
    sorting_saved = si.load_extractor(last['running directory']+"/sorting-saved")
  File "/home/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/base.py", line 1166, in load_extractor
    return BaseExtractor.load(file_or_folder_or_dict, base_folder=base_folder)
  File "/home/.local/apps/spikes/lib/python3.10/site-packages/spikeinterface/core/base.py", line 788, in load
    raise ValueError("spikeinterface.Base.load() file_path must be an existing folder or file")
ValueError: spikeinterface.Base.load() file_path must be an existing folder or file

h-mayorquin · 2024-07-09T14:44:36Z

That needs a better error message #3170 but is basically claiming that the folder does not exists. Can you double check that?

rat-h · 2024-07-09T16:35:24Z

$ tree HDsort-17-continuous 
HDsort-17-continuous
├── jgui_state.json
├── preprocessed
│   ├── binary.json
│   ├── probe.json
│   ├── properties
│   │   ├── contact_vector.npy
│   │   ├── group.npy
│   │   └── location.npy
│   ├── provenance.json
│   ├── si_folder.json
│   └── traces_cached_seg0.raw
├── run.log
├── run.stderr
├── run.stdout
├── sorting-saved
│   ├── numpysorting_info.json
│   ├── properties
│   │   ├── template_frames_cut_before.npy
│   │   └── template.npy
│   ├── provenance.json
│   ├── si_folder.json
│   └── spikes.npy
└── spikeinterface_sorter_log.json

4 directories, 19 files

It does exist.

h-mayorquin · 2024-07-09T16:43:32Z

I was asking more if the file_path that you are passing to load_extractor in your trace is indeed correct. The function only ends up in the error when the file_path that you pass is not a file or a directory:

spikeinterface/src/spikeinterface/core/base.py

Lines 742 to 801 in bd9cd1f

    
               def load(file_path: Union[str, Path], base_folder: Optional[Union[Path, str, bool]] = None) -> "BaseExtractor": 
        
                   """ 
        
                   Load extractor from file path (.json or .pkl) 
        
                   Used both after: 
        
                     * dump(...) json or pickle file 
        
                     * save (...)  a folder which contain data  + json (or pickle) + metadata. 
        
                   """ 
        
                   file_path = Path(file_path) 
        
                   if base_folder is True: 
        
                       base_folder = file_path.parent 
        
                   if file_path.is_file(): 
        
                       # standard case based on a file (json or pickle) 
        
                       if str(file_path).endswith(".json"): 
        
                           with open(file_path, "r") as f: 
        
                               d = json.load(f) 
        
                       elif str(file_path).endswith(".pkl") or str(file_path).endswith(".pickle"): 
        
                           with open(file_path, "rb") as f: 
        
                               d = pickle.load(f) 
        
                       else: 
        
                           raise ValueError(f"Impossible to load {file_path}") 
        
                       if "warning" in d: 
        
                           print("The extractor was not serializable to file") 
        
                           return None 
        
                       extractor = BaseExtractor.from_dict(d, base_folder=base_folder) 
        
                       return extractor 
        
                   elif file_path.is_dir(): 
        
                       # case from a folder after a calling extractor.save(...) 
        
                       folder = file_path 
        
                       file = None 
        
                       if folder.suffix == ".zarr": 
        
                           from .zarrextractors import read_zarr 
        
                           extractor = read_zarr(folder) 
        
                       else: 
        
                           # the is spikeinterface<=0.94.0 
        
                           # a folder came with 'cached.json' 
        
                           for dump_ext in ("json", "pkl", "pickle"): 
        
                               f = folder / f"cached.{dump_ext}" 
        
                               if f.is_file(): 
        
                                   file = f 
        
                           # spikeinterface>=0.95.0 
        
                           f = folder / f"si_folder.json" 
        
                           if f.is_file(): 
        
                               file = f 
        
                           if file is None: 
        
                               raise ValueError(f"This folder is not a cached folder {file_path}") 
        
                           extractor = BaseExtractor.load(file, base_folder=folder) 
        
                       return extractor 
        
                   else: 
        
                       raise ValueError("spikeinterface.Base.load() file_path must be an existing folder or file")

rat-h · 2024-07-09T17:42:01Z

To be sure that the problem in my script, I used a simple example

#! /usr/bin/env python3
import os, sys, logging, shutil
import json
import psutil
from numpy import *
import spikeinterface.full as si
from probeinterface import read_probeinterface

ncpus = os.cpu_count() - 1
job_kwargs = {
        "n_jobs"      : ncpus,
        # "total_memory": f"{int(psutil.virtual_memory()[1]*usemem/ntasts)//1024//1024//1024:d}G",
        #DB>>
        "total_memory": f"{int(psutil.virtual_memory()[1]*0.75/ncpus)//1024//1024//1024:d}G",
        #<<DB
        "progress_bar": True
    }

# recording = si.BinaryRecordingExtractor("continuous.dat",30000.0,'int16', num_channels=128)
# prob = read_probeinterface("probes/A4x32-Poly2-5mm-23s-200-177-after-mapping.json").probes[0]
# recording.set_probe(prob,in_place=True)
# recording = recording.remove_channels([17])
# preproc = [recording]
# preproc.append( si.filter(preproc[-1], btype="bandpass",band=[72,5470]) )
# preproc.append( si.common_reference(preproc[-1], reference="local", operator="median", groups=None, ref_channel_ids=[],local_radius=[151,282]) )
# preproc_saved = preproc[-1].save(folder="test-simple-pipeline/preprocessed", chunk_duration='1m',**job_kwargs)

# srdir = "test-simple-pipeline/sorting-workingdir"
# sorting = si.run_sorter(
        # sorter_name='tridesclous2',
        # recording=preproc_saved, 
        # output_folder=srdir,
# )

# sorting_saved = sorting.save(folder="test-simple-pipeline/sorting-saved", chunk_duration='1m',**job_kwargs)
# os.system(f'rm -fR {srdir}')

#DB>>
preproc_saved = si.load_extractor("test-simple-pipeline/preprocessed")
sorting_saved = si.load_extractor("test-simple-pipeline/sorting-saved")
#<<DB

we = si.extract_waveforms(
    preproc_saved, sorting_saved, "test-simple-pipeline/waveforms",
    use_relative_path=True,
    mode="folder",
    precompute_template= ["average"],
    ms_before = 1.5,
    ms_after  = 2.5,
    max_spikes_per_unit = 500,
    method = "radius", 
    radius_um = 40,
    num_spikes_for_sparsity = 50,
    sparse = True,
    return_scaled = True,
    **job_kwargs
)

and there are still absolute paths...

 find . -name '*.json' -exec grep '/home/' {} + 
./preprocessed/provenance.json:                                    "/home/spikeinterface/continuous.dat"
./sorting-saved/provenance.json:        "folder_path": "/home/spikeinterface/test-simple-pipeline/sorting-workingdir/sorter_output/sorting"
./sorting-saved/provenance.json:                    "folder_path": "/home/spikeinterface/test-simple-pipeline/preprocessed"
./waveforms/sorting/provenance.json:        "folder_path": "/home/spikeinterface/test-simple-pipeline/sorting-saved"
./waveforms/sorting/provenance.json:                    "folder_path": "/home/spikeinterface/test-simple-pipeline/preprocessed"

h-mayorquin · 2024-07-09T18:56:06Z

Thanks, just to confirm, you are using this PR #3165, right?

Could you just copy the provenance.json files that you are getting. I will try to reproduce this.

rat-h · 2024-07-09T20:48:17Z

Thanks, just to confirm, you are using this PR #3165, right?

I did like that

pip install git+https://github.com/SpikeInterface/spikeinterface.git

Did I miss PR #3165? If so, how to install specific PR?

h-mayorquin · 2024-07-09T21:20:07Z

Yes, that one has not been merged. You can either wait till is merged or try:

pip install git+https://github.com/h-mayorquin/spikeinterface.git@provenance_to_relative

rat-h · 2024-07-11T14:44:54Z

Even after update, there are absolute paths there

$ find test-simple-pipeline -name '*.json' -exec grep "/home/" {} + 
test-simple-pipeline/preprocessed/provenance.json:                                    "/home/spikeinterface/continuous.dat"
test-simple-pipeline/sorting-saved/provenance.json:        "folder_path": "/home/spikeinterface/test-simple-pipeline/sorting-workingdir/sorter_output/sorting"
test-simple-pipeline/sorting-saved/provenance.json:                    "folder_path": "/home/spikeinterface/test-simple-pipeline/preprocessed"
test-simple-pipeline/waveforms/sorting/provenance.json:        "folder_path": "/home/spikeinterface/test-simple-pipeline/sorting-saved"
test-simple-pipeline/waveforms/sorting/provenance.json:                    "folder_path": "/home/spikeinterface/test-simple-pipeline/preprocessed"

h-mayorquin · 2024-07-11T15:11:44Z

Thanks for checking. I will check with you script if I can reproduce the issue.

h-mayorquin · 2024-07-13T01:34:22Z

I really don't know what is going on, can you run this on your computer:

This is a script that reproduces your pipeline as much as possible. On it, if I check the provenance for both the sorting and the recording I get no absolute paths even if that's what I pass.

from spikeinterface.core import generate_ground_truth_recording, write_binary_recording, BinaryRecordingExtractor, load_extractor
from spikeinterface.preprocessing import filter, common_reference
from spikeinterface.sorters import run_sorter
from pathlib import Path


num_channels = 32
sampling_frequency = 30_000.0
recording, sorting = generate_ground_truth_recording(num_channels=num_channels, sampling_frequency=sampling_frequency, durations=[10.0])
the_original_probe =recording.get_probe()
an_absolute_file_path = Path("./my_recording.dat").resolve()
print(f"{an_absolute_file_path=}")
file_paths=[an_absolute_file_path]
write_binary_recording(recording=recording, file_paths=file_paths)

dtype = recording.get_dtype()

binary_recording = BinaryRecordingExtractor(file_paths=file_paths, sampling_frequency=sampling_frequency, dtype=dtype, num_channels=num_channels)
binary_recording.set_probe(probe=the_original_probe, in_place=True)

recording_without_chanenls = binary_recording.remove_channels(remove_channel_ids=[17])
filtered_recording = filter(recording=recording_without_chanenls, btype="bandpass", band=[72, 5470])
re_referenced_recording = common_reference(recording=filtered_recording, reference="local", operator="median", groups=None, ref_channel_ids=[],local_radius=[151,282])
path_to_save_recording = Path("./recording_test")
preprocessed_recording_saved = re_referenced_recording.save(folder=path_to_save_recording, overwrite=True)

sorting = run_sorter(
        sorter_name='tridesclous2',
        recording=preprocessed_recording_saved, 
        remove_existing_folder=True,
)

path_to_save_sorting = Path("./sorting_test")
sorting.save(folder=path_to_save_sorting, overwrite=True)

This will be useful because either I am doing something wrong to reproduce your pipeline or your environment is not calling the latest version. Everything here save the output of the sorter should be relative.

rat-h · 2024-07-22T16:58:15Z

sorry this message sank in lots of SI messages

This script works on my side without any problems.
However, when I check paths, I can still see absolute paths in created folders, although it is inside tridesclous2_output folder

$ find . -name '*.json' -exec grep '/home' {} +
./tridesclous2_output/spikeinterface_recording.json:        "folder_path": "/home/spikeinterface/test-relpath/recording_test"

HOWEVER, if I change the sorter to kilosort4

...
sorting = run_sorter(
        sorter_name='kilosort4',
        recording=preprocessed_recording_saved, 
        remove_existing_folder=True,
)
...

Absolute paths appear even in sorting_test directory:

$ find . -name '*.json' -exec grep '/home' {} + 
./kilosort4_output/spikeinterface_recording.json:        "folder_path": "/home/spikeinterface/test-relpath/recording_test"
./sorting_test/provenance.json:        "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",
./sorting_test/si_folder.json:        "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",

Finally, the same problem appears when I ran kilosort4 in a container

sorting = run_sorter(
        sorter_name='kilosort4',
        recording=preprocessed_recording_saved, 
        remove_existing_folder=True,
        singularity_image='../images/kilosort4-base:latest.sif'
)

$ find . -name '*.json' -exec grep '/home' {} + 
./kilosort4_output/spikeinterface_recording.json:        "folder_path": "/home/spikeinterface/test-relpath/recording_test"
./kilosort4_output/in_container_sorting/provenance.json:        "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",
./kilosort4_output/in_container_sorting/si_folder.json:        "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",
./sorting_test/provenance.json:        "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",
./sorting_test/si_folder.json:        "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",

zm711 · 2024-07-22T17:06:51Z

I wonder if the sorters are causing this. Kilosort writes its own phy path as part of running so I wonder if we are just taking in that value. It would make sense for TDC2 as well since that could write its own path. We use the save_sorting function directly from KS4 so that might be part of the problem. KS switched to using absolute paths for phy between KS2 and KS2.5 so since most of the problem seems to be phy_folder that would be my working hypothesis that we are just propagating the phy path that KS4 is generating itself.

h-mayorquin · 2024-07-29T16:54:55Z

Thanks, I will take a look.

h-mayorquin mentioned this issue Jul 8, 2024

Saving provenance with paths relative to folder #3165

Merged

h-mayorquin mentioned this issue Jul 9, 2024

Better error message for BaseExtractor.load() #3170

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

relative paths #3148

relative paths #3148

rat-h commented Jul 4, 2024 •

edited

Loading

h-mayorquin commented Jul 4, 2024

zm711 commented Jul 5, 2024

rat-h commented Jul 8, 2024

zm711 commented Jul 8, 2024

h-mayorquin commented Jul 8, 2024

h-mayorquin commented Jul 8, 2024

rat-h commented Jul 8, 2024 •

edited

Loading

h-mayorquin commented Jul 8, 2024 •

edited

Loading

rat-h commented Jul 8, 2024 •

edited

Loading

h-mayorquin commented Jul 9, 2024

rat-h commented Jul 9, 2024

h-mayorquin commented Jul 9, 2024

rat-h commented Jul 9, 2024

h-mayorquin commented Jul 9, 2024

rat-h commented Jul 9, 2024

h-mayorquin commented Jul 9, 2024

rat-h commented Jul 11, 2024 •

edited

Loading

h-mayorquin commented Jul 11, 2024

h-mayorquin commented Jul 13, 2024

rat-h commented Jul 22, 2024

zm711 commented Jul 22, 2024

h-mayorquin commented Jul 29, 2024

relative paths #3148

relative paths #3148

Comments

rat-h commented Jul 4, 2024 • edited Loading

h-mayorquin commented Jul 4, 2024

zm711 commented Jul 5, 2024

rat-h commented Jul 8, 2024

zm711 commented Jul 8, 2024

h-mayorquin commented Jul 8, 2024

h-mayorquin commented Jul 8, 2024

rat-h commented Jul 8, 2024 • edited Loading

h-mayorquin commented Jul 8, 2024 • edited Loading

rat-h commented Jul 8, 2024 • edited Loading

h-mayorquin commented Jul 9, 2024

rat-h commented Jul 9, 2024

h-mayorquin commented Jul 9, 2024

rat-h commented Jul 9, 2024

h-mayorquin commented Jul 9, 2024

rat-h commented Jul 9, 2024

h-mayorquin commented Jul 9, 2024

rat-h commented Jul 11, 2024 • edited Loading

h-mayorquin commented Jul 11, 2024

h-mayorquin commented Jul 13, 2024

rat-h commented Jul 22, 2024

zm711 commented Jul 22, 2024

h-mayorquin commented Jul 29, 2024

rat-h commented Jul 4, 2024 •

edited

Loading

rat-h commented Jul 8, 2024 •

edited

Loading

h-mayorquin commented Jul 8, 2024 •

edited

Loading

rat-h commented Jul 8, 2024 •

edited

Loading

rat-h commented Jul 11, 2024 •

edited

Loading