-
Notifications
You must be signed in to change notification settings - Fork 191
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
relative paths #3148
Comments
Yes, we have that functionality. But, how are you saving your data? |
I think the waveform extractor has a parameter that must be explicit set see here. If you haven't been setting this to true (it's default is false) then it has been using absolute paths. |
Yes, but only preprocessed data - after filtering and referencing
That is awesome! Do we have something like that for $ find sorting-saved -name '*.json' -exec grep $HOME {} +
sorting-saved/provenance.json: "file_path": "/home/spikeinterface/20230508T195033Z-continuous-hdsort-local-T0.45-20240602-155052-20230508T195033Z-HDsortLx36x21x4-3031626-final-final-84/sorting-workingdir/sorter_output/hdsort_output/hdsort_output_results.mat",
sorting-saved/provenance.json: "folder_path": "/home/spikeinterface/20230508T195033Z-continuous-hdsort-local-T0.45-20240602-155052-20230508T195033Z-HDsortLx36x21x4-3031626-final-final-84/preprocessed"
sorting-saved/si_folder.json: "folder_path": "/home/spikeinterface/20230508T195033Z-continuous-hdsort-local-T0.45-20240602-155052-20230508T195033Z-HDsortLx36x21x4-3031626-final-final-84/preprocessed"
|
I don't think we have that for |
We do have I think the provenance should be relative to the parent folder by default. Let me check that. |
OK, the provenance is not relative to for some reason, I will fix that. But it is very strange that one of your paths in Can you share a tree in your folder and more about your preprocessing? |
$tree sorting-saved
sorting-saved
├── numpysorting_info.json
├── properties
│ ├── template_frames_cut_before.npy
│ └── template.npy
├── provenance.json
├── si_folder.json
└── spikes.npy It is a quite long script which automates the whole process. I think there is a good deal of confusing stuff in it. So, in a nutshell: recording = si.BinaryRecordingExtractor(
last['recording']['binfile'],last['recording']['sampling rate'],
'int16', num_channels=last['recording']['number of channels'])
prob = read_probeinterface(last['recording']['probe']).probes[0]
recording.set_probe(prob,in_place=True)
preproc = [recording]
# -- prerpocessing
def resolvepreproc(cmd:str,rec,config:(dict,None)=None):
if cmd == 'centering':
return si.center(rec)\
if config is None else\
si.center(rec,**config)
elif cmd == 'highpass or band filtering':
return si.filter(rec)\
if config is None else\
si.filter(rec,**config)
elif cmd == 'referencing':
return si.common_reference(rec)\
if config is None else\
si.common_reference(rec,**config)
elif cmd == 'whitening':
return si.whiten(rec)
elif cmd == 'zscore':
return si.zscore(rec)\
if config is None else\
si.zscore(rec,**config)
else:
logger.error(f'Unnknown perprocessing option{cmd}')
raise RuntimeError(f'Unnknown perprocessing option{cmd}')
for ppm in last['preprocessing']['methods']:
logger.info(f"PREPROC: {ppm}")
config = last['preprocessing'][ppm] if ppm in last['preprocessing'] else None
preproc.append( resolvepreproc(ppm,preproc[-1],config) )
preproc[-1].annotate(is_filtered=True)
#>> Saves preprocessed recording
preproc_saved = preproc[-1].save(
folder=last['running directory']+"/preprocessed",
chunk_duration='1m',**job_kwargs)
# -- Sorting
srdir = last['running directory']+"/sorting-workingdir"
sorting = si.run_sorter(
sorter_name=last['sorter']['name'],
recording=preproc_saved,
output_folder=srdir,
**last['sorter'][last['sorter']['name']] )
#>> Saves sorting
sorting_saved = sorting.save(folder=last['running directory']+"/sorting-saved",
chunk_duration='1m',**job_kwargs)
os.system(f'rm -fR {srdir}')
# -- Waveforms (optional can be processed on the client side)
we = si.extract_waveforms(
preproc_saved, sorting_saved, last['running directory']+"/waveforms",
use_relative_path=opts.relpaths, # add this this norning :)
**last['waveexctractor']
) Just in case ... >>> import spikeinterface.full as si
>>> si.__version__
'0.100.8' |
#3165 should have no relative paths. If your paths are real and they are an argument that you passed to the If you can give it a try that would be great. |
@h-mayorquin sorting was saved, but WaveExtractor failed
|
That needs a better error message #3170 but is basically claiming that the folder does not exists. Can you double check that? |
$ tree HDsort-17-continuous
HDsort-17-continuous
├── jgui_state.json
├── preprocessed
│ ├── binary.json
│ ├── probe.json
│ ├── properties
│ │ ├── contact_vector.npy
│ │ ├── group.npy
│ │ └── location.npy
│ ├── provenance.json
│ ├── si_folder.json
│ └── traces_cached_seg0.raw
├── run.log
├── run.stderr
├── run.stdout
├── sorting-saved
│ ├── numpysorting_info.json
│ ├── properties
│ │ ├── template_frames_cut_before.npy
│ │ └── template.npy
│ ├── provenance.json
│ ├── si_folder.json
│ └── spikes.npy
└── spikeinterface_sorter_log.json
4 directories, 19 files It does exist. |
I was asking more if the file_path that you are passing to load_extractor in your trace is indeed correct. The function only ends up in the error when the file_path that you pass is not a file or a directory: spikeinterface/src/spikeinterface/core/base.py Lines 742 to 801 in bd9cd1f
|
To be sure that the problem in my script, I used a simple example #! /usr/bin/env python3
import os, sys, logging, shutil
import json
import psutil
from numpy import *
import spikeinterface.full as si
from probeinterface import read_probeinterface
ncpus = os.cpu_count() - 1
job_kwargs = {
"n_jobs" : ncpus,
# "total_memory": f"{int(psutil.virtual_memory()[1]*usemem/ntasts)//1024//1024//1024:d}G",
#DB>>
"total_memory": f"{int(psutil.virtual_memory()[1]*0.75/ncpus)//1024//1024//1024:d}G",
#<<DB
"progress_bar": True
}
# recording = si.BinaryRecordingExtractor("continuous.dat",30000.0,'int16', num_channels=128)
# prob = read_probeinterface("probes/A4x32-Poly2-5mm-23s-200-177-after-mapping.json").probes[0]
# recording.set_probe(prob,in_place=True)
# recording = recording.remove_channels([17])
# preproc = [recording]
# preproc.append( si.filter(preproc[-1], btype="bandpass",band=[72,5470]) )
# preproc.append( si.common_reference(preproc[-1], reference="local", operator="median", groups=None, ref_channel_ids=[],local_radius=[151,282]) )
# preproc_saved = preproc[-1].save(folder="test-simple-pipeline/preprocessed", chunk_duration='1m',**job_kwargs)
# srdir = "test-simple-pipeline/sorting-workingdir"
# sorting = si.run_sorter(
# sorter_name='tridesclous2',
# recording=preproc_saved,
# output_folder=srdir,
# )
# sorting_saved = sorting.save(folder="test-simple-pipeline/sorting-saved", chunk_duration='1m',**job_kwargs)
# os.system(f'rm -fR {srdir}')
#DB>>
preproc_saved = si.load_extractor("test-simple-pipeline/preprocessed")
sorting_saved = si.load_extractor("test-simple-pipeline/sorting-saved")
#<<DB
we = si.extract_waveforms(
preproc_saved, sorting_saved, "test-simple-pipeline/waveforms",
use_relative_path=True,
mode="folder",
precompute_template= ["average"],
ms_before = 1.5,
ms_after = 2.5,
max_spikes_per_unit = 500,
method = "radius",
radius_um = 40,
num_spikes_for_sparsity = 50,
sparse = True,
return_scaled = True,
**job_kwargs
) and there are still absolute paths... find . -name '*.json' -exec grep '/home/' {} +
./preprocessed/provenance.json: "/home/spikeinterface/continuous.dat"
./sorting-saved/provenance.json: "folder_path": "/home/spikeinterface/test-simple-pipeline/sorting-workingdir/sorter_output/sorting"
./sorting-saved/provenance.json: "folder_path": "/home/spikeinterface/test-simple-pipeline/preprocessed"
./waveforms/sorting/provenance.json: "folder_path": "/home/spikeinterface/test-simple-pipeline/sorting-saved"
./waveforms/sorting/provenance.json: "folder_path": "/home/spikeinterface/test-simple-pipeline/preprocessed" |
Thanks, just to confirm, you are using this PR #3165, right? Could you just copy the provenance.json files that you are getting. I will try to reproduce this. |
Yes, that one has not been merged. You can either wait till is merged or try: pip install git+https://github.com/h-mayorquin/spikeinterface.git@provenance_to_relative |
Even after update, there are absolute paths there
|
Thanks for checking. I will check with you script if I can reproduce the issue. |
I really don't know what is going on, can you run this on your computer: This is a script that reproduces your pipeline as much as possible. On it, if I check the provenance for both the sorting and the recording I get no absolute paths even if that's what I pass. from spikeinterface.core import generate_ground_truth_recording, write_binary_recording, BinaryRecordingExtractor, load_extractor
from spikeinterface.preprocessing import filter, common_reference
from spikeinterface.sorters import run_sorter
from pathlib import Path
num_channels = 32
sampling_frequency = 30_000.0
recording, sorting = generate_ground_truth_recording(num_channels=num_channels, sampling_frequency=sampling_frequency, durations=[10.0])
the_original_probe =recording.get_probe()
an_absolute_file_path = Path("./my_recording.dat").resolve()
print(f"{an_absolute_file_path=}")
file_paths=[an_absolute_file_path]
write_binary_recording(recording=recording, file_paths=file_paths)
dtype = recording.get_dtype()
binary_recording = BinaryRecordingExtractor(file_paths=file_paths, sampling_frequency=sampling_frequency, dtype=dtype, num_channels=num_channels)
binary_recording.set_probe(probe=the_original_probe, in_place=True)
recording_without_chanenls = binary_recording.remove_channels(remove_channel_ids=[17])
filtered_recording = filter(recording=recording_without_chanenls, btype="bandpass", band=[72, 5470])
re_referenced_recording = common_reference(recording=filtered_recording, reference="local", operator="median", groups=None, ref_channel_ids=[],local_radius=[151,282])
path_to_save_recording = Path("./recording_test")
preprocessed_recording_saved = re_referenced_recording.save(folder=path_to_save_recording, overwrite=True)
sorting = run_sorter(
sorter_name='tridesclous2',
recording=preprocessed_recording_saved,
remove_existing_folder=True,
)
path_to_save_sorting = Path("./sorting_test")
sorting.save(folder=path_to_save_sorting, overwrite=True) This will be useful because either I am doing something wrong to reproduce your pipeline or your environment is not calling the latest version. Everything here save the output of the sorter should be relative. |
sorry this message sank in lots of SI messages This script works on my side without any problems. $ find . -name '*.json' -exec grep '/home' {} +
./tridesclous2_output/spikeinterface_recording.json: "folder_path": "/home/spikeinterface/test-relpath/recording_test" HOWEVER, if I change the sorter to ...
sorting = run_sorter(
sorter_name='kilosort4',
recording=preprocessed_recording_saved,
remove_existing_folder=True,
)
... Absolute paths appear even in $ find . -name '*.json' -exec grep '/home' {} +
./kilosort4_output/spikeinterface_recording.json: "folder_path": "/home/spikeinterface/test-relpath/recording_test"
./sorting_test/provenance.json: "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",
./sorting_test/si_folder.json: "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output", Finally, the same problem appears when I ran sorting = run_sorter(
sorter_name='kilosort4',
recording=preprocessed_recording_saved,
remove_existing_folder=True,
singularity_image='../images/kilosort4-base:latest.sif'
) $ find . -name '*.json' -exec grep '/home' {} +
./kilosort4_output/spikeinterface_recording.json: "folder_path": "/home/spikeinterface/test-relpath/recording_test"
./kilosort4_output/in_container_sorting/provenance.json: "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",
./kilosort4_output/in_container_sorting/si_folder.json: "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",
./sorting_test/provenance.json: "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output",
./sorting_test/si_folder.json: "phy_folder": "/home/spikeinterface/test-relpath/kilosort4_output/sorter_output", |
I wonder if the sorters are causing this. Kilosort writes its own phy path as part of running so I wonder if we are just taking in that value. It would make sense for TDC2 as well since that could write its own path. We use the |
Thanks, I will take a look. |
We currently use spikeinterface in a pipe, when for each new recording the script
(recording directory)/preproc
,(recording directory)/sorting-saved
and(recording directory)/waves
.I can then use the
waves
folder to see the results.However, we save all data and processing in a central storage. After copying the results there and downloading them onto another computer, nothing works. The culprits are absolute paths in JSON files.
Would it be possible to use relative paths that will be agnostic of the absolute location of the result folders?
The text was updated successfully, but these errors were encountered: