-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
path to local files in subdir #20
Comments
Can you provide me with some additional details on what you're using for your search directory and file pattern? In trigger.py the piece of code that finds the files does a walk of the subfolders, but maybe there's an incompatibility with what you've told it to look for? |
Thank you for your fast reply. When I copy the same data into one folder. I.e. here the path would be: |
Can you try the first case (mseed files within the subfolders) with the |
The error is: |
Ok, seems that it's complaining that the first item in the list is a directory and it can't read it. In the .cfg file, let's try adding |
Adding this, it just goes like: |
And I take it that putting these in the top directory does find the data correctly? I suppose we should also verify that flist does actually contain the filenames of all the data. |
It finds the files if I remove the inverted commas. |
Ah, yes without the commas. When you moved your files to the top directory, did you move all of them? I'm wondering if there are a lot of files it's trying to read through. I'll readily admit that the way REDPy parses through files on disk is not very efficient. A path we might consider going down instead is setting up a portable FDSN. It's got a bunch of setup associated with it but once it's going it'll probably be the fastest way to query your data, and might be useful outside of REDPy as well. If you'd like to try this, send me an email ([email protected]) and I'll forward you some notes on installing and setting it up from one of my colleagues. |
Ok. I see. Yes there are a lot of files in the original folders. |
Ok, let me know how that goes. I don't usually work with lots of data in files on disk, and tend to favor waveservers and webservices. I've had folks that have their files in directories sorted by date use shell scripts to change the filepath based on what time they are processing to reduce the number of files that REDPy needs to search through. I have some other ideas on better ways to handle it but haven't had a chance to test/implement them. |
Just putting a quick update here that I've been picking at this issue while "cleaning up" the code. In the branch "cleanup" there is new code that creates a file index of all the files in the data search directory that helps it know which of those files to read once, rather than redoing the query every time step. I've also added options to load a few days of that data into memory for faster access. I've tested it with both large mseed volumes (~1 GB each per channel, containing several months of data each) and ~35k individual sac files from that same time span. It probably isn't as optimized as using a local waveserver, but it's orders of magnitude more efficient now. I'll probably close this issue when I pull 'cleanup' into the 'master' branch. I'd love it if you could test the new code on your dataset and let me know how it works, and what I can improve to align with your use case. Were you able to get pyrocko to work? |
When providing the searchdir path to the top directory of my files, it does not find the files (which are three levels of subfolders down) but just provides a list of folder names in 'flist' that cannot be read in. I had to manually copy and rearrange my files in order to have them just one level below the top directory.
The description just says: 'If using local files, define the path to the top directory where they exist, ending in / or \ as appropriate for your operating system. If there are files in any subdirectories within this directory, they will be found.'
The text was updated successfully, but these errors were encountered: