-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel processing #111
Comments
Thank you for that request and for transpors! In the meanwhile, did you compile your project in |
Sure I compile and use my app with release build. If you could look into that critical code that would be nice 👍 |
It turns out it’s a bit more trickier than expected. Maybe there could be some gain in optimizing how to parse the datetimes from I used a 200Mb file (https://transitfeeds.com/p/ov/814) on a i7, 6th generation, and the parsing takes about 50 seconds. |
What exactly is tricky? I was hoping the hidden gem is in parallelizing the loops. With nowadays machines it must be a piece of cake to parse ~20MB file (my case). Imagine doing that 15 years back. I use this file. |
The problem is that The CSV library reads each record on its own, so rayon can’t work directly on it. Storing the data in a I agree that it is frustrating such low parsing speeds. The big problem is I have some leads to improve a bit more. I am still a bit surprised that you need more than 10 seconds to parse your file in release mode |
The edit looks good. It you noticed an improvement it's a good news and I wouldn't wait and release a new version ;) |
The improvements are nice, but they aren't free. The ideas currently in the PR have the following side effects:
Another idea discussed in the comments of PR #112 would also yield nice performance improvements, but there the choices are either:
All in all, I think it makes sense to study the effects a bit more before committing to these ideas. In the mean time, you could speed up your own development by temporarily depending on PR #112's branch: in your |
What comes to my mind is to create parallel processing as a "feature" which will solve almost everything from your previous comment. Anyway the branch is a cool compromise. Well done. |
Hello, your issue lead us to quite a journey. Your data seems to be well structured. So you can do the following to get quite a performance boost. - let gtfs = Gtfs::new(&file_path);
+ let gtfs = gtfs_structures::GtfsReader::default()
+ .trim_fields(false)
+ .read(&file_path); |
Hi,
first of all thanks for this crate. I utilized that in my project transpors. One thing that I noticed is loading/parsing GTFS takes a lot of time (~20MB file on my i7 8th gen takes about 30 secs). Thats understandable but I noticed that all the works runs in one thread. I don't know what the core of this crate is but I assume it's a lot of
for
loops. These can be optimized literally in one lines thanks to rayon.I did tne same thing in my application and now the walking thru data takes only ~20% of the time it used to take.
The text was updated successfully, but these errors were encountered: