Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
CodyCBakerPhD authored Aug 23, 2024
1 parent 05a1d0c commit 21e4770
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ A few summary facts as of 2024:
- A single line of a raw S3 log file can be between 400-1000+ bytes.
- Some of the busiest daily logs on the archive can have around 5,014,386 lines.
- There are more than 6 TB of log files collected in total.
- This parser reduces that total to around 20 GB of final essential information.
- This parser reduces that total to less than 25 GB of final essential information on NWB assets (Zarr size TBD).



Expand Down Expand Up @@ -133,7 +133,7 @@ In the summer of 2024, this process took less than 5 hours to bin all 170 GB of

### Mapping

The next step, which is also the step to re-run and release regularly, is to iterate through all current versions of all Dandisets, mapping the binned logs to their corresponding file paths as seen on the archive.
To map:

```bash
map_binned_s3_logs_to_dandisets \
Expand All @@ -151,7 +151,7 @@ map_binned_s3_logs_to_dandisets \
--object_type blobs
```

In the summer of 2024, this `blobs` process took less than 12 hours to run with one worker (could easily be parallelized in the future) without any activate caches. The caches that accumulate over time help speed up the process over repeated calls; a fresh run with caches only took less than ?? hours.
In the summer of 2024, this `blobs` process took less than 8 hours to run with one worker (and could easily be parallelized in the future) without any caches. The caches that accumulate over time help speed up the process over repeated calls - a run with caches took less than ?? hours.

`zarr` is likely to take longer, but the general process is the same.

Expand Down

0 comments on commit 21e4770

Please sign in to comment.