You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TLDR; use parquet instead of CSV. At a minimum, compress bcerror CSVs with gzip before adding to the repo.
Parquet files are much more disk efficient, faster to parse, etc. Not high priority but would be useful to incorporate into remora pipelines where we just want per-base stats.
Might be as simple as:
library(readr)
library(nanoparquet)
write_parquet(read_csv("file.csv"), "file.parquet").# then inspect to
file.info("file.csv")
file.info("file.parquet")
# reload file in subsequent analyses
read_parquet("file.parquet")
Could also combine multiple CSVs together into one parquet with a column for sample name.
The text was updated successfully, but these errors were encountered:
TLDR; use parquet instead of CSV. At a minimum, compress bcerror CSVs with gzip before adding to the repo.
Parquet files are much more disk efficient, faster to parse, etc. Not high priority but would be useful to incorporate into remora pipelines where we just want per-base stats.
Might be as simple as:
Could also combine multiple CSVs together into one parquet with a column for sample name.
The text was updated successfully, but these errors were encountered: