Figure out how to convert bcerror files to parquet #11

jayhesselberth · 2024-06-27T14:03:19Z

TLDR; use parquet instead of CSV. At a minimum, compress bcerror CSVs with gzip before adding to the repo.

Parquet files are much more disk efficient, faster to parse, etc. Not high priority but would be useful to incorporate into remora pipelines where we just want per-base stats.

Might be as simple as:

library(readr)
library(nanoparquet)

write_parquet(read_csv("file.csv"), "file.parquet").

# then inspect to 
file.info("file.csv")
file.info("file.parquet")

# reload file in subsequent analyses
read_parquet("file.parquet")

Could also combine multiple CSVs together into one parquet with a column for sample name.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Figure out how to convert bcerror files to parquet #11

Figure out how to convert bcerror files to parquet #11

jayhesselberth commented Jun 27, 2024

Figure out how to convert bcerror files to parquet #11

Figure out how to convert bcerror files to parquet #11

Comments

jayhesselberth commented Jun 27, 2024