pl-abs

pl-abs is a blazingly-fast and correct ChRIS ds-type plugin which calculates the absolute value of each number in each data file of an input directory, writing outputs to an output directory.

Usage

Run pl-abs with a directory containing containing .txt and .csv file inputs, and a separate directory for outputs:

apptainer exec docker://fnndsc/pl-abs:latest abs --input-files .txt,.csv incoming/ outgoing/

To write outputs in-place to the same directory, use --output-suffix to avoid clobbering files.

apptainer exec docker://fnndsc/pl-abs:latest abs --input-files .txt --output-suffix abs.txt data/ data/

On ChRIS, it cn be useful to copy unmodified files to the output directory as well:

apptainer exec docker://fnndsc/pl-abs:latest abs --copy --output-suffix abs.txt data/ data/

Input Examples

Let incoming/ be a directory containing input files containing numerical data, e.g.

-3
-4
-5.6
7.896
2.3E-4
-5.5E6

The values separator can be anything, and non-numerical data is ignored. e.g. a CSV:

food,price_2019,price_2020,change,tasty
apple,1.1,1.2,0.1,false
cereal,2.0,1.5,-.5,false
peanut butter,3.0,5.0,2.0,true

See examples/incoming for examples.

Correctness?

pl-abs does not deserialize numbers. To be more true about its functionality, pl-abs removes any negative sign found in front of anything it thinks is the start of a number, specifically any character from the set 1234567890..

This implementation means pl-abs guarantees numerical stability, whereas typical programmatic implementations of the "absolute value" function can cause a loss of floating point precision.

Some readings on floating point math and numerical stability:

Benchmarks

I know, Rust developers are annoying. pl-abs is a very simple program, I wrote it in Rust so that I can personally explore the inefficiencies of Python for data processing.

pl-abs written in Rust is ~5 times faster on a single thread for realistic workloads* than other programs with or without multiprocessing, including an equivalent Python implementation and vertstats_math from CIVET.

*Performance, of course, is going to vary with how large your data files are and how many there are. If your input files are 5 lines long, the startup cost of Python can make it 20 times slower. On the other hand, for a single input file 10,000,000 lines long:

pl-abs in Rust takes 0.7 seconds (max RSS=2456KB)
abs.py in Python takes 9.8 seconds (max RSS=15680KB)
vertstats_math from CIVET takes 2.2 seconds (max RSS=199936KB)

Setup Benchmarks

cargo build --release

pip install chris-plugin==0.2.0a1 numpy

python stress_test/create_data.py

Run Benchmarks

hyperfine -c 'rm -rf /tmp/outgoing' \
    'target/release/abs stress_test/incoming /tmp/outgoing' \
    'env NUM_THREADS=1 python abs.py stress_test/incoming /tmp/outgoing' \
    'env NUM_THREADS=4 python abs.py stress_test/incoming /tmp/outgoing' \
    'find stress_test/incoming -type f -name "*.txt"  | parallel -j1 "mkdir -p /tmp/outgoing/{}; vertstats_math -old_style_file -abs {} /tmp/outgoing/{}"' \
    'find stress_test/incoming -type f -name "*.txt"  | parallel -j4 "mkdir -p /tmp/outgoing/{}; vertstats_math -old_style_file -abs {} /tmp/outgoing/{}"'

TODO: benchmark v.s. Numpy, Codon, PyPy, Julia, ...

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
examples		examples
src		src
stress_test		stress_test
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
abs.py		abs.py
chris_plugin_info.json		chris_plugin_info.json
docker-entrypoint.sh		docker-entrypoint.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pl-abs

Usage

Input Examples

Correctness?

Benchmarks

Setup Benchmarks

Run Benchmarks

About

Releases 2

Packages

Languages

License

FNNDSC/pl-abs

Folders and files

Latest commit

History

Repository files navigation

pl-abs

Usage

Input Examples

Correctness?

Benchmarks

Setup Benchmarks

Run Benchmarks

About

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages