Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade labels on sourmash plot with --labels-from or some such #2452

Closed
ctb opened this issue Jan 31, 2023 · 2 comments · Fixed by #2598
Closed

upgrade labels on sourmash plot with --labels-from or some such #2452

ctb opened this issue Jan 31, 2023 · 2 comments · Fixed by #2598

Comments

@ctb
Copy link
Contributor

ctb commented Jan 31, 2023

Labels on sourmash plot are complicated and ugly ;(.

Random thought: we could add a --labels-to option to sourmash compare that supports a simple labeling output spreadsheet, and then support a --labels-from option with sourmash plot.

@peterjc
Copy link
Contributor

peterjc commented Apr 24, 2023

Good idea.

As a workaround, I've been editing the *.npy.labels.txt file which gets created alongside *.npy and is parsed by sourmash plot with --labels. This can be as simple as a few in-place sed edits to do things like remove .fasta extensions of a common filename prefix.

@ctb
Copy link
Contributor Author

ctb commented Apr 29, 2023

@peterjc check out #2598 - curious what you think. Simple and flexible, or overly complicated? Let me know :)

@ctb ctb closed this as completed in #2598 Feb 6, 2024
ctb added a commit that referenced this issue Feb 6, 2024
…tter customization (#2598)

Adds `sourmash compare --labels-to` and `sourmash plot --labels-from` to
support better label customization.

Fixes #2452
Fixes #2915

## `sourmash compare --labels-to`

This command will generate a 'labels-to' file. Running:
```
sourmash compare tests/test-data/demo/*.sig -o compare-demo \
    --labels-to compare-demo-labels.csv
```
will produce a file that looks like this:

file `compare-demo-labels.csv`:
```csv
order,md5,label,name,filename,signature_file
1,60f7e23c24a8d94791cc7a8680c493f9,SRR2060939_1.fastq.gz,,SRR2060939_1.fastq.gz,../tests/test-data/demo/SRR2060939_1.sig
2,4e94e60265e04f0763142e20b52c0da1,SRR2060939_2.fastq.gz,,SRR2060939_2.fastq.gz,../tests/test-data/demo/SRR2060939_2.sig
3,f71e78178af9e45e6f1d87a0c53c465c,SRR2241509_1.fastq.gz,,SRR2241509_1.fastq.gz,../tests/test-data/demo/SRR2241509_1.sig
4,6d6e87e1154e95b279e5e7db414bc37b,SRR2255622_1.fastq.gz,,SRR2255622_1.fastq.gz,../tests/test-data/demo/SRR2255622_1.sig
5,0107d767a345eff67ecdaed2ee5cd7ba,SRR453566_1.fastq.gz,,SRR453566_1.fastq.gz,../tests/test-data/demo/SRR453566_1.sig
6,f0c834bc306651d2b9321fb21d3e8d8f,SRR453569_1.fastq.gz,,SRR453569_1.fastq.gz,../tests/test-data/demo/SRR453569_1.sig
7,b59473c94ff2889eca5d7165936e64b3,SRR453570_1.fastq.gz,,SRR453570_1.fastq.gz,../tests/test-data/demo/SRR453570_1.sig
```

The `label` column in this file can be edited to suit the user's needs;
the index column is `order`, and all other columns can be ignored or
deleted or updated without consequence.

## `sourmash plot --labels-from`

This command will load labels from a file. Running:
```
sourmash plot --labels-from compare-demo-new-labels.csv compare-demo
```
uses the `label` column from the CSV as labels, in the order specified
by the `order` column (interpreted as integers and sorted from lowest to
highest). All other columns are ignored.

## Example in a Jupyter Notebook

Some example code for updating the labels is available here:


https://github.com/sourmash-bio/sourmash/blob/compare_labels/doc/plotting-compare.ipynb

## TODO

- [x] add test for `args.labeltext and args.labels_from` check
- [x] check the notebook update

## Future:

- [ ] Consider switching to `LinearIndex` in the signature loading code,
as that would let us maintain the location in the code without the
current machinations. Also worth thinking about enabling lazy loading,
which some future `Index`-code based modification might support.
- [ ] consider if and how to validate --labels-from CSV file...

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants