Skip to content

Commit

Permalink
MRG: misc updates (#1)
Browse files Browse the repository at this point in the history
* add a few fields

* add cargo fmt

* upd

* fix warnings

* add dependabot config

* add an example doc

* update README

* upd

* upd 2

* more README
  • Loading branch information
ctb authored Sep 1, 2024
1 parent ca336b1 commit 3fe4dec
Show file tree
Hide file tree
Showing 7 changed files with 5,214 additions and 26 deletions.
20 changes: 20 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
version: 2
updates:
- package-ecosystem: pip
directory: "/"
schedule:
interval: weekly
open-pull-requests-limit: 10
- package-ecosystem: cargo
directory: "/"
schedule:
interval: weekly
allow:
- dependency-type: "direct"
open-pull-requests-limit: 10
ignore:
- dependency-name: "zip"
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: weekly
61 changes: 61 additions & 0 deletions .github/workflows/lint.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# This file is autogenerated by maturin v1.4.0
# To update, run
#
# maturin generate-ci github
#

name: "lint"
on:
pull_request:
push:
branches: [latest]
jobs:
tests_on_linux:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0

- name: cache conda
uses: actions/cache@v4
env:
CACHE_NUMBER: 1
with:
path: ~/conda_pkgs_dir
key:
${{ runner.os }}-conda-${{ env.CACHE_NUMBER }}-${{ hashFiles('environment.yml') }}

- name: cache rust
uses: Swatinem/rust-cache@v2

- name: setup conda
uses: conda-incubator/setup-miniconda@v3
with:
auto-update-conda: true
python-version: 3.12
channels: conda-forge,bioconda
miniforge-variant: Mambaforge
miniforge-version: latest
use-mamba: true
mamba-version: "*"
activate-environment: sourmash_dev
auto-activate-base: false
# use-only-tar-bz2: true

- run: conda info
- run: conda list
- run: conda config --show

- run: mamba search rust

- name: install dependencies
shell: bash -l {0}
run: mamba install rust==1.75.0

- name: install dependencies 2
shell: bash -l {0}
run: mamba install compilers maturin pytest pandas

- name: Run cargo fmt
run: cargo fmt --all -- --check --verbose
59 changes: 58 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,59 @@
# oxli
k-mers and the like

Author: C. Titus Brown (@ctb), [email protected]

oxli is a simple Rust library + Python interface for counting k-mers
in genomic sequencing data.

## Installation

You can try building it yourself:
```
mamba env create -f environment.yml -n oxli
make wheel
```
and then install the resulting wheel.

We are working on packaging via conda-forge.

## Documentation

Please see [the API documentation](doc/api.md).

## Is there anything I should know about oxli?

Two things -

First, oxli is channeling
[khmer](https://khmer.readthedocs.io/en/latest/), a package written by
@ctb and many others. You shouldn't be too surprised to see useful
functionality from khmer making an appearance in oxli.

Second, it's written on top of the
[sourmash](https://sourmash.readthedocs.io/)
[rust library](https://sourmash.readthedocs.io/), and the underlying
code for dealing with sequence data is pretty well tested.

## What's the history here?

The history is a bit convoluted:

* the khmer package was useful for inspecting large collections of
k-mers, but was hard to maintain and evolve.

* in ~2016 @ctb's lab more or less switched over to developing
sourmash, which was initially built on a similar tech stack to khmer
(Python & C++).

* at some point, @luizirber rewrote the sourmash C++ code into Rust.

* this forced @ctb to learn Rust to maintain sourmash.

* @ctb then decided he liked Rust an awful lot, and missed some of the
khmer functionality.

* voila, oxli was born.

---

(Sep 2024)
33 changes: 33 additions & 0 deletions doc/api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# A simple example of the API

Import necessary modules:

```python
>>> import screed
>>> import oxli

```

Create a KmerCountTable with a k-mer size of 31:

```python
>>> counts = oxli.KmerCountTable(31)

```

Open a FASTA file and consume k-mers from all the sequences within:

```python
>>> for record in screed.open('example.fa'):
... counts.consume(record.sequence)
349900

```

Get the count of `CGGAGGAAGCAAGAACAAAATATTTTTTCAT` in the data::

```python
>>> counts.get('CGGAGGAAGCAAGAACAAAATATTTTTTCAT')
1

```
Loading

0 comments on commit 3fe4dec

Please sign in to comment.