HMM_profile

Hidden Markov Model profile toolkit.

Written in the base of HMMER User's Guide p.107.

Usage

With my package you can read and write hmm profile files. It's easy to use and easy to read - the best documentation is a well-written code itself, so don't be scared about reading source code.

Reader

Read all hmm from file

The read_all function returns generator to optimise memory usage - it's a common pattern that one file contains many profiles.

from hmm_profile import reader


with open('/your/hmm/profile/file.hmm') as f:
    model_generator = reader.read_all(f)  # IMPORTANT: returns generator

profiles = list(model_generator)

Read single model

If you have only single model files, you can use this method. It will return models.HMM ready to use.

from hmm_profile import reader


with open('/your/hmm/profile/file.hmm') as f:
    model = reader.read_single(f)

Writer

Write multiple profiles to single file

from hmm_profile import writer

profiles = [...]
path = '/your/hmm/profile/file.hmm'

writer.save_many_to_file(hmms=profiles, output=path)

Write single model to file

from hmm_profile import writer

model = ...
path = '/your/hmm/profile/file.hmm'

writer.save_to_file(hmm=model, output=path)

Get file content without saving

from hmm_profile import writer

model = ...

lines = writer.get_lines(model)  # IMPORTANT: returns generator
content = ''.join(lines)

Support/bugs

If you have a file that is not readable or has some glitches on save, please crate the issue and attach this file. Bug reports without files (or good examples if you can't provide full file) will be ignored.

Guarantees

Above you can see if all hmm profiles from Pfam works. Test are running every day.

Test flow:

Download all hmm profiles from Pfam.
Load profiles sequentially.
Write model to file.
Load saved model from file.
Check if both loaded profiles are equals.

For this test the latest version of hmm_profile from pypi is used.

Full DB test also runs before each release, but badge above shows only periodic tests results.

Performance

Whole package is written in pure Python, without C extensions.

You can treat full DB test as benchmark.

Benchmark should be depended mainly on single core of CPU and secondarily on storage and eventually on RAM. Storage is used only for read from then files will be saved to "in-memory file" (StringIO).

Remember: Results may vary when CPU is under load. Also, hmm profiles in db can be modified in future or some profiles may be added/removed from DB.

D - means test was done in Docker container with slim version of Python.

Processor	Storage	Time [s]	Profiles	Date	Version	Python
Intel Core i7-4702MQ	Crucial MX500 500 GB	322	17928	2020.02.22	0.0.9	3.6
Intel Core i7-4702MQ	Crucial MX500 500 GB	342	17928	2020.02.22	0.0.9	3.7
AMD Ryzen 5 3600	AData XPG SX8200 1TB	217	19631	2022.06.28	0.0.13	3.9
AMD Ryzen 5 3600	AData XPG SX8200 1TB	216	19632	2022.07.10	0.0.13	3.10
AMD Ryzen 5 3600	AData XPG SX8200 1TB	265	19632	2022.07.10	0.0.13	3.11.0b3
AMD Ryzen 5 3600	AData XPG SX8200 1TB	354	19632	2022.07.10	0.0.13	3.7 D
AMD Ryzen 5 3600	AData XPG SX8200 1TB	330	19632	2022.07.10	0.0.13	3.8 D
AMD Ryzen 5 3600	AData XPG SX8200 1TB	336	19631	2022.07.09	0.0.13	3.9 D
AMD Ryzen 5 3600	AData XPG SX8200 1TB	253	19632	2022.07.10	0.0.13	3.10 D
AMD Ryzen 5 3600	AData XPG SX8200 1TB	218	19632	2022.07.10	0.0.13	3.11.0b3 D

To run benchmark:

pip install .
export HMM_PROFILE_RUN_INTEGRITY_TESTS=TRUE
python setup.py test --addopts -s

Run test at least 3 times if you want to share results (last line) and close as much process as possible. Important: do not run tests inside so-called terminal in IDE - it will do much more job with output and benchmark result will be affected.

As you can see python 3.6 is a little faster, probably due to different implementation of backported dataclasses, but I'm not sure.

Development

Release

Change version in setup.py to x.y.z.dev0 (or leave if minor version bump) and ensure changelog is up-to-date. (Nothing changed yet. is not ok, CI will fail)
Tag head of master branch with x.y.z without .dev0

Important: release ALWAYS is from master branch! So keep master untouched when you want to release.

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
.github/workflows		.github/workflows
hmm_profile		hmm_profile
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py
test_requirements.in		test_requirements.in
test_requirements.txt		test_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HMM_profile

Usage

Reader

Read all hmm from file

Read single model

Writer

Write multiple profiles to single file

Write single model to file

Get file content without saving

Support/bugs

Guarantees

Performance

Development

Release

About

Releases 3

Packages

Languages

License

Behoston/hmm_profile

Folders and files

Latest commit

History

Repository files navigation

HMM_profile

Usage

Reader

Read all hmm from file

Read single model

Writer

Write multiple profiles to single file

Write single model to file

Get file content without saving

Support/bugs

Guarantees

Performance

Development

Release

About

Resources

License

Stars

Watchers

Forks

Releases 3

Packages 0

Languages

Packages