Skip to content

Commit

Permalink
Push of first non container version
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexanderGress committed Feb 1, 2024
1 parent 3203a65 commit c46c16d
Show file tree
Hide file tree
Showing 119 changed files with 1,933,954 additions and 0 deletions.
150 changes: 150 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
config.txt
structman_config.txt
errorlog.txt
example/Output/*

*.pdb
*~
*.gz
!structman/scripts/*.gz

.vscode/

# ---> Python
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.nox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*.cover
*.py,cover
.hypothesis/
.pytest_cache/
cover/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py
db.sqlite3
db.sqlite3-journal

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
.pybuilder/
target/

# Jupyter Notebook
.ipynb_checkpoints

# IPython
profile_default/
ipython_config.py

# pyenv
# For a library or package, you might want to ignore these files since the code is
# intended to run in multiple environments; otherwise, check them in:
# .python-version

# pipenv
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
# However, in case of collaboration, if having platform-specific dependencies or dependencies
# having no cross-platform support, pipenv may install dependencies that don't work, or not
# install all needed dependencies.
#Pipfile.lock

# PEP 582; used by e.g. github.com/David-OConnor/pyflow
__pypackages__/

# Celery stuff
celerybeat-schedule
celerybeat.pid

# SageMath parsed files
*.sage.py

# Environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# Spyder project settings
.spyderproject
.spyproject

# Rope project settings
.ropeproject

# mkdocs documentation
/site

# mypy
.mypy_cache/
.dmypy.json
dmypy.json

# Pyre type checker
.pyre/

# pytype static type analyzer
.pytype/

# Cython debug symbols
cython_debug/
13 changes: 13 additions & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
include structman/lib/rinerator/reduce
include structman/lib/rinerator/probe
include structman/lib/rinerator/reduce_wwPDB_het_dict.txt
include structman/scripts/pdb-rsync.sh
include structman/scripts/struct_man_db_uniprot.sql
include structman/scripts/database_structure.sql
include structman/scripts/sqlite_database_structure.sql
include structman/scripts/structguy_project_template.conf
include structman/resources/Components-smiles-stereo-oe.smi
include structman/resources/inchi_base.tsv
include structman/resources/pdbba.fasta.gz
include structman/resources/human_info_map.tab
include changelog.txt
171 changes: 171 additions & 0 deletions changelog.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
Version 1.4.0
- Major update preparing the public version installable outside of a container
- Implemented a installation script to fully setup structman in a conda environment
- New outputs that support the direct usage of structguy after using structman
- Added support for sqlite3 as a local database option, hence removed the lite version
- Integrated MicroMiner into the pipeline (usage requires to configure an executable)
- Switched to ray 2.9.1
- Many smaller bug fixes

Version 1.3.0
- Needed to add an Interface hashing system to the database to retrace different interfaces from the same protein
- Updated the automated Modelling pipeline, improved the template selection to focus on mutated sites
- New functionality: now one can give tags to proteins that annotate a gene identifier (for example: "gene:P53"). Multiple protein isoforms with same gene tag will then lead to an isoform analysis.
- switched to ray 2.2.0
- Added support for Ensembl transcript identifiers

Version 1.2.2
- added the feature to parse HGVS protein mutation identifiers (for example: p.Arg12His)


Version 1.2.1
- updated Uniprot ID mapping service to REST API

Version 1.2.0
- added database support for interfaces, aggregated interface, protein-protein-interaction, position-position-interactions
- switched to ray 1.13.0

Version 1.1.1
- added aggregated contact matrix calculation
- added aggregated interface calculation

Version 1.1.0
- database makes now version checks
- switched to ray 1.11.0
- added model database support (specifically alphafold DB)

Version 1.0.3
- database create command now checks if already an instance exists first
- if database is not properly configured and structman got still called in db mode, now it switches automatically to lite mode

Version 1.0.2
- fasta input parsing checks now for irregular characters
- improved some bug logging
- added a possibility to run the alignment in serial mode without ray for debugging purposes
- switched to ray 1.10.0

Version 1.0.1
- removed some commented code
- slight changes of file structure to prevent circular imports
- multiple small bug fixes

Version 1.0.0
- first main version
- switched to ray 1.9.0
- Multiple smaller bug fixes

Version 0.9.2
- changed how imports are organized
- switched to ray 1.8.0

Version 0.9.1
- Various smaller bug fixes
- Multiple model structures (not NMR) are now fused internally into one model
- obsolete uniprot entries are now supported

Version 0.9.0
- Moved Modeller to version 10.1
- Fixed multiple modelling bugs
- Implemented aggregated indel analysis
- Indel analysis results are now a compressed row in the database
- Added automated script that creates/updates the mapping/sequence database
- mapping/sequence database got a new structure
- default mode now doesn't model indel structures. Can be activated by config or flag
- Updated disclaimers

Version 0.8.1
- The size of the database row for position data and alignment is now bigger
- Fixed a bug that prevented disordered region annotation to happen
- Upgraded disorder prediction method from IUpred2A to IUpred3
- Unpacking of position data in the ouput creation is now paralellized

Version 0.8.0
- Bigger data fields are now stored as binaries in the database

Version 0.7.3
- The update script now addtionally loads the .../data/status directory
- The RINdb update functions checks and writes (into meta.txt in the RINdb main directory) the date of the last update
- The RINdb compares the date of the last update with the dates of structure modifications in the /data/status directory of the PDB
- Added a function to split sequences of structures with unknown portions; to align them individually; and to fuse the alignments afterwards
- Configured custom output format of MMseqs2
- Use sequence length of the structure to better estimate the cost for an alignment

Version 0.7.2
- Added a nested parallelization for the classifcation. Triggers only, when we have many cores available and the classifcation task is large enough.
- updated the docker update script

Version 0.7.1
- Fixed a bug that kept the program using a local instance of the PDB
- Improved the protein splitting scheduling in the alignment parallelization, now works better with few proteins that are mapped to many structures
- Improved the performance of the packaging in the parallelized classification

Version 0.7.0
- Fixed some bugs regarding the parsing of asymmetric unit structures for cases them being given as input
- Parallelization if filter_structures is now chunked
- Improved chunking of alignment tasks
- Ray reinit for each major core loop to prevent memory leaks
- added custom object serialization functions to utils
- Fixed bugs regarding the parsing of PDB-type inputs
- added '--mem_limit [any%]' flag to limit maximal memory usage for testing purposes
- extensive overhaul of the parallelization of the annotation and the classification

Version 0.6.1
- Added output util 'RAS' that Retrieves all Annotated Structures (and their protein interaction partner) from a specific session.

Version 0.6.0:
- Added output util `suggest` command that ranks structural annotation templates
- Added associated database table `Suggestion` for rankings
- Add pandas requirement to structman conda environment file

Version 0.5.1
- increased radius for intra chain interface score sphere

Version 0.5.0:
- Location (formerly surface/core) gets now divided into surface/buried/core and into total/mainchain/sidechain
- Classification is based on sidechain location (means also a new class 'buried')
- Changed location and surface values had to be stored in the database -> version change to 0.5.x
- Added all types of locations and surface values to classification and feature table outputs
- Implemented an interface milieu analysis (inspired by core/rim interface distinction, but goes into more detail)

Version 0.4.1:
- Unspecified fasta inputs now add all positions as query

Version 0.4.0:
* Major Changes
- Add input checksums to see if input file has changed, triggering new session if so
- Adds new Checksum column to Session table in DB
- Separate output.py into multiple modules
* Enhancements
- Allow C++ style // line and trailing comments in smlf format
- Search user home folder for config file when no others found
- Added new structman.utils module (less structman specific than structman.lib.sdsc.utils)
* Misc Changes/Fixes
- Fix Mutated Sequence off-by-1 indexing error
- Move several functions from structman.lib.sdsc.utils to structman.utils
- Check that database_source_path exists before creating DB to avoid missing tables and ensuing failure of structman database create
- Delete duplicated sql script from /lib (resides in /scripts)
- Prefer os.path.join to avoid errors with double slashes in path
- Use package paths set in settings.py for ray_init()
- Use surface_threshold from config in database.getClassWT, database.getClass, database.diffSurfs
- Rename class Output_generator to OutputGenerator to conform to pep8
- Fix input/output issues when using relative paths
- Import fixes
- Autopep8 fixes
- Updated Readme

Version 0.3.2:
- Modelling now builds up a database of models that can be used to save computation costs for later runs
- Positions filtered by sanity checks now do not delete the whole protein
- Indel analysis can now be used with stored information from the database
- Added '--only_wt' option to run the pipeline without the generation of mutated proteins
- Multiple smaller bug fixes
- Improved remote classification

Version 0.3.1:
- Improved the store handling of remote classification
- Implemented a modelling output framework. Currently produced one model per input protein and model for each given mutation.
- Multiple smaller bug fixes

Version 0.3.0:
- Begin of changelog
- Acts as first master version of structman as pip package
24 changes: 24 additions & 0 deletions config_template.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[user]
db_user_name=
db_password=
db_name=
mmseqs_tmp_folder=
mail=

db_address=
mapping_db=

verbosity=1

mmseqs2_path=mmseqs

pdb_path=
path_to_model_db=
rin_db_path=
mmseqs2_db_path=
mmseqs2_model_db_path=

microminer_exe=
microminer_db=

iupred_path=
Loading

0 comments on commit c46c16d

Please sign in to comment.