Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moving ORF as an AbstractGenomicInterval{T} #34

Merged
merged 40 commits into from
Jul 1, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
16edd60
First steps towards ORFs as GenomicIntervals
camilogarciabotero May 18, 2024
d9cc396
Some io fixes, and tests
camilogarciabotero May 18, 2024
87b49c3
Some attempts to correct tests
camilogarciabotero May 21, 2024
9132e11
refactor: Remove unused code in runtests.jl
camilogarciabotero May 23, 2024
8e138ba
Relax some tests
camilogarciabotero May 23, 2024
b412355
Move get_variable_name outside loop
camilogarciabotero May 25, 2024
0aaf4e2
Major refactor of the ORF type and also add some groundwork for RBS
camilogarciabotero Jun 3, 2024
14fff61
Update deps
camilogarciabotero Jun 3, 2024
4dc47a6
Improve getindex with some types
camilogarciabotero Jun 3, 2024
61ac5c7
Up translate and other utils are now more stable
camilogarciabotero Jun 3, 2024
27ca3f8
Update the min_len into minlen
camilogarciabotero Jun 3, 2024
4336542
Remove unused function
camilogarciabotero Jun 3, 2024
0516508
Update deps
camilogarciabotero Jun 3, 2024
f17a1f5
Refactor finders with correct score calculation
camilogarciabotero Jun 5, 2024
41e509b
chore: Update Julia version to 1.10.4
camilogarciabotero Jun 5, 2024
140b672
Update README
camilogarciabotero Jun 8, 2024
8cbd961
Manifest update
camilogarciabotero Jun 8, 2024
56d3f1c
Create Features struct and include it in the ORF as the NamedTuple field
camilogarciabotero Jun 8, 2024
fd839c9
Refactor code to improve precompile file size and loading speed
camilogarciabotero Jun 8, 2024
d0cf754
Update io methods
camilogarciabotero Jun 8, 2024
2ac4ce9
Remove getorfs.jl
camilogarciabotero Jun 8, 2024
ba9e75e
Update finder methods with the Feature struct
camilogarciabotero Jun 8, 2024
80a0594
Update lors from BMC taking only one argument
camilogarciabotero Jun 8, 2024
44c791c
Update naivecolletor.jl to new BMC losr
camilogarciabotero Jun 8, 2024
7663ab6
Update iscoding method, so that it takes an ORF directly
camilogarciabotero Jun 9, 2024
b01af90
Update score method to handle directly the score field
camilogarciabotero Jun 9, 2024
784c1d5
Update finder methods kwargs to only use the scheme kwargs
camilogarciabotero Jun 9, 2024
01c3e3e
After fixing the lors from BMC package the lordr is also updated, req…
camilogarciabotero Jun 9, 2024
24b05b5
Update to BMC v0.10.1
camilogarciabotero Jun 9, 2024
2047656
Update some docstring in lordr
camilogarciabotero Jun 9, 2024
f21c4d9
Update deps
camilogarciabotero Jul 1, 2024
91cb0bb
Update getindex of ORFs
camilogarciabotero Jul 1, 2024
5235e69
reduce ORF type fields: get rid of seq
camilogarciabotero Jul 1, 2024
4607f96
Add new internal methods for calling var names and symbol
camilogarciabotero Jul 1, 2024
6419219
Update docs simple coding rule
camilogarciabotero Jul 1, 2024
91e2450
Up deps and extras
camilogarciabotero Jul 1, 2024
f7ae859
Add YAML extra
camilogarciabotero Jul 1, 2024
0ffa402
Update gitignore and delete Manifest
camilogarciabotero Jul 1, 2024
929f482
Update actions cache and cov
camilogarciabotero Jul 1, 2024
67e046d
refactor: Simplify NaiveCollector createorfs function
camilogarciabotero Jul 1, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/CI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,11 +35,11 @@ jobs:
with:
version: ${{ matrix.version }}
arch: ${{ matrix.arch }}
- uses: julia-actions/cache@v1
- uses: julia-actions/cache@v2
- uses: julia-actions/julia-buildpkg@v1
- uses: julia-actions/julia-runtest@v1
- uses: julia-actions/julia-processcoverage@v1
- uses: codecov/codecov-action@v3
- uses: codecov/codecov-action@v4
with:
files: lcov.info
docs:
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,5 @@
.travis.yml
/docs/src/*_cache
/docs/src/*_files
/test/Manifest.toml
/test/Manifest.toml
Manifest.toml
131 changes: 0 additions & 131 deletions Manifest.toml

This file was deleted.

11 changes: 8 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,20 +7,25 @@ version = "0.4.0"
BioMarkovChains = "f861b655-cb5f-42ce-b66a-341b542d4f2c"
BioSequences = "7e6ae17a-c86d-528c-b3b9-7f778a29fe59"
FASTX = "c2308a5c-f048-11e8-3e8a-31650f418d12"
GenomicFeatures = "899a7d2d-5c61-547b-bef9-6698a8d05446"
IterTools = "c8e1da08-722c-5040-9ed9-7db0dc04731e"
PrecompileTools = "aea7be01-6a6a-4083-8856-8a6e6704d82a"

[compat]
BioMarkovChains = "0.9"
BioMarkovChains = "0.10"
BioSequences = "3"
FASTX = "2"
GenomicFeatures = "3"
IterTools = "1.4"
PrecompileTools = "1"
julia = "1"
julia = "1.6"

[extras]
Aqua = "4c88cf16-eb10-579e-8560-4a9242c79595"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
StyledStrings = "f489334b-da3d-4c2e-b8f0-e476e12c162b"
Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"
YAML = "ddb6d928-2868-570f-bddf-ab3f9cf99eb6"

[targets]
test = ["Test", "Aqua"]
test = ["Test", "Aqua", "LinearAlgebra", "StyledStrings", "YAML"]
56 changes: 38 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,27 +50,27 @@ seq = dna"AACCAGGGCAATATCAGTACCGCGGGCAATGCAACCCTGACTGCCGGCGGTAACCTGAACAGCACTGGCA
Now lest us find the ORFs

```julia
findorfs(seq, NaiveFinder())

12-element Vector{ORF}:
ORF(29:40, '+', 2, 0.0)
ORF(137:145, '+', 2, 0.0)
ORF(164:184, '+', 2, 0.0)
ORF(173:184, '+', 2, 0.0)
ORF(236:241, '+', 2, 0.0)
ORF(248:268, '+', 2, 0.0)
ORF(362:373, '+', 2, 0.0)
ORF(470:496, '+', 2, 0.0)
ORF(551:574, '+', 2, 0.0)
ORF(569:574, '+', 2, 0.0)
ORF(581:601, '+', 2, 0.0)
ORF(695:706, '+', 2, 0.0)
orfs = findorfs(seq, finder=NaiveFinder) # use finder=NaiveCollector as an alternative

12-element Vector{ORF{4, NaiveFinder}}:
ORF{NaiveFinder}(29:40, '+', 2)
ORF{NaiveFinder}(137:145, '+', 2)
ORF{NaiveFinder}(164:184, '+', 2)
ORF{NaiveFinder}(173:184, '+', 2)
ORF{NaiveFinder}(236:241, '+', 2)
ORF{NaiveFinder}(248:268, '+', 2)
ORF{NaiveFinder}(362:373, '+', 2)
ORF{NaiveFinder}(470:496, '+', 2)
ORF{NaiveFinder}(551:574, '+', 2)
ORF{NaiveFinder}(569:574, '+', 2)
ORF{NaiveFinder}(581:601, '+', 2)
ORF{NaiveFinder}(695:706, '+', 2)
```

Two other methods where implemented into `getorfs` to get the ORFs in DNA or aminoacid sequences, respectively. They use the `findorfs` function to first get the ORFs and then get the correspondance array of `BioSequence` objects.
The `ORF` structure contains also the sequence of the ORF, the frame, and several features. If you want to get the sequence of the ORFs you can broadcast the `sequence` function to safely extract the sequences of all ORFs.

```julia
getorfs(seq, DNAAlphabet{4}(), NaiveFinder())
sequence.(orfs)

12-element Vector{LongSubSeq{DNAAlphabet{4}}}:
ATGCAACCCTGA
Expand All @@ -87,6 +87,26 @@ getorfs(seq, DNAAlphabet{4}(), NaiveFinder())
ATGCAACCCTGA
```

Similarly, you can extract the amino acid sequences of the ORFs using the `translate` function.

```julia
translate.(orfs)

12-element Vector{LongAA}:
MQP*
MR*
MRRMAR*
MAR*
M*
MCPTAV*
MQP*
MHWLVLSI*
MSPHKAM*
M*
MCPTAA*
MQP*
```

## Writting ORF information into bioinformatic formats

This package facilitates now the creation of `FASTA`, `BED`, and `GFF` files, specifically extracting Open Reading Frame (ORF) information from `BioSequence` instances, particularly those of type `NucleicSeqOrView{A} where A`, and then writing the information into the desired format.
Expand Down Expand Up @@ -118,7 +138,7 @@ Once a `BioSequence` object has been instantiated, the `write_orfs_fna` function
outfile = "LFLS01000089.fna"

open(outfile, "w") do io
write_orfs_fna(seq, io, NaiveFinder())
write_orfs_fna(seq, io, finder=NaiveFinder) # use finder=NaiveCollector as an alternative
end
```

Expand Down
Loading
Loading