diff --git a/NEWS.md b/NEWS.md index 309d1c17..214ee3bc 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,25 @@ ## MIToS.jl Release Notes +### Changes from v2.19.0 to v2.20.0 + +* *[Breaking change]* The PDB module has deprecated `residues` and `@residues` in favor of + the `select_residues` function that uses keyword arguments. + So, `residues(pdb, "1", "A", "ATOM", All)` or `@residues pdb "1" "A" "ATOM" All` should be + replaced by `select_residues(pdb, model="1", chain="A", group="ATOM")`. + +* *[Breaking change]* The PDB module has deprecated `atoms` and `@atoms` in favor of + the `select_atoms` function that uses keyword arguments. + So, `atoms(pdb, "1", "A", "ATOM", All, "CA")` or `@atoms pdb "1" "A" "ATOM" All "CA"` should be + replaced by `select_atoms(pdb, model="1", chain="A", group="ATOM", atom="CA")`. + +* *[Breaking change]* The PDB module has deprecated the methods of the `isresidue` and + `residuesdict` functions that rely on positional arguments in favor of the keyword arguments. + So, `isresidue(pdb, "1", "A", "ATOM", "10")` should be replaced by + `isresidue(pdb, model="1", chain="A", group="ATOM", residue="10")`. Similarly, + `residuesdict(pdb, "1", "A", "ATOM", All)` should be replaced by + `residuesdict(pdb, model="1", chain="A", group="ATOM")`. + + ### Changes from v2.18.0 to v2.19.0 * *[Breaking change]* The `shuffle` and `shuffle!` functions are deprecated in favor of the diff --git a/Project.toml b/Project.toml index 0451b6a2..50cd9bff 100644 --- a/Project.toml +++ b/Project.toml @@ -1,6 +1,6 @@ name = "MIToS" uuid = "51bafb47-8a16-5ded-8b04-24ef4eede0b5" -version = "2.19.0" +version = "2.20.0" [deps] ArgParse = "c7e460c6-2fb9-53a9-8c5b-16f535851c63" diff --git a/docs/src/PDB.md b/docs/src/PDB.md index afa0d0a3..d2b9c9fe 100644 --- a/docs/src/PDB.md +++ b/docs/src/PDB.md @@ -81,42 +81,35 @@ CA_1ivo[1] # First residue. It has only the α carbon. MIToS parse PDB files to vector of residues, instead of using a hierarchical structure like other packages. This approach makes the search and selection of residues or atoms a little different. -To make it easy, this module exports a number of functions and macros to select particular -residues or atoms. Given the fact that residue numbers from different chains, models, etc. -can collide, **it's mandatory to indicate the `model`, `chain`, `group`, `residue` number -and `atom` name in a explicit way** to these functions or macros. If you want to select all -the residues in one of the categories, you are able to use the type `All`. You can also use -regular expressions or functions to make the selections. +To make it easy, this module exports the `select_residues` and `select_atoms` functions. +Given the fact that residue numbers from different chains, models, etc. can collide, we +can indicate the `model`, `chain`, `group`, `residue` number and `atom` name using the +keyword arguments of those functions. If you want to select all the residues in one of the +categories, you are able to use the type `All` (this is the default value of such arguments). +You can also use regular expressions or functions to make the selections. ```@example pdb_select using MIToS.PDB pdbfile = downloadpdb("1IVO", format=PDBFile) residues_1ivo = read_file(pdbfile, PDBFile) -# Select residue number 9 from model 1 and chain B -residues(residues_1ivo, "1", "B", All, "9") +# Select residue number 9 from model 1 and chain B (it looks in both ATOM and HETATM groups) +select_residues(residues_1ivo, group="1", chain="B", residue="9") ``` ### Getting a `Dict` of `PDBResidue`s If you prefer a `Dict` of `PDBResidue`, indexed by their residue numbers, you can use the -`residuedict` function or the `@residuedict` macro. +`residuedict` function. ```@example pdb_select # Dict of residues from the model 1, chain A and from the ATOM group -chain_a = residuesdict(residues_1ivo, "1", "A", "ATOM", All) +chain_a = residuesdict(residues_1ivo, model="1", chain="A", group="ATOM") chain_a["9"] -``` - -You can do the same with the macro `@residuesdict` to get a more readable code - -```@example pdb_select -chain_a = @residuesdict residues_1ivo model "1" chain "A" group "ATOM" residue All -chain_a["9"] -``` +``` ### Select particular residues -Use the `residues` function to collect specific residues. It's possible to use a single +Use the `select_residues` function to collect specific residues. It's possible to use a single **residue number** (i.e. `"2"`) or even a **function** which should return true for the selected residue numbers. Also **regular expressions** can be used to select residues. Use `All` to select all the residues. @@ -130,7 +123,7 @@ residue_list = map(string, 2:5) ``` ```@example pdb_select -first_res = residues(residues_1ivo, "1", "A", "ATOM", resnum -> resnum in residue_list) +first_res = select_residues(residues_1ivo, model="1", chain="A", group="ATOM", residue=resnum -> resnum in residue_list) for res in first_res println(res.id.name, " ", res.id.number) @@ -142,7 +135,7 @@ A more complex example using an anonymous function: ```@example pdb_select # Select all the residues of the model 1, chain A of the ATOM group with residue number less than 5 -first_res = residues(residues_1ivo, "1", "A", "ATOM", x -> parse(Int, match(r"^(\d+)", x)[1]) <= 5 ) +first_res = select_residues(residues_1ivo, model="1", chain="A", group="ATOM", residue=x -> parse(Int, match(r"^(\d+)", x)[1]) <= 5 ) # The anonymous function takes the residue number (string) and use a regular expression # to extract the number (without insertion code). # It converts the number to `Int` to test if the it is `<= 5`. @@ -152,35 +145,17 @@ for res in first_res end ``` -Use the `@residues` macro for a cleaner syntax. - -```@example pdb_select -# You can use All, regular expressions or functions also for model, chain and group: - -# i.e. Takes the residue 10 from chains A and B - -for res in @residues residues_1ivo model "1" chain ch -> ch in ["A","B"] group "ATOM" residue "10" - println(res.id.chain, " ", res.id.name, " ", res.id.number) -end -``` - ### Select particular atoms -The `atoms` function or macro allow to select a particular set of atoms. +The `select_atoms` function allow to select a particular set of atoms. ```@example pdb_select # Select all the atoms with name starting with "C" using a regular expression # from all the residues of the model 1, chain A of the ATOM group -carbons = @atoms residues_1ivo model "1" chain "A" group "ATOM" residue All atom r"C.+" +carbons = select_atoms(residues_1ivo, model="1", chain="A", group="ATOM", residue=All, atom=r"C.+") carbons[1] -``` - -You can also use the `atoms` function instead of the `@atoms` macro: - -```@example pdb_select -atoms(residues_1ivo, "1", "A", "ATOM", All, r"C.+")[1] ``` ## Protein contact map @@ -202,7 +177,7 @@ pdbfile = downloadpdb("1IVO", format=PDBFile) residues_1ivo = read_file(pdbfile, PDBFile) -pdb = @residues residues_1ivo model "1" chain "A" group "ATOM" residue All +pdb = select_residues(residues_1ivo, model="1", chain="A", group="ATOM") dmap = distance(pdb, criteria="All") # Minimum distance between residues using all their atoms ``` @@ -256,8 +231,8 @@ pdbfile = downloadpdb("2HHB") res_2hhb = read_file(pdbfile, PDBML) -chain_A = pdb = @residues res_2hhb model "1" chain "A" group "ATOM" residue All -chain_C = pdb = @residues res_2hhb model "1" chain "C" group "ATOM" residue All +chain_A = select_residues(res_2hhb, model="1", chain="A", group="ATOM", residue=All) +chain_C = select_residues(res_2hhb, model="1", chain="C", group="ATOM", residue=All) using Plots gr() diff --git a/docs/src/cookbook/02_Linking_structural_and_evolutionary_information.jl b/docs/src/cookbook/02_Linking_structural_and_evolutionary_information.jl index 8c178c1e..a410b657 100644 --- a/docs/src/cookbook/02_Linking_structural_and_evolutionary_information.jl +++ b/docs/src/cookbook/02_Linking_structural_and_evolutionary_information.jl @@ -106,7 +106,7 @@ Hx = mapcolfreq!(entropy, # functions from the MIToS `PDB` module: using MIToS.PDB -res_dict = residuesdict(read_file(pdb_file, PDBFile, occupancyfilter=true), "1", "A") # model 1 chain A +res_dict = residuesdict(read_file(pdb_file, PDBFile, occupancyfilter=true), model="1", chain="A") # Then, we can iterate the mapping dictionary to link the MSA and PDB based # values: diff --git a/scripts/Distances.jl b/scripts/Distances.jl index 05c5142b..e21bb6d3 100755 --- a/scripts/Distances.jl +++ b/scripts/Distances.jl @@ -86,7 +86,7 @@ set_parallel(Args["parallel"]) model_arg = string(args["model"]) == "All" ? All : string(args["model"]) chain_arg = string(args["chain"]) == "All" ? All : string(args["chain"]) group_arg = string(args["group"]) == "All" ? All : string(args["group"]) - res = residues(res, model_arg, chain_arg, group_arg, All) + res = select_residues(res, model=model_arg, chain=chain_arg, group=group_arg, residue=All) N = length(res) inter = !Bool(args["inter"]) for i in 1:(N-1) diff --git a/src/PDB/PDB.jl b/src/PDB/PDB.jl index 58a0ed59..2404cbbf 100644 --- a/src/PDB/PDB.jl +++ b/src/PDB/PDB.jl @@ -41,10 +41,12 @@ export # PDBResidues contact, isresidue, isatom, + select_residues, residues, @residues, residuesdict, @residuesdict, + select_atoms, atoms, @atoms, findheavy, diff --git a/test/PDB/Contacts.jl b/test/PDB/Contacts.jl index 6d75dbaa..593d901f 100644 --- a/test/PDB/Contacts.jl +++ b/test/PDB/Contacts.jl @@ -236,7 +236,8 @@ code = "2VQC" pdb = read_file(txt(code), PDBFile) - residues = @residues pdb model "1" chain "A" group "ATOM" residue x -> x in ["62","64","65"] + residues = select_residues(pdb, model="1", chain="A", group="ATOM", + residue = x -> x in ["62","64","65"]) @test contact(residues, 6.05) == ( [1 1 0 1 1 1 diff --git a/test/PDB/Kabsch.jl b/test/PDB/Kabsch.jl index 9ec176d9..f3bcf7a4 100644 --- a/test/PDB/Kabsch.jl +++ b/test/PDB/Kabsch.jl @@ -107,10 +107,10 @@ hemoglobin = read_file(joinpath(DATA, "2hhb.pdb.gz"),PDBFile,group="ATOM",model="1") - α1 = @residues hemoglobin model "1" chain "A" group "ATOM" residue All - α2 = @residues hemoglobin model "1" chain "C" group "ATOM" residue All - β1 = @residues hemoglobin model "1" chain "B" group "ATOM" residue All - β2 = @residues hemoglobin model "1" chain "D" group "ATOM" residue All + α1 = select_residues(hemoglobin, model="1", chain="A", group="ATOM") + α2 = select_residues(hemoglobin, model="1", chain="C", group="ATOM") + β1 = select_residues(hemoglobin, model="1", chain="B", group="ATOM") + β2 = select_residues(hemoglobin, model="1", chain="D", group="ATOM") a1, a2, rα = superimpose(α1, α2) diff --git a/test/PDB/PDB.jl b/test/PDB/PDB.jl index 70b09cc2..b92f666d 100644 --- a/test/PDB/PDB.jl +++ b/test/PDB/PDB.jl @@ -59,44 +59,22 @@ for residue_list in [pdb, pdbml] @test findall(res -> res.id.number == "15A", residue_list) == [1] - @test findall(res -> isresidue(res,All,All,All,"15A"), residue_list) == [1] + @test findall(res -> isresidue(res, residue="15A"), residue_list) == [1] @test findall(res -> res.id.number == "15B", residue_list) == [2] - @test findall(res -> isresidue(res,All,All,All,"15B"), residue_list) == [2] + @test findall(res -> isresidue(res, residue="15B"), residue_list) == [2] end - @testset "@residues" begin - - @test (@residues pdb model All chain All group All residue "141") == - filter(res -> isresidue(res, All, All, All,"141"), pdb) - - # Testing the macro in let block: - mo = "1" - ch = "A" - gr = All - re = "141" - @test (@residues pdb model mo chain ch group gr residue re) == - (@residues pdb model All chain All group gr residue "141") - end - - @testset "Occupancy != 1.0 and @atom" begin + @testset "Occupancy != 1.0" begin @test sum(map(a -> (a.atom == "HH22" ? a.occupancy : 0.0), filter(r -> r.id.number == "141", pdbml)[1].atoms)) == 1.0 - @test sum([a.occupancy for a in @atoms pdbml model "1" chain "A" group All residue "141" atom "HH22" ] ) == 1.0 - - # Testing the macro in let block: - mo = "1" - ch = "A" - gr = All - re = "141" - at = "HH22" - @test sum([a.occupancy for a in @atoms pdbml model mo chain ch group gr residue re atom at ] ) == 1.0 + @test sum([a.occupancy for a in select_atoms(pdbml, model="1", chain="A", group=All, residue="141", atom="HH22")]) == 1.0 end @testset "Best occupancy" begin - atoms_141 = @atoms pdbml model "1" chain "A" group All residue "141" atom "HH22" - resid_141 = @residues pdbml model "1" chain "A" group All residue "141" + atoms_141 = select_atoms(pdbml, model="1", chain="A", group=All, residue="141", atom="HH22") + resid_141 = select_residues(pdbml, model="1", chain="A", group=All, residue="141") @test bestoccupancy(atoms_141)[1].occupancy == 0.75 @test bestoccupancy(reverse(atoms_141))[1].occupancy == 0.75 @@ -110,10 +88,10 @@ @test_throws AssertionError selectbestoccupancy(resid_141[1], collect(1:100)) end - @testset "@atom with All" begin + @testset "select_atom with All" begin # ATOM 2 CA ALA A 15A 22.554 11.619 6.400 1.00 6.14 C - @test atoms(pdb, "1", "A", "ATOM", All, r"C.+")[1].atom == "CA" + @test select_atoms(pdb, model="1", chain="A", group="ATOM", residue=All, atom=r"C.+")[1].atom == "CA" end end @@ -124,14 +102,14 @@ pdb = read_file(txt(code), PDBFile, occupancyfilter=true) pdbml = read_file(xml(code), PDBML, occupancyfilter=true) - res_pdb = @residues pdbml model "1" chain "A" group All residue "141" - res_pdbml = @residues pdbml model "1" chain "A" group All residue "141" + res_pdb = select_residues(pdb, model="1", chain="A", group=All, residue="141") + res_pdbml = select_residues(pdbml, model="1", chain="A", group=All, residue="141") - atm_pdbml = @atoms pdbml model "1" chain "A" group All residue "141" atom "HH22" + atm_pdbml = select_atoms(pdbml, model="1", chain="A", group=All, residue="141", atom="HH22") @test length( atm_pdbml ) == 1 @test atm_pdbml[1].occupancy == 0.75 - @test length( @atoms pdb model "1" chain "A" group All residue "141" atom "HH22" ) == 1 + @test length(select_atoms(pdb, model="1", chain="A", group=All, residue="141", atom="HH22")) == 1 @test length(res_pdb[1]) == 24 @test length(res_pdbml[1]) == 24 @@ -151,11 +129,11 @@ pdb = read_file(txt(code), PDBFile) pdbml = read_file(xml(code), PDBML) - @test length( @residues pdb model "1" chain "A" group All residue "22" ) == 2 - @test length( @residues pdbml model "1" chain "A" group All residue "22" ) == 2 + @test length(select_residues(pdb, model="1", chain="A", group=All, residue="22")) == 2 + @test length(select_residues(pdbml, model="1", chain="A", group=All, residue="22")) == 2 - @test [r.id.name for r in @residues pdb model "1" chain "A" group All residue "22"] == ["SER", "PRO"] - @test [r.id.name for r in @residues pdbml model "1" chain "A" group All residue "22"] == ["SER", "PRO"] + @test [r.id.name for r in select_residues(pdb, model="1", chain="A", group=All, residue="22")] == ["SER", "PRO"] + @test [r.id.name for r in select_residues(pdbml, model="1", chain="A", group=All, residue="22")] == ["SER", "PRO"] end @testset "1AS5: NMR" begin @@ -164,10 +142,10 @@ pdb = read_file(txt(code), PDBFile) pdbml = read_file(xml(code), PDBML) - @test length( @residues pdbml model "1" chain "A" group All residue All ) == 25 - @test length( @residues pdbml model "14" chain "A" group All residue All ) == 25 + @test length( select_residues(pdbml, model="1", chain="A") ) == 25 + @test length( select_residues(pdbml, model="14", chain="A") ) == 25 - @test length( @residues pdbml model All chain "A" group All residue All ) == 25*14 + @test length( select_residues(pdbml, model=All, chain="A") ) == 25*14 end @testset "1DPO: Inserted residues lack insertion letters" begin @@ -183,14 +161,14 @@ # But 'A':'H' chains for PDBML (label_asym_id) @test unique([r.id.chain for r in pdbml]) == [ string(chain) for chain in 'A':'H' ] - @test [r.id.name for r in @residues pdb model "1" chain "A" group All residue r"^184[A-Z]?$"] == ["GLY", "PHE"] - @test [r.id.name for r in @residues pdbml model "1" chain "A" group All residue r"^184[A-Z]?$"] == ["GLY", "PHE"] + @test [r.id.name for r in select_residues(pdb, model="1", chain="A", residue=r"^184[A-Z]?$")] == ["GLY", "PHE"] + @test [r.id.name for r in select_residues(pdbml, model="1", chain="A", residue=r"^184[A-Z]?$")] == ["GLY", "PHE"] - @test [r.id.name for r in @residues pdb model "1" chain "A" group All residue r"^188[A-Z]?$"] == ["GLY", "LYS"] - @test [r.id.name for r in @residues pdbml model "1" chain "A" group All residue r"^188[A-Z]*"] == ["GLY", "LYS"] + @test [r.id.name for r in select_residues(pdb, model="1", chain="A", residue=r"^188[A-Z]?$")] == ["GLY", "LYS"] + @test [r.id.name for r in select_residues(pdbml, model="1", chain="A", residue=r"^188[A-Z]*")] == ["GLY", "LYS"] - @test [r.id.name for r in @residues pdb model "1" chain "A" group All residue r"^221[A-Z]?$"] == ["ALA", "LEU"] - @test [r.id.name for r in @residues pdbml model "1" chain "A" group All residue r"^221[A-Z]?$"] == ["ALA", "LEU"] + @test [r.id.name for r in select_residues(pdb, model="1", chain="A", residue=r"^221[A-Z]?$")] == ["ALA", "LEU"] + @test [r.id.name for r in select_residues(pdbml, model="1", chain="A", residue=r"^221[A-Z]?$")] == ["ALA", "LEU"] end @testset "1IGY: Insertions" begin @@ -202,15 +180,13 @@ pdb = read_file(txt(code), PDBFile) pdbml = read_file(xml(code), PDBML) - @test [r.id.name for r in @residues pdb model "1" chain "B" group All residue r"^82[A-Z]?$" ] == - ["LEU", "SER", "SER", "LEU"] - @test [r.id.name for r in @residues pdbml model "1" chain "B" group All residue r"^82[A-Z]?$" ] == - ["LEU", "SER", "SER", "LEU"] + @test [r.id.name for r in select_residues(pdb, model="1", chain="B", group=All, residue=r"^82[A-Z]?$")] == ["LEU", "SER", "SER", "LEU"] + @test [r.id.name for r in select_residues(pdbml, model="1", chain="B", group=All, residue=r"^82[A-Z]?$")] == ["LEU", "SER", "SER", "LEU"] - @test sum([r.id.group for r in @residues pdb model "1" chain "D" group All residue All] .== "HETATM") == - length(@residues pdb model "1" chain "D" group "HETATM" residue All) - @test sum([r.id.group for r in @residues pdb model "1" chain "D" group All residue All] .== "ATOM") == - length(@residues pdb model "1" chain "D" group "ATOM" residue All) + @test sum([r.id.group for r in select_residues(pdb, model="1", chain="D", group=All, residue=All)] .== "HETATM") == + length(select_residues(pdb, model="1", chain="D", group="HETATM", residue=All)) + @test sum([r.id.group for r in select_residues(pdb, model="1", chain="D", group=All, residue=All)] .== "ATOM") == + length(select_residues(pdb, model="1", chain="D", group="ATOM", residue=All)) end @testset "1HAG" begin @@ -224,10 +200,10 @@ @test unique([res.id.chain for res in pdbml]) == ["A", "B", "C", "D", "E"] # The chain E of PDB is the chain A of PDBML - @test [r.id.number for r in @residues pdb model "1" chain "E" group All residue r"^1[A-Z]?$"] == - [string(1, code) for code in vcat(collect('H':-1:'A'), "")] - @test [r.id.number for r in @residues pdbml model "1" chain "A" group All residue r"^1[A-Z]?$"] == - [string(1, code) for code in vcat(collect('H':-1:'A'), "")] + @test [r.id.number for r in select_residues(pdb, model="1", chain="E", group=All, residue=r"^1[A-Z]?$")] == + [string(1, code) for code in vcat(collect('H':-1:'A'), "")] + @test [r.id.number for r in select_residues(pdbml, model="1", chain="A", group=All, residue=r"^1[A-Z]?$")] == + [string(1, code) for code in vcat(collect('H':-1:'A'), "")] end @testset "1NSA" begin @@ -255,9 +231,8 @@ code = "1IAO" pdb = read_file(txt(code), PDBFile) pdbml = read_file(xml(code), PDBML) - - pdb_B = @residues pdb model "1" chain "B" group All residue All - pdbml_B = @residues pdbml model "1" chain "B" group All residue All + pdb_B = select_residues(pdb, model="1", chain="B") + pdbml_B = select_residues(pdbml, model="1", chain="B") for B in [pdb_B, pdbml_B] @test B[findall(r -> r.id.number == "2S", B)[1] + 1].id.number == "323P" diff --git a/test/Pfam/Pfam.jl b/test/Pfam/Pfam.jl index 0c36b6b0..4df45366 100644 --- a/test/Pfam/Pfam.jl +++ b/test/Pfam/Pfam.jl @@ -26,7 +26,7 @@ end pdb_file = joinpath(DATA, "2VQC.xml") msa = read_file(msa_file, Stockholm, generatemapping=true, useidcoordinates=true) cmap = msacolumn2pdbresidue(msa, "F112_SSV1/3-112", "2VQC", "A", "PF09645", sifts_file) - res = residuesdict(read_file(pdb_file, PDBML), "1", "A", "ATOM", All) + res = residuesdict(read_file(pdb_file, PDBML), model="1", chain="A", group="ATOM") # -45 20 pdb #.....QTLNSYKMAEIMYKILEK msa seq