select_atoms, select_residues, isresidue, residuesdict

diegozea · Jun 25, 2024 · 151548c · 151548c · diegozea · Jun 25, 2024
1 parent fda198d
commit 151548c
Show file tree

Hide file tree

Showing 10 changed files with 88 additions and 115 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -1,5 +1,25 @@
 ## MIToS.jl Release Notes
 
+### Changes from v2.19.0 to v2.20.0
+
+* *[Breaking change]* The PDB module has deprecated `residues` and `@residues` in favor of
+  the `select_residues` function that uses keyword arguments. 
+  So, `residues(pdb, "1", "A", "ATOM", All)` or `@residues pdb "1" "A" "ATOM" All` should be
+  replaced by `select_residues(pdb, model="1", chain="A", group="ATOM")`.
+
+* *[Breaking change]* The PDB module has deprecated `atoms` and `@atoms` in favor of
+  the `select_atoms` function that uses keyword arguments. 
+  So, `atoms(pdb, "1", "A", "ATOM", All, "CA")` or `@atoms pdb "1" "A" "ATOM" All "CA"` should be
+  replaced by `select_atoms(pdb, model="1", chain="A", group="ATOM", atom="CA")`.
+
+* *[Breaking change]* The PDB module has deprecated the methods of the `isresidue` and 
+  `residuesdict` functions that rely on positional arguments in favor of the keyword arguments.
+  So, `isresidue(pdb, "1", "A", "ATOM", "10")` should be replaced by 
+  `isresidue(pdb, model="1", chain="A", group="ATOM", residue="10")`. Similarly,
+  `residuesdict(pdb, "1", "A", "ATOM", All)` should be replaced by 
+  `residuesdict(pdb, model="1", chain="A", group="ATOM")`.
+
+
 ### Changes from v2.18.0 to v2.19.0
 
 * *[Breaking change]* The `shuffle` and `shuffle!` functions are deprecated in favor of the 

diff --git a/Project.toml b/Project.toml
@@ -1,6 +1,6 @@
 name = "MIToS"
 uuid = "51bafb47-8a16-5ded-8b04-24ef4eede0b5"
-version = "2.19.0"
+version = "2.20.0"
 
 [deps]
 ArgParse = "c7e460c6-2fb9-53a9-8c5b-16f535851c63"

diff --git a/docs/src/PDB.md b/docs/src/PDB.md
@@ -81,42 +81,35 @@ CA_1ivo[1] # First residue. It has only the α carbon.
 MIToS parse PDB files to vector of residues, instead of using a hierarchical structure
 like other packages. This approach makes the search and selection of residues or atoms a
 little different.
-To make it easy, this module exports a number of functions and macros to select particular
-residues or atoms. Given the fact that residue numbers from different chains, models, etc.
-can collide, **it's mandatory to indicate the `model`, `chain`, `group`, `residue` number
-and `atom` name in a explicit way** to these functions or macros. If you want to select all
-the residues in one of the categories, you are able to use the type `All`. You can also use
-regular expressions or functions to make the selections.
+To make it easy, this module exports the `select_residues` and `select_atoms` functions. 
+Given the fact that residue numbers from different chains, models, etc. can collide, we
+can indicate the `model`, `chain`, `group`, `residue` number and `atom` name using the 
+keyword arguments of those functions. If you want to select all the residues in one of the 
+categories, you are able to use the type `All` (this is the default value of such arguments).
+You can also use regular expressions or functions to make the selections.
 
 ```@example pdb_select
 using MIToS.PDB
 pdbfile = downloadpdb("1IVO", format=PDBFile)
 residues_1ivo = read_file(pdbfile, PDBFile)
-# Select residue number 9 from model 1 and chain B
-residues(residues_1ivo, "1", "B", All, "9")
+# Select residue number 9 from model 1 and chain B (it looks in both ATOM and HETATM groups)
+select_residues(residues_1ivo, group="1", chain="B", residue="9")
 ```
 
 ### Getting a `Dict` of `PDBResidue`s
 
 If you prefer a `Dict` of `PDBResidue`, indexed by their residue numbers, you can use the
-`residuedict` function or the `@residuedict` macro.  
+`residuedict` function.  
 
 ```@example pdb_select
 # Dict of residues from the model 1, chain A and from the ATOM group
-chain_a = residuesdict(residues_1ivo, "1", "A", "ATOM", All)
+chain_a = residuesdict(residues_1ivo, model="1", chain="A", group="ATOM")
 chain_a["9"]
-```  
-
-You can do the same with the macro `@residuesdict` to get a more readable code  
-
-```@example pdb_select
-chain_a = @residuesdict residues_1ivo model "1" chain "A" group "ATOM" residue All
-chain_a["9"]
-```  
+```
 
 ### Select particular residues  
 
-Use the `residues` function to collect specific residues. It's possible to use a single
+Use the `select_residues` function to collect specific residues. It's possible to use a single
 **residue number** (i.e. `"2"`) or even a **function** which should return true for the
 selected residue numbers. Also **regular expressions** can be used to select residues.
 Use `All` to select all the residues.  
@@ -130,7 +123,7 @@ residue_list = map(string, 2:5)
 ```
 
 ```@example pdb_select
-first_res = residues(residues_1ivo, "1", "A", "ATOM", resnum -> resnum in residue_list)
+first_res = select_residues(residues_1ivo, model="1", chain="A", group="ATOM", residue=resnum -> resnum in residue_list)
 
 for res in first_res
     println(res.id.name, " ", res.id.number)
@@ -142,7 +135,7 @@ A more complex example using an anonymous function:
 ```@example pdb_select
 # Select all the residues of the model 1, chain A of the ATOM group with residue number less than 5
 
-first_res = residues(residues_1ivo, "1", "A", "ATOM", x -> parse(Int, match(r"^(\d+)", x)[1]) <= 5 )
+first_res = select_residues(residues_1ivo, model="1", chain="A", group="ATOM", residue=x -> parse(Int, match(r"^(\d+)", x)[1]) <= 5 )
 # The anonymous function takes the residue number (string) and use a regular expression
 # to extract the number (without insertion code).
 # It converts the number to `Int` to test if the it is `<= 5`.
@@ -152,35 +145,17 @@ for res in first_res
 end
 ```
 
-Use the `@residues` macro for a cleaner syntax.  
-
-```@example pdb_select
-# You can use All, regular expressions or functions also for model, chain and group:
-
-# i.e. Takes the residue 10 from chains A and B
-
-for res in @residues residues_1ivo model "1" chain ch -> ch in ["A","B"] group "ATOM" residue "10"
-    println(res.id.chain, " ", res.id.name, " ", res.id.number)
-end
-```
-
 ### Select particular atoms
 
-The `atoms` function or macro allow to select a particular set of atoms.
+The `select_atoms` function allow to select a particular set of atoms.
 
 ```@example pdb_select
 # Select all the atoms with name starting with "C" using a regular expression
 # from all the residues of the model 1, chain A of the ATOM group
 
-carbons = @atoms residues_1ivo model "1" chain "A" group "ATOM" residue All atom r"C.+"
+carbons = select_atoms(residues_1ivo, model="1", chain="A", group="ATOM", residue=All, atom=r"C.+")
 
 carbons[1]
-```  
-
-You can also use the `atoms` function instead of the `@atoms` macro:  
-
-```@example pdb_select
-atoms(residues_1ivo, "1", "A", "ATOM", All, r"C.+")[1]
 ```
 
 ## Protein contact map
@@ -202,7 +177,7 @@ pdbfile = downloadpdb("1IVO", format=PDBFile)
 
 residues_1ivo = read_file(pdbfile, PDBFile)
 
-pdb = @residues residues_1ivo model "1" chain "A" group "ATOM" residue All
+pdb = select_residues(residues_1ivo, model="1", chain="A", group="ATOM")
 
 dmap = distance(pdb, criteria="All") # Minimum distance between residues using all their atoms
 ```
@@ -256,8 +231,8 @@ pdbfile = downloadpdb("2HHB")
 
 res_2hhb = read_file(pdbfile, PDBML)
 
-chain_A = pdb = @residues res_2hhb model "1" chain "A" group "ATOM" residue All
-chain_C = pdb = @residues res_2hhb model "1" chain "C" group "ATOM" residue All
+chain_A = select_residues(res_2hhb, model="1", chain="A", group="ATOM", residue=All)
+chain_C = select_residues(res_2hhb, model="1", chain="C", group="ATOM", residue=All)
 
 using Plots
 gr()

diff --git a/docs/src/cookbook/02_Linking_structural_and_evolutionary_information.jl b/docs/src/cookbook/02_Linking_structural_and_evolutionary_information.jl
@@ -106,7 +106,7 @@ Hx = mapcolfreq!(entropy,
 # functions from the MIToS `PDB` module:
 
 using MIToS.PDB
-res_dict = residuesdict(read_file(pdb_file, PDBFile, occupancyfilter=true), "1", "A") # model 1 chain A
+res_dict = residuesdict(read_file(pdb_file, PDBFile, occupancyfilter=true), model="1", chain="A")
 
 # Then, we can iterate the mapping dictionary to link the MSA and PDB based
 # values:

diff --git a/scripts/Distances.jl b/scripts/Distances.jl
@@ -86,7 +86,7 @@ set_parallel(Args["parallel"])
         model_arg = string(args["model"]) == "All" ? All : string(args["model"])
         chain_arg = string(args["chain"]) == "All" ? All : string(args["chain"])
         group_arg = string(args["group"]) == "All" ? All : string(args["group"])
-        res = residues(res, model_arg, chain_arg, group_arg, All)
+        res = select_residues(res, model=model_arg, chain=chain_arg, group=group_arg, residue=All)
         N = length(res)
         inter = !Bool(args["inter"])
         for i in 1:(N-1)

diff --git a/src/PDB/PDB.jl b/src/PDB/PDB.jl
@@ -41,10 +41,12 @@ export  # PDBResidues
         contact,
         isresidue,
         isatom,
+        select_residues,
         residues,
         @residues,
         residuesdict,
         @residuesdict,
+        select_atoms,
         atoms,
         @atoms,
         findheavy,

diff --git a/test/PDB/Contacts.jl b/test/PDB/Contacts.jl
@@ -236,7 +236,8 @@
 
         code = "2VQC"
         pdb = read_file(txt(code), PDBFile)
-        residues = @residues pdb model "1" chain "A" group "ATOM" residue x -> x in ["62","64","65"]
+        residues = select_residues(pdb, model="1", chain="A", group="ATOM", 
+            residue = x -> x in ["62","64","65"])
 
         @test contact(residues, 6.05) == ( [1 1 0
                                             1 1 1

diff --git a/test/PDB/Kabsch.jl b/test/PDB/Kabsch.jl
@@ -107,10 +107,10 @@
 
         hemoglobin = read_file(joinpath(DATA, "2hhb.pdb.gz"),PDBFile,group="ATOM",model="1")
 
-        α1 = @residues hemoglobin model "1" chain "A" group "ATOM" residue All
-        α2 = @residues hemoglobin model "1" chain "C" group "ATOM" residue All
-        β1 = @residues hemoglobin model "1" chain "B" group "ATOM" residue All
-        β2 = @residues hemoglobin model "1" chain "D" group "ATOM" residue All
+        α1 = select_residues(hemoglobin, model="1", chain="A", group="ATOM")
+        α2 = select_residues(hemoglobin, model="1", chain="C", group="ATOM")
+        β1 = select_residues(hemoglobin, model="1", chain="B", group="ATOM")
+        β2 = select_residues(hemoglobin, model="1", chain="D", group="ATOM")
 
         a1, a2, rα = superimpose(α1, α2)