tweak doc

CVUA-RRW · Jun 14, 2024 · 6df91be · 6df91be
1 parent f09fec3
commit 6df91be
Show file tree

Hide file tree

Showing 5 changed files with 48 additions and 50 deletions.
diff --git a/docs/index.md b/docs/index.md
@@ -42,7 +42,7 @@ docker pull gregdenay/taxidtools
 
 With the [NCBI's taxdump files](https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/new_taxdump/) installed locally:
 
-```python
+``` py
 >>> import taxidTools as txd
 >>> ncbi = txd.read_taxdump('nodes.dmp', 'rankedlineage.dmp', 'merged.dmp')
 >>> tax.getName('9606')

diff --git a/docs/recipes/verify_blast.md b/docs/recipes/verify_blast.md
@@ -7,9 +7,9 @@ are in agreement with the expected composition of the sample to calculate to per
 of a method for example.
 
 First things first, let`s load the taxdump file in a Taxonomy object:
-```python
+``` py
 import taxidTools
-tax = read_taxdump("nodes.dmp", "rankedlineage.dmp", "merged.dmp")
+tax = taxidTools.read_taxdump("nodes.dmp", "rankedlineage.dmp", "merged.dmp")
 ```
 
 ## Getting a taxid for each sequence
@@ -26,7 +26,7 @@ Gallus gallus   species 0.9
 
 In order to work with these nodes later we want to create a list of Nodes from this output:
 
-```python
+``` py
 # This allows you to run the code in your interpreter
 # in practice you should parse the sintax output into a list of names
 names = ["Bos", "Gallus gallus"]
@@ -42,7 +42,7 @@ one of our sequences. BLAST can typically output taxids directly, otherwise get
 names like above. Let`s say we parsed our BLAST file in a list of list of taxids. Each element of
 the outer list is a list of hits for a single sequence:
 
-```python
+``` py
 res = [
     [9913, 9913, 72004],
     [9031, 9031]
@@ -52,9 +52,9 @@ res = [
 Ideally we would like to have a single assignement for each sequence. We can do this by assigning the last common ancestor 
 of all the hits for this sequence, or use a less stringent approach, like a majority agreement:
 
-```python
+``` py
 # Here we could also choose to use tax.lca() instead
-nodes = [tax.consensus(ids, 0.51) fir ids in res]
+nodes = [tax.consensus(ids, 0.51) for ids in res]
 ```
 
 We now have a single Node object for each sequence, neatly organized in a list!
@@ -65,7 +65,7 @@ In order to verify that our results are correct, we want to compare
 this list to a list of expected taxids, for example Bos taurus (cattle) and 
 Gallus gallus (chicken), bot at the species level:
 
-```python
+``` py
 expected = [9913, 9031] 
 ```
 
@@ -79,9 +79,8 @@ expected components. The smallest distance indicates the correponding expected c
 One has to keep in mind that different branches of the taxonomy can have a wildly different number of nodes,
 so it can greatly simplify things first normalize to taxonomy for such an approach:
 
-```python
-norm = tax.copy()
-norm.filterRanks(inplace=False)
+``` py
+norm = tax.filterRanks(inplace=False)
 
 distances = []
 for n in nodes:
@@ -95,14 +94,14 @@ index_corr = [d.index(min(d)) for d in distances]
 Now that we have a list which links each consensus to the index of its closest match in the list of 
 expected species, it is straightforward to determine the agreement rank between result and expectation:
 
-```python
-rank = []
+``` py
+ranks = []
 for i in range(len(nodes)):
-    rank.append(
+    ranks.append(
         tax.lca(
-            nodes[i].taxid,
-            expected[index_corr[i]]
-        )
+            [nodes[i].taxid,
+            expected[index_corr[i]]]
+        ).rank
     )
 ```
 
@@ -112,8 +111,8 @@ Let's say we want to determine these values at the genus resolution. The advanta
 the taxonomy earlier is that we don't need to care about the precise order of ranks in each branch,
 we can simply check wether the agreement rank in either of 'genus' or 'species':
 
-```python
-[1 if r in ['genus', 'species'] else 0 for r in ranks]
+``` py
+[True if r in ['genus', 'species'] else False for r in ranks]
 ```
 
 ### Unnormalized taxonomy
@@ -122,27 +121,27 @@ Of course it is possible to follow a similar approach without normalizing the ta
 significantly more complicated. For example checking wether *Bos taurus* (9913) consensus (here genus) is
 under the genus level involves determining the correpsonding expected node as before with the unnormalized taxonomy.
 
-```python
+``` py
 distances = [tax.distance(9913, e) for e in expected]
-# Getting the index of the minimum distance
 index_corr = distances.index(min(distances))
 agreement = expected[index_corr]
 ```
 
 Now instead of simply checking the rank of the agreement, we will rather determine the ancestor
 node of the expected species at the required resolution:
 
-```python
-lin = txd.getLineage(agreement) 
-target = lin.filter('genus')[0]
+``` py
+lin = tax.getAncestry(agreement)
+lin.filter(['genus'])
+target = lin[0]
 ```
 
 Now the last common ancestor of our result and the corresponding expected species is either
 an ancestor of `target`, in which case the result did not reach the expected resolution,
 or its descendant or the target itself, in which case the required resolution is attained:
 
-```python
-not tax.isAncestor(target, tax.lca(agreement, 9913))
+``` py
+not tax.isAncestorOf(target.taxid, tax.lca([agreement, 9913]))
 ```
 
 Note that in the last expression above we added `not` in order to have the results in the same form 

diff --git a/docs/usage/advanced.md b/docs/usage/advanced.md
@@ -22,7 +22,7 @@ for example.
 Should you want to keep a copy of the original Taxonomy (and the Nodes), you should 
 do a copy:
 
-```python
+``` py
 >>> import copy
 >>> backup = tax.copy()
 ```
@@ -33,7 +33,7 @@ Alternatively you can save the Taxonomy in JSON format for a later use (see next
 
 Determining a consensus node from a bunch of taxid can be done as easily as:
 
-```python
+``` py
 >>> tax.lca(['9606', '10090']).name  # Mice and men
 'Euarchontoglires'
 ```
@@ -43,7 +43,7 @@ frequencies of a bunch of taxids. You can set a minimal frequency threshold (bet
 As soon a a single node meets this threshold, it will be returned as a consensus. If this threshold is 
 not met with the given input, then the parents of the input will be considered, and so on.
 
-```python
+``` py
 >>> tax_list = ['9606']*6 + ['314146']*3 + ['4641']*8  # Mice and men and bananas
 >>> tax.consensus(tax_list, 0.51).name
 'Euarchontoglires'
@@ -55,7 +55,7 @@ not met with the given input, then the parents of the input will be considered,
 
 Distance between two nodes is straightforward to calculate:
 
-```python
+``` py
 >>> tax.distance('9606', '10090')
 18
 ```
@@ -69,7 +69,7 @@ If you don't care about part of the Taxonomy
 you can extract a subtree and/or filter the Taxonomy to keep only specific 
 ranks.
 
-```python
+``` py
 >>> tax.prune('40674') # mammals class
 >>> tax.filterRanks(['species', 'genus', 'family', 'order', 'class', 'phylum', 'kingdom'])
 >>> tax.getAncestry('9606')
@@ -85,7 +85,7 @@ to calculate internode distances or comparing Lineages. When requesting a rank
 which nodes are missing, these nodes will be replaced by a DummyNode.
 These special kind of nodes act as place-holders for non-existing nodes.
 
-```python
+``` py
 >>> tax.filterRanks(['species', 'subgenus', 'genus', 'family', 'order', 'class', 'phylum', 'kingdom'])
 >>> tax.getAncestry('9606')
 Lineage([Node(9606), DummyNode(AAeFFWcs), Node(9605), Node(9604), Node(9443), Node(40674), 
@@ -94,7 +94,7 @@ Node(7711), Node(33208), Node(1)])
 
 Note that the above methods **mutate** the nodes:
 
-```python
+``` py
 >>> tax.getParent('9606')
 DummyNode(AAeFFWcs)
 >>> tax.getRank('AAeFFWcs')
@@ -104,7 +104,7 @@ DummyNode(AAeFFWcs)
 The formatted Linaean taxonomy ranks can be retrieved from the utility function `linne()`
 for use in diverse methods:
 
-```python
+``` py
 >>> taxidTools.linne()
 ['species', 'genus', 'family', 'order', 'class', 'phylum', 'kingdom']
 >>> tax.filterRanks(taxidTools.linne())
@@ -118,7 +118,7 @@ As you probably already noticed, parsing the Taxonomy definition can
 take a couple of minutes. If you plan on regularly using a subset of the Taxonomy, 
 it can be beneficial to save a filtered version to a JSON file and to reload it later.
 
-```python
+``` py
 >>> tax.write("my_filtered_taxonomy.json")
 >>> new_tax = taxidTools.read_json("my_filtered_taxonomy.json")
 ```
@@ -128,7 +128,7 @@ it can be beneficial to save a filtered version to a JSON file and to reload it
 Creating a Taxonomy object can also be done without the Taxdump files.
 You can either manually create Nodes and build a Taxonomy from them:
 
-```python
+``` py
 >>> root = taxidTools.Node(taxid = 1, name = 'root', rank = 'root')
 >>> node1 = taxidTools.Node(taxid = 2, name = 'node1', rank = 'rank1', parent = root)
 >>> tax = taxidTools.Taxonomy.from_list([root, node1])
@@ -145,7 +145,7 @@ to create a parsing function to:
 Here is a boilerplate code for such a function, assuming that each node 
 is defined on a single line:
 
-```python
+``` py
 def custom_parser(file):
     # Create two empty dict that will store the node
     # information and parent information respectively

diff --git a/docs/usage/quickstart.md b/docs/usage/quickstart.md
@@ -14,13 +14,13 @@ using the Taxdump files is the easiest solution.
 
 Start by importing taxidTools:
 
-```python
+``` py
 >>> import taxidTools
 ```
 
 Then load the taxdump files that you saved and unpacked locally:
 
-```python
+``` py
 >>> tax = taxidTools.read_taxdump(
         "path/to/nodes.dmp", 
         "path/to/rankedlineage.dmp",
@@ -53,7 +53,7 @@ is refered to as root node and represents the top of the taxonomy.
 
 All these properties can be easily accessed, using the taxid number:
 
-```python
+``` py
 >>> tax.getName('9606')
 'Homo sapiens'
 >>> tax.getRank('9606')
@@ -67,7 +67,7 @@ Node(9605)
 It is also possible to etrieve the taxid number for a name. However be careful that
 this can lead to unexpected results if the names are not unique!
 
-```python
+``` py
 >>> tax.getTaxid('Homo sapiens')
 '9606'
 >>> tax.addNode(Node(taxid = 0, name = 'Homo sapiens'))
@@ -81,7 +81,7 @@ Actually the Taxonomy object is just a dictionnary of Nodes.
 You can access a Node object directly by passing its taxid as a key
 to a Taxonomy object and retrieve the Node properties:
 
-```python
+``` py
 >>> hs = tax.get('9606')
 >>> hs.name
 'Homo sapiens'
@@ -100,7 +100,7 @@ Node(9605)
 It is possible to test directly the relationships betwen two nodes.
 Note that a Node is neither an ancestor or descendant of itself.
 
-```python
+``` py
 >>> tax.isDescendantOf('9606', '9605')
 True
 >>> tax.isAncestorOf('9606', '9605')
@@ -113,7 +113,7 @@ It is also possible to retrieve the whole ancestry of a given node.
 Ancestries are stored in list-like Lineage objects, Nodes indices follow 
 the taxonomy order.
 
-```python
+``` py
 >>> lin = tax.getAncestry('9606')
 >>> lin[0]
 Node(9606)
@@ -123,7 +123,7 @@ Node(9606)
 
 It is possible to filter a Lineage for specific ranks:
 
-```python
+``` py
 >>> lin.filter(['genus', 'family'])
 >>> lin
 Lineage([Node(9605), Node(9604)])
@@ -132,7 +132,7 @@ Lineage([Node(9605), Node(9604)])
 This mutates the Lineage object, if you want to keep the object intact
 you should use list comprehensions to filter specific nodes:
 
-```python
+``` py
 >>> lin = tax.getAncestry('9606')
 >>> [node for node in lin if node.rank in ['genus', 'family']]
 [Node(9605), Node(9604)]

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -46,11 +46,10 @@ theme:
     - navigation.instant
     - navigation.tracking
     - navigation.tabs
-    - navigation.tabs.sticky
-    - navigation.sections
     - navigation.top
-    - toc.integrate
-    - content.code.annotate
+    - navigation.path
+    - toc.follow
+    - content.code.copy
 
 plugins:
   - search