Skip to content

Commit

Permalink
5.7.1: bug fixed on reported quartet score for multiind
Browse files Browse the repository at this point in the history
  • Loading branch information
smirarab committed Dec 18, 2019
1 parent 38fde4b commit 658ff3f
Show file tree
Hide file tree
Showing 6 changed files with 53 additions and 41 deletions.
Binary file renamed Astral.5.7.0.zip → Astral.5.7.1.zip
Binary file not shown.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
- version 5.7.1:
- Bug fix: `-t 0`, `-t 2`, `-t 4`, and `-t 8` reported wrong quartet score for multi-ind datasets. Fixed to give a warning instead of wrong score.

- version 5.7.0:
- **Important Bug Fix**: The normalized quartet score was incorrect for multi-individual gene trees with polytomies. Absolute quartet score was correct but normalizing factor was not

Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,52 +77,52 @@ INSTALLATION:
-----------
* There is no installation required to run ASTRAL.
* Download using one of two approaches:
* You simply need to download the [zip file](https://github.com/smirarab/ASTRAL/raw/master/Astral.5.7.0.zip) and extract the contents to a folder of your choice.
* You simply need to download the [zip file](https://github.com/smirarab/ASTRAL/raw/master/Astral.5.7.1.zip) and extract the contents to a folder of your choice.
* Alternatively, you can clone the [github repository](https://github.com/smirarab/ASTRAL/). You then run `make.sh` to build the project or simply uncompress the zip file that is included with the repository.
* ASTRAL is a java-based application, and should run in any environment (Windows, Linux, Mac, etc.) as long as java is installed.
Java 1.5 or later is required. We have tested ASTRAL only on Linux and MAC.
* To test your installation, go to the place where you put the uncompressed ASTRAL, and run:

``` bash
java -jar astral.5.7.0.jar -i test_data/song_primates.424.gene.tre
java -jar astral.5.7.1.jar -i test_data/song_primates.424.gene.tre
```

This should quickly finish. There are also other sample input files under `test_data/` that can be used.

* ASTRAL can be run from any directory (e.g., `/path/to/astral/`). Then, you just need to run:

``` bash
java -jar /path/to/astral/astral.5.7.0.jar
java -jar /path/to/astral/astral.5.7.1.jar
```

* Also, you can move `astral.5.7.0.jar` to any location you like and run it from there, but note that you need to move the `lib` directory with it as well.
* Also, you can move `astral.5.7.1.jar` to any location you like and run it from there, but note that you need to move the `lib` directory with it as well.


EXECUTION:
-----------
ASTRAL currently has no GUI. You need to run it through the command-line. In a terminal, go the location where you have downloaded the software, and issue the following command:

```
java -jar astral.5.7.0.jar
java -jar astral.5.7.1.jar
```

This will give you a list of options available in ASTRAL.

To find the species tree given a set of gene trees in a file called `in.tree`, use:

```
java -jar astral.5.7.0.jar -i in.tree
java -jar astral.5.7.1.jar -i in.tree
```

The results will be outputted to the standard output. To save the results in a file use the `-o` option (**Strongly recommended**):

```
java -jar astral.5.7.0.jar -i in.tree -o out.tre
java -jar astral.5.7.1.jar -i in.tree -o out.tre
```
To save the logs (**also recommended**), run:

```
java -jar astral.5.7.0.jar -i in.tree -o out.tre 2>out.log
java -jar astral.5.7.1.jar -i in.tree -o out.tre 2>out.log
```

###### Input:
Expand Down Expand Up @@ -155,7 +155,7 @@ Please refer to the [tutorial](astral-tutorial.md) for all other features, inclu
For big datasets (say more than 5000 taxa), increasing the memory available to Java can result in speedups. Note that you should give Java only as much free memory as you have available on your machine. So, for example, if you have 3GB of free memory, you can invoke ASTRAL using the following command to make all the 3GB available to Java:

```
java -Xmx3000M -jar astral.5.7.0.jar -i in.tree
java -Xmx3000M -jar astral.5.7.1.jar -i in.tree
```

Acknowledgment
Expand Down
48 changes: 24 additions & 24 deletions astral-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ ASTRAL currently has no GUI. You need to run it through command-line.
To see the help, issue the following command:

```
java -jar astral.5.7.0.jar
java -jar astral.5.7.1.jar
```

This will print the list of options available in ASTRAL. If no errors are printed, your ASTRAL installation is fine and you can proceed to the next sections.
Expand All @@ -60,13 +60,13 @@ This will print the list of options available in ASTRAL. If no errors are printe
We will next run ASTRAL on an input dataset. From the ASTRAL directory, run:

```
java -jar astral.5.7.0.jar -i test_data/song_mammals.424.gene.tre
java -jar astral.5.7.1.jar -i test_data/song_mammals.424.gene.tre
```

The results will be outputted to the standard output. To save the results in an output file use the `-o` option:

```
java -jar astral.5.7.0.jar -i test_data/song_mammals.424.gene.tre -o test_data/song_mammals.tre
java -jar astral.5.7.1.jar -i test_data/song_mammals.424.gene.tre -o test_data/song_mammals.tre
```

Here, the main input is just a file that contains all the input gene trees in Newick format. The input gene trees are treated as unrooted, whether or not they have a root. Note that the **output of ASTRAL should also be treated as an unrooted tree**.
Expand All @@ -79,7 +79,7 @@ The input gene trees can have polytomies (unresolved branches) since [version 4.
We will now run ASTRAL on a larger dataset. Run:

```
java -jar astral.5.7.0.jar -i test_data/100-simulated-boot
java -jar astral.5.7.1.jar -i test_data/100-simulated-boot
```

The input file here is a simulated dataset with 100 sequences and 100 replicates of bootstrapped gene trees for 25 loci (thus 2,500 input trees). Note that ASTRAL finishes on this dataset in a matter of seconds.
Expand All @@ -88,7 +88,7 @@ A larger real dataset from the [1kp](http://www.pnas.org/content/early/2014/10/2
424 genes from 103 species. Run:

```
java -jar astral.5.7.0.jar -i test_data/1KP-genetrees.tre -o test_data/1kp.tre 2> test_data/1kp.log
java -jar astral.5.7.1.jar -i test_data/1KP-genetrees.tre -o test_data/1kp.tre 2> test_data/1kp.log
```

This takes about a minute to run on a laptop. On this dataset, notice in the ASTRAL log information that it originally starts with 11043 clusters in its search space, and using heuristics implemented in ASTRAL-II, it increases the search space slightly to 11085 clusters. For more challenging datasets (i.e., more discordance or fewer genes) this number might increase a lot.
Expand All @@ -107,7 +107,7 @@ nw_ed 1KP-genetrees.tre 'i & b<=10' o > 1KP-genetrees-BS10.tre
To create a file `1KP-genetrees-BS10.tre` that includes the 1KP dataset with branches of 10% support or lower contracted. If you don't have newick utilities, don't worry. The contracted file is part of the ASTRAL distribution.

```
java -jar astral.5.7.0.jar -i test_data/1KP-genetrees-BS10.tre -o test_data/1kp-BS10.tre 2> test_data/1kp-bs10.log
java -jar astral.5.7.1.jar -i test_data/1KP-genetrees-BS10.tre -o test_data/1kp-BS10.tre 2> test_data/1kp-bs10.log
```

Compare the species tree generated here with that generated with the fully resolved gene trees. You can confirm that the tree topology has not changed in this case, but the branch lengths and the branch support have all changed (and that they tend to both increase). By comparing the log files you can also see that after contracting low support branches, the normalized quartet score increases to 0.92321 (from 0.89467 with no contraction). This is expected as low support branches tend to increase not decrease discordance.
Expand Down Expand Up @@ -166,7 +166,7 @@ ASTRAL outputs lots of useful information to your screen ([stderr](https://en.wi
by directing your stderr to a file. Capturing the log is highly recommended. Here is how you would capture stderr:

```
java -jar astral.5.7.0.jar -i test_data/song_mammals.424.gene.tre -o test_data/song_mammals.tre 2> song_mammals.log
java -jar astral.5.7.1.jar -i test_data/song_mammals.424.gene.tre -o test_data/song_mammals.tre 2> song_mammals.log
```

Here are some of the important information captured in the log:
Expand All @@ -187,7 +187,7 @@ You can use the `-q` option in ASTRAL to score an existing species tree to produ
To score a tree using ASTRAL, run:

```
java -jar astral.5.7.0.jar -q test_data/simulated_14taxon.default.tre -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_scored.tre 2> test_data/simulated_scored.log
java -jar astral.5.7.1.jar -q test_data/simulated_14taxon.default.tre -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_scored.tre 2> test_data/simulated_scored.log
```

This will score the species tree given in `test_data/simulated_14taxon.default.tre` with respect to the gene trees given in `test_data/simulated_14taxon.gene.tre`. It will output the following in the log:
Expand Down Expand Up @@ -237,16 +237,16 @@ Here is a description of various information that can be turned on by using `-t`
Run:

```
java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 2 -o test_data/1kp-scored-t2.tre
java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 2 -o test_data/1kp-scored-t2.tre
```
```
java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 4 -o test_data/1kp-scored-t4.tre
java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 4 -o test_data/1kp-scored-t4.tre
```
```
java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 8 -o test_data/1kp-scored-t8.tre
java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 8 -o test_data/1kp-scored-t8.tre
```
```
java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 10 -o test_data/1kp-scored-t8.tre
java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -t 10 -o test_data/1kp-scored-t8.tre
```
read all the values given for a couple of branches and try to make sense of them.

Expand All @@ -258,11 +258,11 @@ Our calculations of the local posterior probabilities and branch lengths use a Y
Run the following two commands and compare the lengths of the longest branches:

```
java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -c 2 -o test_data/1kp-scored-c2.tre
java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -c 2 -o test_data/1kp-scored-c2.tre
```

```
java -jar astral.5.7.0.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -c 0.001 -o test_data/1kp-scored-cs.tre
java -jar astral.5.7.1.jar -q test_data/1kp.tre -i test_data/1KP-genetrees.tre -c 0.001 -o test_data/1kp-scored-cs.tre
```

Note that setting lambda to 0 results in reporting ML estimates of the branch lengths instead of MAP. However, for branches with no discordance, we cannot compute a branch lengths. For these, we currently arbitrarily set ML to 10 coalescent units (we might change this in future versions).
Expand All @@ -284,7 +284,7 @@ To start multi-locus bootstrapping using ASTRAL, you need to provide the locatio
* Now run:

```
java -jar ../astral.5.7.0.jar -i song_mammals.424.gene.tre -b bs-files -o song_mammals.bootstrapped.astral.tre
java -jar ../astral.5.7.1.jar -i song_mammals.424.gene.tre -b bs-files -o song_mammals.bootstrapped.astral.tre
```

This will run 100 replicates of bootstrapping in addition to one run of ASTRAL on the main trees.
Expand Down Expand Up @@ -318,7 +318,7 @@ By default, ASTRAL performs 100 bootstrap replicates, but the `-r` option can be
For example,

```
java -jar ../astral.5.7.0.jar -i song_mammals.424.gene.tre -b bs-files -r 150 -o song_mammals.bootstrapped.150.astral.tre
java -jar ../astral.5.7.1.jar -i song_mammals.424.gene.tre -b bs-files -r 150 -o song_mammals.bootstrapped.150.astral.tre
```

will do 150 replicates. Note that your input gene tree bootstrap files need to have enough bootstrap replicates for the number of replicates requested using `-r`. For example, if you have `-r 150`, each file listed in `bs-files` should contain at least 150 bootstrap replicates.
Expand All @@ -329,7 +329,7 @@ will do 150 replicates. Note that your input gene tree bootstrap files need to h
ASTRAL performs site-only resampling by default (see [Seo, 2008](http://www.ncbi.nlm.nih.gov/pubmed/18281270)). ASTRAL can also perform gene+site resampling, which can be activated with the `-g` option:

```
java -jar ../astral.5.7.0.jar -i song_mammals.424.gene.tre -b bs-files -g -r 100 -o song_mammals.bootstrapped.gs.astral.tre
java -jar ../astral.5.7.1.jar -i song_mammals.424.gene.tre -b bs-files -g -r 100 -o song_mammals.bootstrapped.gs.astral.tre
```

Note that when you perform gene/site resampling, you need more gene tree replicates than the number of multi-locus bootstrapping replicates you requested using `-r`. For example, if you have `-g -r 100`, you might need 150 replicates for some genes (and less than 100 replicates for other genes). This is because when genes are resampled, some genes will be sampled more often than others by chance.
Expand All @@ -338,7 +338,7 @@ Note that when you perform gene/site resampling, you need more gene tree replica
ASTRAL can also perform gene-only bootstrapping using the `--gene-only` option. This form of bootstrapping requires only one input file, which is given using `-i`. Thus, for this, you don't need to use `-b`. The following performs bootstrapping by resampling genes in the input file:

```
java -jar ../astral.5.7.0.jar -i song_mammals.424.gene.tre --gene-only -o song_mammals.bootstrapped.go.astral.tre
java -jar ../astral.5.7.1.jar -i song_mammals.424.gene.tre --gene-only -o song_mammals.bootstrapped.go.astral.tre
```


Expand All @@ -354,13 +354,13 @@ ASTRAL has an exact and a heuristic version. The heuristic version solves the op
Since the mammalian dataset we have used so far has 37 taxa, the exact version cannot run on it. However, we have created a subset of this dataset that has all 9 primates, tree shrew, rat, rabbit, horse, and the sloth (a total of 14 taxa). We can run the exact version of ASTRAL on this reduced dataset. Run:

```
java -jar astral.5.7.0.jar -i test_data/song_primates.424.gene.tre -o test_data/song_primates.424.exact.tre -x
java -jar astral.5.7.1.jar -i test_data/song_primates.424.gene.tre -o test_data/song_primates.424.exact.tre -x
```

Using the `-x` option results in running the exact version of the ASTRAL algorithm. This run should finish in about 30 seconds. Now, we will run ASTRAL on the same input using the default heuristic algorithm:

```
java -jar astral.5.7.0.jar -i test_data/song_primates.424.gene.tre -o test_data/song_primates.424.default.tre
java -jar astral.5.7.1.jar -i test_data/song_primates.424.gene.tre -o test_data/song_primates.424.default.tre
```
This time, ASTRAL finished in under a second. So, is there a difference between the output of the exact and the heuristic version? Open up the two trees in your tree viewer tool and compare them. You will notice they are identical. You could also compare the scores outputted by ASTRAL and notice that they are identical.

Expand All @@ -376,13 +376,13 @@ We tried hard to find a subset of genes in the biological primates dataset where
Run

```
java -jar astral.5.7.0.jar -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_14taxon.default.tre
java -jar astral.5.7.1.jar -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_14taxon.default.tre
```

and then

```
java -jar astral.5.7.0.jar -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_14taxon.exact.tre -x
java -jar astral.5.7.1.jar -i test_data/simulated_14taxon.gene.tre -o test_data/simulated_14taxon.exact.tre -x
```

Now you see that the tree outputted by the exact version has a slightly higher score (4812=48.07% versus 4803=47.98%), and a slightly different topology compared to the heuristic version. Thus, in extreme cases (i.e., lots of ILS and/or gene tree estimation error and few available gene trees compared to the number of taxa), one could observe differences between the exact and heuristic versions. Note that how many genes should be considered few depends on the number of taxa you have, and also how much missing data there is.
Expand All @@ -401,7 +401,7 @@ impact on running time.
To expand the search space, you can run:

```
java -jar astral.5.7.0.jar -i test_data/simulated_primates_5X.10.gene.tre -o test_data/simulated_primates_5X.10.species.tre -e test_data/simulated_primates_5X.10.bootstrap.gene.tre
java -jar astral.5.7.1.jar -i test_data/simulated_primates_5X.10.gene.tre -o test_data/simulated_primates_5X.10.species.tre -e test_data/simulated_primates_5X.10.bootstrap.gene.tre
```
Here, the `-e` option is used to input a set of extra trees that ASTRAL uses to expand its search space. The file provided simply has 200 bootstrap replicates for each of the these 10 simulated genes.
A similar option `-f` can be used when input trees have species labels instead of gene labels (only consequential when for multi-individual datasets).
Expand All @@ -415,7 +415,7 @@ Miscellaneous
For big datasets (say more than 500 taxa) increasing the memory available to java might be necessary. Note that you should never give java more memory than what you have available on your machine. So, for example, if you have 4GB of free memory, you can invoke ASTRAL using the following command to make 3GB available to java:

```
java -Xmx3000M -jar astral.5.7.0.jar -i in.tree
java -Xmx3000M -jar astral.5.7.1.jar -i in.tree
```

### Other options
Expand Down
2 changes: 1 addition & 1 deletion main/phylonet/coalescent/CommandLine.java
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@
import com.martiansoftware.jsap.stringparsers.FileStringParser;

public class CommandLine{
protected static String _versinon = "5.7.0";
protected static String _versinon = "5.7.1";

protected static SimpleJSAP jsap;

Expand Down
23 changes: 16 additions & 7 deletions main/phylonet/coalescent/WQInference.java
Original file line number Diff line number Diff line change
Expand Up @@ -211,7 +211,7 @@ public double scoreSpeciesTreeWithGTLabels(Tree st, boolean initialize) {

Stack<STITreeCluster> stack = new Stack<STITreeCluster>();
long sum = 0l;

boolean poly = false;
for (TNode node: st.postTraverse()) {
if (node.isLeaf()) {
String nodeName = node.getName(); //GlobalMaps.TaxonNameMap.getSpeciesName(node.getName());
Expand Down Expand Up @@ -248,26 +248,35 @@ public double scoreSpeciesTreeWithGTLabels(Tree st, boolean initialize) {
}
System.err.println(" (polytomy)");*/
if (this.getBranchAnnotation() % 2 == 0) {
poly = true;
continue;
}
}

for (int i = 0; i < childbslist.size(); i++) {
for (int j = i+1; j < childbslist.size(); j++) {
for (int k = j+1; k < childbslist.size(); k++) {
Tripartition trip = new Tripartition(childbslist.get(i), childbslist.get(j), childbslist.get(k));
Long s = weightCalculator.getWeight(trip, null);
sum += s;
sum += weightCalculator.getWeight(
new Tripartition(childbslist.get(i), childbslist.get(j), childbslist.get(k)),
null);
}
}
}
}
}


System.err.println("Final quartet score is: " + sum/4l);
System.err.println("Final normalized quartet score is: "+ (sum/4l+0.)/this.maxpossible);
//System.out.println(st.toNewickWD());
if (poly) {
System.err.println("Final quartet score is: won't report because of the existense of polytomies and to save time. "
+ "To get the score run with -t 1 and you can score the tree below using -q. ");
System.err.println("Final normalized quartet score is: won't report because of the existense of polytomies and to save time. "
+ "To get the score run with -t 1 and you can score the tree below using -q. ");
} else {

System.err.println("Final quartet score is: " + sum/4l);
System.err.println("Final normalized quartet score is: "+ (sum/4l+0.)/this.maxpossible);
//System.out.println(st.toNewickWD());
}

if (this.getBranchAnnotation() == 0){
for (TNode n: st.postTraverse()) {
Expand Down

0 comments on commit 658ff3f

Please sign in to comment.