-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asteroid on non-randomly missing data #5
Comments
Hi Léo-Paul,
That's a very good question. Although we don't specifically address this
case in our paper, that's actually a very good example for which Asteroid
should work quite well, compared with tools that suffer from systematic
biases such as Astrid.
I would expect that when missing data gets more systematic, tools such as
Astrid gets even worse, but Asteroid should not be affected much: the lack
of data (= less information) will always be a problem for any tool, but
there should not be any systematic bias due to missing data with Asteroid.
Have you tried running it on such a dataset? I am always happy to know if
our approaches work (or not :)) well on empirical datasets.
I hope this helps,
Benoit
Le mar. 13 juin 2023 à 23:36, Léo-Paul Dagallier ***@***.***>
a écrit :
… Hi Benoit,
Thanks for Asteroid, looks a very promising tool!
This is not an issue on the program, but more a question.
From what I understand of the paper, Asteroid performs well with high
proportion of data that is missing because of a stochastic process of data
deletion (in the case of simulated datasets) or data absence (in the case
of empirical datasets).
Do you have any idea of the performance of Asteroid in case data is
non-randomly missing?
For example, in case where a dataset combines a few species represented by
a lot of genes (e.g. phylogenomic dataset) with a lot of species
represented by a few genes (e.g. sanger sequencing/barcode data) (see e.g.
https://doi.org/10.1093/molbev/msad109).
Did you tried to simulate missing data in a non random manner?
I'm curious to know whether Asteroid would perform similarly well with
high levels of non-random missing data.
Thanks,
Léo-Paul
—
Reply to this email directly, view it on GitHub
<#5>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJJ3UITFKVENRAWXY52YQLXLDMPDANCNFSM6AAAAAAZFO3H7A>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Hi Benoit, |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hi Benoit,
Thanks for Asteroid, looks a very promising tool!
This is not an issue on the program, but more a question.
From what I understand of the paper, Asteroid performs well with high proportion of data that is missing because of a stochastic process of data deletion (in the case of simulated datasets) or data absence (in the case of empirical datasets).
Do you have any idea of the performance of Asteroid in case data is non-randomly missing?
For example, in case where a dataset combines a few species represented by a lot of genes (e.g. phylogenomic dataset) with a lot of species represented by a few genes (e.g. sanger sequencing/barcode data) (see e.g. https://doi.org/10.1093/molbev/msad109).
Did you tried to simulate missing data in a non random manner?
I'm curious to know whether Asteroid would perform similarly well with high levels of non-random missing data.
Thanks,
Léo-Paul
The text was updated successfully, but these errors were encountered: