You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@dongyoungy Where is the current algorithm(the path)? I have some idea but the problem is not clear enough for me. Is it the I/O time consuming or it is because of algorithm itself? I want to study the current algorithm first.
The logic to create different types of samples (i.e., uniform, stratified, universe) is implemented under CreateSampleQuery class (at least it will give you the starting point, you might need to look at other classes from there), which is located at /core/src/main/java/edu/umich/verdict/query/CreateSampleQuery.java
Possible improvements could be something like 1) removing steps of generating temp table for counting # of groups; and/or 2) revising sample generation query itself somehow for a better performance.
To implement 1), I guess we may
1.store group sizes data in memory (will the size of the table be a problem, or the data transfer slow?) 2.use nested queries. (I wonder whether 'group by' used in 'count' still achievable in later stratifying)
Is any of the two way tried and didn't work so I should skip trying? Is there other thoughts about what to try?
Currently, stratified samples require multiple passes to generate them and take significantly longer than other types of samples.
It is desirable to streamline current stratified sample generation procedure somehow for a faster sample generation.
The text was updated successfully, but these errors were encountered: