Time Complexity of the Algorithm? #250

r-luo · 2018-02-22T21:03:40Z

Hi, I'm trying out macrobase on a dataset with ~15 million rows and ~30 columns. I selected three columns to try out first but it's taking forever to run. So I'm wondering if there's any information about time complexity for the algorithm.

fabuzaid21 · 2018-02-23T16:14:22Z

Hi @madcarrot, thanks for using MacroBase! Could you give us some information on:

the version (commit hash) of MacroBase you're running on
the interface you're using (the UI, the SQL shell, or the CLI runner)
the minimum support and minimum ratio metric you're using in your query (you may not know this if you're using the UI)
the number of distinct elements in those 3 columns that you selected initially

MacroBase uses a variant of the APriori algorithm in its query engine. The time complexity is linear in the number of rows; while it can be combinatorial in the number of columns, we do a lot of pruning by ignoring low-frequency combinations of column values during execution of the algorithm.

If you only selected three columns to run on, then I'm surprised that it's taking so long to run; that's why I'm wondering if maybe it's simply an old version of the code. Let us know, and we'll figure out what's going on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Time Complexity of the Algorithm? #250

Time Complexity of the Algorithm? #250

r-luo commented Feb 22, 2018

fabuzaid21 commented Feb 23, 2018 •

edited

Loading

Time Complexity of the Algorithm? #250

Time Complexity of the Algorithm? #250

Comments

r-luo commented Feb 22, 2018

fabuzaid21 commented Feb 23, 2018 • edited Loading

fabuzaid21 commented Feb 23, 2018 •

edited

Loading