Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Complexity of the Algorithm? #250

Open
r-luo opened this issue Feb 22, 2018 · 1 comment
Open

Time Complexity of the Algorithm? #250

r-luo opened this issue Feb 22, 2018 · 1 comment

Comments

@r-luo
Copy link

r-luo commented Feb 22, 2018

Hi, I'm trying out macrobase on a dataset with ~15 million rows and ~30 columns. I selected three columns to try out first but it's taking forever to run. So I'm wondering if there's any information about time complexity for the algorithm.

@fabuzaid21
Copy link
Contributor

fabuzaid21 commented Feb 23, 2018

Hi @madcarrot, thanks for using MacroBase! Could you give us some information on:

  • the version (commit hash) of MacroBase you're running on
  • the interface you're using (the UI, the SQL shell, or the CLI runner)
  • the minimum support and minimum ratio metric you're using in your query (you may not know this if you're using the UI)
  • the number of distinct elements in those 3 columns that you selected initially

MacroBase uses a variant of the APriori algorithm in its query engine. The time complexity is linear in the number of rows; while it can be combinatorial in the number of columns, we do a lot of pruning by ignoring low-frequency combinations of column values during execution of the algorithm.

If you only selected three columns to run on, then I'm surprised that it's taking so long to run; that's why I'm wondering if maybe it's simply an old version of the code. Let us know, and we'll figure out what's going on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants