Skip to content
This repository has been archived by the owner on Nov 22, 2018. It is now read-only.

Query planner should use an index if available for any conditional #184

Open
twitchyliquid64 opened this issue Jul 12, 2017 · 9 comments
Open

Comments

@twitchyliquid64
Copy link
Collaborator

./ql 'create table t(thing int, t time);'
./ql 'CREATE INDEX xt_thing ON t(t);'
./ql 'explain SELECT * FROM t WHERE thing = 3 AND t = now();'

┌Iterate all rows of table "t"
└Output field names ["thing" "t"]
┌Filter on thing == 3 && t == now()
│Possibly useful indices
│CREATE INDEX xt_thing ON t(thing);
└Output field names ["thing" "t"]

The query planner should see that there is an index on the column t and use that instead of iterating all rows of table t.

The current query planner will only use a index if it is non-composite, and where it is the first conditional in the WHERE clause.

@cznic
Copy link
Owner

cznic commented Jul 12, 2017

The current query planner will only use a index if it is non-composite, and where it is the first conditional in the WHERE clause.

IIRC, that's correct. It's definitely feasible to use multiple indices, but it's not trivial. I've never even tried to start exploring that path.

I'll provide full support to anyone willing to implement this. It would improve QL a lot.

@twitchyliquid64
Copy link
Collaborator Author

Crossing indexes is definitely non-trivial.

What I was hoping to capture in this issue was that, in worst case, at least one index should be used if possible. In this case, t has an index but it is not used. It is only used if the binOp with t is the first binop in the WHERE clause. That seems like a bug - query use should occur regardless of the location of the op in the clause.

If I get familar enough with the planner, I'll look at rewriting some sections - there are a good number of quirks in it, and its roughly the same layout as an interpreter, whose design i am somewhat familiar with.

@cznic
Copy link
Owner

cznic commented Jul 12, 2017

That seems like a bug - query use should occur regardless of the location of the op in the clause.

Definitely possible.

If I get familar enough with the planner, I'll look at rewriting some sections - there are a good number of quirks in it, and its roughly the same layout as an interpreter, whose design i am somewhat familiar with.

Sounds like a reasonable approach.

@twitchyliquid64 twitchyliquid64 self-assigned this Jul 13, 2017
@twitchyliquid64
Copy link
Collaborator Author

Current state:

  • Index planning magic starts in whereRset.planExpr(), for a binary operation it handed off to planBinOp().
  • If an indexable condition is detected, the expression node is pushed to the current query plan through the plan.filter() method, if a optimization is possible with the given node, a new plan is returned.
  • Indexes are only used if:
    • The LHS is id() and the RHS is a literal. (logic in planBinOp())
    • The WHERE conditional is a IS [NOT] NULL on an indexed column (filterIsNull())
    • The WHERE conditional is unary on a boolean, indexed column.
    • The query is composed of binary AND logic at the root, and can be re-written into pieces that meet the previous criteria. (basically unreadable code at ql.go:393)

The TL;DR of this logic is that index will only be used in a small subset of applicable cases.

Proposed Changes:

type searchBound struct {
  map[string][]columnCondition //maps column name to a set of conditions
}

type columnCondition struct {
  minValue expr
  maxValue expr
}

type whereConditional []searchBound

The proposed implementation will scan the logic in the WHERE clause, generating a distinct list of []searchBound's. A search bound can be though as a set of conditionals which are logically ORed with the other searchBound's, and are non overlapping. As search bounds are non-overlapping they can be considered to come from different sources (considering them different sources will eliminate the need for query rewriting).

Note that all WHERE clauses should be able to be simplified into this binOP OR binOP OR binOP two-layered form at runtime, so getting rid of the expr form simplifies determining bounds expressions.

whereConditional objects and columnCondition objects can be symbolically simplified prior to runtime in two cases:

  • Overlapping bounds (2 columnCondition objects folded into one)
  • Boolean algebra rules mean that the filters can be re-written into a form where less []searchBounds are needed for a given query, and/or indexes are available to support the query.

For an index to be used for a given []searchBound an index must be present which indexes all columns in the searchBound.

At runtime, the []columnCondition can be evaluated into a single min+max for each column.

Query plans can be cached in a LRU cache if the intensiveness of this algorithm is of concern. All DBMS's do this.

Thoughts?

@twitchyliquid64
Copy link
Collaborator Author

Additionally, this algorithm would go a long way to getting things ready for composite indexes in #183

@cznic
Copy link
Owner

cznic commented Jul 13, 2017

Thoughts?

SGTM.

FTR: The interval package was intended for improving the planner but I never got so far as to actually use it. Maybe you will find it usefull.

@twitchyliquid64
Copy link
Collaborator Author

SGTM.

Are you a Googler by chance? All the googlers say that. :P

I've started a new branch 'planner2' to work on this, gimme a few weeks.

@cznic
Copy link
Owner

cznic commented Jul 15, 2017

Are you a Googler by chance? All the googlers say that. :P

They do and I learned it from them. But I'm not a Googler.

I've started a new branch 'planner2' to work on this, gimme a few weeks.

Any time you need to make it happen is the right time ;-)

@twitchyliquid64
Copy link
Collaborator Author

They do and I learned it from them. But I'm not a Googler.

You've got me very curious, but I won't probe :)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants