Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

New index driver based on https://github.com/RoaringBitmap/roaring #645

Open
kuba-- opened this issue Mar 26, 2019 · 6 comments
Open

New index driver based on https://github.com/RoaringBitmap/roaring #645

kuba-- opened this issue Mar 26, 2019 · 6 comments
Labels
proposal proposal for new additions or changes triage/product-input-needed

Comments

@kuba--
Copy link
Contributor

kuba-- commented Mar 26, 2019

Pilosa uses https://github.com/RoaringBitmap/roaring to implement bitmap index, but in our case pilosa comes with huge overhead (we've already got rid of server part).
Moreover, lot of syscalls in pilosa implementation caused some portability problems, e.g.: for mounted volumes in docker.
Last but not least, pilosa comes with long hierarchy of directories:
/index/field/view/fragment/storage,cache which has to be opened/closed/synced.
Maybe we can go down to the lower level and implement own bitmaps using https://github.com/RoaringBitmap/roaring
We don't use many pilosa features (which are mainly server oriented).
If we directly call roaring we can even get better performance, control parallel index creation and make all operations (And, Or, ...) also parallel (something what pilosa doesn't give us - roaring.ParAnd(nworkers, bmp1, bmp2))

@kuba-- kuba-- added the proposal proposal for new additions or changes label Mar 26, 2019
@kuba-- kuba-- changed the title Index driver based on https://github.com/RoaringBitmap/roaring New index driver based on https://github.com/RoaringBitmap/roaring Mar 26, 2019
@ajnavarro
Copy link
Contributor

@smola WDYT? I'm totally in to simplify our bitmap index implementation.

@smola
Copy link
Collaborator

smola commented Apr 3, 2019

@ajnavarro I'm all for it, but let's get the priority of index improvements first. Do we have any size estimation of this task? (e.g. 1, 2, 4, 8 weeks?)

@kuba--
Copy link
Contributor Author

kuba-- commented Apr 3, 2019

I would say 2 weeks. But because I always multiply by 1.4 (my error factor) I would say 2.8 ;)

@erizocosmico
Copy link
Contributor

Do we still want to do this given how little gitbase indexes have been used?

@ajnavarro
Copy link
Contributor

We can leave it here for someone that is interested to contribute.

@kuba--
Copy link
Contributor Author

kuba-- commented Oct 9, 2019

Personally I would paraphrase the issue, because originally the idea was to get rid of all server leftovers in pilosa and directly use underlaying bitmaps implementation.
But over the time, we don't benefit from fast merging feature (which bitmaps gives us) as much as we could. Moreover indexes mapping consume lot of space (and under the hood use boltdb b-trees).
To recap, it could be even better if we replace bitmap indexes by b-trees (what tidb did).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
proposal proposal for new additions or changes triage/product-input-needed
Projects
None yet
Development

No branches or pull requests

4 participants