Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core/filtermaps: two dimensional log filter data structure #30370

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

zsfelfoldi
Copy link
Contributor

This PR implements a new log filter data structure that is intended to replace core/bloombits.
It can also be considered as a pilot project for my EIP-7745 proposal:
https://github.com/zsfelfoldi/EIPs/blob/new-log-filter/EIPS/eip-7745.md
Note that this PR implements the filter structure proposed in the EIP but does not touch consensus. It implements the filter maps but not the tree hash structure. It also does not add pointers to headers and receipts, instead it stores block to log value pointers separately.
Regardless of whether and when EIP-7745 might get accepted, this PR provides immediate value to Geth users interested in logs as it should drastically speed up log search compared to bloombits which is not practically useless because of the overpopulated bloom filters. The EIP is mostly interesting for light client friendliness.

@zsfelfoldi zsfelfoldi changed the title core/filtermaps: two dimensional log filter (WIP) core/filtermaps: two dimensional log filter data structure (WIP) Aug 29, 2024
@zsfelfoldi zsfelfoldi force-pushed the log-filter branch 2 times, most recently from 9a05680 to 9ad34e5 Compare September 15, 2024 23:43
@zsfelfoldi zsfelfoldi changed the title core/filtermaps: two dimensional log filter data structure (WIP) core/filtermaps: two dimensional log filter data structure Oct 6, 2024
@MariusVanDerWijden
Copy link
Member

Do you have some numbers about the performance of the filtermaps? (size, lookup speed, generation speed, etc)

@zsfelfoldi
Copy link
Contributor Author

Do you have some numbers about the performance of the filtermaps? (size, lookup speed, generation speed, etc)

I measured indexing and unindexing time for the entire chain history and I also saved the log where the index size was 2.350.000 blocks which is the currently proposed default setting:

INFO [10-10|12:15:26.044] Reverse log indexing in progress         maps=51857 history=2,350,607 processed=2,350,000 remaining=18,583,940 elapsed=1h59m54.391s
INFO [10-10|20:18:56.474] Reverse log indexing finished            maps=240,264 history=20,936,958 processed=20,933,940 elapsed=10h3m24.820s
INFO [10-10|21:33:21.991] Log unindexing finished                  maps=1       history=1          removed=20,937,327 elapsed=4m1.752s

Database size growth is hard to measure exactly because of compaction (or the lack of it), doing a full indexing after a full unindexing my db size grew 57Gb but it would probably be bigger when done on a freshly synced database. A starting point to do some estimations is that each map consists of 4096 rows which are 64 bytes long on average, stored under consecutive keys so probably a low db overhead per entry. So the entire history log should be about 58.6 Gb plus db overhead while the recommended 2.350.000 blocks (one year) history should be about 12.7 Gb plus db overhead. Also note that this PR removes the old bloombits db which is about 5-6 Gb.

The log search performance depends on what we are searching for, I chose a more difficult but pretty common scenario where some of the search values appear very frequently while the overall pattern happens 40 times throughout the chain history. It's a WETH transaction, the filter pattern is for one address and 3 topics.

var options = {
   fromBlock: 19924000,
   toBlock: 20924000,
   address: ["0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2"],
   topics: [["0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef"],["0x000000000000000000000000b05c9b5a0ce5d2e12fbd678d7fe34bec7d14414e"],["0x000000000000000000000000f3de3c0d654fda23dad170f0f320a92172509127"]],
};
var filter = web3.eth.filter(options);
filter.get(function(error, log) {
   console.log(JSON.stringify(log));
});

I did the test for 1M blocks, 10M blocks and the entire chain history, both with and without indexing:

Recent 1M blocks:

INFO [10-09|01:38:34.923] Performed indexed log search             begin=19,924,000 end=20,924,000 "true matches"=38 "false positives"=0 elapsed=251.910ms
INFO [10-10|09:56:09.803] Performed unindexed log search           begin=19,924,000 end=20,924,000 matches=38 elapsed=1m8.051s

Recent 10M blocks:

INFO [10-09|01:38:55.740] Performed indexed log search             begin=10,924,000 end=20,924,000 "true matches"=38 "false positives"=0 elapsed=977.532ms
INFO [10-10|10:09:02.963] Performed unindexed log search           begin=10,924,000 end=20,924,000 matches=38 elapsed=4m57.828s

Entire history:

INFO [10-10|21:28:40.395] Performed indexed log search             begin=0 end=20,937,304 "true matches"=40 "false positives"=0 elapsed=1.901s
INFO [10-10|21:45:03.971] Performed unindexed log search           begin=0 end=20,937,354 matches=40 elapsed=6m16.716s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants