core/filtermaps: two dimensional log filter data structure #30370

zsfelfoldi · 2024-08-29T08:36:46Z

This PR implements a new log filter data structure that is intended to replace core/bloombits.
It can also be considered as a pilot project for my EIP-7745 proposal:
https://github.com/zsfelfoldi/EIPs/blob/new-log-filter/EIPS/eip-7745.md
Note that this PR implements the filter structure proposed in the EIP but does not touch consensus. It implements the filter maps but not the tree hash structure. It also does not add pointers to headers and receipts, instead it stores block to log value pointers separately.
Regardless of whether and when EIP-7745 might get accepted, this PR provides immediate value to Geth users interested in logs as it should drastically speed up log search compared to bloombits which is not practically useless because of the overpopulated bloom filters. The EIP is mostly interesting for light client friendliness.

MariusVanDerWijden · 2024-10-07T05:12:57Z

Do you have some numbers about the performance of the filtermaps? (size, lookup speed, generation speed, etc)

zsfelfoldi · 2024-10-14T08:42:28Z

Do you have some numbers about the performance of the filtermaps? (size, lookup speed, generation speed, etc)

I measured indexing and unindexing time for the entire chain history and I also saved the log where the index size was 2.350.000 blocks which is the currently proposed default setting:

INFO [10-10|12:15:26.044] Reverse log indexing in progress         maps=51857 history=2,350,607 processed=2,350,000 remaining=18,583,940 elapsed=1h59m54.391s
INFO [10-10|20:18:56.474] Reverse log indexing finished            maps=240,264 history=20,936,958 processed=20,933,940 elapsed=10h3m24.820s
INFO [10-10|21:33:21.991] Log unindexing finished                  maps=1       history=1          removed=20,937,327 elapsed=4m1.752s

Database size growth is hard to measure exactly because of compaction (or the lack of it), doing a full indexing after a full unindexing my db size grew 57Gb but it would probably be bigger when done on a freshly synced database. A starting point to do some estimations is that each map consists of 4096 rows which are 64 bytes long on average, stored under consecutive keys so probably a low db overhead per entry. So the entire history log should be about 58.6 Gb plus db overhead while the recommended 2.350.000 blocks (one year) history should be about 12.7 Gb plus db overhead. Also note that this PR removes the old bloombits db which is about 5-6 Gb.

The log search performance depends on what we are searching for, I chose a more difficult but pretty common scenario where some of the search values appear very frequently while the overall pattern happens 40 times throughout the chain history. It's a WETH transaction, the filter pattern is for one address and 3 topics.

var options = {
   fromBlock: 19924000,
   toBlock: 20924000,
   address: ["0xC02aaA39b223FE8D0A0e5C4F27eAD9083C756Cc2"],
   topics: [["0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef"],["0x000000000000000000000000b05c9b5a0ce5d2e12fbd678d7fe34bec7d14414e"],["0x000000000000000000000000f3de3c0d654fda23dad170f0f320a92172509127"]],
};
var filter = web3.eth.filter(options);
filter.get(function(error, log) {
   console.log(JSON.stringify(log));
});

I did the test for 1M blocks, 10M blocks and the entire chain history, both with and without indexing:

Recent 1M blocks:

INFO [10-09|01:38:34.923] Performed indexed log search             begin=19,924,000 end=20,924,000 "true matches"=38 "false positives"=0 elapsed=251.910ms
INFO [10-10|09:56:09.803] Performed unindexed log search           begin=19,924,000 end=20,924,000 matches=38 elapsed=1m8.051s

Recent 10M blocks:

INFO [10-09|01:38:55.740] Performed indexed log search             begin=10,924,000 end=20,924,000 "true matches"=38 "false positives"=0 elapsed=977.532ms
INFO [10-10|10:09:02.963] Performed unindexed log search           begin=10,924,000 end=20,924,000 matches=38 elapsed=4m57.828s

Entire history:

INFO [10-10|21:28:40.395] Performed indexed log search             begin=0 end=20,937,304 "true matches"=40 "false positives"=0 elapsed=1.901s
INFO [10-10|21:45:03.971] Performed unindexed log search           begin=0 end=20,937,354 matches=40 elapsed=6m16.716s

zsfelfoldi requested review from lightclient, karalabe, holiman and rjl493456442 as code owners August 29, 2024 08:36

zsfelfoldi added the status:in-progress label Aug 29, 2024

zsfelfoldi changed the title ~~core/filtermaps: two dimensional log filter (WIP)~~ core/filtermaps: two dimensional log filter data structure (WIP) Aug 29, 2024

zsfelfoldi force-pushed the log-filter branch from c924f0d to e6b037f Compare September 1, 2024 15:56

zsfelfoldi force-pushed the log-filter branch 2 times, most recently from 9a05680 to 9ad34e5 Compare September 15, 2024 23:43

zsfelfoldi force-pushed the log-filter branch from 5a12b94 to c6c805f Compare September 22, 2024 09:44

zsfelfoldi added 3 commits September 25, 2024 23:36

core/filtermaps: two dimensional log filter

2047650

core/filtermaps: use rawdb.ReadRawReceipts

61d04dc

core/filtermaps: add filtermaps tests

8fe1504

zsfelfoldi force-pushed the log-filter branch from c6c805f to a8aa689 Compare September 25, 2024 21:37

zsfelfoldi added 2 commits September 26, 2024 03:21

core/filtermaps: safe concurrent index update and search

27652f2

core/filtermaps: revert to legacy filter in case of "match all" search

c04968b

zsfelfoldi force-pushed the log-filter branch from a8aa689 to c04968b Compare September 26, 2024 01:49

core/bloombits, eth/filters: removed bloombits

f187df1

zsfelfoldi requested review from fjl and gballet as code owners September 27, 2024 01:50

zsfelfoldi added 6 commits September 27, 2024 12:38

core/filtermaps: remove bloombits database

348c6f0

core/filtermaps: added history.logs parameter

bf2d00d

core/filtermaps: moved math stuff to separate file, added Params

77318f1

core/filtermaps: add indexer test

b73ed9c

core/filtermaps: fixed tail pointer bug, added more failing checks

db83e03

core/filtermaps: fixed map pruning

94c869e

zsfelfoldi force-pushed the log-filter branch from 5691a10 to 94c869e Compare September 30, 2024 01:06

zsfelfoldi added 2 commits October 1, 2024 04:45

core/filtermaps: use unindexed search as a fallback

ee9caee

eth/filters: fixed tests, added more

d5f2af2

zsfelfoldi force-pushed the log-filter branch from 4a6a232 to c592bbf Compare October 3, 2024 01:00

zsfelfoldi added 2 commits October 3, 2024 17:17

core/filtermaps: added license text

5c17d79

core/filtermaps: added more tests

28cdf15

zsfelfoldi force-pushed the log-filter branch from c592bbf to 28cdf15 Compare October 3, 2024 15:18

zsfelfoldi added 5 commits October 3, 2024 17:57

core/filtermaps: trigger undindexing after 1000 blocks

37d9fd5

core/filtermaps: improved unindexer

455dd20

core/filtermaps: nice log info during indexing/unindexing

2d8fc05

core/filtermaps: simplified locking scheme

9dbcb1d

core/filtermaps: fixed comment

b63fce0

zsfelfoldi changed the title ~~core/filtermaps: two dimensional log filter data structure (WIP)~~ core/filtermaps: two dimensional log filter data structure Oct 6, 2024

zsfelfoldi removed the status:in-progress label Oct 6, 2024

zsfelfoldi added 2 commits October 6, 2024 23:01

core/filtermaps: ensure 8 byte alignment of struct fields

665ff3e

core/filtermaps, eth/filters: fixed linter issues

026d498

Thompson1985 approved these changes Oct 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core/filtermaps: two dimensional log filter data structure #30370

core/filtermaps: two dimensional log filter data structure #30370

zsfelfoldi commented Aug 29, 2024

MariusVanDerWijden commented Oct 7, 2024

zsfelfoldi commented Oct 14, 2024

core/filtermaps: two dimensional log filter data structure #30370

Are you sure you want to change the base?

core/filtermaps: two dimensional log filter data structure #30370

Conversation

zsfelfoldi commented Aug 29, 2024

MariusVanDerWijden commented Oct 7, 2024

zsfelfoldi commented Oct 14, 2024