Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a best effort heap-wide object property slot cache #1289

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

svaarala
Copy link
Owner

@svaarala svaarala commented Jan 14, 2017

Add a heap-wide property slot cache:

  • Lookup key is a combination of duk_hobject * and a key string hash.
  • Lookup value is a possible property slot (integer).

Property lookup changes:

  • If object has hash part, use the hash part without using the slot cache.
  • Otherwise, look up the slot cache using object/key.
  • Validate the returned slot index: check that it's within the object's property table, and check that the key matches. If so, use the slot as is.
  • Otherwise do the usual linear scan to find the key. When found, overwrite the property slot cache entry with the slot index.

The upsides of this approach are that:

  • Unlike hash tables, it emphasizes actually looked up object/key pairs. No upfront work is done like with hash tables.
  • There is no GC impact in ensuring the entries are valid (compare to the relatively tricky generation-based validity handling in the property cache prototype). The slot index is always just a best guess, and is always validated.
  • The slot cache entry is just an integer. If hash table size limit is 256 properties, the integer can be only 8 bits. Compared to the property cache approach (where the entry is 16+ bytes) this allows a much larger lookup for the same memory cost.

There are downsides too:

  • If an application has a hot path accessing a few object/key pairs repeatedly, and they happen to map to the same lookup entry, performance will suffer because linear scans will happen for both objects. This is not very likely, but still possible, and there's no easy defense against it except maybe switching the lookup index computation from time to time.
  • If an object is very large, and most of its properties are accessed continuously, it takes initial linear scans for each property to populate the cache (and the entries may later be overwritten) which is expensive compared to a hash table which is a dedicated structure. So, a property/slot cache is not a substitute for a hash table in general.

See also: #1284 (comment).

Tasks:

  • Rebase after hash algorithm merge
  • Add the slot cache logic
  • Add minimal stats when debug enabled, log in mark-and-sweep for example
  • Skip the slotcache lookup/overwrite for tiny objects (say <= 4 properties) because scanning is just as fast and reduces slot cache traffic
  • If the minimum slotcache limit is e.g. 4 and hash part limit is also relatively low (say 8) the slotcache will have minimal impact in the default configuration and can maybe be disabled; it would still be useful for low memory targets where hash parts can be globally disabled but it may be possible to use e.g. a 256-byte slotcache instead (it's fixed size and thus easy to manage)
  • Configuration changes, zero size disabled slotcache
  • Low memory config could include a small cache
  • Internal documentation
  • Releases entry

@svaarala
Copy link
Owner Author

The current object property part layout is good for linearly scanning keys (which are kept in a packed array) but when a slot cache or hash part lookup succeeds the layout is not ideal because multiple cache lines get fetched. For example, for a successful slot cache lookup:

  • The slot cache lookup fetches a cache line.
  • The key, value, and attributes cause separate cache lines to be fetched.

Depending on cache pressure this may or may not have a significant impact. For hot paths the lines usually stay in memory so this is not a big issue for the most part, but it's still not ideal.

For desktop targets it might be better to place the key, value, and attributes into a single record structure and make the property table an array of those. This is not the most memory efficient approach but would be the most cache friendliest when a lookup is done using a hash table, the slot cache, etc, instead of linear scanning of keys. For low memory targets the packed structure with no overhead should still be available because the padding losses add up and are significant when working with very low amounts of RAM.

Added a bullet point to #1196 for this question.

@svaarala
Copy link
Owner Author

svaarala commented Jan 17, 2017

Some thoughts on whether a slot cache could replace hash tables in general.

The main reason that wouldn't work well with the pull as it is now is that only properties that actually get accessed get cached. So if a large object has N properties (say 1000) and they all get accessed in sequence, each of them will involve one linear scan followed by caching. The linear scans will take 1000 * 500 = 500k key comparisons, 500 on average.

That could be avoided as follows: when doing a linear scan, insert all keys scanned into the slot cache. So, for example, if one were to scan and find the desired key at index 591, keys at indices 0-590 would also be inserted into the slot cache. This would eliminate some of the scans (but still perform poorly if the properties were accessed in increasing property slot index). The downside is that a lot of entries also get overwritten; to work well, the slot cache must be large enough to make this a non-issue in practice.

Related approach: all new keys written would always get inserted into the slot cache, and when an object is resized, the compacted key list would be inserted into the slot cache during the resize.

This would work well if the slot cache is larger than the effective working set of properties being manipulated so that, on average, slot cache entries wouldn't get continually overwritten and thus again needing too many linear scans. The slot cache could be relatively large in practice because all hash tables avoided by using it could be allocated towards the slot cache instead; the slot cache might need to have a dynamic size to work well.

Even a few linear scans would be a problem for a huge object (say 1M properties) that is constructed once and then looked up a large number of times; a linear scan would cost on average 500k key comparisons for such an object.

So, it might be possible to avoid an explicit hash table for medium size objects (say < 1000 properties) by just using a much larger slot cache which could be paid for by the memory freed by not using hash tables for many objects, and maybe using a dynamic slot cache. But for extremely large objects this would still not work very well; objects with > 1000 properties are not very common but still occur from time to time in e.g. some temporary tracking data structures in algorithms.

@svaarala
Copy link
Owner Author

svaarala commented Jan 17, 2017

The reason why it'd be nice to eliminate per-object hash tables is to remove hash table management from the object model. A dynamically sized property/slot cache would avoid upfront cost and react to actual property read/write patterns. So far I don't know how to achieve that without any per-object state because a shared structure always experiences some entry overwrites which causes linear key scans, and that breaks down as a viable model for very large (think 1M properties) objects. This is a shame because the hash table for 1M properties would be around 8MB in size, and a slot cache of that size can hold quite a lot of property slot indices.

@svaarala svaarala force-pushed the global-object-hash-index branch from 254d18c to b460231 Compare January 20, 2017 01:51
@svaarala svaarala force-pushed the global-object-hash-index branch 2 times, most recently from 9960349 to 2f76846 Compare January 31, 2017 00:48
@svaarala svaarala force-pushed the global-object-hash-index branch from 2f76846 to 6bcc886 Compare June 21, 2017 21:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant