Rework object hash part algorithm #1284

svaarala · 2017-01-13T01:02:43Z

Instead of a prime and a MOD, use a bitmask and 2^N sized hash part.

Tasks:

Change hash sizing to 2^N, use bitwise mask
Move hash parameters to config options
Remove hashprime utility, as it is no longer needed
Config option changes
Default parameters for hash size limit and hash sizing (current hash limit 4 in pull is probably too low)
Low memory parameters for hash size limit etc (very low memory targets don't have a hash part so these don't apply)
Releases entry

Follow-ups:

Reconsider step handling: earlier large step, now +1 (more cache friendly but more clustering)
Maybe a good place to add small hash tables (8-bit or 16-bit) which would allow a much smaller load factor at the same memory cost

svaarala · 2017-01-13T22:00:21Z

Just considering property lookups, a hash table always pays off, even for very small objects:

test-prop-read-1024.js              : duk.O2.prophash  3.15 duk.O2.master  3.53
test-prop-read-16.js                : duk.O2.prophash  3.14 duk.O2.master  4.08
test-prop-read-256.js               : duk.O2.prophash  3.12 duk.O2.master  4.09
test-prop-read-32.js                : duk.O2.prophash  3.13 duk.O2.master  3.56
test-prop-read-4.js                 : duk.O2.prophash  3.17 duk.O2.master  3.32
test-prop-read-48.js                : duk.O2.prophash  3.12 duk.O2.master  3.53
test-prop-read-64.js                : duk.O2.prophash  3.12 duk.O2.master  3.50
test-prop-read-8.js                 : duk.O2.prophash  3.15 duk.O2.master  3.60

However, this is only the case when reading the same properties repeatedly, which ignores the cost of creating and maintaining the hash table over resizes (which matters for much practical code).

I'll run some more performance tests but a limit of 4 (= create hash table if object has 4 properties or more) seems too low. The good default is probably between say 6 and 12; I'll run some more tests to see what seems to work best. I'll also make the limit configurable via config options so it can be more easily tweaked than currently.

svaarala · 2017-01-13T22:26:28Z

Here's a concrete example of code behaving exactly the opposite as the property read tests:

test-object-literal.js              : duk.O2.proplimit4  3.12 duk.O2.proplimit6  3.14 duk.O2.proplimit8  3.10 duk.O2.master  2.86

The test case creates an object literal with 20 properties with the value immediately thrown away. The object hash table limit is 32 in master, so it avoids the overhead of creating a hash table. With limit values 4, 6, 8, the result is naturally slower because there's the overhead of creating (a never used) hash table.

Real application code is somewhere between these two extremes: hash tables have a cost to set up, but also benefit accesses if there are more than just a few over time.

svaarala · 2017-01-13T23:04:15Z

@fatcerberus I think I asked about this before but I don't remember what it came to -- but do you think it would be possible to arrange some sort of headless Minisphere build which could "run through the motions" for some example game? I can run such a thing with a display available (I think this was a blocker before) but it'd probably be best if it didn't spend most of the time drawing stuff.

The reason I ask is that there are currently no useful application benchmarks in the set of automated tests, so I'm trying to figure out what application benchmarks to use to improve that part of commit test coverage (and hopefully lead to good merge decisions :-). Some current tests and ideas are:

Sunspider: it's obsolete, and probably not a very accurate application benchmark for modern out-of-browser Javascript.
Google's V8 benchmark is useful, but again doesn't necessarily emphasize actual application behavior very well.
Kraken benchmark is useful.
Running a large Emscripten compiled C program would be an interesting benchmark, but it's most likely quite one sided in what features it stress tests.
The Typescript compiler would be one interesting case also.

Anyway, Minisphere would maybe be a useful test target and would also provide you useful information about builds and their impact on Minisphere.

fatcerberus · 2017-01-13T23:55:34Z

I don't think it's possible to run minisphere headlessly because its first action on startup is to create an Allegro display, which in turn needs to initialize OpenGL. When I was first implementing my Node.js-compatible require() system, I experimented with making the graphics, audio, etc. components be lazy-loaded native modules (like in Node), but scrapped the idea because I figured almost all games will end up having to require them anyway. Instead I ended up designing the core set of bindings to be as low-level as possible and built a set of easy-to-use JS modules on top of that.

For TypeScript in particular: Cell, minisphere's SCons-inspired compiler, also uses Duktape. So that would actually be a bit easier to automate than minisphere itself since it's just a matter of providing a Cellscript and then running cell from the command line.

Anyway, I can look into mocking something up where minisphere runs the Spectacles battle engine as a "smoke test" of sorts for Duktape. Currently battles require player input, but it shouldn't be too difficult to set an AI to control the player characters. I designed my AI framework to be quite flexible in that regard :)

The Specs battle engine should be pretty decent coverage since it does a lot of different things: Damage/healing calculations, calling into C (for the Sphere API), tons of stuff with first-class functions (i.e. the "from" query module), etc.

svaarala · 2017-01-13T23:57:40Z

Right, I now have some physical hosts to run automated tests on, so it's no longer a problem if an OpenGL context gets created. But for the test result to make sense, ideally most of the execution time (say 30-50% at least) would be in script execution. Sort of "warp mode".

Also if there's a concept of a "frame time", measuring the frame time over some automated run might give a useful indication. This might be workable even if OpenGL output is enabled.

fatcerberus · 2017-01-14T00:07:45Z

For transpilation, a useful performance test would be to have a Cellscript that looks like this:

const minify    = require('minify');
const transpile = require('transpile');

describe("SpecsMark 2017",
{
	version: 1,
	author: "Fat Cerberus",
	resolution: '320x200',
	main: 'scripts/main.js',
	// etc.
});

var scripts = transpile('tmp/transpiled/', files('src/*.js', true));
minify('@/scripts/', scripts);

install('@/images/', files('images/*.png', true));
// etc.

transpile() would run all scripts through an ES7 -> ES5 transformation using Babel, and minify() runs the output through the Babili minifier.

fatcerberus · 2017-01-14T00:08:54Z

Regarding frame time: minisphere has system.now() which returns the number of frames processed (including skipped) since the game started running. Is that what you mean?

svaarala · 2017-01-14T00:10:55Z

I mean more that when an individual frame is processed (if that concept applies to Minisphere - usually it does for game engines :-) how much time (on average, or cumulatively) each frame takes. If there's a high resolution time source available, cumulative frame processing time would be a useful measure if it can be computed so that graphics operations are excluded.

fatcerberus · 2017-01-14T00:13:43Z

Ah, I see. I actually removed all the "wall clock" timing in minisphere 4.3 in favor of a "frame perfect" API (all durations in the API are specified in frames), since wall-clock timing is more vulnerable to game lag. The engine times its frames internally (so it knows how long it can sleep between frames), but that information is not exposed to game code.

fatcerberus · 2017-01-14T00:15:58Z

By the way, maybe we should open a separate issue to discuss this so we don't spam the object hash pull too much?

svaarala · 2017-01-14T00:18:45Z

Sounds good, opened #1288.

svaarala · 2017-01-14T02:55:10Z

Google benchmark, maximum score for 5 runs:

1.3.1: 229
1.5.0: 234
2.0.0: 272
master: 293
hash limit 2: 304
hash limit 4: 305
hash limit 6: 310
hash limit 8: 309
hash limit 10: 306
hash limit 12: 310
hash limit 14: 309
hash limit 16: 307
hash limit 32: 306

The hash limit doesn't affect the score very strongly (maybe 6-12 scores slightly higher, but can't really be sure). It's quite likely the test doesn't use a lot of large objects so it doesn't really shed much light on choosing a good hash limit. What's interesting though is that regardless of the hash limit this branch gets better scores than master. I can't think of any other reason than code layout effects (and code being smaller in general).

fatcerberus · 2017-01-14T03:26:15Z

Intuitively, I'd expect the typical pattern for real-world code to be that small objects with only data properties are likely to be thrown away quickly after reading one or two values from them (compound return values, e.g.), with larger objects more likely to be long-lived and accessed repeatedly--especially if those objects contain any function properties.

svaarala · 2017-01-14T04:00:01Z

I was also thinking about the typical objects that occur, and some basic categories I could think of:

Small temporary objects whose values are read ~once: argument object literals, compound return values like you said.
Small permanent objects like an object instance for a logger, socket, or something. They may contain anywhere between 1-10 properties, with some of them accessed for roughly every operation. Also small inheritance parents are like that.
Large temporary objects, for example a "visited keys" structure for some tree walk. Written and read a lot but thrown away quickly.
Large permanent objects like big constant tables (for example, some string-to-number conversion), Math object, prototype objects in general.

There is a lot more relevant nuance beyond these of course. For example, some objects are read heavy, others are write heavy, etc.

I added a note to #1196 that it would be nice if the hash structure was spawned only if the object is actually operated on a lot. There are ways to do that e.g. using a probabilistic check so that no actual count tracking or similar would be needed.

Other useful places where the hash table could be spawned are e.g. when an object is frozen, or when an object is set as a prototype of another object.

So there's a lot of scope to make better hash table decisions. I'll try to stick to the hash algorithm and parameters here :-)

Also, I added a task item for hash tables whose entries are smaller than the full 32 bits. Right now a hash table contains 32-bit entries, but that's pretty wasteful for an object of, say, 200 properties, because the entries could be 8-bit integers instead. So for desktop environments where footprint is not critical, supporting 8-bit, 16-bit, and 32-bit hash tables (or maybe just 8+32 or 16+32) would allow a much smaller load factor (and less collisions) for same memory cost. But I'll probably work on that in a separate pull.

svaarala · 2017-01-14T21:50:26Z

Some work related to this pull is the prototype property cache pull: if that works well, a hash part becomes less critical and could be reserved for actually large objects for which the cache doesn't work well because the cost of a miss and a full key scan is high.

Another property cache related idea I have is to use a best effort property slot cache which is sloppier but in some ways easier/cheaper to manage:

Maintain a heap-wide table of property slot indices: duk_uint16_t slotcache[4096] for example.
When doing a property entry lookup for an object with no hash part, compute a lookup index as (object_pointer ^ string_hash) % 4096.
The value provides a potential property slot index. Validate that index against the current object by comparing the lookup key and the key at that property slot (validating the index against the current property table size of course).
If the lookup is valid, we can safely use the property slot because we've validated the object and the key before using the slot.
If the lookup is not valid, i.e. the slot doesn't exist or contains a different key, continue normally. This can happen for a variety of reason, e.g. a collision in the index space, property deletion, or maybe the object property table has been resized or compacted.
When the actual lookup has been done, overwrite the slot cache entry so that a repeated lookup will now be valid.

This should work relatively well for a few reasons:

It emphasizes caching of actually looked up object/key pairs. No upfront work is done for maintaining hash tables in advance.
There's no GC impact, i.e. no need to explicitly or implicitly invalidate entries due to object changes, because the slot index is just tentative anyway.
The slot cache entry is just an integer (16 bits should be enough and even 8 bits might be enough if objects >= 256 entries in size get a hash part) which is much more dense than the prototype property cache entry which has 4 separate fields.

A property/slot cache is still not a replacement for a hash part for large objects: if an object is large, and most of its properties are accessed over and over again, it takes a lot of linear scans to populate the cache as compared to O(1) lookups from the hash. Maintaining the hash table is cheaper than re-populating the cache at least for very large objects.

I'll prototype this in a separate branch. It may be a valid alternative to the (more complicated) property cache pull because the property cache entry is larger, and requires careful invalidation which can be tricky to get right.

Make the hash algorithm simpler by using a bit mask rather than a modulus for probing the hash. Make the hash part load factor lower than before to reduce clustering. Low memory environments disable hash part support anyway, so this doesn't impact them.

svaarala added footprint performance labels Jan 13, 2017

svaarala added this to the v2.1.0 milestone Jan 13, 2017

svaarala force-pushed the rework-object-hash branch from 77ae636 to 7279e06 Compare January 13, 2017 01:04

svaarala mentioned this pull request Jan 13, 2017

Misc improvements to internal data structures #1196

Open

20 tasks

svaarala mentioned this pull request Jan 14, 2017

Add a best effort heap-wide object property slot cache #1289

Open

9 tasks

svaarala force-pushed the rework-object-hash branch 2 times, most recently from 95c2f69 to 75e397f Compare January 14, 2017 23:24

svaarala added 7 commits January 15, 2017 01:54

Config options for hobject resize controls

f64a509

Dist/configure fix, hashprime removal

b3cc695

Remove genhashsizes.py utility as unneeded

1c7be20

Add perf tests to exercise object hash performance

8ab1f58

Internal doc updates for object hash

6f58367

Releases: object hash part

51f7b3b

svaarala force-pushed the rework-object-hash branch from 75e397f to 51f7b3b Compare January 15, 2017 00:00

svaarala merged commit 299efb0 into master Jan 15, 2017

svaarala deleted the rework-object-hash branch January 15, 2017 01:21

svaarala mentioned this pull request Feb 11, 2017

Misc object property handling improvements #1364

Open

13 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework object hash part algorithm #1284

Rework object hash part algorithm #1284

svaarala commented Jan 13, 2017 •

edited

Loading

svaarala commented Jan 13, 2017

svaarala commented Jan 13, 2017

svaarala commented Jan 13, 2017 •

edited

Loading

fatcerberus commented Jan 13, 2017

svaarala commented Jan 13, 2017

fatcerberus commented Jan 14, 2017

fatcerberus commented Jan 14, 2017

svaarala commented Jan 14, 2017

fatcerberus commented Jan 14, 2017

fatcerberus commented Jan 14, 2017

svaarala commented Jan 14, 2017

svaarala commented Jan 14, 2017 •

edited

Loading

fatcerberus commented Jan 14, 2017

svaarala commented Jan 14, 2017 •

edited

Loading

svaarala commented Jan 14, 2017

Rework object hash part algorithm #1284

Rework object hash part algorithm #1284

Conversation

svaarala commented Jan 13, 2017 • edited Loading

svaarala commented Jan 13, 2017

svaarala commented Jan 13, 2017

svaarala commented Jan 13, 2017 • edited Loading

fatcerberus commented Jan 13, 2017

svaarala commented Jan 13, 2017

fatcerberus commented Jan 14, 2017

fatcerberus commented Jan 14, 2017

svaarala commented Jan 14, 2017

fatcerberus commented Jan 14, 2017

fatcerberus commented Jan 14, 2017

svaarala commented Jan 14, 2017

svaarala commented Jan 14, 2017 • edited Loading

fatcerberus commented Jan 14, 2017

svaarala commented Jan 14, 2017 • edited Loading

svaarala commented Jan 14, 2017

svaarala commented Jan 13, 2017 •

edited

Loading

svaarala commented Jan 13, 2017 •

edited

Loading

svaarala commented Jan 14, 2017 •

edited

Loading

svaarala commented Jan 14, 2017 •

edited

Loading