-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework object hash part algorithm #1284
Conversation
77ae636
to
7279e06
Compare
Just considering property lookups, a hash table always pays off, even for very small objects:
However, this is only the case when reading the same properties repeatedly, which ignores the cost of creating and maintaining the hash table over resizes (which matters for much practical code). I'll run some more performance tests but a limit of 4 (= create hash table if object has 4 properties or more) seems too low. The good default is probably between say 6 and 12; I'll run some more tests to see what seems to work best. I'll also make the limit configurable via config options so it can be more easily tweaked than currently. |
Here's a concrete example of code behaving exactly the opposite as the property read tests:
The test case creates an object literal with 20 properties with the value immediately thrown away. The object hash table limit is 32 in master, so it avoids the overhead of creating a hash table. With limit values 4, 6, 8, the result is naturally slower because there's the overhead of creating (a never used) hash table. Real application code is somewhere between these two extremes: hash tables have a cost to set up, but also benefit accesses if there are more than just a few over time. |
@fatcerberus I think I asked about this before but I don't remember what it came to -- but do you think it would be possible to arrange some sort of headless Minisphere build which could "run through the motions" for some example game? I can run such a thing with a display available (I think this was a blocker before) but it'd probably be best if it didn't spend most of the time drawing stuff. The reason I ask is that there are currently no useful application benchmarks in the set of automated tests, so I'm trying to figure out what application benchmarks to use to improve that part of commit test coverage (and hopefully lead to good merge decisions :-). Some current tests and ideas are:
Anyway, Minisphere would maybe be a useful test target and would also provide you useful information about builds and their impact on Minisphere. |
I don't think it's possible to run minisphere headlessly because its first action on startup is to create an Allegro display, which in turn needs to initialize OpenGL. When I was first implementing my Node.js-compatible For TypeScript in particular: Cell, minisphere's SCons-inspired compiler, also uses Duktape. So that would actually be a bit easier to automate than minisphere itself since it's just a matter of providing a Cellscript and then running cell from the command line. Anyway, I can look into mocking something up where minisphere runs the Spectacles battle engine as a "smoke test" of sorts for Duktape. Currently battles require player input, but it shouldn't be too difficult to set an AI to control the player characters. I designed my AI framework to be quite flexible in that regard :) The Specs battle engine should be pretty decent coverage since it does a lot of different things: Damage/healing calculations, calling into C (for the Sphere API), tons of stuff with first-class functions (i.e. the "from" query module), etc. |
Right, I now have some physical hosts to run automated tests on, so it's no longer a problem if an OpenGL context gets created. But for the test result to make sense, ideally most of the execution time (say 30-50% at least) would be in script execution. Sort of "warp mode". Also if there's a concept of a "frame time", measuring the frame time over some automated run might give a useful indication. This might be workable even if OpenGL output is enabled. |
For transpilation, a useful performance test would be to have a Cellscript that looks like this: const minify = require('minify');
const transpile = require('transpile');
describe("SpecsMark 2017",
{
version: 1,
author: "Fat Cerberus",
resolution: '320x200',
main: 'scripts/main.js',
// etc.
});
var scripts = transpile('tmp/transpiled/', files('src/*.js', true));
minify('@/scripts/', scripts);
install('@/images/', files('images/*.png', true));
// etc.
|
Regarding frame time: minisphere has |
I mean more that when an individual frame is processed (if that concept applies to Minisphere - usually it does for game engines :-) how much time (on average, or cumulatively) each frame takes. If there's a high resolution time source available, cumulative frame processing time would be a useful measure if it can be computed so that graphics operations are excluded. |
Ah, I see. I actually removed all the "wall clock" timing in minisphere 4.3 in favor of a "frame perfect" API (all durations in the API are specified in frames), since wall-clock timing is more vulnerable to game lag. The engine times its frames internally (so it knows how long it can sleep between frames), but that information is not exposed to game code. |
By the way, maybe we should open a separate issue to discuss this so we don't spam the object hash pull too much? |
Sounds good, opened #1288. |
Google benchmark, maximum score for 5 runs:
The hash limit doesn't affect the score very strongly (maybe 6-12 scores slightly higher, but can't really be sure). It's quite likely the test doesn't use a lot of large objects so it doesn't really shed much light on choosing a good hash limit. What's interesting though is that regardless of the hash limit this branch gets better scores than master. I can't think of any other reason than code layout effects (and code being smaller in general). |
Intuitively, I'd expect the typical pattern for real-world code to be that small objects with only data properties are likely to be thrown away quickly after reading one or two values from them (compound return values, e.g.), with larger objects more likely to be long-lived and accessed repeatedly--especially if those objects contain any function properties. |
I was also thinking about the typical objects that occur, and some basic categories I could think of:
There is a lot more relevant nuance beyond these of course. For example, some objects are read heavy, others are write heavy, etc. I added a note to #1196 that it would be nice if the hash structure was spawned only if the object is actually operated on a lot. There are ways to do that e.g. using a probabilistic check so that no actual count tracking or similar would be needed. Other useful places where the hash table could be spawned are e.g. when an object is frozen, or when an object is set as a prototype of another object. So there's a lot of scope to make better hash table decisions. I'll try to stick to the hash algorithm and parameters here :-) Also, I added a task item for hash tables whose entries are smaller than the full 32 bits. Right now a hash table contains 32-bit entries, but that's pretty wasteful for an object of, say, 200 properties, because the entries could be 8-bit integers instead. So for desktop environments where footprint is not critical, supporting 8-bit, 16-bit, and 32-bit hash tables (or maybe just 8+32 or 16+32) would allow a much smaller load factor (and less collisions) for same memory cost. But I'll probably work on that in a separate pull. |
Some work related to this pull is the prototype property cache pull: if that works well, a hash part becomes less critical and could be reserved for actually large objects for which the cache doesn't work well because the cost of a miss and a full key scan is high. Another property cache related idea I have is to use a best effort property slot cache which is sloppier but in some ways easier/cheaper to manage:
This should work relatively well for a few reasons:
A property/slot cache is still not a replacement for a hash part for large objects: if an object is large, and most of its properties are accessed over and over again, it takes a lot of linear scans to populate the cache as compared to O(1) lookups from the hash. Maintaining the hash table is cheaper than re-populating the cache at least for very large objects. I'll prototype this in a separate branch. It may be a valid alternative to the (more complicated) property cache pull because the property cache entry is larger, and requires careful invalidation which can be tricky to get right. |
95c2f69
to
75e397f
Compare
Make the hash algorithm simpler by using a bit mask rather than a modulus for probing the hash. Make the hash part load factor lower than before to reduce clustering. Low memory environments disable hash part support anyway, so this doesn't impact them.
75e397f
to
51f7b3b
Compare
Instead of a prime and a MOD, use a bitmask and 2^N sized hash part.
Tasks:
Follow-ups: