Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simple optimization? #4

Open
artelk opened this issue Sep 5, 2023 · 7 comments
Open

Simple optimization? #4

artelk opened this issue Sep 5, 2023 · 7 comments

Comments

@artelk
Copy link

artelk commented Sep 5, 2023

typedef struct {
    const BYTE* levelUp;
    const BYTE* nextTry;
} selectNextHop_t;

(apart from obvious using offets instead of pointers) you could cache the next (MINMATCH+1) byte of the sequence

typedef struct {
    const int32 levelUp: 28;
    const int32 nextTry: 28;
    int8 nextByte: 8;
} selectNextHop_t;

while promoting to the next level the nextByte should be updated to keep the next-next symbol (symbol at MINMATCH+LEVEL offset).
While searching you will start from comparing with the nextByte from the structure (and that value will be different in most cases).
And I believe in theory that can signifinactly reduce the amount of memory jumps into the buffer and improve performance 😃

@Cyan4973
Copy link
Owner

Cyan4973 commented Sep 5, 2023

This is a sound idea. It seems this would reduce the amount of memory accesses, so it could improve speed.

The bit twiddling hacks to store the necessary information into a packed structure tends to be a bit concerning, as there are ways this could end badly, both for portability and for performance. But I guess this is secondary topic that could be addressed in a second step.

@artelk
Copy link
Author

artelk commented Oct 6, 2023

Dear Yann,
Could you please explain one thing.
Capture
How the link between the node A and the node C is implemented?
Reading the code didn't help me to find the answer.. 😃
From what I understand the nextTry & MASK is the index of the next (previous in time) node at the lowest level (it would give you the link from A to B, not to C).
How you were able to avoid adding a distinct explicit field like indexOfTheNextNodeOnTheSameLevel which value can be different from nextTry & MASK for the nodes on the levels above the lowest one? (upd: on the lowest level it can also be different to skip the nodes which were pushed to the upper levels)

@Cyan4973
Copy link
Owner

Cyan4973 commented Oct 6, 2023

It's been a while since I looked at this code.

If I understand correctly, this picture comes from the MMC homepage,
and explains how a new chain, of guaranteed minimum match length N+1, is created.

Considering the previous steps,
I don't think that the link from A to C existed beforehand.
Instead, it is discovered, by scrubbing the previous chain (mml=n)
Once a position of length N+1 is discovered, it's extracted from the mml=n chain, and becomes a new chained member to the chain mml=n+1. This is the operation that the red arrows symbolizes.

@artelk
Copy link
Author

artelk commented Oct 6, 2023

Yes, the picture is from that page 😄
I mean I thought the discovered link A->C is probably somehow stored for future use and so next time you will search for the same N+1 string you will be able to jump from A to C not visiting the B and other intermediate nodes on the lower level.

@Cyan4973
Copy link
Owner

Cyan4973 commented Oct 6, 2023

Once it's discovered, it's indeed saved, as a "good" link.
But it first needs to be discovered.
This is done by scrubbing the list of "unclassified" candidates.

@artelk
Copy link
Author

artelk commented Oct 6, 2023

My question is how this saving is done. I believe some field of the node A should keep that good link.
But the nodes only have const BYTE* levelUp and const BYTE* nextTry.

The value A.nextTry % ChainTableSize is an index of B, not C.
Maybe the nodes themselves are somehow moved inside the chain table? I mean the node C is copied to the position of the node B in the table and before that the node B is moved to some other place and so on.

@Cyan4973
Copy link
Owner

Cyan4973 commented Oct 6, 2023

The new chain starts with levelUp, which essentially states that, from now on, all candidates in the chain are guaranteed to be matches of at least N+1 bytes. Then, all remaining positions within the underlying N chain are scrubbed, and any match of length N+1 is brought into the N+1 chain, using the nextTry link. Doing so, such position leaves its previous chain, which was only guaranteeing N bytes (making it shorter). But note that, on reaching the N+1 chain, they are now considered "unsorted". Meaning the next time we reach this part of the chain, they will be tested for N+2 promotion. The promotion process ends on reaching the previous Gateway to N+1 (if it existed) or on reaching the end of the N chain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants