Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider storing only the difference of the attributes #102

Open
e200 opened this issue May 1, 2023 · 5 comments
Open

Consider storing only the difference of the attributes #102

e200 opened this issue May 1, 2023 · 5 comments

Comments

@e200
Copy link

e200 commented May 1, 2023

Could we consider storing only the diffs between the previous and the current attribute instead of a full copy of the data to reduce storage usage?

This way, in case you want to revert, you would need to loop over all the past to compile the differences to get the content of the revision you want, and will save a lot of storage attributes containing big texts and small changes.

@nonoesp
Copy link
Collaborator

nonoesp commented May 1, 2023

Hi, @e200 are you proposing to do diffing and deltas on the fields that have been edited or only storing edited fields?

I don't think this is something we'll add support for at this time as this package is minimally maintained.

@e200
Copy link
Author

e200 commented May 2, 2023

Hi @nonoesp. In this case it is going to store only the field’s diffs and as consequence, only the fields that changed will be stored.

@wize-wiz
Copy link

wize-wiz commented May 8, 2024

@e200 You need to come up with an algorythm to which point the model was once in its original (or current) state.

From that point on, each difference saved for each field needs a recursion back in order to get to its original past state, only then will the full context of that particular model be complete. A text with 300 words and 50 different revisions (textual comparison) only made out of differences would take a whole lot of code and precision to get the full context of a revision in the past. So to get back to revision number 20, I need to go through 30 revisions. For model attributes this would be the same process.

Then we need to talk about the benefit ;)

@Birleyici
Copy link

Hi @wize-wiz,

Thank you for sharing your thoughts! However, I believe there is a misunderstanding regarding the diff approach, and I'd like to clarify its functionality.

The diff method only stores the changed fields, so when a specific version is retrieved, only the changes in that version are applied. Therefore, it is not necessary to process all previous versions in sequence. Here's an example to explain this more clearly:

Example:

Initial Data (Version 1):
{
"name": "Shoes",
"price": 100,
"stock": 50
}

Version 2: (Only price changed.)
{
"price": 120
}

Version 3: (Both name and price changed.)
{
"name": "Running Shoes",
"price": 150
}

When Version 3 is retrieved:

  • Only the fields stored in Version 3 (name and price) are applied.
  • There is no need to process the earlier versions, as Version 3 contains the complete summary of its changes.

Advantages of the Diff Approach:

  1. The diff method stores only the changed fields, which significantly reduces database storage requirements.
  2. Since each version contains its changes as a summary, it eliminates the need to reprocess previous versions.
  3. Performance efficiency is enhanced, as only the requested version’s data is processed.

Conclusion:
When implemented correctly, the diff method is highly efficient in terms of storage and performance. It avoids the need to sequentially process all previous versions. Instead, it provides a change-based approach where only the requested version's data is applied.

I hope this explanation clarifies the situation. Let me know if you'd like to discuss this further! 😊

@wize-wiz
Copy link

@Birleyici

I understood it the first time. The topic we're talking about is called state management and this particular part is mostly known as "Change-Only Saved State". This is not a new thing and is quite a big subject in the field of state management.

The issue I discussed in my comment, when saving only "changes" (deltas) instead of the entire model state, is state reconstruction complexity and potential for incomplete or corrupted data.

Here's a short version:

  • To reconstruct the full state of the model at any given version, you must replay all changes from the initial version up to the desired version.

  • If a single intermediate version's change is missing or corrupted, the reconstruction process fails or produces incorrect results.

  • Reconstructing a specific version becomes increasingly slow as the number of changes (versions) grows. Every query or operation that depends on the state requires traversing and applying all prior deltas. This can lead to scalability issues in large datasets or systems with frequent changes.

  • Any error in a saved change (e.g., an incorrect delta or a skipped update) propagates to all subsequent versions, potentially invalidating the entire version history.

  • Tracking and debugging issues becomes harder since you don't have a clear snapshot of the full state at each version, making it more difficult to verify the correctness of the model at a particular version.

If you meant that a state is saved always from version 1 to the changes made "in the now", I don't see how this will work.

Let's take your example: A model Product starts as version 1 (initial state), we'll only capture the changes and never the entire model where I will add a 4th version.

// version 1
{
  "name": "Shoes",
  "price": 100,
  "stock": 50
}
// version 2
{
   "price": 120
}
// version 3
{
   "name": "Running Shoes",
   "price": 150
}
// version 4
{
   "price": 90
   "stock": 10
}

Our snapshot of the current state of the model is version 4

// version 4 (snapshot)
{
  "name": "Running Shoes",
  "price": 90,
  "stock": 10
}

If we wanted to reconstruct our model's state, from version 1 to version 4, we need to apply all versions to get to version 4. Lets reconstruct our model by just jumping to version 4, without including the versions in between, our model will have an incorrect state, the price is right, the stock is right, but the name is wrong.

{
  "name": "Shoes",
  "price": 90,
  "stock": 10
}

Secondly, I've written quite a lot of stuff in just a few minutes. If it was as simple as you clarified in your comment, you would've written a prototype by now ;) If you have any ideas for a repository where you started to work on an implementation to incorporate "Change-Only Saved State", I'm all ears, because I'm on the brink of writing such a thing for a Rust based program and might do it in PHP for Laravel just as well. But you do need to get up-to-date with all the details about this particular topic.

Cheers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants