Consider storing only the difference of the attributes #102

e200 · 2023-05-01T09:03:31Z

Could we consider storing only the diffs between the previous and the current attribute instead of a full copy of the data to reduce storage usage?

This way, in case you want to revert, you would need to loop over all the past to compile the differences to get the content of the revision you want, and will save a lot of storage attributes containing big texts and small changes.

nonoesp · 2023-05-01T11:43:43Z

Hi, @e200 are you proposing to do diffing and deltas on the fields that have been edited or only storing edited fields?

I don't think this is something we'll add support for at this time as this package is minimally maintained.

e200 · 2023-05-02T08:00:21Z

Hi @nonoesp. In this case it is going to store only the field’s diffs and as consequence, only the fields that changed will be stored.

wize-wiz · 2024-05-08T16:30:12Z

@e200 You need to come up with an algorythm to which point the model was once in its original (or current) state.

From that point on, each difference saved for each field needs a recursion back in order to get to its original past state, only then will the full context of that particular model be complete. A text with 300 words and 50 different revisions (textual comparison) only made out of differences would take a whole lot of code and precision to get the full context of a revision in the past. So to get back to revision number 20, I need to go through 30 revisions. For model attributes this would be the same process.

Then we need to talk about the benefit ;)

Birleyici · 2024-11-25T06:18:06Z

Hi @wize-wiz,

Thank you for sharing your thoughts! However, I believe there is a misunderstanding regarding the diff approach, and I'd like to clarify its functionality.

The diff method only stores the changed fields, so when a specific version is retrieved, only the changes in that version are applied. Therefore, it is not necessary to process all previous versions in sequence. Here's an example to explain this more clearly:

Example:

Initial Data (Version 1):
{
"name": "Shoes",
"price": 100,
"stock": 50
}

Version 2: (Only price changed.)
{
"price": 120
}

Version 3: (Both name and price changed.)
{
"name": "Running Shoes",
"price": 150
}

When Version 3 is retrieved:

Only the fields stored in Version 3 (name and price) are applied.
There is no need to process the earlier versions, as Version 3 contains the complete summary of its changes.

Advantages of the Diff Approach:

The diff method stores only the changed fields, which significantly reduces database storage requirements.
Since each version contains its changes as a summary, it eliminates the need to reprocess previous versions.
Performance efficiency is enhanced, as only the requested version’s data is processed.

Conclusion:
When implemented correctly, the diff method is highly efficient in terms of storage and performance. It avoids the need to sequentially process all previous versions. Instead, it provides a change-based approach where only the requested version's data is applied.

I hope this explanation clarifies the situation. Let me know if you'd like to discuss this further! 😊

wize-wiz · 2024-11-25T22:14:40Z

@Birleyici

I understood it the first time. The topic we're talking about is called state management and this particular part is mostly known as "Change-Only Saved State". This is not a new thing and is quite a big subject in the field of state management.

The issue I discussed in my comment, when saving only "changes" (deltas) instead of the entire model state, is state reconstruction complexity and potential for incomplete or corrupted data.

Here's a short version:

To reconstruct the full state of the model at any given version, you must replay all changes from the initial version up to the desired version.
If a single intermediate version's change is missing or corrupted, the reconstruction process fails or produces incorrect results.
Reconstructing a specific version becomes increasingly slow as the number of changes (versions) grows. Every query or operation that depends on the state requires traversing and applying all prior deltas. This can lead to scalability issues in large datasets or systems with frequent changes.
Any error in a saved change (e.g., an incorrect delta or a skipped update) propagates to all subsequent versions, potentially invalidating the entire version history.
Tracking and debugging issues becomes harder since you don't have a clear snapshot of the full state at each version, making it more difficult to verify the correctness of the model at a particular version.

If you meant that a state is saved always from version 1 to the changes made "in the now", I don't see how this will work.

Let's take your example: A model Product starts as version 1 (initial state), we'll only capture the changes and never the entire model where I will add a 4th version.

// version 1
{
  "name": "Shoes",
  "price": 100,
  "stock": 50
}

// version 2
{
   "price": 120
}

// version 3
{
   "name": "Running Shoes",
   "price": 150
}

// version 4
{
   "price": 90
   "stock": 10
}

Our snapshot of the current state of the model is version 4

// version 4 (snapshot)
{
  "name": "Running Shoes",
  "price": 90,
  "stock": 10
}

If we wanted to reconstruct our model's state, from version 1 to version 4, we need to apply all versions to get to version 4. Lets reconstruct our model by just jumping to version 4, without including the versions in between, our model will have an incorrect state, the price is right, the stock is right, but the name is wrong.

{
  "name": "Shoes",
  "price": 90,
  "stock": 10
}

Secondly, I've written quite a lot of stuff in just a few minutes. If it was as simple as you clarified in your comment, you would've written a prototype by now ;) If you have any ideas for a repository where you started to work on an implementation to incorporate "Change-Only Saved State", I'm all ears, because I'm on the brink of writing such a thing for a Rust based program and might do it in PHP for Laravel just as well. But you do need to get up-to-date with all the details about this particular topic.

Cheers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider storing only the difference of the attributes #102

Consider storing only the difference of the attributes #102

e200 commented May 1, 2023

nonoesp commented May 1, 2023

e200 commented May 2, 2023

wize-wiz commented May 8, 2024 •

edited

Loading

Birleyici commented Nov 25, 2024

wize-wiz commented Nov 25, 2024

Consider storing only the difference of the attributes #102

Consider storing only the difference of the attributes #102

Comments

e200 commented May 1, 2023

nonoesp commented May 1, 2023

e200 commented May 2, 2023

wize-wiz commented May 8, 2024 • edited Loading

Birleyici commented Nov 25, 2024

wize-wiz commented Nov 25, 2024

wize-wiz commented May 8, 2024 •

edited

Loading