Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider integrating with objectdiff #8

Open
robertzk opened this issue Apr 18, 2015 · 2 comments
Open

Consider integrating with objectdiff #8

robertzk opened this issue Apr 18, 2015 · 2 comments

Comments

@robertzk
Copy link

objectdiff

Wonder if there's anything we can collaborate on?

@edwindj
Copy link
Owner

edwindj commented Apr 24, 2015

Dear Robert,

Sorry for my late response: I was on a leave.
I'd be happy to cooperate/integrate: do you have suggestions on what to
integrate?

Best,

Edwin

2015-04-18 22:43 GMT+02:00 Robert Krzyzanowski [email protected]:

objectdiff https://github.com/robertzk/objectdiff

Wonder if there's anything we can collaborate on?


Reply to this email directly or view it on GitHub
#8.

@robertzk
Copy link
Author

Thanks for the response edwin!

The way objectdiff works is it provides a function called objectdiff that computes a closure containing the "diff" between two arbitrary R objects. For example, if we have:

iris2 <- iris
iris2$new_column <- 1
patch <- objectdiff(iris, iris2)

Then patch will only store the new_column, rather than duplicating the full data set. This is particularly useful in wide data sets with hundreds or thousands of columns.

# Proof that the patch is smaller
> object.size(patch)
1896 bytes
> object.size(iris)
7088 bytes
> object.size(iris2)
8384 bytes

If you apply several modifications to a data.frame, you can start with only a copy of the initial set and its succession of patches to work your way to the final data.frame. This has two advantages: (1) you know what changed in each step, (2) it occupies much less memory.

Going further, objectdiff provides a tracked_environment that stores any changes to an R environment object using patches obtained from objectdiff. My question then is whether we can generate a plot of changes to, say, a data frame, by mapping patches obtained from objectdiff to plottable diffs obtained from daff.

Do you think this would be an interesting project? I could probably dedicate a weekend to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants