Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(question) cusz changes input data after compression #70

Open
maltempi opened this issue Oct 21, 2022 · 5 comments
Open

(question) cusz changes input data after compression #70

maltempi opened this issue Oct 21, 2022 · 5 comments

Comments

@maltempi
Copy link

Hey everybody,

I've been using cusz APIs and after compressing input data I noticed the input data is not the same anymore -- it looks modified. This can be reproduced with cusz API example.

As far as I understood, according to cusz's wiki the nondestructive=true configuration should avoid this behavior but switching the flags I had no success. I tried checking out the source code, and I didn't find any implementation for that.

Are my assumptions correct? Can I consider that cusz nowadays changes the input data after compression or am I doing something wrong?

Thanks very much!

@jtian0
Copy link
Collaborator

jtian0 commented Oct 21, 2022

Hi @maltempi,

Sorry about your experience. Yes, you are correct; the functionality is not there while the API is.

Can you temporarily duplicate the input data before running the compressor to make your experiments smooth? I will fix this issue soon. The first fix should be to duplicate input data internally, which should come back to you quickly. And later, I will rewrite the memory management (maybe the file format).

In addition, the current memory footprint does not scale well. If you experience such an issue---can be a "segmentation fault" (it won't show "out-of-memory" directly), please also mark it here.

Thank you.

@maltempi
Copy link
Author

maltempi commented Oct 24, 2022

Hey @jtian0 , Thank you very much for your prompt response! Ok, I just wanted to confirm if I was in the right direction. I'll be making a copy of the input before of compressing it.

In addition, the current memory footprint does not scale well. If you experience such an issue---can be a "segmentation fault" (it won't show "out-of-memory" directly), please also mark it here.

Thanks for the headsup! I'll keep an eye on it.

@jtian0
Copy link
Collaborator

jtian0 commented Nov 8, 2022

Hi @maltempi,

I made a temporary fix not to destroy the input data, and I put it to an unstable branch. See this and this.

The internal allocates an array for outlier by default (1x the input data size), which result in an extra memory footprint. Therefore, it is only for demonstration. Memory management relies on a more thorough fix rather than a patch.

Conclusion

  • What's fixed: destroying input data
  • Known issue after the fix: increased memory footprint by default
  • What's the next fix: rewriting the memory management

@hyviquel
Copy link

Hi @jtian0,

Is there any plan to really fix this issue? I mean rewriting the memory management.
We would like to use cusz but this problem is a blocker for us and the temporary solution is not good enough for us.

Thanks!

@jtian0
Copy link
Collaborator

jtian0 commented Dec 17, 2022

Hi @hyviquel,

Thank you for still having confidence in cusz. I apologize for the severe delay in the development--my travel and then coursework occupied November and December until now (the final week of the semester).

And yes, it is planned because it is (also) the blocker for any further plan for cusz. I think mid-December to mid-January could be a good window for me to address the piled issues, especially the memory footprint issue. I'd reopen this issue until it is sufficiently resolved.

@jtian0 jtian0 reopened this Dec 17, 2022
This was referenced Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants