Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

internal not working for 8-GB device #19

Closed
jtian0 opened this issue Nov 6, 2020 · 2 comments
Closed

internal not working for 8-GB device #19

jtian0 opened this issue Nov 6, 2020 · 2 comments
Assignees

Comments

@jtian0
Copy link
Collaborator

jtian0 commented Nov 6, 2020

Test shows it's working for 16-GB V100: #18 (comment)

For large dataset such as Parihaka (4.8 GB), a 8-GB device cannot generate correct unzipped data. The peak memory usage should be around 1.5x the input datum size (in this case, 7.5 GB), because the workflow can be stated as

  1. datum of 1x size loaded to GPU, generating (at max) quant. code of 0.5x
  2. CSR, preserving according to compression ratio, after which, 1x datum is freed
    In one case, 4.8 GB generates 800 MB CSR; this may exceed device memory capacity.
> ./bin/cusz -f32 -m r2r -e 1.0e-4.0 -i ~/Parihaka_PSTM_far_stack.f32 -D parihaka -z          
[info] datum:           /path/to/Parihaka_PSTM_far_stack.f32 (4850339584 bytes) of type f32
[dbg]  original len:    1212584896 (padding: 34823)
[dbg]  Time loading data:       3.19802s
[info] quant.cap:       1024    input eb:       0.0001
[dbg]  Time inspecting data range:      0.0232662s
[info] eb change:       (input eb) x 12342.2 (rng) = 1.23422 (relative-to-range)
[dbg]  2-byte quant type, 4-byte internal Huff type

[info] Commencing compression...
[info] nnz.outlier:     16607   (0.00136955%)
[dbg]  Optimal Huffman deflating chunksize      131072
[info] entropy:         3.85809
[dbg]  Huffman enc:     #chunk=9252, chunksze=131072 => 212256108 4-byte words/6792051551 bits
[dbg]  Time writing Huff. binary:       0.501431s
[info] Compression finished, saved Huffman encoded quant.code.
[dbg]  Time tar'ing     1.01489s
[info] Written to:      /path/to/Parihaka_PSTM_far_stack.f32.sz

> ./bin/cusz -i ~/Parihaka_PSTM_far_stack.f32.sz -x --origin ~/Parihaka_PSTM_far_stack.f32 --skip write.x 
[info] Commencing decompression...
[info] Huffman decoding into quant.code.
[info] Extracted outlier from CSR format.
[info] Decompression finished.

[info] Huffman metadata of chunking and reverse codebook size (in bytes): 150336
[info] Huffman coded output size: 849024432
[info] To compare with the original datum

[info] Verification start ---------------------
| min.val             -6893.359375
| max.val             5448.8828125
| val.rng             12342.2421875
| max.err.abs.val     6893.359375
| max.err.abs.idx     706941819
| max.err.vs.rng      0.55851759107283360795
| max.pw.rel.err      1
| PSNR                32.837037623060211899
| NRMSE               0.022811199295531395248
| correl.coeff        -NAN
| comp.ratio.w/o.gzip 5.709997
[info] Verification end -----------------------

[info] Decompressed file is written to /path/to/Parihaka_PSTM_far_stack.f32.szx.
[info] Please use compressed data (*.sz) to calculate final comp ratio (w/ gzip).
[info] Skipped writing unzipped to filesystem.

misc. todo:

  1. writing to filesystem info not correct
  2. verification too slow
  3. trailing slash in printing output file
@jtian0 jtian0 self-assigned this Nov 6, 2020
@jtian0 jtian0 changed the title not working for 8-GB device [internal] not working for 8-GB device Nov 6, 2020
@jtian0 jtian0 changed the title [internal] not working for 8-GB device internal not working for 8-GB device Aug 11, 2021
@jtian0
Copy link
Collaborator Author

jtian0 commented Feb 28, 2022

The problem was previously solved in an ad hoc manner: online releasing memory to keep the program run. However, it is too expensive to deallocate, thus, other better solutions are needed, e.g., data partitioner: only a (small) portion of data is compressed each time.

@jtian0
Copy link
Collaborator Author

jtian0 commented Oct 21, 2022

Merge to #70.

@jtian0 jtian0 closed this as completed Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant