Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

buffer in a tmpfile #1

Open
jbenet opened this issue Apr 3, 2015 · 12 comments
Open

buffer in a tmpfile #1

jbenet opened this issue Apr 3, 2015 · 12 comments

Comments

@jbenet
Copy link
Owner

jbenet commented Apr 3, 2015

hashpipe needs to buffer all the data as the hash isn't decidable until the last bit. we could buffer on disk with a tmpfile instead of memory.

@jbenet
Copy link
Owner Author

jbenet commented Apr 4, 2015

18:39 <Luke> jbenet: if I use hashpipe, it has to read the whole input into memory, right?
18:40 <jbenet> Luke: yeah, for now. https://github.com/jbenet/hashpipe/issues/1
18:40 <Luke> jbenet: yeah but either way that has to live somewhere. swapping might also take care of it
18:41 <jbenet> Luke: yeah. ideally it would be an option with a specified filename, in case you need to control it.

@jbenet
Copy link
Owner Author

jbenet commented Apr 4, 2015

<Tv`> jbenet: you could use an unlinked tempfile

thanks @tv42!

@jmscott
Copy link

jmscott commented Apr 4, 2015

buffering is a tricky issue and truly breaks the advantages of a pipeline. in practice a solid script would always check the exit code of hashpipe, regardless of the output. perhaps an option to hashpipe?

@jbenet
Copy link
Owner Author

jbenet commented Apr 4, 2015

It should be an option at the very least

@jmscott
Copy link

jmscott commented Apr 4, 2015

buffering the blob still does not remove the need to check the exit code in a pipeline of any consequence, so why buffer? for example, hypothtically the write of the buffered, verified data on the pipeline could fail with a partial write to the output. if hashpipe fails to digest properly simply abort with a burp to stderr and and a non-zero exit code and let the caller worry about how to harden the environment. in other words, the exit code indicates correctness as much as the output.

@jbenet
Copy link
Owner Author

jbenet commented Apr 4, 2015

if hashpipe fails to digest properly simply abort with a burp to stderr and and a non-zero exit code

yep, this is what it already does.

the buffering is to avoid nuking the machine's memory. swap may work fine, but it's still unideal. (i.e. if i know im about to get 100MB exec i may want to buffer it in a file of my choosing)

@jmscott
Copy link

jmscott commented Apr 5, 2015

so hashpipe buffers (to temp file) the whole blob before writing to output?
if so then why? the exit code would indicate correctness, not the output.
a caller will have to check the exit code anyway in a strict environment,
so it's not clear to be what advantage buffering gives.

-j

On Sat, Apr 4, 2015 at 6:01 PM, Juan Batiz-Benet [email protected]
wrote:

if hashpipe fails to digest properly simply abort with a burp to stderr
and and a non-zero exit code

yep, this is what it already does.

the buffering is to avoid nuking the machine's memory. swap may work fine,
but it's still unideal.


Reply to this email directly or view it on GitHub
#1 (comment).

@jbenet
Copy link
Owner Author

jbenet commented Apr 5, 2015

it does not currently buffer to a temp file. im saying there should be an option to.

the buffering is to avoid nuking the machine's memory.

suppose the file being input is 1TB.

@kpcyrd
Copy link

kpcyrd commented Oct 19, 2015

@jmscott assuming ./a | hashpipe $hash | ./b, ./b has no access to hashpipes exit code, but will start processing as soon as it gets data on stdin.

@jbenet you could chunk the file, calculate a hash for each chunk and then hash the list of hashes. On start, it reads and verifies the list of hashes with argv[1] and is then able to verify and write each chunk to stdout.

@jbenet
Copy link
Owner Author

jbenet commented Oct 19, 2015

  • no, ./b should NOT receive any data until the whole hash is verified. this is a well known attack vector. attackers can cut the download mid-stream and leave things in inconsistent states.

@kpcyrd
Copy link

kpcyrd commented Oct 19, 2015

@jbenet I think you misread my comment. ./b starts processing data as soon as it gets any, which must not happen until the hash is verified. This is the reason why it can't just write to stdout and expect ./b to wait for an exit code.

@kpcyrd
Copy link

kpcyrd commented Oct 19, 2015

Never mind, noticed you've been referring to the 2nd part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants