-
Notifications
You must be signed in to change notification settings - Fork 442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add in-memory I/O using hFILE fixed buffers #590
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR.
Rob has a far greater knowledge of the hfile overloading code than myself so I'll let him add more comments on that side of thing, specifically the double buffering questions.
hfile.c
Outdated
FILE *file = fopen(filename, mode); | ||
if (!file) goto error; | ||
|
||
fseek(file, 0, SEEK_END); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to check the return value of fseek, incase someone has given a non-seekable stream to this function. (Eg /dev/stdin)
hfile.c
Outdated
char* buffer = malloc(len); | ||
if(buffer == NULL) | ||
{ | ||
errno = ENOMEM; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manually setting errno isn't recommended. Malloc will do this itself, as would the fread below. Indeed we're obscuring the fread return as it may have other more meaningful errno values.
hfile.c
Outdated
@@ -821,6 +862,41 @@ static int init_add_plugin(void *obj, int (*init)(struct hFILE_plugin *), | |||
return 0; | |||
} | |||
|
|||
hFILE *hopenv_mem(const char *filename, const char *mode, va_list args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if the existing hopen_mem, which is strictly URL based rather than generic memory based, should be reimplemented in terms of the generic hopenv_mem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a good idea, though in hopenv_mem
there isn't (and I don't think it would be a good idea to) a way to specify a different buffer size from the length of the buffer that is filled - this is needed in hopen_mem
. I think it would be a good to create a function create_hfile_mem
, which would do the functionality in common with both functions (which I have implemented below)
@jkbonfield thanks for the comments, the double buffering issue is actually to do with the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please review.
hfile.c
Outdated
@@ -615,6 +615,39 @@ static hFILE *hopen_fd(const char *filename, const char *mode) | |||
return NULL; | |||
} | |||
|
|||
static hFILE *hpreload_fd(const char *filename, const char *mode) | |||
{ | |||
if(!strchr(mode, 'r')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check mode
value. If NULL
, strchr
will seg fault.
hfile.c
Outdated
static hFILE *create_hfile_mem(char* buffer, const char* mode, size_t buf_filled, size_t buf_size) | ||
{ | ||
hFILE_mem *fp = (hFILE_mem *) hfile_init_fixed(sizeof(hFILE_mem), mode, buffer, buf_filled, buf_size); | ||
if (fp == NULL) { free(buffer); return NULL; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should let the caller function free buffer
.
hfile.c
Outdated
@@ -735,6 +777,8 @@ static hFILE *hopen_mem(const char *url, const char *mode) | |||
hts_decode_percent(buffer, &length, data); | |||
} | |||
|
|||
return create_hfile_mem(buffer, mode, length, size); | |||
|
|||
hFILE_mem *fp = (hFILE_mem *) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is unreachable.
hfile.c
Outdated
hFILE_mem *fp = (hFILE_mem *) | ||
hfile_init_fixed(sizeof (hFILE_mem), mode, buffer, length, size); | ||
if (fp == NULL) { free(buffer); return NULL; } | ||
return create_hfile_mem(buffer, mode, length, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You still need to free buffer
. I would check the value returned by create_hfile_mem
and, if NULL
, free(buffer)
.
Had another look at this as part of trying to solve the double buffering issue for the hfile_ref structure. I've changed this so the buffer doesn't get freed on |
G131 asks @jmarshall do you have time and the inclination to take a peek? |
@dkj: Thanks for the notification, which I have only just seen; will take a peek. |
The discussion at the start of hfile.c regarding mobile vs immobile windows is now incorrect, as in-memory streams can be "mobile" again. (I'm rather confused on the terminology though.) It probably needs to be explained in a different way, such as a fixed-size memory buffer, with a comment about variable-sized mobile in-memory buffers also being supported. I'll look at updating this as I have a few other tweaks to do I think. |
hfile.c
Outdated
} | ||
|
||
hFILE *fp = hpreload_fd(fname, mode); | ||
if (fp == NULL && errno == ENOENT) errno = EPROTONOSUPPORT; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unsure why this line exists. hpreload_fd can return NULL/ENOENT for valid reasons, and ENOENT is suitable then I think.
hfile.c
Outdated
void hfile_destroy(hFILE *fp) | ||
{ | ||
int save = errno; | ||
if (fp) free(fp->buffer); | ||
if (fp && fp->backend != &mem_backend) free(fp->buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This extra check means data: hopens now leak memory. Why are we not freeing here?
Changing this back makes test/hfile pass valgrind leak checks provided we also remove the now redundant free(internal_buf)
calls.
I think it is probably fine to accept that obtaining the internal buffer doesn't equate to now also owning this buffer. If we wish to have the concept of taking ownership (and hence freeing responsibilities) we'd need an additional flag somewhere.
I'm not convinced this works. I created a noddy test case so I could strace it and see what happens:
This simply rereads the file each time. It hasn't preloaded at all. The cause is due to hopen and unknown_scheme. It runs We could change this code in hopen, or make find_scheme_handler always return a scheme instead of returning NULL, so that the unknown scheme is always used (which boils own to straight hopen_fd mostly), but I'm not sure it's the right approach. I assume the idea of adding a preload argument comes from #417. If this is what we're trying to do, then then hacking the unknown scheme handler to support preload feels wrong. I think hopen() itself should check for preload and if found then proceed with a read loop to preload the file. (If this fails then the entire open fails.) This would make preload uri agnostic too. I'll have a play and see if this works. |
Hmm, adding this to va_args doesn't help either really. Normally with va_args we either have a scan along it until we hit a known terminator (eg NULL):
Or we have a format string that controls what the arguments are (think printf("%d %s", 10, "foo")), or we have some specific logic that knows precisely how many arguments are parsed. Eg in this bit of test/hfile.c
The problem here is adding preload into this just breaks it. We don't know what follows, if anything, and we can't stop processing our va list. The only thing we can do is check if the first argument is preload and then stop, but that then forbids preloading of anything else that wants to use va_list for its own purposes. In short, we make preload only viable for some things, which is perhaps what the intention was, but if so that's not at all clear. If we're going down that road, eg file only, then frankly we'd be better off just defining it as "preload:filename" and adding a scheme handler for it instead of using the varargs interface. Actually we could even have preload:uri so that "preload:ftp://foo/bar.bam.bai" worked. This layering works, whereas adding it to varags doesn't. |
@ThomasHickman I have a branch in my home dir at Sanger (~jkb/work/samtools_master/htslib) named mem_io_jkb. It's a work in progress, but attempts to fix some of the above. The preload now works and the buffer is owned by hfile still unless explicitly stolen, so we don't leak memory in the common use case. I'm unsure though of whether this is better than the original implementation in #417 as I haven't had time to go over that code. There were ongoing discussions between @jmarshall and @pkrusche which fizzled out, but it does appear it mostly did the right thing already bar the preload bit. What was happening with that and this PR? Any comments @jmarshall? You said you'd take a look at this, and obviously also looked at the other PR. Which would you say is the best starting point to work from? |
@jkbonfield, thanks for the corrections, I'll add them into my branch. This PR is my attempt to implement suggestions by @jmarshall on the original pull request: #417 (comment) - I think this PR is a better place to go from than the original implementation, as this was written when the varargs version of hopen wasn't implemented and therefore uses a very inelegant method of setting the buffer (as detailed in the conversation around that pull request) |
The buffer used in preload is now freed. If we want to take ownership of it and avoid it being freed, the new hfile_mem_steal_buffer function can be called. Also changed the prototype of hfile_mem_get_buffer to return the buffer directly instead of via a pointer to a pointer as it's simpler and permits inline usage. Changed preload to work on more than straight hFILE_fd, although in the current incarnation it still won't work on anything that attempts to use the varags hopen interface for anything else.
Replaces the vargargs method of specifying that the url should be preloaded. hopen_preload has also been renamed to hpreload to make way for a hopen_preload function that acts as an interface for hfile_add_scheme_handler
Looks good. I tested it works on preload:ftp://blah too, as we'd expected. Running more tests but hopefully merge soon. |
@@ -908,21 +1019,25 @@ static const struct hFILE_scheme_handler *find_scheme_handler(const char *s) | |||
|
|||
hFILE *hopen(const char *fname, const char *mode, ...) | |||
{ | |||
hFILE *fp = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Edit: ignore that - confused by diff here not showing entire function.
Squashed, resolved conflicts and merged as 8003166 |
This PR implements what was proposed in #417, by implementing writing for fixed buffers and implementing a wrapper.
To create a in-memory hFILE:
you can also access the internal buffer using:
This PR also implements preloading the enter contents of a file into a fixed buffer: