Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: wrappers for malloc/calloc/realloc/free #345

Open
DrTimothyAldenDavis opened this issue Aug 19, 2021 · 6 comments
Open

Comments

@DrTimothyAldenDavis
Copy link

I'd like to try out c-blosc2 in SuiteSparse:GraphBLAS (see https://github.com/DrTimothyAldenDavis/GraphBLAS ). The package is used for C=A*B in MATLAB, and for the sparse matrix engine inside RedisGraph. It's also under consideration as the core sparse matrix library for Julia.

However, all of these applications have their own malloc/calloc/realloc/free memory managers. Julia has jl_malloc, MATLAB has mxMalloc, etc. There are TBB malloc functions that some applications use. I allow the application to tell me which malloc/calloc/realloc/free to use (via function pointers), and then I use that for all of my memory allocations. c-blosc2 calls the bare malloc/calloc/realloc/free, and there's no way to change this easily, from the user application.

LZ4 has a mechanism like this (except it's a compile-time selection), so I could use LZ4 inside GraphBLAS if I compile it myself, and make LZ4_malloc do the same thing as the example below. But currently I can't use strategy in c-blosc2, since c-blosc2 has no way for me to tell it which malloc/calloc/realloc/free to use.

If c-blosc2 could have something like the following, then I could set blosc2_malloc_function, etc, pointers to point to my malloc/etc functions. There are some other nuances ... the MATLAB mxMalloc is not thread-safe and must be placed in a critical section, so I have a bool that tells me whether the critical section is needed or not. Below is a simple version that ignores this issue. See https://github.com/DrTimothyAldenDavis/GraphBLAS/blob/9d36ba4fd0bec6edfae25077d5de6b4cfb7f4762/Source/GB_Global.c#L670 for details on how I handle that case.

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>

//------------------------------------------------------------------------------
// global pointers somewhere, which default to the ANSI C11 functions:
void * (* blosc2_malloc_function  ) (size_t)         = malloc ;
void * (* blosc2_calloc_function  ) (size_t, size_t) = calloc ;
void * (* blosc2_realloc_function ) (void *, size_t) = realloc ;
void   (* blosc2_free_function    ) (void *)         = free ;

//------------------------------------------------------------------------------
// use these inside c-blosc2, not bare calls to malloc/calloc/realloc/free:

void *blosc2_malloc (size_t size)
{
    // ANSI C11 malloc, or provided malloc-compatible function
    return (blosc2_malloc_function (size)) ;
}

void *blosc2_calloc (size_t n, size_t size)
{
    // ANSI C11 calloc, or provided calloc-compatible function
    return (blosc2_calloc_function (n, size)) ;
}

void *blosc2_realloc (void *p, size_t size)
{
    // ANSI C11 realloc, or provided realloc-compatible function
    return (blosc2_realloc_function (p, size)) ;
}

void blosc2_free (void *p)
{
    // ANSI C11 free, or user-provided free-compatible function
    blosc2_free_function (p) ;
}

//------------------------------------------------------------------------------
// user application example:
void *my_malloc (size_t size)
{
    void *p = malloc (size) ;
    printf ("Hi, I am my_malloc: size %d p %p\n" , (int) size, p)  ;
    return (p) ;
}
void *my_calloc (size_t n, size_t size)
{
    void *p = calloc (n, size) ;
    printf ("Hi, I am my_calloc: n: %d size %d p %p\n" , (int) n, (int) size, p)  ;
    return (p) ;
}
void *my_realloc (void *p, size_t size)
{
    printf ("Hi, I am my_reaalloc\n") ;
    return (realloc (p, size)) ;
}
void my_free (void *p)
{
    printf ("Hi, I am my_free: %p\n", p) ;
    free (p) ;
}

int main (void)
{

    // initialize the pointers
    blosc2_malloc_function = my_malloc ;
    blosc2_calloc_function = my_calloc ;
    blosc2_realloc_function = my_realloc ;
    blosc2_free_function = my_free ;

    int *p = blosc2_malloc (10 * sizeof (int)) ;
    for (int k = 0 ; k < 10 ; k++)
    {
        p [k] = k ;
    }
    for (int k = 0 ; k < 10 ; k++)
    {
        printf ("%d ", p [k]) ;
    }
    printf ("\n") ;
    blosc2_free (p) ;
    return (0) ;
}
@DrTimothyAldenDavis
Copy link
Author

Actually, a better way to handle the odd case of malloc not being thread-safe (the MATLAB mxMalloc), would be to add a critical section inside "my_malloc". Then the blosc2_malloc, as above, would be used just as shown, with no critical section. So the above strawman would work fine in all use cases I can think of.

@DrTimothyAldenDavis
Copy link
Author

For how this is done in LZ4, see https://github.com/lz4/lz4/releases/tag/v1.9.3 for a discussion, and https://github.com/lz4/lz4/blob/e78eeec86e696f318a0ad1e71d6ad50555d1c0a9/lib/lz4.c#L188 for how it's done in that library.

@FrancescAlted
Copy link
Member

Yeah, that has been in our TODO list for long time and we would be glad to consider a pull request on this.

@DrTimothyAldenDavis
Copy link
Author

Great!

It will take me a while before I can tackle an update to c-blosc2 as a pull request, but I'll keep you posted before I give it a try. The tricky part is not c-blosc2 itself, but all the other packages it relies on. LZ4 has its own method for redirecting malloc/calloc/free, but this is a compile-time control not a run time one (the latter is needed). So even LZ4 would need a patch, in my opinion. Unlike LZ4, the zlib package calls malloc/calloc/free directly, with no wrapper at all. So it would need patches, to (say) replace malloc with blosc2_malloc, free with blosc2_free, etc. The zstd library has ZSTD_malloc and ZSTD_customMalloc (the latter looks promising) but I also see bare calls to malloc in some places, like in zstd/dictBuilder.

These fixes could be done but they would have the downside of requiring a patch to a set of 3rd-party libraries, which can be difficult to maintain. So I understand why this feature doesn't appear in c-blosc2 yet.

@ericonr
Copy link

ericonr commented Jun 29, 2024

@DrTimothyAldenDavis why do you need every malloc call to use the external wrapper? Afaiu only the memory that's allocated internally, passed to the framework/library in use (Matlab, Julia, etc), and is from that point on managed by it, needs to be allocated using the framework's version of malloc; all dynamic internal buffers can use whatever they want. Of course, keeping track of which allocations need which free function isn't simple, but it could be simpler than changing every call in a code base.

@DrTimothyAldenDavis
Copy link
Author

It's far too complicated to keep track. I will often create internal temporary sparse matrices (each with about 3 to 5 malloced spaces) and other data structures. The I will move those memory spaces into a final sparse matrix object passed back to the user. Having different memory managers in a single complex application like my packages just isn't possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants