Skip to content

Pack and Unpack

Travis Koehring edited this page Aug 24, 2021 · 15 revisions

Overview

This document presents the pack and unpack APIs provided by Yaksa and showcases them using some simple example from Creating New Datatypes. These APIs have non-blocking semantics and return a request object that can be used to test on progress of these operations or wait for their completion (see apposite APIs that follow). The reason for having non-blocking semantics is that some target architectures, such as GPUs, allow data to be transferred asynchronously (without the intervention of the CPU). Besides this, the APIs also allow for partial pack and unpack. Some example of when this feature is useful is given in the rest of the document.

API

yaksa_ipack()

int yaksa_ipack(const void      * inbuf,
                uintptr_t         incount,
                yaksa_type_t      type,
                uintptr_t         inoffset,
                void            * outbuf,
                uintptr_t         max_pack_bytes,
                uintptr_t       * actual_pack_bytes,
                yaksa_into_t      info,
                yaksa_op_t        op,
                yaksa_request_t * request)
  • Pack the data represented by (incount, type) tuple into a contiguous buffer
  • Parameters
    • [in] inbuf: Input buffer from which data is being packed
    • [in] incount: Number of elements of the datatype representing the layout
    • [in] type: Datatype representing the layout
    • [in] inoffset: Number of bytes to skip from the layout represented by the (incount, type) tuple
    • [out] outbuf: Output buffer in which data is being packed
    • [in] max_pack_bytes: Maximum number of bytes that can be packed in the output buffer
    • [out] actual_pack_bytes: Actual number of bytes that were packed in the output buffer
    • [in] info: Info object object for the data being packed
    • [in] 'op':
    • [out] request: Request handle associated with the operation (YAKSA_REQUEST__NULL if the request already completed)
  • Return values
    • On success, YAKSA_SUCCESS is returned.
    • On error, a non-zero error code is returned.

yaksa_iunpack()

int yaksa_iunpack(const void      * inbuf,
                  uintptr_t         insize,
                  void            * outbuf,
                  uintptr_t         outcount,
                  yaksa_type_t      type,
                  uintptr_t         outoffset,
                  uintptr_t       * actual_unpack_bytes,
                  yaksa_info_t      info,
                  yaksa_op_t        op,
                  yaksa_request_t * request)
  • Unpack data from a contiguous buffer into a buffer represented by the (incount, type) tuple
  • Parameters
    • [in] inbuf: Input buffer from which data is being unpacked
    • [in] insize: Number of bytes in the input buffer
    • [out] outbuf: Output buffer into which data is being unpacked
    • [out] outcount: Number of elements of the data representing the layout
    • [in] type: Datatype representing the layout
    • [out] outoffset: Number of bytes to skip from the layout represented by the (incount, type) tuple
    • [out] actual_unpack_bytes: Actual number of bytes that were unpacked into the output buffer
    • [in] info: Info hint to apply
    • [in] op:
    • [out] request: Request handle associated with the operation (YAKSA_REQUEST__NULL if the request already completed)
  • Return values
    • On success, YAKSA_SUCCESS is returned.
    • On error, a non-zero error code is returned.

yaksa_request_wait()

int yaksa_request_wait(yaksa_request_t request)
  • Wait till a request has completed
  • Parameters
    • [in] request: The request object that needs to be waited up on
  • Return values
    • On success, YAKSA_SUCCESS is returned.
    • On error, a non-zero error code is returned.

yaksa_request_test()

int yaksa_request_test(yaksa_request_t   request,
                       int             * completed)
  • Test to see if a request has completed
  • Parameters
    • [in] request: The request object that needs to be tested
    • [out] completed: Flag to tell the caller whether the request object has completed
  • Return values
    • On success, YAKSA_SUCCESS is returned.
    • On error, a non-zero error code is returned.

Examples

The examples in this section build on the examples presented in Datatype Creation. They show how the produced data in the pack buffer looks like when considering different layouts. Moreover, they also show how to do partial pack and unpack. This feature is useful when the original data has to be sent to another process over the network, for example, and the communication library cannot transfer all the data at once. In this case smaller portions of the data have to be transferred one after the other in sequence.

Contig Layout

Packing data using a contiguous layout has the effect of making an exact copy of the original data into the pack buffer, up to the number of bytes defined by the layout. Similarly, unpacking data using a contiguous layout has the effect of making an exact copy of the data in the pack buffer into the target buffer. The following code shows how to pack and unpack using the contiguous layout and the input matrix used in previous examples.

#include <yaksa.h>

int main()
{
    int rc;
    int input_matrix[64]; /* initialized with data from previous example */
    int pack_buf[64];
    int unpack_buf[64];
    yaksa_type_t contig;

    yaksa_init(YAKSA_INIT_ATTR__DEFAULT);

    /* For layout creation see corresponding example */

    yaksa_request_t request;
    uintptr_t actual_pack_bytes;

    /* start packing */
    rc = yaksa_ipack(input_matrix, 1, contig, pack_buf, 256, 
                     &actual_pack_bytes, &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for packing to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    /* start unpacking */
    rc = yaksa_iunpack(pack_buf, 256, unpack_buf, 256, contig, 0,
                       &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for unpacking to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    yaksa_type_free(contig);

    yaksa_finalize();
    return 0;
}

In the previous code the yaksa_ipack/iunpack functions start the packing/unpacking. These functions have non-blocking semantics and return a request object that can be used to check when the packing/unpacking has completed. The reason for having non-blocking semantics is that some target architectures, such as GPUs, support asynchronous memory copies. The previous code produces the following data in the pack/unpack buffers.

pack_buf = unpack_buf =
 0  1  2  3  4  5  6  7
 8  9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63

Vector Layout

The following code shows how to pack and unpack using the vector layout and the input matrix used in previous examples.

#include <yaksa.h>

int main()
{
    int rc;
    int input_matrix[64]; /* initialized with data from previous example */
    int pack_buf[64];
    int unpack_buf[64];
    yaksa_type_t vector;

    yaksa_init(YAKSA_INIT_ATTR__DEFAULT);

    /* For layout creation see corresponding example */

    yaksa_request_t request;
    uintptr_t actual_pack_bytes;

    /* start packing.
     * note that we can request more bytes in max_pack_bytes and will
     * get the correct number of packed bytes in actual_pack_bytes */
    rc = yaksa_ipack(input_matrix, 1, vector, pack_buf, 256,
                     &actual_pack_bytes, &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for packing to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    /* start unpacking */
    rc = yaksa_iunpack(pack_buf, 32, unpack_buf, 1, vector, 0,
                       &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for unpacking to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    yaksa_type_free(vector);

    yaksa_finalize();
    return 0;
}

The previous code produces the following data in the pack/unpack buffers.

pack_buf=
 0  8 16 24 32 40 48 56
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0

unpack_buf=
 0  0  0  0  0  0  0  0
 8  0  0  0  0  0  0  0
16  0  0  0  0  0  0  0
24  0  0  0  0  0  0  0
32  0  0  0  0  0  0  0
40  0  0  0  0  0  0  0
48  0  0  0  0  0  0  0
56  0  0  0  0  0  0  0

Indexed Block Layout

The following code shows how to pack and unpack using the indexed block layout and the input matrix used in previous examples.

#include <yaksa.h>

int main()
{
    int rc;
    int input_matrix[64]; /* initialized with data from previous example */
    int pack_buf[64];
    int unpack_buf[64];
    yaksa_type_t indx_block;

    yaksa_init(YAKSA_INIT_ATTR__DEFAULT);

    /* For layout creation see corresponding example */

    yaksa_request_t request;
    uintptr_t actual_pack_bytes;

    /* start packing */
    rc = yaksa_ipack(input_matrix, 1, indx_block, pack_buf, 256,
                     &actual_pack_bytes, &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for packing to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    /* start unpacking */
    rc = yaksa_iunpack(pack_buf, 128, unpack_buf, 1, indx_block, 0,
                       &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for unpacking to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    yaksa_type_free(indx_block);

    yaksa_finalize();
    return 0;
}

The previous code produces the following data in the pack/unpack buffers.

pack_buf=
 4  5  6  7 12 13 14 15
20 21 22 23 28 29 30 31
32 33 34 35 40 41 42 43
48 49 50 51 56 57 58 59
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0

unpack_buf=
 0  0  0  0  4  5  6  7
 0  0  0  0 12 13 14 15
 0  0  0  0 20 21 22 23
 0  0  0  0 28 29 30 31
32 33 34 35  0  0  0  0
40 41 42 43  0  0  0  0
48 49 50 51  0  0  0  0
56 57 58 59  0  0  0  0

Indexed Layout

The following code shows how to pack and unpack using the indexed layout and the input matrix used in previous examples.

#include <yaksa.h>

int main()
{
    int rc;
    int input_matrix[64]; /* initialized with data from previous example */
    int pack_buf[64];
    int unpack_buf[64];
    yaksa_type_t indexed;

    yaksa_init(YAKSA_INIT_ATTR__DEFAULT);

    /* For layout creation see corresponding example */

    yaksa_request_t request;
    uintptr_t actual_pack_bytes;

    /* start packing */
    rc = yaksa_ipack(input_matrix, 1, indexed, pack_buf, 256,
                     &actual_pack_bytes, &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for packing to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    /* start unpacking */
    rc = yaksa_iunpack(pack_buf, 84, unpack_buf, 1, indexed, 0,
                       &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for unpacking to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    yaksa_type_free(indexed);

    yaksa_finalize();
    return 0;
}

The previous code produces the following data in the pack/unpack buffers.

pack_buf=
 9 18 19 26 27 36 37 38
39 44 45 46 47 52 53 54
55 60 61 62 63  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0

unpack_buf=
 0  0  0  0  0  0  0  0
 0  9  0  0  0  0  0  0
 0  0 18 19  0  0  0  0
 0  0 26 27  0  0  0  0
 0  0  0  0 36 37 38 39
 0  0  0  0 44 45 46 47
 0  0  0  0 52 53 54 55
 0  0  0  0 60 61 62 63

Resized Layout

The following code shows how to pack and unpack using the transposed layout and the input matrix used in previous examples.

#include <yaksa.h>

int main()
{
    int rc;
    int input_matrix[64]; /* initialized with data from previous example */
    int pack_buf[64];
    int unpack_buf[64];
    yaksa_type_t vector;
    yaksa_type_t vector_resized;
    yaksa_type_t transpose;

    yaksa_init(YAKSA_INIT_ATTR__DEFAULT);

    /* For layout creation see corresponding example */

    yaksa_request_t request;
    uintptr_t actual_pack_bytes;

    /* start packing */
    rc = yaksa_ipack(input_matrix, 1, transpose, pack_buf, 256,
                     &actual_pack_bytes, &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for packing to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    /* start unpacking */
    rc = yaksa_iunpack(pack_buf, 256, unpack_buf, 1, transpose, 0,
                       &request);
    assert(rc == YAKSA_SUCCESS);

    /* wait for unpacking to complete */
    rc = yaksa_request_wait(request);
    assert(rc == YAKSA_SUCCESS);

    yaksa_type_free(vector);
    yaksa_type_free(vector_resized);
    yaksa_type_free(transpose);

    yaksa_finalize();
    return 0;
}

The previous code produces the following data in the pack/unpack buffers.

pack_buf=
 0  8 16 24 32 40 48 56
 1  9 17 25 33 41 49 57
 2 10 18 26 34 42 50 58
 3 11 19 27 35 43 51 59
 4 12 20 28 36 44 52 60
 5 13 21 29 37 45 53 61
 6 14 22 30 38 46 54 62
 7 15 23 31 39 47 55 63

unpack_buf=
 0  1  2  3  4  5  6  7
 8  9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63

The unpack buffer is identical to the original input matrix as a double transposition returns the original matrix.

Partial Pack and Unpack

As already said, there are cases in which pack/unpack might need to be performed to/from a buffer that can only contain a part of the original data. This is the case, for example, when transferring a very large buffer over the network. In such case it is more efficient to break up the packing/unpacking into smaller chunks and overlap the packing/unpacking of the current chunk with the transfer of the previous/next. This overlap between memory copy and network transfer can be achieved precisely using the partial packing/unpacking feature offered by Yaksa, as shown in the following example.

#include <yaksa.h>

int main()
{
    int rc;
    int input_matrix[64]; /* initialized with data from previous example */
    int pack_buf[64];
    int unpack_buf[64];
    yaksa_type_t indx_block;

    yaksa_init(YAKSA_INIT_ATTR__DEFAULT);

    /* For layout creation see corresponding example */

    yaksa_request_t request;
    uintptr_t actual_pack_bytes;

    uintptr_t chunk = 64;
    for (uintptr_t pos = 0; pos < 128; pos += chunk) {
        /* start partial packing */
        rc = yaksa_ipack(input_matrix, 1, indx_block, pos, pack_buf,
                         chunk, &actual_pack_bytes, &request);
        assert(rc == YAKSA_SUCCESS);

        /* wait for packing to complete */
        rc = yaksa_request_wait(request);
        assert(rc == YAKSA_SUCCESS);

        /* start partial unpacking */
        rc = yaksa_iunpack(pack_buf, chunk, unpack_buf, 1, 
                           indx_block, pos, &request);
        assert(rc == YAKSA_SUCCESS);

        /* wait for unpacking to complete */
        rc = yaksa_request_wait(request);
        assert(rc == YAKSA_SUCCESS);
    }

    yaksa_type_free(indx_block);

    yaksa_finalize();
    return 0;
}

The previous code is similar to the indexed block example. The difference is that only half of the data in the input buffer is packed/unpacked at every iteration of the for loop.