-
Notifications
You must be signed in to change notification settings - Fork 24
Pack and Unpack
This document presents the pack and unpack APIs provided by Yaksa and showcases them using some simple example from Creating New Datatypes. These APIs have non-blocking semantics and return a request object that can be used to test on progress of these operations of the wait for their completion (see apposite APIs that follow). The reason for having non-blocking semantics is that some target architectures, such as GPUs, allow data to be transferred asynchronously (without the intervention of the CPU). Besides this, the APIs also allow for partial pack and unpack. Some example of when this feature is useful is given in the rest of the document.
int yaksa_ipack(const void * inbuf,
uintptr_t incount,
yaksa_type_t type,
uintptr_t inoffset,
void * outbuf,
uintptr_t max_pack_bytes,
uintptr_t * actual_pack_bytes,
yaksa_request_t * request)
- Pack the data represented by (incount, type) tuple into a contiguous buffer
- Parameters
- [in]
inbuf
: input buffer from which data is being packed - [in]
incount
: number of elements of the datatype representing the layout - [in]
type
: datatype representing the layout - [in]
inoffset
: number of bytes to skip from the layout represented by the (incount, type) tuple - [out]
outbuf
: output buffer in which data is being packed - [in]
max_pack_bytes
: maximum number of bytes that can be packed in the output buffer - [out]
actual_pack_bytes
: actual number of bytes that were packed in the output buffer - [out]
request
: request handle associated with the operation (YAKSA_REQUEST__NULL
if the request already completed)
- [in]
- Return values
- On success,
YAKSA_SUCCESS
is returned. - On error, a non-zero error code is returned.
- On success,
int yaksa_iunpack(const void * inbuf,
uintptr_t insize,
void * outbuf,
uintptr_t outcount,
yaksa_type_t type,
uintptr_t outoffset,
yaksa_request_t * request)
- Unpack data from a contiguous buffer into a buffer represented by the (incount, type) touple
- Parameters
- [in]
inbuf
: input buffer from which data is being unpacked - [in]
insize
: number of bytes in the input buffer - [out]
outbuf
: output buffer into which data is being unpacked - [out]
outcount
: number of elements of the data representing the layout - [in]
type
: datatype representing the layout - [out]
outoffset
: number of bytes to skip from the layout represented by the (incount, type) tuple - [out]
request
: request handle associated with the operation (YAKSA_REQUEST__NULL
if the request already completed)
- [in]
- Return values
- On success,
YAKSA_SUCCESS
is returned. - On error, a non-zero error code is returned.
- On success,
int yaksa_request_wait(yaksa_request_t request)
- Wait till a request has completed
- Parameters
- [in]
request
: the request object that needs to be waited up on
- [in]
- Return values
- On success,
YAKSA_SUCCESS
is returned. - On error, a non-zero error code is returned.
- On success,
int yaksa_request_test(yaksa_request_t request,
int * completed)
- Test to see if a request has completed
- Parameters
- [in]
request
: the request object that needs to be tested - [out]
completed
: flag to tell the caller whether the request object has completed
- [in]
- Return values
- On success,
YAKSA_SUCCESS
is returned. - On error, a non-zero error code is returned.
- On success,
The examples in this section build on the examples presented in Datatype Creation. They show how the produced data in the pack buffer looks like when considering different layouts. Moreover, they also show how to do partial pack and unpack. This feature is useful when the original data has to be sent to another process over the network, for example, and the communication library cannot transfer all the data at once. In this case smaller portions of the data have to be transferred one after the other in sequence.
Packing data using a contiguous layout has the effect of making an exact copy of the original data into the pack buffer, up to the number of bytes defined by the layout. Similarly, unpacking data using a contiguous layout has the effect of making an exact copy of the data in the pack buffer into the target buffer. The following code shows how to pack and unpack using the contiguous layout and the input matrix used in previous examples.
#include <yaksa.h>
int main()
{
int rc;
int input_matrix[64]; /* initialized with data from previous example */
int pack_buf[64];
int unpack_buf[64];
yaksa_type_t contig;
yaksa_init(); /* before any yaksa API is called the library
must be initialized */
/* For layout creation see corresponding example */
yaksa_request_t request;
uintptr_t actual_pack_bytes;
/* start packing */
rc = yaksa_ipack(input_matrix, 1, contig, pack_buf, 256,
&actual_pack_bytes, &request);
assert(rc == YAKSA_SUCCESS);
/* wait for packing to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
/* start unpacking */
rc = yaksa_iunpack(pack_buf, 256, unpack_buf, 256, contig, 0,
&request);
assert(rc == YAKSA_SUCCESS);
/* wait for unpacking to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
yaksa_free(contig);
yaksa_finalize();
return 0;
}
In the previous code the yaksa_ipack/iunpack
functions start the packing/unpacking. These functions have non-blocking semantics and return a request object that can be used to check when the packing/unpacking has completed. The reason for having non-blocking semantics is that some target architectures, such as GPUs, support asynchronous memory copies. The previous code produces the following data in the pack/unpack buffers.
pack_buf = unpack_buf =
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
The following code shows how to pack and unpack using the vector layout and the input matrix used in previous examples.
#include <yaksa.h>
int main()
{
int rc;
int input_matrix[64]; /* initialized with data from previous example */
int pack_buf[64];
int unpack_buf[64];
yaksa_type_t vector;
yaksa_init(); /* before any yaksa API is called the library
must be initialized */
/* For layout creation see corresponding example */
yaksa_request_t request;
uintptr_t actual_pack_bytes;
/* start packing.
* note that we can request more bytes in max_pack_bytes and will
* get the correct number of packed bytes in actual_pack_bytes */
rc = yaksa_ipack(input_matrix, 1, vector, pack_buf, 256,
&actual_pack_bytes, &request);
assert(rc == YAKSA_SUCCESS);
/* wait for packing to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
/* start unpacking */
rc = yaksa_iunpack(pack_buf, 32, unpack_buf, 1, vector, 0,
&request);
assert(rc == YAKSA_SUCCESS);
/* wait for unpacking to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
yaksa_free(vector);
yaksa_finalize();
return 0;
}
The previous code produces the following data in the pack/unpack buffers.
pack_buf=
0 8 16 24 32 40 48 56
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
unpack_buf=
0 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0
16 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0
32 0 0 0 0 0 0 0
40 0 0 0 0 0 0 0
48 0 0 0 0 0 0 0
56 0 0 0 0 0 0 0
The following code shows how to pack and unpack using the indexed block layout and the input matrix used in previous examples.
#include <yaksa.h>
int main()
{
int rc;
int input_matrix[64]; /* initialized with data from previous example */
int pack_buf[64];
int unpack_buf[64];
yaksa_type_t indx_block;
yaksa_init(); /* before any yaksa API is called the library
must be initialized */
/* For layout creation see corresponding example */
yaksa_request_t request;
uintptr_t actual_pack_bytes;
/* start packing */
rc = yaksa_ipack(input_matrix, 1, indx_block, pack_buf, 256,
&actual_pack_bytes, &request);
assert(rc == YAKSA_SUCCESS);
/* wait for packing to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
/* start unpacking */
rc = yaksa_iunpack(pack_buf, 128, unpack_buf, 1, indx_block, 0,
&request);
assert(rc == YAKSA_SUCCESS);
/* wait for unpacking to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
yaksa_free(indx_block);
yaksa_finalize();
return 0;
}
The previous code produces the following data in the pack/unpack buffers.
pack_buf=
4 5 6 7 12 13 14 15
20 21 22 23 28 29 30 31
32 33 34 35 40 41 42 43
48 49 50 51 56 57 58 59
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
unpack_buf=
0 0 0 0 4 5 6 7
0 0 0 0 12 13 14 15
0 0 0 0 20 21 22 23
0 0 0 0 28 29 30 31
32 33 34 35 0 0 0 0
40 41 42 43 0 0 0 0
48 49 50 51 0 0 0 0
56 57 58 59 0 0 0 0
The following code shows how to pack and unpack using the indexed layout and the input matrix used in previous examples.
#include <yaksa.h>
int main()
{
int rc;
int input_matrix[64]; /* initialized with data from previous example */
int pack_buf[64];
int unpack_buf[64];
yaksa_type_t indexed;
yaksa_init(); /* before any yaksa API is called the library
must be initialized */
/* For layout creation see corresponding example */
yaksa_request_t request;
uintptr_t actual_pack_bytes;
/* start packing */
rc = yaksa_ipack(input_matrix, 1, indexed, pack_buf, 256,
&actual_pack_bytes, &request);
assert(rc == YAKSA_SUCCESS);
/* wait for packing to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
/* start unpacking */
rc = yaksa_iunpack(pack_buf, 84, unpack_buf, 1, indexed, 0,
&request);
assert(rc == YAKSA_SUCCESS);
/* wait for unpacking to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
yaksa_free(indexed);
yaksa_finalize();
return 0;
}
The previous code produces the following data in the pack/unpack buffers.
pack_buf=
9 18 19 26 27 36 37 38
39 44 45 46 47 52 53 54
55 60 61 62 63 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
unpack_buf=
0 0 0 0 0 0 0 0
0 9 0 0 0 0 0 0
0 0 18 19 0 0 0 0
0 0 26 27 0 0 0 0
0 0 0 0 36 37 38 39
0 0 0 0 44 45 46 47
0 0 0 0 52 53 54 55
0 0 0 0 60 61 62 63
The following code shows how to pack and unpack using the transposed layout and the input matrix used in previous examples.
#include <yaksa.h>
int main()
{
int rc;
int input_matrix[64]; /* initialized with data from previous example */
int pack_buf[64];
int unpack_buf[64];
yaksa_type_t vector;
yaksa_type_t vector_resized;
yaksa_type_t transpose;
yaksa_init(); /* before any yaksa API is called the library
must be initialized */
/* For layout creation see corresponding example */
yaksa_request_t request;
uintptr_t actual_pack_bytes;
/* start packing */
rc = yaksa_ipack(input_matrix, 1, transpose, pack_buf, 256,
&actual_pack_bytes, &request);
assert(rc == YAKSA_SUCCESS);
/* wait for packing to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
/* start unpacking */
rc = yaksa_iunpack(pack_buf, 256, unpack_buf, 1, transpose, 0,
&request);
assert(rc == YAKSA_SUCCESS);
/* wait for unpacking to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
yaksa_free(vector);
yaksa_free(vector_resized);
yaksa_free(transpose);
yaksa_finalize();
return 0;
}
The previous code produces the following data in the pack/unpack buffers.
pack_buf=
0 8 16 24 32 40 48 56
1 9 17 25 33 41 49 57
2 10 18 26 34 42 50 58
3 11 19 27 35 43 51 59
4 12 20 28 36 44 52 60
5 13 21 29 37 45 53 61
6 14 22 30 38 46 54 62
7 15 23 31 39 47 55 63
unpack_buf=
0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31
32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55
56 57 58 59 60 61 62 63
The unpack buffer is identical to the original input matrix as a double transposition returns the original matrix.
As already said, there are cases in which pack/unpack might need to be performed to/from a buffer that can only contain a part of the original data. This is the case, for example, when transferring a very large buffer over the network. In such case it is more efficient to break up the packing/unpacking into smaller chunks and overlap the packing/unpacking of the current chunk with the transfer of the previous/next. This overlap between memory copy and network transfer can be achieved precisely using the partial packing/unpacking feature offered by Yaksa, as shown in the following example.
#include <yaksa.h>
int main()
{
int rc;
int input_matrix[64]; /* initialized with data from previous example */
int pack_buf[64];
int unpack_buf[64];
yaksa_type_t indx_block;
yaksa_init(); /* before any yaksa API is called the library
must be initialized */
/* For layout creation see corresponding example */
yaksa_request_t request;
uintptr_t actual_pack_bytes;
uintptr_t chunk = 64;
for (uintptr_t pos = 0; pos < 128; pos += chunk) {
/* start partial packing */
rc = yaksa_ipack(input_matrix, 1, indx_block, pos, pack_buf,
chunk, &actual_pack_bytes, &request);
assert(rc == YAKSA_SUCCESS);
/* wait for packing to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
/* start partial unpacking */
rc = yaksa_iunpack(pack_buf, chunk, unpack_buf, 1,
indx_block, pos, &request);
assert(rc == YAKSA_SUCCESS);
/* wait for unpacking to complete */
rc = yaksa_request_wait(request);
assert(rc == YAKSA_SUCCESS);
}
yaksa_free(indx_block);
yaksa_finalize();
return 0;
}
The previous code is similar to the indexed block example. The difference is that only half of the data in the input buffer is packed/unpacked at every iteration of the for loop.