From 60c048228b5bfd13a99759bd213f23cb279cf9cd Mon Sep 17 00:00:00 2001 From: Devin Matthews Date: Wed, 17 Jul 2024 14:57:22 -0500 Subject: [PATCH] WIP on plugin documentation. --- docs/PluginHowTo.md | 661 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 661 insertions(+) create mode 100644 docs/PluginHowTo.md diff --git a/docs/PluginHowTo.md b/docs/PluginHowTo.md new file mode 100644 index 0000000000..0710be1231 --- /dev/null +++ b/docs/PluginHowTo.md @@ -0,0 +1,661 @@ + + + + +# Contents + +* **[Introduction](PluginHowTo.md#introduction)** + * **[Example Plugin](PluginHowTo.md#example-plugin)** + * **[Creating a New Plugin](PluginHowTo.md#creating-a-new-plugin)** + * **[Building a Plugin](PluginHowTo.md#building-a-plugin)** +* **[Kernels](PluginHowTo.md#kernels)** + * **[Accessing Kernels](PluginHowTo.md#accessing-kernels)** + * **[Reference Kernels](PluginHowTo.md#reference-kernels)** + * **[Optimized Kernels](PluginHowTo.md#optimized-kernels)** + * **[Mappign Kernels to Subconfigurations](PluginHowTo.md#mapping-kernels-to-subconfigurations)** +* **[Custom Operations](PluginHowTo.md#custom-operations)** + * **[Example: `bli_gemmt_ex`](PluginHowTo.md#example-bli_gemmt_ex)** + * **[The Control Tree](PluginHowTo.md#the-control-tree)** + * **[Modifying the Control Tree](PluginHowTo.md#modifying-the-control-tree)** +* **[API Reference](PluginHowTo.md#api-reference)** + * **[Registration](PluginHowTo.md#registration)** + * **[Helper Functions](PluginHowTo.md#helper-functions)** + * **[Context Initialization](PluginHowTo.md#context-initialization)** + * **[Context Query](PluginHowTo.md#context-query)** + * **[Control tree modification](PluginHowTo.md#control-tree-modification)** + +# Introduction + +A BLIS plugin is a piece of user-defined code that provides additional linear algebra functionality, but leverages BLIS's internal framework for high performance. Through a plugin, users can: + +* Provide customized or optimized [kernels](PluginHowTo.md#kernels), and access internal BLIS kernels. +* Define new, custom linear algebra [operations](PluginHowTo.md#custom-operations) which extend the level-3 BLAS (for example, `GEMM`). + +Plugins are defined completely externally to BLIS (that is, the BLIS source code is not required). However, an installed copy of BLIS 2.0 or later is required (assumed installed to `$PREFIX`) in order to configure or build a plugin. Building a plugin then results in a shared and/or static library which can be distributed or linked into your code. The template and example files generated by BLIS are all in C99, but C++ is also supported. + +## Example Plugin + +A new plugin is created by running `$PREFIX/share/blis/configure-plugin `, where `` is the name you wish to give to the plugin which must be a valid C99 identifier. By default, this generates a fully-functioning example plugin containing the following files: + +
├─ [Makefile](PluginHowTo.md#makefile) **
+├─ [config.mk](PluginHowTo.md#configmk) **
+├─ [config_registry](PluginHowTo.md#config_registry) **
+├─ [bli_plugin_\.h](PluginHowTo.md#bli_plugin_nameh)
+├─ [bli_plugin_register.c](PluginHowTo.md#bli_plugin_registerc)
+├─ config
+│  ├─ \
+│  ├─ ...
+│  └─ \
+│     ├─ [bli_kernel_defs_\.h](PluginHowTo.md#configconfigbli_kernel_defs_configh)
+│     ├─ [bli_plugin_init_\.c](PluginHowTo.md#configconfigbli_plugin_init_configc)
+│     └─ [make_defs.mk](PluginHowTo.md#configarchmake_defsmk)
+├─ ref_kernels
+│  ├─ [bli_plugin_init_ref.c](PluginHowTo.md#ref_kernelsbli_plugin_init_refc)
+│  ├─ [my_kernel_1_ref.c](PluginHowTo.md#ref_kernelsmy_kernel_1_refc-and-my_kernel_2_refc) *
+│  └─ [my_kernel_2_ref.c](PluginHowTo.md#ref_kernelsmy_kernel_1_refc-and-my_kernel_2_refc) *
+├─ kernels
+│  ├─ \
+│  ├─ ...
+│  └─ zen3
+│     └─ [my_kernel_1_zen3.c](PluginHowTo.md#kernelszen3my_kernel_1_zen3c) *
+└─ obj **
+   └─ [\](PluginHowTo.md#objconfig) **
+
+ +Files marked with `*` (and some portions of other files) are for example only and can be omitted by passing the `--disable-examples` flag to `configure-plugin`. Files and directories marked with `**` are only required when you are ready to build the plugin and can be disabled with `--disable-build`. The remaining files and directories constitute the plugin "template". If you want to later generate only build files then these files (which presumably already exist) can be skipped with `--disable-templates`. + +#### `Makefile` + +The Makefile for your plugin is automatically generated by `configure-plugin` and should not be modified. Targets `make` and `make clean` are supported and will build your plugin based on the flags given during configuration. + +#### `config.mk` + +This file is also generated by `configure-plugin` and should not need to be modified. + +#### `config_registry` + +This file is provides the mapping from kernel sets to subconfigurations and configuration families. See [Mapping Kernels to Subconfigurations](PluginHowTo.md#mapping-ernels-to-subconfigurations) for more details. + +#### `bli_plugin_.h` + +This file is the main header for the plugin. It should be `#include`d in order to use the functionality provided by the plugin. ***Note:*** the name and contents of this header are a suggestion---feel free to structure your plugin however you like! + +The example file contains several sections: + +* Macros defining arguments to be passed to the registration functions. The example given uses externally-provided arrays to store the generated kernel, blocksize, and preference IDs. Many alternative strategies are possible, e.g. passing a struct, passing individual pointers/references to IDs, or using global variables and passing no arguments (defining these macros to be empty). You can also pass in any other arguments you might need during registration. Macros are preferred to define the parameters since the parameter list is used in several different files and in generated code. + +* Enumerations providing convenient names by which kernel/blocksize/preference IDs can be obtained. In the example, these are offsets into the arrays passed into the `bli_plugin_register_`. So, calling code could look up the kernel ID for kernel #2 as `kerids[MY_KERNEL_2]`. This section is entirely optional if you prefer a different way of accessing kernel IDs. + +* Prototypes for kernels. A prototype (and preferably a typedef) is recommended for each kernel you write so that you can provide type safety when calling kernels. Note that both kernels are assumed to have reference implementations (one for each enabled subconfiguration, expanded using the `INSERT_GENTCONF` macro to generate prototypes automatically), while a special "optimized" kernel #2 is available for double-precision operations on Zen 3 hardware. The latter prototype is given only for example---your plugin code would not need to know whether or not an optimized kernel is available and would only need to look up kernels by ID. The file `config/zen3/bli_plugin_init_zen3.c` handles registering this optimized kernel so that it can be automatically selected when running on Zen 3. + +* Prototypes for the plugin registration function (`bli_plugin_register_`) and configuration-specific initialization functions. The former function can be named and structured however you like, but we recommend keeping the latter (configuration-specific) functions as-is. + +#### `bli_plugin_register.c` + +This file implements the function `bli_plugin_register_` and illustrates how to register new kernels, along with associated blocksizes and kernel preferences. Each registration function generates a new, unique ID which must be saved and communicated to the rest of the plugin (for example, via global variables or arguments passed in to the function `bli_plugin_register_`) so that they can be used later. This function also calls `bli_plugin_register__` for each architecture which was enabled at configure time (see [`bli_plugin_init_.c`](PluginHowTo.md#configconfigbli_plugin_init_configc)). + +Any code using the plugin should call this function (which you can rename if you like) before making use of any plugin functionality. + +#### `config//bli_kernel_defs_.h` + +This file provides macros specific to one subconfiguration, such as the register blocksizes for the BLIS `GEMM` microkernel. You can add any macros or other definitions here that you want to be avialable to any code being compiled for the corresponding subcofiguration. Note that configuration families (e.g. `x86_64`) supersede individual subconfigurations. + +#### `config//bli_plugin_init_.c` + +This file initializes the "context" with any kernels, blocksizes, or kernel preference which are optimized for the corresponding subconfiguration. It also call the reference initialization function in [`ref_kernels/bli_plugin_init_ref.c`](PluginHowTo.md#ref_kernelsbli_plugin_init_refc) for the matching configuration. A full example is given for the `zen3` subconfiguration. If no optimized kernels have been written for a particular subconfiguration, then no modifications are necessary. See [Mapping Kernels to Subconfigurations](PluginHowTo.md#mapping-ernels-to-subconfigurations) for more information about how optimized kernels and subconfigurations are related. + +#### `config//make_defs.mk` + +This file contains additional build variables or compiler-/architecture-specific flags for each subconfiguration. Typically these files should not be modified in order to achieve the best performance and maintain compatibility with BLIS. + +#### `ref_kernels/bli_plugin_init_ref.c` + +This file handles initialization of the context with [reference](PluginHowTo.md#reference-kernels) kernels. This file is compiled once for each enabled subconfiguration, resulting in functions `bli_plugin_init___ref`. Whenever you add a new reference kernel, blocksize, or kernel preference, you must also add code to initialize it here. + +#### `ref_kernels/my_kernel_1_ref.c` and `my_kernel_2_ref.c` + +These are example reference kernels. Note that the kernels are instantiated for the four standard datatypes (single and double precision, for both real and complex domains), indicated by the letters `sdcz`. Your kernels can use the same macros to help with instantiation of different types (or combinations of types), or you can use a different mechanism such as C++ templates. + +#### `kernels/zen3/my_kernel_1_zen3.c` + +This is an example optimized kernel. Typically optimized kernels are written with a specific data type or combination of data types in mind. In this example, only a double-precision real version is implemented, specifically for the Zen 3 architecture. + +#### `obj/` + +This folder will contain the built object files and static and/or shared library for the plugin. Only one sub-folder is created corresponding to the configuration for which BLIS was built. + +## Creating a New Plugin + +To create a "blank" plugin without any build files or example code, execute `$PREFIX/share/blis/configure-plugin --init ` in the directory where you want the plugin to exist. At this point, you can start adding your own: + +* Kernels, [see below](PluginHowTo.md#kernels) for more details + 1. Create a reference kernel. The file must be in the `ref_kernels` directory in order to be compiled correctly. Your kernel can any name and interface, but should ideally be implemented for all supported data types and should be architecture-agnostic. + 2. Register your kernel in the `bli_plugin_register.c` file. + 3. Initialize the context with pointer(s) to your reference kernel in the `ref_kernels/bli_plugin_init_ref.c` file. + 4. [Optionally] implemented optimized versions in the appropriate `kernels/` directories, and initialize them in `config//bli_plugin_init_.c` +* Blocksizes + 1. Register the blocksizes in `bli_plugin_register.c`. + 2. Provide default values in `ref_kernels/bli_plugin_init_ref.c`. All data types should be given a default value. + 3. [Optional] provide values for configuration-specific optimized implementations in `config//bli_plugin_init_.c`. +* Kernel preferences + 1. Register the kernel preferences in `bli_plugin_register.c`. + 2. Provide default values in `ref_kernels/bli_plugin_init_ref.c`. All data types should be given a default value. + 3. [Optional] provide values for configuration-specific optimized implementations in `config//bli_plugin_init_.c`. + +You will also need to provide a way to get registered kernel/blocksize/preference IDs back to your code by filling in the `plugin__params` and `plugin__params_only` macros in `bli_plugin_.h`, saving to global variables, etc. + +## Building a Plugin + +Before building your kernel on a particular system, you must reconfigure to build using `$PREFIX/share/blis/configure-plugin --build []` in the plugin directory. Note that you do not need to provide the plugin name if it can be guessed from the name of `bli_plugin_.h`. There are several flags which can be used to control how your plugin will be built: + +| Flag | Explanation | +|-----------------------|-------------| +| -p PATH,
--path=PATH | Look for the plugin source in PATH instead of the current directory. This option is used to build the plugin out-of-tree. | +| -e SYMBOLS,
--export-shared[=SYMBOLS] | Specify the subset of library symbols that are exported within a shared library. Valid values for SYMBOLS are: 'public' (the default) and 'all'. By default, only functions and variables that belong to public APIs are exported in shared libraries. However, the user may instead export all symbols in BLIS, even those that were intended for internal use only. Note that the public APIs encompass all functions that almost any user would ever want to call, including the BLAS/CBLAS compatibility APIs as well as the basic and expert interfaces to the typed and object APIs that are unique to BLIS. Also note that changing this option to 'all' will have no effect in some environments, such as when compiling with clang on Windows. | +| --enable-rpath,
--disable-rpath | Enable (disabled by default) setting an install_name for dynamic libraries on macOS which starts with @rpath rather than the absolute install path. | +| --disable-shared,
--enable-shared | Disable (enabled by default) building BLIS as a shared library. If the shared library build is disabled, the static library build must remain enabled. | +| --disable-static,
--enable-static | Disable (enabled by default) building BLIS as a static library. If the static library build is disabled, the shared library build must remain enabled. | +| -d DEBUG, --enable-debug[=DEBUG] | Enable debugging symbols in the library. If argument DEBUG is given as 'opt', then optimization flags are kept in the framework, otherwise optimization is turned off. | +| --enable-verbose-make,
--disable-verbose-make | Enable (disabled by default) verbose compilation output during make. | +| -f, --force | Overwrite any files in the current directory which are normally copied by configure-plugin, for example 'Makefile' and 'config_registry'. | +| --enable-asan,
--disable-asan | Enable (disabled by default) compiling and linking BLIS framework code with the AddressSanitizer (ASan) library. Optimized kernels are NOT compiled with ASan support due to limitations of register assignment in inline assembly. WARNING: ENABLING THIS OPTION WILL NEGATIVELY IMPACT PERFORMANCE. Please use only for informational/debugging purposes. | +| --enable-arg-max-hack
--disable-arg-max-hack | Enable (disabled by default) build system logic that will allow archiving/linking the static/shared library even if the command plus command line arguments exceeds the operating system limit (ARG_MAX). | + +After configuring, you can now build using `make`. **Your plugin is always built for the same subconfiguration or configuration family that BLIS was.** This means that build configuration should ideally be done on the target system, unless you are using an installation of BLIS which is configured for a "fat build" for a full configuration familty, such as `x86_64`. The final shared and/or static library is available in the `obj/` directory, where `` is the configuration that BLIS and your plugin are built for. + +# Kernels + +Kernels are the high-performance pieces of code at the heart of BLIS. A kernel usually does one simple computational operation on one or more input matrices, vectors, or scalars. For example, one of the workhorse kernels in BLIS is the `GEMM` microkernel, which computes a small matrix multiplication of `MR*k` and `k*NR` matrices, where `MR` and `NR` are constants depending on the architecture. You can write kernels which are intended to replace or extend existing BLIS kernels, or for any other operation which you might encounter in your code which needs a high-performance, architecture-specific solution. + +The BLIS plugin architecture supports two types of user-supplied kernels: reference kernels and optimized kernels. The former type of kernel is coded once (typically in standard C or C++), and compiled separately for any architecture which might be encountered. Then, at runtime BLIS will select the appropriate version of the kernel for the current hardware. Reference kernels typically do not achieve the highest performance, but are useful for less performance-sensitive operations such as data movement (which is bandwidth limited and not FLOP limited). For performance-critical kernels, you can additionally provide optimized kernels. These kernels are specific to one hardware architecture or family of related architectures, and are also often datatype-specific. These kernels also often employ compiler intrinsics or inline assembly which is not portable. If you provide an optimized kernel for a hardware architecture which is detected at runtime, BLIS will automatically select this kernel in preference to the reference kernel. + +In addition to kernels, BLIS plugins support providing blocksizes (for example, the `MR` and `NR` parameters above) as well as kernel preferences (essentially, the logical true/false equivalent of blocksizes) which control or define the behavior of kernels. These too are looked-up based on the actual hardware encountered at runtime, and come in reference (essentially, default) and optimized flavors. While internal BLIS kernels endeavor to operate correctly for any kind of input (although they work most efficiently for inputs which conform to the corresponding block sizes and preferences), your kernels are not required to support arbitrary inputs or parameters. You only have to provide the functionality that you know you will need! + +## Accessing Kernels + +Kernels, blocksizes, and kernel preferences are accessed through the "context", which reflects the kernel set available for the hardware on which BLIS is running. Initially, kernels and their parameters must be registered. This creates a slot in the context to hold pointers, blocksizes, or other data, and then returns a unique ID. Next, this slot must be filled with user-supplied data (pointers to reference kernels, default blocksizes, etc.), using the supplied IDs. If optimized kernels or parameters are avialable these are then written over the reference data. All of these steps happen during plugin registration which must happen before any computations are performed with the plugin (although BLIS itself can be used). Finally, at any point after plugin registratation, the current context can be obtained and then queried using the unique IDs: + +```C++ +const cntx_t* cntx = bli_gks_query_cntx(); + +my_fun_ptr kernel = ( my_fun_ptr )bli_cntx_get_ukr_dt( BLIS_DOUBLE, MY_KERNEL_ID, cntx ); + +kernel(...); +``` + +The process for registering and intializing kernels is detailed below. + +## Reference Kernels + +A reference kernel must first be registered. This should happen in `bli_plugin_register_` defined in `bli_plugin_register.c` (although you can change the function and file names): + +```C++ +err_t errval; +kerid_t id; + +err = bli_gks_register_ukr( &id ); +if ( err != BLIS_SUCCESS ) + //handle error +``` + +Note that for registration we don't need to know anything about the actual kernel yet. Next, the pointers to the reference kernels must be supplied in the file `ref_kernels/bli_plugin_init_ref.c` (again, you can change the filename, but it must reside in `ref_kernels`, and it is not recommended to change the function name or signature since this must match `bli_plugin_register.c` and is generated automatically for each subconfiguration): + +```C++ +func_t ptrs; +gen_func_init( &ptrs, PASTECH(my_kernel,BLIS_CNAME_INFIX,BLIS_REF_SUFFIX) ); +bli_cntx_set_ukr( MY_KERNEL_ID, &ptrs, cntx ); +``` + +The `func_t` struc contains a function pointer for each data type. In this example the helper macro `gen_func_init` is used to automatically generate the correct symbol name for each type and for the current subconfiguration (since this file is compiled once for each enabled subconfiguration). It is strongly recommended to use the provided macros and naming convention for reference kernels. However, you are free to use any method you like to fill the entries of the `func_t` struct, *with pointers to the reference function of the correct type and for the correct subfiguration*. The kernel is now fully initialized and can be used safely on any hardware which BLIS was configured for. + +## Optimized Kernels + +If an optimized kernel implementation is available (as a function in a file in some `kernels/` folder), it should be initialized in the appropriate file `config//bli_plugin_init_.c`. For example: + +```C++ +bli_cntx_set_ukrs +( + cntx, + + MY_KERNEL_ID, BLIS_DOUBLE, bli_dmy_kernel_zen3, + + BLIS_VA_END +); +``` + +Here, it is not necessary to provide an optimized implementation for all datatypes. The automatically-generated template code and build system will handle building the correct files and calling the initialization functions for subconfigurations which are enabled in the BLIS installation you are using. So, you can simply provide optimized implementations for any hardware which is important to you and it will be picked up and used if possible. + +## Mapping Kernels to Subconfigurations + +It may seem strange that optimized kernel implementations are written in the `kernels` folder, but are initialized in the `config` folder. In fact, the sub-folders of these two directories are not even the same! This is because in BLIS, multiple *subconfigurations* (roughly mapping to specific hardware architectures), as well as *configuration families* (for example, all `x86_64` architectures), can use kernels from one (or more) of the folders in `kernels`, called *kernel sets*. The mapping from kernel sets to configurations is defined by the `config_registry` file. Essentially, this means that when adding an optimized kernel, you should initialize the kernel in each configuration which maps the kernel set where you defined the kernel. Conversely, this also means that if you define the kernel in a kernel set which is not mapped by any enabled configuration, then the kernel will not exist and linking will fail. + +By default, this file contains the mapping known by BLIS at the time of plugin creation. Thus, it might be a good idea to periodically reconfigure your plugin in order to pick up new `config` or `kernels` sub-folders and entries in `config_registry`. Instead, or in addition, you can define your own mappings in `config_registry` to reflect how your particular kernels should be used. *Note that this mapping only affects kernels in your plugin, and does not affect reference kernels.* See [here](ConfigurationHowTo.md) for more information on subconfigurations, configuration families, and mapping of kernel sets. + +# Custom Operations + +BLIS is written as a framework, meaning that user-written code can be inserted in order to achieve new functionality. For example, consider the mathematical operation $C := \alpha A D A^T + \beta C$ where $D$ is a diagonal matrix. If $D$ were the identity matrix, then this would be a standard level-3 BLAS operation, `SYRK`, so we call this BLAS-like operation `SYRKD`. While it is technically not necessary to use the plugin infrastructure to implement `SYRKD` using BLIS, extending BLAS operations typically requires new kernels which are conveniently managed as a plugin. However, the code discussed in this section does not need to exist in the plugin directory (although it can be placed in the top-level plugin directory) but should have access to the kernel, blocksize, and kernel preference IDs registered by the plugin. + +Because $A D A^T = A (A D)^T = A B = (A D) A^T = B^T A^T$ where $B = (A D)^T$, it is actually even more closely related to the operation `GEMMT`, which implements $\operatorname{tri}(C) := \operatorname{tri}(\alpha A B + \beta C)$ where the function `tri` operates only on the upper or lower part of a matrix. Essentially, this is just `GEMM` where we know the result will in fact be symmetric even though $A \ne B^T$. + +## Example: `bli_gemmt_ex` + +TODO + +## The Control Tree + +TODO + +## Modifying the Control Tree + +TODO + +# API Reference + +## Registration + +```C++ +err_t bli_gks_register_ukr( siz_t* ukr_id ); +``` + +Register a new microkernel, which may have a different implementation for each supported data type. + + + + +
Parameters:
  • ukr_id – A pointer to value which will be set to the unique ID of the new kernel.
Returns: An error code which is BLIS_SUCCESS on success.
+ +```C++ +err_t bli_gks_register_ukr2( siz_t* ukr_id ); +``` + +Register a new microkernel, which may have a different implementation for each *pair* of supported data types. + + + + +
Parameters:
  • ukr_id – A pointer to value which will be set to the unique ID of the new kernel.
Returns: An error code which is BLIS_SUCCESS on success.
+ +```C++ +err_t bli_gks_register_blksz( siz_t* bs_id ); +``` + +Register a new blocksize, which may have a different integral value for each supported data type. + + + + +
Parameters:
  • bs_id – A pointer to value which will be set to the unique ID of the new blocksize.
Returns: An error code which is BLIS_SUCCESS on success.
+ +```C++ +err_t bli_gks_register_ukr_pref( siz_t* ukr_pref_id ); +``` + +Register a new microkernel preference, which may have a different logical value for each supported data type. + + + + +
Parameters:
  • ukr_pref_id – A pointer to value which will be set to the unique ID of the new preference.
Returns: An error code which is BLIS_SUCCESS on success.
+ +## Helper Functions + +```C++ +void_fp bli_func_get_dt( num_t dt, + const func_t* func ); +``` + +TODO + +```C++ +void bli_func_set_dt( void_fp fp, + num_t dt, + func_t* func ); +``` + +```C++ +void bli_func_copy_dt( num_t dt_src, const func_t* func_src, + num_t dt_dst, func_t* func_dst ); +``` + +```C++ +func_t* bli_func_create( void_fp ptr_s, + void_fp ptr_d, + void_fp ptr_c, + void_fp ptr_z ); +``` + +```C++ +void bli_func_init( func_t* f, + void_fp ptr_s, + void_fp ptr_d, + void_fp ptr_c, + void_fp ptr_z ); +``` + +```C++ +void bli_func_init_null( func_t* f ); +``` + +```C++ +void bli_func_free( func_t* f ); +``` + +```C++ +void_fp bli_func2_get_dt( num_t dt1, + num_t dt2, + const func2_t* func ); +``` + +```C++ +void bli_func2_set_dt( void_fp fp, + num_t dt1, + num_t dt2, + func2_t* func ); +``` + +```C++ +func2_t* bli_func2_create( void_fp ptr_ss, void_fp ptr_sd, void_fp ptr_sc, void_fp ptr_sz, + void_fp ptr_ds, void_fp ptr_dd, void_fp ptr_dc, void_fp ptr_dz, + void_fp ptr_cs, void_fp ptr_cd, void_fp ptr_cc, void_fp ptr_cz, + void_fp ptr_zs, void_fp ptr_zd, void_fp ptr_zc, void_fp ptr_zz ); +``` + +```C++ +void bli_func2_init( func2_t* f, + void_fp ptr_ss, void_fp ptr_sd, void_fp ptr_sc, void_fp ptr_sz, + void_fp ptr_ds, void_fp ptr_dd, void_fp ptr_dc, void_fp ptr_dz, + void_fp ptr_cs, void_fp ptr_cd, void_fp ptr_cc, void_fp ptr_cz, + void_fp ptr_zs, void_fp ptr_zd, void_fp ptr_zc, void_fp ptr_zz ); +``` + +```C++ +void bli_func2_init_null( func2_t* f ); +``` + +```C++ +void bli_func2_free( func2_t* f ); +``` + +```C++ +dim_t bli_blksz_get_def( num_t dt, + const blksz_t* b ); +``` + +```C++ +dim_t bli_blksz_get_max( num_t dt, + const blksz_t* b ); +``` + +```C++ +void bli_blksz_set_def ( dim_t val, + num_t dt, + blksz_t* b ); +``` + +```C++ +void bli_blksz_set_max( dim_t val, + num_t dt, + blksz_t* b ); +``` + +```C++ +void bli_blksz_copy( const blksz_t* b_src, + blksz_t* b_dst ); +``` + +```C++ +void bli_blksz_copy_if_nonneg( const blksz_t* b_src, + blksz_t* b_dst ); +``` + +```C++ +void bli_blksz_copy_def_dt( num_t dt_src, const blksz_t* b_src, + num_t dt_dst, blksz_t* b_dst ); +``` + +```C++ +void bli_blksz_copy_max_dt( num_t dt_src, const blksz_t* b_src, + num_t dt_dst, blksz_t* b_dst ); +``` + +```C++ +void bli_blksz_copy_dt( num_t dt_src, const blksz_t* b_src, + num_t dt_dst, blksz_t* b_dst ); +``` + +```C++ +blksz_t* bli_blksz_create( dim_t b_s, dim_t b_d, dim_t b_c, dim_t b_z, + dim_t be_s, dim_t be_d, dim_t be_c, dim_t be_z ); +``` + +```C++ +blksz_t* bli_blksz_create_ed( dim_t b_s, dim_t be_s, + dim_t b_d, dim_t be_d, + dim_t b_c, dim_t be_c, + dim_t b_z, dim_t be_z ); +``` + +```C++ +void bli_blksz_init( blksz_t* b, + dim_t b_s, dim_t b_d, dim_t b_c, dim_t b_z, + dim_t be_s, dim_t be_d, dim_t be_c, dim_t be_z ); +``` + +```C++ +void bli_blksz_init_ed( blksz_t* b, + dim_t b_s, dim_t be_s, + dim_t b_d, dim_t be_d, + dim_t b_c, dim_t be_c, + dim_t b_z, dim_t be_z ); +``` + +```C++ +void bli_blksz_init_easy( blksz_t* b, + dim_t b_s, dim_t b_d, dim_t b_c, dim_t b_z ); +``` + +```C++ +void bli_blksz_free( blksz_t* b ); +``` + +```C++ +bool bli_mbool_get_dt( num_t dt, const mbool_t* mb ); +``` + +```C++ +void bli_mbool_set_dt( bool val, num_t dt, mbool_t* mb ); +``` + +```C++ +mbool_t* bli_mbool_create( bool b_s, + bool b_d, + bool b_c, + bool b_z ); +``` + +```C++ +void bli_mbool_init( mbool_t* b, + bool b_s, + bool b_d, + bool b_c, + bool b_z ); +``` + +```C++ +void bli_mbool_free( mbool_t* b ); +``` + +```C++ +#define PASTECH(...) +``` + +```C++ +#define PASTEMAC(...) +``` + +```C++ +#define gen_func_init( func_p, opname ) +``` + +```C++ +#define gen_func_init_ro( func_p, opname ) +``` + +```C++ +#define gen_func_init_co( func_p, opname ) +``` + +## Context Initialization + +```C++ +err_t bli_cntx_set_ukr( siz_t ukr_id, const func_t* func, cntx_t* cntx ); +``` + +```C++ +void bli_cntx_set_ukr_dt( void_fp fp, num_t dt, siz_t ukr_id, const func_t* func, cntx_t* cntx ); +``` + +```C++ +err_t bli_cntx_set_ukr2( siz_t ukr_id, const func2_t* func, cntx_t* cntx ); +``` + +```C++ +void bli_cntx_set_ukr2_dt( void_fp fp, num_t dt1, num_t dt2, siz_t ukr_id, const func_t* func, cntx_t* cntx ); +``` + +```C++ +err_t bli_cntx_set_blksz( siz_t bs_id, const blksz_t* blksz, siz_t mult_id, cntx_t* cntx ); +``` + +```C++ +void bli_cntx_set_blksz_def_dt( num_t dt, siz_t bs_id, dim_t bs, cntx_t* cntx ); +``` + +```C++ +void bli_cntx_set_blksz_max_dt( num_t dt, siz_t bs_id, dim_t bs, cntx_t* cntx ); +``` + +```C++ +err_t bli_cntx_set_ukr_pref( siz_t ukr_pref_id, const mbool_t* prefs, cntx_t* cntx ); +``` + +```C++ +err_t bli_cntx_set_ukr_pref_dt( bool pref, num_t dt, siz_t ukr_pref_id, cntx_t* cntx ); +``` + +```C++ +void bli_cntx_set_ukrs( cntx_t* cntx, + siz_t ukr0_id, num_t dt0, void_fp ukr0_fp, + siz_t ukr1_id, num_t dt1, void_fp ukr1_fp, + siz_t ukr2_id, num_t dt2, void_fp ukr2_fp, + ..., + BLIS_VA_END ); +``` + +```C++ +void bli_cntx_set_ukr2s( cntx_t* cntx, + siz_t ukr0_id, num_t dt1_0, num_t dt2_0, void_fp ukr0_fp, + siz_t ukr1_id, num_t dt1_1, num_t dt2_1, void_fp ukr1_fp, + siz_t ukr2_id, num_t dt1_2, num_t dt2_2, void_fp ukr2_fp, + ..., + BLIS_VA_END ); +``` + +```C++ +void bli_cntx_set_blksz( cntx_t* cntx, + siz_t bs0_id, const blksz_t* blksz0, siz_t bm0_id, + siz_t bs1_id, const blksz_t* blksz1, siz_t bm1_id, + siz_t bs2_id, const blksz_t* blksz2, siz_t bm2_id, + ..., + BLIS_VA_END ); +``` + +```C++ +void bli_cntx_set_ukr_prefs( cntx_t* cntx, + siz_t ukr_pref0_id, num_t dt0, bool ukr_pref0, + siz_t ukr_pref1_id, num_t dt1, bool ukr_pref1, + siz_t ukr_pref2_id, num_t dt2, bool ukr_pref2, + ..., + BLIS_VA_END ); +``` + +## Context Query + +```C++ +const cntx_t* bli_gks_query_cntx(); +``` + +```C++ +const cntx_t* bli_gks_lookup_id( arch_t id ); +``` + +```C++ +const func_t* bli_cntx_get_ukrs( siz_t ukr_id, const cntx_t* cntx ); +``` + +```C++ +void_fp bli_cntx_get_ukr_dt( num_t dt, siz_t ukr_id, const cntx_t* cntx ); +``` + +```C++ +const func2_t* bli_cntx_get_ukr2s( siz_t ukr_id, const cntx_t* cntx ); +``` + +```C++ +void_fp bli_cntx_get_ukr2_dt( num_t dt1, num_t dt2, siz_t ukr_id, const cntx_t* cntx ); +``` + +```C++ +const blksz_t* bli_cntx_get_blksz( siz_t bs_id, const cntx_t* cntx ); +``` + +```C++ +dim_t bli_cntx_get_blksz_def_dt( num_t dt, siz_t bs_id, const cntx_t* cntx ); +``` + +```C++ +dim_t bli_cntx_get_blksz_max_dt( num_t dt, siz_t bs_id, const cntx_t* cntx ); +``` + +```C++ +siz_t bli_cntx_get_bmult_id( siz_t bs_id, const cntx_t* cntx ); +``` + +```C++ +const blksz_t* bli_cntx_get_bmult( siz_t bs_id, const cntx_t* cntx ); +``` + +```C++ +dim_t bli_cntx_get_bmult_dt( num_t dt, siz_t bs_id, const cntx_t* cntx ); +``` + +```C++ +const mbool_t* bli_cntx_get_ukr_prefs( siz_t ukr_pref_id, const cntx_t* cntx ); +``` + +```C++ +bool bli_cntx_get_ukr_prefs_dt( num_t dt, siz_t ukr_pref_id, const cntx_t* cntx ); +``` + +## Control tree modification + +TODO \ No newline at end of file