Skip to content

Latest commit

 

History

History
 
 

gapir

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Graphics API Replay (GAPIR)

GAPIR is a stack-based virtual machine that can execute programs formed from a very small instruction set.

Evaluation of existing VMs

Before embarking on building a new virtual machine from scratch, we evaluated our needs, and compared it to a number of existing, lightweight, open-source VMs (Lua, Parrot, Neko, etc).

We opted for building a custom VM because:

  • Our required instruction set was significantly smaller than those provided by other VMs. We have no need for functions or any type of control flow, and by reducing the instruction set to only what we absolutely require, we’ve avoided unnecessary complexity in testing, and generation of the command stream.
  • We have no need for standard libraries (math functions, io functions, etc), which for some VMs come bundled with, and can be tricky to separate.
  • We desired a very custom memory system that would have been difficult to fit into other VMs.
  • Some of the VMs of interest had licences that were incompatible with our needs.
  • Our speed requirements are very high, we do profiling based on the VM playback, we need as little overhead as possible per draw call.

Memory pools

GAPIR has 3 distinct types of memory pools.

Volatile memory

Volatile memory is pre-allocated memory that is free to be modified by any opcode during execution. It can be used for temporary or semi-persistent storage.

Constant memory

Along with a sequence of opcodes, a replay request contains a block of constant data. This may be read from at any point in the execution of the replay, but is immutable for the entire replay.

Absolute pointers

Memory that’s not allocated by the replay system may still need to be read or written to in order to perform a replay. Pointers returned by [glGetString] glGetString or glMapBufferRange are examples of memory that’s not allocated by the replay system, but may need to be accessed.

Data types

The AGI virtual-machine supports the following primitive data types:

Type Description
Bool true / false value
Int8 8-bit signed integer
Int16 16-bit signed integer
Int32 32-bit signed integer
Int64 64-bit signed integer
Uint8 8-bit unsigned integer
Uint16 16-bit unsigned integer
Uint32 32-bit unsigned integer
Uint64 64-bit unsigned integer
Float 32-bit floating point number
Double 64-bit floating point number
AbsolutePointer Pointer to an absolute address
ConstantPointer Pointer within the constant pool
VolatilePointer Pointer within the volatile pool

Stack

The VM uses a standard LIFO stack where each element is a type-value pair. The size of the stored elements are unified to the size of the largest storable type and all of the elements are aligned.

Each operation, except for CLONE, consumes the operands from the current stack and pushes the result back to the stack.

Opcodes

Each opcode is 32 bits long where the first 6 bits are the instruction code and the rest of the bits contain the instruction data. This leaves room for additional instructions to be added in the future.

Notation: <field_name:field_size_in_bits>

CALL(push-return, api, function) [-{arg-count} (any type) / +{push-return} (any type)]

<code:6> <padding:1> <push-return:1> <padding:4> <api:4> <function id:16>

Calls the specified function in the given API and if push-return is 1 then saves the return value to the stack; otherwise the return value is discarded.

The arguments are popped from the stack and they are type-checked with the arguments of the called function.

The arguments have to be pushed onto the stack in order (the last argument is on the top of the stack).

Function IDs in range 0xff00-0xffff are reserved.

PUSH_I(type, data) [+1 (type)]

<code:6> <type:6> <data:20>

Pushes data to the top of the stack.

If the data type is an integer or a pointer type, then the data is copied into the least-significant-bits of the target word, sign-extending if the type is signed.

If the data type is a float or double, then the value is written to the sign and exponent bits of the floating point number, and the fractional bits are set to 0.

LOAD_C(type, address) [+1 (type)]

<code:6> <type:6> <constant-address:20>

Pushes data loaded from constant-address to the top of the stack.

LOAD_V(type, address) [+1 (type)]

<code:6> <type:6> <volatile-address:20>

Pushes data loaded from volatile-address to the top of the stack.

LOAD(type) [-1 (pointer) / +1 (type)]

<code:6> <type:6> <padding:20>

Pops a memory address from the top of the stack and pushes the data at that address to the top of the stack

POP(count) [-{count} (any type)]

<code:6> <count:26>

Pops and discards count values from the top of the stack.

STORE_V(volatile-address) [-1 (any type)]

<code:6> <volatile-address:26>

Pops the top value from the the stack and saves it to volatile-address. All pointer values, regardless of the pointer type on the stack, will be stored as an absolute pointer address.

STORE() [-2 (pointer, any type)]

<code:6> <padding:26>

Pops the target address and then the value from the top of the stack, and then stores the value to the target address. All pointer values, regardless of the pointer type on the stack, will be stored as an absolute pointer address.

RESOURCE(resource-id) [-1 (pointer)]

<code:6> <resource-id:26>

Pops the address from the top of the stack and then loads the resource resource-id to that address.

POST() [-2 (uint32_t, pointer)]

<code:6> <padding:26>

Pops size and then a pointer from the top of the stack and posts size bytes of data from the address to the server.

COPY(count) [-2 (pointer, pointer)]

<code:6> <count:26>

Pops the target address then the source address from the top of the stack, and then copies count bytes from source to target.

CLONE(n) [+1 (any type)]

<code:6> <n:26>

Copies the n-th element from the top of the stack to the new top of the stack.

STRCPY() [-2 (pointer, pointer)]

<code:6> <max-count:26>

Pops the target address then the source address from the top of the stack, and then copies at most max-count minus one bytes from source to target. If the max-count is greater than the source string length, then the target will be padded with 0s. The destination buffer will always be 0-terminated.

EXTEND(value) [no change]

<code:6> <value:26>

Extends the value at the top of the stack with the given data, in-place.

If the data type of the top of the stack is an integer or a pointer type, then the value on the stack is left-shifted by 26 bits and is bitwise-OR’ed with the specified value.

If the data type is a float or double, then the fractional part of the floating point value on the stack is left-shifted by 26 bits and is bitwise-OR’ed with the specified value. Bits shifted beyond the fractional part of the floating point number are discarded.

ADD(value) [no change]

<code:6> <count:26>

Pops and sums count values from the top of the stack, and then pushes the result to the top of the stack.

All summed value types must be equal.

LABEL(value) [no change]

<code:6> <value:26>

Set the current debug label to value. The label value is displayed in debug messages or in the case of a crash.

JUMPLABEL(value) [no change]

<code:6> <value:26>

Add a jump label to store the current execute instruction index so that later a jump instruction can jump to this instruction and start execution from there.

JUMPNZ(value) [no change]

<code:6> <value:26>

Jump to the instruction specified by the jump label and start execution from there if the value on the top of the stack is not zero. Otherwise it is a Nop.

JUMPZ(value) [no change]

<code:6> <value:26>

Jump to the instruction specified by the jump label and start execution from there if the value on the top of the stack is zero. Otherwise it is a Nop.

NOTIFICATION() [-2 (uint32_t, pointer)]

<code:6> <padding:26>

Pops size and then a pointer from the top of the stack and streams back size bytes of data from the address to the server via the notification message.

WAIT() [no change]

<code:6> <fence-id:26>

Streams back the fence-id to the server. Replay pauses until the server streams back the same ID.

Resources

GAPIR is designed to be run on desktop and Android devices. When replaying on Android, the communication between GAPIS and GAPIR is usually performed over USB 2, which has a peak throughput of around 60 megabytes per second. It’s not uncommon for capture files to be hundreds of megabytes in size, and in rare cases an order of magnitude greater than that.

It is typical for many replay requests to be made for the same capture file - for example clicking around the draw calls in the client will usually result in a replay request per click. The bulk of the data in replay requests of the same capture file is identical - the large assets are typically static textures and mesh data.

To avoid repeated transmission of these large assets over USB, GAPIR has a memory cache for storing resource data.

A list of resources used in the replay is included as part of the replay request payload header. This list consists of all the resource identifiers used by the replay stream (and their size). Upon receiving the header, GAPIR can check which of the resources it already has in its cache, and request the resource data for those that are missing.