Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework hlsl-vector-type into two specs #361

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
323 changes: 323 additions & 0 deletions proposals/0026-hlsl-long-vector-type.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,323 @@
<!-- {% raw %} -->

# HLSL Long Vectors

* Proposal: [0026-HLSL-Vectors](0026-hlsl-vector-type.md)
* Author(s): [Anupama Chandrasekhar](https://github.com/anupamachandra), [Greg Roth](https://github.com/pow2clk)
* Sponsor: [Greg Roth](https://github.com/pow2clk)
* Status: **Under Consideration**

## Introduction

HLSL has previously supported vectors of as many as four elements (int3, float4, etc.).
These are useful in a traditional graphics context for representation and manipulation of
pow2clk marked this conversation as resolved.
Show resolved Hide resolved
geometry and color information.
The evolution of HLSL as a more general purpose language targeting Graphics and Compute
greatly benefit from longer vectors to fully represent these operations rather than to try to
break them down into smaller constituent vectors.
This feature adds the ability to load, store, and perform elementwise operations on HLSL
vectors longer than four elements.

## Motivation

The adoption of machine learning techniques expressed as vector-matrix operations
require larger vector sizes to be representable in HLSL.
To take advantage of specialized hardware that can accelerate longer vector operations,
these vectors need to be preserved in the exchange format as well.

## Proposed solution

Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the range is inclusive, then that 4 should be 5.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Such vectors will hereafter be referred to as "long vectors".
These will be supported for all elementwise intrinsics that take variable-length vector parameters.
For certain operations, these vectors will be represented as native vectors using
[Dxil vectors](NNNN-dxil-vectors.md) and equivalent SPIR-V representations.

## Detailed design

### HLSL vectors

Currently HLSL allows declaring vectors using a templated representation:

```hlsl
vector<T, N> name;
```

`T` is any [scalar](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-scalar) type.
`N` is the number of components and must be an integer between 1 and 4 inclusive.
See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details.
This proposal adds support for long vectors of length greater than 4 by
allowing `N` to be greater than 4 where previously such a declaration would produce an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this design accommodate N being supplied from a specialization constant supplied at runtime, rather than a literal value that is known at HLSL compile time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime specialization constants are not currently supported in HLSL. That's a question we'll need to answer when it is. For now, any similar mechanism produces an error: https://godbolt.org/z/5Kszf9Tr3 This includes the existing vk:: namespace specialization/push constant support.

error: non-type template argument of type 'uint' is not an integral constant expression


Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat related to what @tex3d asked, but more broad: the spec needs to say how N can be written.

  • Can it be N where earlier there is a declaration const uint N = 15;
  • Can it be N where N is the parameter to a function, and the vector type declaration is inside the function.

I guess it has to be statically computed. (I'm not sure what the HLSL term for it.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this introduces no changes to the existing vector declaration restrictions, I didn't think it necessary to call out that it needs to be a constant expression. Looking at the docs though, I see that it isn't mentioned there, so I'll update the docs and call it out here.

The default behavior of HLSL vectors is preserved for backward compatibility, meaning, skipping the last parameter `N`
defaults to 4-component vectors and the use `vector name;` declares a 4-component float vector, etc. More examples
[here](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector).
Declarations of long vectors require the use of the template declaration.
Unlike vector sizes between 1 and 4, no shorthand declarations that concatenate
the element type and number of elements (e.g. float2, double4) are allowed for long vectors.

#### Allowed Usage

The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line needed for the long vector spec? It's probably implicitly assumed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part do you take to be implicitly assumed? I don't think support across all shader stages should be assumed. The control/wave uniformity requirements touch more on the followups and could probably be removed. I'm inclined to leave the implementations encouraging best practices language as some platforms might end up scalarizing vector operations at some point or another which won't lead to the best performance.


Long vectors can be:

* Elements of arrays, structs, StructuredBuffers, and ByteAddressBuffers.
* Parameters and return types of non-entry functions.
* Stored in groupshared memory.
* Static global variables.

pow2clk marked this conversation as resolved.
Show resolved Hide resolved
Long vectors are not permitted in:

* Resource types other than ByteAddressBuffer or StructuredBuffer.
* Any part of the shader's signature including entry function parameters and return types or
user-defined struct parameters.
* Cbuffers or tbuffers.
damyanp marked this conversation as resolved.
Show resolved Hide resolved
* A ray tracing `Parameter`, `Attributes`, or `Payload` parameter structures.
* A work graph record.

#### Constructing vectors

HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment.
Vectors can be initialized and assigned from various casting operations including scalars and arrays.
Long vectors will maintain equivalent casting abilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a way to initialize all elements of the vector with one value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current way of doing that is assigning a scalar to a vector. This mentions scalars. Did you have something else in mind?


Examples:

```hlsl
vector<uint, 5> InitList = {1, 2, 3, 4, 5};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if not all elements are listed explicitly. Are the rest zero-initialized?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing behavior where an error is produced for insufficient elements is unchanged, though poorly documented. I don't mind adding a note here to that effect.

vector<uint, 6> Construct = vector<uint, 6>(6, 7, 8, 9, 0, 0);
uint4 initval = {0, 0, 0, 0};
vector<uint, 8> VecVec = {uint2(coord.xy), vecB};
vector<uint, 6> Assigned = vecB;
float arr[5];
vector<float, 5> CastArr = (vector<float, 5>)arr;
vector<float, 6> ArrScal = {arr, 7.9};
vector<float, 10> ArrArr = {arr, arr};
vector<float, 15> Scal = 4.2;
```

#### Vectors in Raw Buffers

N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
with a vector type of the required size as the template parameter and byte offset parameters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we discuss alignment requirements at all here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions on how that discussion might look?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could say that a long vector <T,N> has the same alignment requirements as an array of N elements of type T.
Except that may be different from vector<float,4> which might have 16byte alignment. With scalar alignment they would be the same. (Sorry, I don't remember if modern HLSL assumes scalar alignment for vectors.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tex and I will discuss this and I'll update the spec accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alignment for vectors in HLSL is purely based on the alignment of the scalar element type. Constant buffer layout ("legacy") is a special case where a vector that would cross a 16-byte aligned boundary is started at the next 16-byte aligned location instead.

Array elements have the same natural alignment based on scalar element type as vectors. However, they also have a special rule for constant buffer layout ("legacy") where each array element must start at the beginning of a 16-byte aligned boundary. There are also special rules for signature inputs/outputs for arrays, but these can't be described by a concept of alignment.

The "legacy" constant layout rules cannot be described by the concept of type alignments.

Changing the natural alignment rules for a vector depending on whether it has > 4 elements seems weird.


```hlsl
RWByteAddressBuffer myBuffer;

vector<T, N> val = myBuffer.Load< vector<T, N> >(StartOffsetInBytes);
myBuffer.Store< vector<T, N> >(StartoffsetInBytes + 100, val);

```

StructuredBuffers with N-element vectors are declared using the template syntax
with a long vector type as the template parameter.
N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
with the element index parameters.

```hlsl
RWStructuredBuffer< vector<T, N> > myBuffer;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the space between > > necessary, as per C++03 style parsing? Might be good to call out.
I hope no space is required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLSL 2021 templates are C++98/03-era, so the space is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm willing to add comments regarding vectors specifically even if there is no change in previous behavior, but as this is just the facts of HLSL templates (and pseudo-templates if you prefer), I feel this is outside the scope of this document.


vector<T, N> val = myBuffer.Load(elementIndex);
myBuffer.Store(elementIndex, val);

```

#### Accessing elements of long vectors

Long vectors support the existing vector subscript operators `[]` to access the scalar element values.
They do not support any swizzle operations.

#### Operations on long vectors

Support all HLSL intrinsics that perform [elementwise calculations](NNNN-dxil-vectors.md#elementwise-intrinsics)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While defining the HLSL language, I do not understand the need to link to the DXIL proposal as if it defines what's meant by "elementwise calculations" here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't consider this a language spec. I structured it as a shader model spec.

There's no "as if" about it. I defined what elementwise calculations are in the dxil vectors spec. I don't really care where that definition ends up, but it's useful to have because there isn't anywhere else. Since I wrote this spec to depend on that one, it made sense to put it there. As it stands, neither spec technically depends on the other because long vectors can be implemented with scalarization and native vectors can be used without changing the allowed vector length. The lack of any hard interdependence inclines me further toward leaving this as having a DXIL element as well as language elements even if it's just discussion of validation changes.

that take parameters that could be long vectors and whose function doesn't limit them to shorter vectors.
These are operations that perform the same operation on an element regardless of its position in the vector
except that the position indicates which element(s) of other vector parameters might be used in that calculation.

Refer to the HLSL spec for an exhaustive list of [Operators](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-operators) and [Intrinsics](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-intrinsic-functions).

#### Allowed elementwise vector intrinsics

* Trigonometry : acos, asin, atan, atan2, cos, cosh, degrees, radians, sin, sinh, tan, tanh
* Math: abs, ceil, clamp, exp, exp2, floor, fma, fmod, frac, frexp, ldexp, lerp, log, log10, log2, mad, max, min, pow, rcp, round, rsqrt, sign, smoothstep, sqrt, step, trunc
* Float Ops: f16tof32, f32tof16, isfinite, isinf, isnan, modf, saturate
* Bitwise Ops: reversebits, countbits, firstbithigh, firstbitlow
* Logic Ops: and, or, select
* Reductions: all, any, clamp, dot
* Quad Ops: ddx, ddx_coarse, ddx_fine, ddy, ddy_coarse, ddy_fine, fwidth, QuadReadLaneAt, QuadReadLaneAcrossX, QuadReadLaneAcrossY, QuadReadLaneAcrossDiagonal
* Wave Ops: WaveActiveBitAnd, WaveActiveBitOr, WaveActiveBitXor, WaveActiveProduct, WaveActiveSum, WaveActiveMin, WaveActiveMax, WaveMultiPrefixBitAnd, WaveMultiPrefixBitOr, WaveMultiPrefixBitXor, WaveMultiPrefixProduct, WaveMultiPrefixSum, WavePrefixSum, WavePrefixProduct, WaveReadLaneAt, WaveReadLaneFirst
* Wave Reductions: WaveActiveAllEqual, WaveMatch
* Type Conversions: asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16

#### Disallowed vector intrinsics

* Only applicable to shorter vectors: AddUint64, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
* Only useful for disallowed variables: EvaluateAttributeAtSample, EvaluateAttributeCentroid, EvaluateAttributeSnapped, GetAttributeAtVertex

### Interchange Format Additions

Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should be informative.

A good SPIR-V representation could be as arrays of elements. So it's still a single SSA value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SPIR-V representation is very much up in the air right now. I prefer to leave this vague. This is mentioned in the issues below.

Representation of native vectors in DXIL depends on [dxil vectors](NNNN-dxil-vectors.md).

### Debug Support

First class debug support for HLSL vectors. Emit `llvm.dbg.declare` and `llvm.dbg.value` intrinsics that can be used by tools for better debugging experience.
These should enable tracking vectors through their scalarized and native vector usages.

### Diagnostic Changes

Error messages should be produced for use of long vectors in unsupported interfaces:

* Typed buffer element types.
* Parameters to the entry function.
* Return types from the entry function.
* Cbuffers blocks.
* Cbuffers global variables.
* Tbuffers.
* Work graph records.
* Mesh/amplification payload entry parameter structures.
* `Payload`, `Parameter`, and `Attributes` parameter user-defined structs used in
`TraceRay()`, `CallShader()`, and `ReportHit()` ray tracing intrinsics.

Errors should also be produced when long vectors are used as parameters to intrinsics
with vector parameters of variable length, but aren't permitted as listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
Attempting to use any swizzle member-style accessors on long vectors should produce an error.
Declaring vectors of length longer than 1024 should produce an error.

### Validation Changes

Validation should produce errors when a long vector is found in:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DXIL validation should be left to the DXIL spec, no?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not find it useful to divide the specs along interchange format and language lines. The DXIL spec would have included elements of native vectors and also of long vectors while the language spec would have contained only part of the long vectors description. As such, I created two shader model features that can, but don't have to include language changes. This one includes language and DXIL changes. "The DXIL spec" only has DXIL changes, but its characteristic feature is that it documents native DXIL vectors.


* The shader signature.
* A cbuffer/tbuffer.
* Work graph records.
* `Payload`, `Parameter`, and `Attributes` parameter user-defined structs used in
`TraceRay()`, `CallShader()`, and `ReportHit()` ray tracing intrinsics.
* Metadata

Note that the disallowing long vectors in entry function signatures includes any user-defined structs
used in mesh and ray tracing shaders.

Use of long vectors in unsupported intrinsics should produce validation errors.

### Device Capability

Devices that support Shader Model 6.9 will be required to fully support this feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interchange format section above we have:

Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors.
Representation of native vectors in DXIL depends on dxil vectors.

This seems to imply that long vectors can be supported in existing shader models. I think it's only native DXIL vectors feature that actually requires SM 6.9?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wasn't what I intended to imply with that statement, rather that implementations could choose either approach as we intend to do at least temporarily. However, it is true that this is possible. One of the major remaining questions is if we should.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I resolve this conversation: do we have something written down that is tracking this remaining question?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #371


## Testing

### Compilation Testing

#### Correct output testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where I think we describe testing for each supported backend IR. Shouldn't we be giving SPIR-V some love here as well?

Shouldn't there be a section before this that outlines various valid scenarios for AST testing?

For a given implementation, perhaps there would be an additional infra spec to outline tests for initial codegen and various important phases through the compiler as well, right? Perhaps the AST test scenarios belong there too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where I think we describe testing for each supported backend IR. Shouldn't we be giving SPIR-V some love here as well?

Definitely. I haven't done that research just yet. I've added a slightly hand-wavey allusion to the SPIR-V equivalent.

Shouldn't there be a section before this that outlines various valid scenarios for AST testing?

Have we done this in the past? I'm not sure what scenarios you have in mind. I'd like to see an example to better understand.

For a given implementation, perhaps there would be an additional infra spec to outline tests for initial codegen and various important phases through the compiler as well, right? Perhaps the AST test scenarios belong there too?

I think infra specs are useful to discuss forthcoming implementations. Given the state of this implementation, I don't think writing one would be as productive as carefully documenting what has been done in code and commit comments. I think that would be more likely to be preserved and found by future generations of coders.


Verify that long vectors can be declared in all appropriate contexts:

* Local variables.
* Static global variables.
* Non-entry parameters.
* Non-entry return types.
* StructuredBuffer elements.
* Templated Load/Store methods on ByteAddressBuffers.
* As members of arrays and structs in any of the above contexts.

Verify that long vectors can be correctly initialized in all the forms listed in [Constructing vectors](constructing-vectors).

Verify that long vectors in supported intrinsics produce appropriate outputs.
Supported intrinsic functions listed in [Allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
may produce intrinsic calls with native vector parameters where available
or scalarized parameters with individual scalar calls to the corresponding interchange format intrinsics.

Verify that long vector elements can be accessed using the subscript operation with static or dynamic indices.

Verify that long vectors of different sizes will reference different overloads of user and built-in functions.
Verify that template instantiation using long vectors correctly creates variants for the right sizes.

Verification of correct interchange format output depends on the implementation and representation.
Native vector DXIL intrinsics might be checked for as described in [Dxil vectors](NNNN-dxil-vectors.md)
if native DXIL vector output is supported.
SPIR-V equivalent output should be checked as well.
Scalarized representations are also possible depending on the compilation implementation.

#### Invalid usage testing

Verify that long vectors produce compilation errors when:

* Declared in interfaces listed in [Diagnostic changes](diagnostic-changes).
* Passed as parameters to any intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics)
* All swizzle operations (e.g. `lvec.x`, `lvec.rg`, `lvec.wzyx`)
* Declaring a vector over the maximum size in any of the allowed contexts listed in [Allowed usage](allowed-usage).

### Validation Testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think DXIL validation belongs in the DXIL spec only.


Verify that long vectors produce validation errors in:

* Each element of the shader signature.
* A cbuffer block struct.
* Work graphs record structs.
* The mesh/amplification entry `Payload` parameter struct.
* Each of the `Payload`, `Parameter`, `Attributes` parameter structs used in
`TraceRay()`, `CallShader()`, and `ReportHit()`,
and `anyhit`, `closesthit`, `miss`, `callable`, and `closesthit` entry functions.
* Any DXIL intrinsic that corresponds to the HLSL intrinsic functions listed in [Disallowed vector intrinsics](#disallowed-vector-intrinsics).
* Any metadata type.

### Execution Testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section belongs in the DXIL spec exclusively, even though in practice most of our execution tests in DXC start with HLSL when testing DXIL backends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think execution testing may be the strongest argument for keeping it here. Long and native vectors aren't really interdependent and although the tests would likely share a lot of code, they should be independently testable in execution testing.


Correct behavior for all of the intrinsics listed in [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
will be verified with execution tests that perform the operations on long vectors and confirm correct results
for the given test values.
Where possible, these tests will be variations on existing tests for these intrinsics.

## Alternatives considered

The original proposal introduced an opaque type to HLSL that could represent longer vectors.
This would have been used only for native vector operations.
This would have limited the scope of the feature to small neural network evaluation and also contain the scope for testing some.

Representing vectors used in neural networks as LLVM vectors also allows leveraging existing optimizations.
This direction also aligns with the long term roadmap of HLSL to enable generic vectors.
Since the new data type would have required extensive testing as well,
the testing burden saved may not have been substantial.
Since these vectors are to be added eventually anyway, the testing serves multiple purposes.
It makes sense to not introduce a new datatype but use HLSL vectors,
even if the initial implementation only exposes partial functionality.

The restrictions outlined in [Allowed Usage](allowed-usage) were chosen because they weren't
needed for the targeted applications, but are not inherently impossible.
They omitted out of unclear utility and to simplify the design.
There's nothing about those use cases that is inherently incompatible with long vectors
and future work might consider relaxing those restrictions.

Swizzle operations were not supported because they are limited to the first four elements.
The names of the accessors (xyzw or rgba) are named according to the expected content of
those vectors in a graphics context.
Since that intretation does not apply to longer vectors, it could be confusing.
The subscript access is flexible and generic and makes other accessors redundant.

## Open Issues

* Q: Is there a limit on the Number of Components in a vector?
* A: 128. It's big enough for some known uses.
There aren't concrete reasons to restrict the vector length.
Having a limit facilitates testing and sets expectations for both hardware and software developers.

* Q: Usage restrictions
* A: Long vectors may not form part of the shader signature.
There are many restrictions on signature elements including bit fields that determine if they are fully written.
By definition, these involve more interfaces that would require additional changes and testing.
* Q: Does this have implications for existing HLSL source code compatibility?
* A: Existing HLSL code that makes no use of long vectors will have no semantic changes.
* Q: Should this change the default N = 4 for vectors?
* A: No. While the default size of 4 is less intuitive in a world of larger vectors, existing code depends on this default, so it remains unchanged.
* Q: How will SPIR-V be supported?
* A: TBD
* Q: should swizzle accessors be allowed for long vectors?
* A: No. It doesn't make sense since they can't be used to access all elements
and there's no way to create enough swizzle members to accommodate the longest allowed vector.
* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors.
* A: After some consideration, we opted not to include explicit Load/Store operations for this function.
There are at least a couple ways this could be resolved, and the preferred solution is outside the scope.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that long vectors can marked as groupshared and it just works? Wondering if this Q&A belongs in the DXIL spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupshared is allowed. I don't list it as being disallowed, but I don't mind calling it out explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing the context or the significance of not including explicit Load/Store operations here, or just plain misunderstanding this Q/A altogether. The "preferred solution is outside the scope" seems to suggest that there's not going to be a way to use long vectors in groupshared.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a feature that is directly connected with long nor native vectors. The core problem is that groupshared memory is limited and explicitly allocating it to one purpose is expensive. Instead, many users want to reuse groupshared memory for a few different purposes depending on stage and time.

There's some disagreement on how we can best solve this. We might provide what is essentially a groupshared rawbuffer with mechanisms to perform loads and stores on it similarly to how we do on global rawbuffers. That was the alternative approach previously described. There are more clever compiler things (tm) that could be done or other ways of enabling the user in this way.

Since it is not a problem directly connected with this, since there are a few different approaches we might take that would take a long time to decide on and implement, and since there are other solutions to the problem using existing mechanisms, we deemed it out of scope for this project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think that this Q/A is a bit confusing as written now. If I understand correctly, it's saying that we're not going to tackle solving the problem of people wanting buffer-like access to groupshared memory. So in the same way that we don't support explicit Load/Store operations for other data types, we're not going to add new ones for this. Marking a long vector as groupshared is still expected to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you've said is accurate, so I don't know why you think it's confusing. I'm happy to take another run at it, but I think it's succinct and sufficient. It doesn't go into detail because I don't think we need to. It's a feature for another day. I could create an issue for it that could contain all the details if that would help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only able to reach the understanding by this conversation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's useful to go into detail about something that someone wanted to tack onto this spec, but isn't related to it. Perhaps the best approach is to just remove it.


<!-- {% endraw %} -->
Loading