Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework hlsl-vector-type into two specs #361

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pow2clk
Copy link
Member

@pow2clk pow2clk commented Jan 3, 2025

This splits the spec into two. dxil-vectors concerns the addition of vectors to DXIL only. hlsl-long-vector-type relates to the addition of long vectors in the HLSL language and also for select DXIL intrinsics.

Throughout, this adds additional details concerning testing and support. It makes a few alterations to the originally proposed behavior particularly concerning the loading and storing of long vectors whether from/to raw buffers or groupshared variables. The latter intrinsics were dropped entirely in favor of existing assignment operations being lowered to appropriate operations.
Long vectors are allowed in structs and non-entry function signatures and disallowed in shader signatures, cbuffers/tbuffers, and as elements of non-raw buffers.

Note that the use of 6.9 is a placeholder for the release vehicle for this feature.

Since we'll be creating a separate DXIL spec to document native vectors in DXIL, this spec will be a little more constrined to deal with HLSL long vectors. This commit is to isolate the meaningful content changes that come later
Make md lint happy, eliminate Load/StoreN, revise some wording
This splits the spec into two. dxil-vectors concerns the addition of vectors to DXIL only.
hlsl-long-vector-type relates to the addition of long vectors in the HLSL language and also
for select DXIL intrinsics.

Throughout, this adds additional details concerning testing and support.
It makes a few alterations to the originally proposed behavior particularly concerning
the loading and storing of long vectors whether from/to raw buffers or groupshared variables.
The latter intrinsics were dropped entirely in favor of existing assignment operations being
lowered to appropriate operations.
Long vectors are allowed in structs and non-entry function signatures and disallowed
in shader signatures, cbuffers/tbuffers, and as elements of non-raw buffers.

Note that the use of 6.9 is a placeholder for the release vehicle for this feature.

## Introduction

HLSL has always supported vectors of as many as four elements of different element types (int3, float4, etc.).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Always doesn't indicate enough that this is past behavior. and as many as just read awkward to me. Also replaced the repeat of the word element with scalar.

Suggested change
HLSL has always supported vectors of as many as four elements of different element types (int3, float4, etc.).
HLSL has traditionally supported vectors with up to four elements of different scalar types (int3, float4, etc.).

* FMin
* FMax
* Tertiary
* Fma
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to point out, DXIL has both fma and fmad. The valid overload set for fma is fp64, where the valid overload set for fmad is all floating point types. I assume that fma in this list is intended to refer to both, but it would be good to be explicit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not intend it to refer to both as of yet. This is a very preliminary list of intrinsics that was handed to me. It does now occur to me that the source might have been under the impression that it encompassed fmad even though I took it to be strictly the double fma.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I was surprised at the tiny scope of opcodes listed here, but I assumed that the primary use case is not for doubles, so listing only fma without fmad confused me.

Copy link
Collaborator

@tex3d tex3d Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fma in this list should only refer to the special double fma DXIL operation with specific precision requirements.

There is also the DXIL operation FMad, which supports some flexibility in precision requirements as long as they are the same wherever it is used on the same hardware.

I believe this list is about to be significantly expanded, and I hope to see Mad on this list as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is getting removed as we've determined that scalarized versus native elementwise intrinsics are an implementation detail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as there's sufficient test content that all vectorized intrinsics are covered, I'm fine without an exhaustive list in the spec.

Copy link
Member Author

@pow2clk pow2clk Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exhaustive list is now the "allowed elementwise" list that was above this removed section.

@damyanp damyanp added this to the Shader Model 6.9 milestone Jan 8, 2025
Comment on lines +274 to +276
* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors.
* A: After some consideration, we opted not to include explicit Load/Store operations for this function.
There are at least a couple ways this could be resolved, and the preferred solution is outside the scope.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean that long vectors can marked as groupshared and it just works? Wondering if this Q&A belongs in the DXIL spec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupshared is allowed. I don't list it as being disallowed, but I don't mind calling it out explicitly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing the context or the significance of not including explicit Load/Store operations here, or just plain misunderstanding this Q/A altogether. The "preferred solution is outside the scope" seems to suggest that there's not going to be a way to use long vectors in groupshared.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a feature that is directly connected with long nor native vectors. The core problem is that groupshared memory is limited and explicitly allocating it to one purpose is expensive. Instead, many users want to reuse groupshared memory for a few different purposes depending on stage and time.

There's some disagreement on how we can best solve this. We might provide what is essentially a groupshared rawbuffer with mechanisms to perform loads and stores on it similarly to how we do on global rawbuffers. That was the alternative approach previously described. There are more clever compiler things (tm) that could be done or other ways of enabling the user in this way.

Since it is not a problem directly connected with this, since there are a few different approaches we might take that would take a long time to decide on and implement, and since there are other solutions to the problem using existing mechanisms, we deemed it out of scope for this project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think that this Q/A is a bit confusing as written now. If I understand correctly, it's saying that we're not going to tackle solving the problem of people wanting buffer-like access to groupshared memory. So in the same way that we don't support explicit Load/Store operations for other data types, we're not going to add new ones for this. Marking a long vector as groupshared is still expected to work.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you've said is accurate, so I don't know why you think it's confusing. I'm happy to take another run at it, but I think it's succinct and sufficient. It doesn't go into detail because I don't think we need to. It's a feature for another day. I could create an issue for it that could contain all the details if that would help.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was only able to reach the understanding by this conversation.

proposals/0026-hlsl-long-vector-type.md Show resolved Hide resolved
proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
Copy link
Member Author

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update shortly.

proposals/0026-hlsl-long-vector-type.md Show resolved Hide resolved
proposals/0026-hlsl-long-vector-type.md Show resolved Hide resolved
Comment on lines +274 to +276
* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors.
* A: After some consideration, we opted not to include explicit Load/Store operations for this function.
There are at least a couple ways this could be resolved, and the preferred solution is outside the scope.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupshared is allowed. I don't list it as being disallowed, but I don't mind calling it out explicitly.

proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
* FMin
* FMax
* Tertiary
* Fma
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not intend it to refer to both as of yet. This is a very preliminary list of intrinsics that was handed to me. It does now occur to me that the source might have been under the impression that it encompassed fmad even though I took it to be strictly the double fma.

proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved

#### Disallowed vector intrinsics

* Only applicable to for shorter vectors: AddUint64, asdouble, asfloat, asfloat16, asint, asint16, asuint, asuint16, D3DCOLORtoUBYTE4, cross, distance, dst, faceforward, length, normalize, reflect, refract, NonUniformResourceIndex
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reasoning for disallowing the conversion functions?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't address this. Some of these comments didn't show up in my review of the files. I think you're referring to the as functions? There is some variability there. Some of them map to a simple bitcast. Those are fine. Some of them take multiple parameters representing low and hi bits that don't map as neatly.

Copy link
Collaborator

@tex3d tex3d left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general high-level points:

What gates this feature in HLSL? Is it part of a future HLSL language version? Just Shader Model 6.9 (which would seem odd considering this is basically a language change)? I don't see why we can't map long vectors to legacy scalarized shader models, with some requiring a bit more work, such as native vector load/store DXIL operations, vector reductions (dot) to expansions, and so on, but these are still mappable.

If not by language version, how do we enforce the limitations? Globally by shader model? Typically, shader model limitations are applied according to what's used, not what's present, so do we plan to do the same here (and how?), or use the shader model to change the language accepted everywhere, which is odd?

This doesn't address targeting SPIR-V from HLSL at all.

I still don't see how we will be able to support specialization constant sized vectors in HLSL or any equivalent feature in DXIL using this approach.

Many of the DXIL details I believe belong only in the DXIL spec. Some testing details in the DXIL spec I think belong only in the HLSL spec.

The limits we define should not be the temporary ones for implementation purposes, which could be discussed in some other section or doc perhaps. Limits on intrinsics with native vector DXIL overloads and limiting use in cbuffer/tbuffer could fall under this category.

No mention about vector and matrix swizzles lengths deliberately not being extended to allow long vector results.

I have some concern about 64-bit component types and in particular double in long vectors, since there are a number of special cases for these. They might need additional testing scenarios, or to be added to temporary exclusions for preview.

`N` is the number of components and must be an integer between 1 and 4 inclusive.
See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details.
This proposal adds support for long vectors of length greater than 4 by
allowing `N` to be greater than 4 where previously such a declaration would produce an error.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this design accommodate N being supplied from a specialization constant supplied at runtime, rather than a literal value that is known at HLSL compile time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime specialization constants are not currently supported in HLSL. That's a question we'll need to answer when it is. For now, any similar mechanism produces an error: https://godbolt.org/z/5Kszf9Tr3 This includes the existing vk:: namespace specialization/push constant support.

error: non-type template argument of type 'uint' is not an integral constant expression

proposals/0026-hlsl-long-vector-type.md Show resolved Hide resolved
proposals/0026-hlsl-long-vector-type.md Outdated Show resolved Hide resolved
* Vectors with length greater than 4 are not permitted inside a `struct`.
* Vectors with length greater than 4 are not permitted as shader input/output parameters.
* Resource types other than ByteAddressBuffer or StructuredBuffer.
* Any element of the shader's signature including entry function parameters and return types.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include or exclude things like payload (DXR or mesh), attribute, or node record structures?

Testing scenarios below make it clear, but it should be clear here, and this should also include corresponding intrinsics that accept UDT values leading to these shader parameter types elsewhere.

We should also differentiate temporary limitations for the first implementation from limitations that have a good reason to be more permanent in the language.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include or exclude things like payload (DXR or mesh), attribute, or node record structures?

Testing scenarios below make it clear, but it should be clear here, and this should also include corresponding intrinsics that accept UDT values leading to these shader parameter types elsewhere.

I've added a bit more detail here and gone into still more detail about the ray tracing interfaces that are disallowed in the diagnostics section. I think that's an appropriate place where here we can be more vague.

We should also differentiate temporary limitations for the first implementation from limitations that have a good reason to be more permanent in the language.

I thought we determined that we wouldn't draw such distinctions in this spec. It's my intention to document it as it will ultimately be, removing any references to temporary approaches. From context, I suspect you may think of this as a "temporary" approach? I don't think we are relaxing these restrictions as part of this shader model release. Mention of other possibilities might make sense in the "alternatives considered" section as potential future work, but otherwise, I'd prefer to leave it unmentioned.

For loading and storing N-dimensional vectors from ByteAddressBuffers we use the `LoadN` and `StoreN` methods, extending
the existing Load/Store, Load2/Store2, Load3/Store3 and Load4/Store4 methods.
N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
with a vector type of the required size as the template parameter and byte offset parameters.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we discuss alignment requirements at all here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions on how that discussion might look?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could say that a long vector <T,N> has the same alignment requirements as an array of N elements of type T.
Except that may be different from vector<float,4> which might have 16byte alignment. With scalar alignment they would be the same. (Sorry, I don't remember if modern HLSL assumes scalar alignment for vectors.)

proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
Comment on lines 167 to 169
A compiler targeting shader model 6.9 should be able to represent vectors in the supported memory spaces
in their native form and generate native calls for supported intrinsics
and scalarized versions for unsupported intrinsics.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is part of the compiler testing which might belong in the HLSL spec where we describe how we verify DXIL and SPIR-V target IR output from the HLSL constructs. This is an area that crosses the spec boundaries, so it could go either way, but probably should live in one or the other and not both.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't want to keep playing the same note, but this is another comment that is born of our different perspectives on how these specs are divided up.

Comment on lines 171 to 174
The DXIL 6.9 validator should allow native vectors in the supported memory and intrinsic uses.
It should produce errors for uses in signatures, cbuffers, and type buffers and any uses in unsupported intrinsics.
Any representation of a single element vector should produce a validation error.
These shouldn't be directlty produceable with a compatible compiler and will require custom DXIL generation.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be a sub-section for validator testing. Basically, all validator testing is IR-based, since the compiler generally shouldn't be able to generate incorrect IR from valid HLSL. Valid cases are tested as part of testing the expected IR for each backend, which could be on the HLSL side. I feel like point form test scenarios is easier to review and break down further. Note that "type buffers" (typed buffers) is probably just looped in with "unsupported intrinsics".

More detail will probably be needed before implementation, but not before merging the proposal.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree this should be its own subsection and fleshed out.

Comment on lines 176 to 177
Full runtime execution should be tested by using the native vector intrinsics on different types of memory
and confirming that the calculations produce the correct results in all cases for an assortment of vector sizes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory accesses operations should be separate from intrinsics that perform calculations (ALU ops). This wording seems to imply that you need to test ALU ops on different types of memory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of groupshared when I wrote this. I expect this is an anomoly of atomic operations, but they will lower into DXIL intrinsics or LLVM ops depending on whether the parameter is a groupshared variable or a resource element. Black box testing would indicate that we should ignore that we don't expect any such divisions because DXC doesn't provide them and when it comes to runtimes, some of it is a black box for us.

Do you think we shouldn't test groupshared as a separate state at least?

* A: *HLK tests*
### Compilation Testing

#### Correct output testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where I think we describe testing for each supported backend IR. Shouldn't we be giving SPIR-V some love here as well?

Shouldn't there be a section before this that outlines various valid scenarios for AST testing?

For a given implementation, perhaps there would be an additional infra spec to outline tests for initial codegen and various important phases through the compiler as well, right? Perhaps the AST test scenarios belong there too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's where I think we describe testing for each supported backend IR. Shouldn't we be giving SPIR-V some love here as well?

Definitely. I haven't done that research just yet. I've added a slightly hand-wavey allusion to the SPIR-V equivalent.

Shouldn't there be a section before this that outlines various valid scenarios for AST testing?

Have we done this in the past? I'm not sure what scenarios you have in mind. I'd like to see an example to better understand.

For a given implementation, perhaps there would be an additional infra spec to outline tests for initial codegen and various important phases through the compiler as well, right? Perhaps the AST test scenarios belong there too?

I think infra specs are useful to discuss forthcoming implementations. Given the state of this implementation, I don't think writing one would be as productive as carefully documenting what has been done in code and commit comments. I think that would be more likely to be preserved and found by future generations of coders.

@llvm-beanz
Copy link
Collaborator

Some general high-level points:

What gates this feature in HLSL? Is it part of a future HLSL language version? Just Shader Model 6.9 (which would seem odd considering this is basically a language change)? I don't see why we can't map long vectors to legacy scalarized shader models, with some requiring a bit more work, such as native vector load/store DXIL operations, vector reductions (dot) to expansions, and so on, but these are still mappable.

If not by language version, how do we enforce the limitations? Globally by shader model? Typically, shader model limitations are applied according to what's used, not what's present, so do we plan to do the same here (and how?), or use the shader model to change the language accepted everywhere, which is odd?

This is definitely a discussion we should explore more. From a purity perspective my gut is that long vectors is a language feature for HLSL 202x. The downside to that is that anyone using SM 6.9 features for long vectors will need to update their codebase to HLSL 202x. This might cause us to reconsider the scoping of 202x to support calling it "done" sooner than originally planned.

If we aren't okay with this feature requiring updating to 202x, it probably ins't the worst thing to make it available in older language modes, but we may want to consider making it an opt-in feature. I haven't thought about this extensively but this is definitely something we should discuss.

I still don't see how we will be able to support specialization constant sized vectors in HLSL or any equivalent feature in DXIL using this approach.

I don't think we should be thinking about specialization constants as part of this feature. HLSL does not have support for specialization constants. There is a Vulkan extension, but the Vulkan extension is not core to the language and it is not really the direction that the language-based solution would take.

I realize that this feedback may seem counter to other feedback I've provided about needing a comprehensive plan for a feature. The distinction I see here is that designing specialization constant support for HLSL and fixing the existing language features that currently don't support specialization constants and should (like a bunch of the attributes, and operations where we require immediate values), is a much larger feature and will require significant investment and planning. Since we don't have a plan for how to support specialization constants in HLSL as a proper language feature, we shouldn't use that to roadblock or derail the design of this feature.

If we make a mistake here and we need to adjust or redesign things when we add specialization constants in the future, we can cross that road when we get to it. We are all human, making design mistakes is something that is going to happen, we can't let the possibility of a mistake prevent us from making forward progress.

The biggest changes are removing most references to scalarized implementation of certain intrinsics. This has the effect of removing any hard dependencies between the specs. This further strengthens my opinion that the specs should be divided along feature lines rather than the DXIL/language barrier.

A lot of rewording and specifics added where vague statements were before.
Copy link
Member Author

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks all! I've responded inline and a commit is forthcoming.

`N` is the number of components and must be an integer between 1 and 4 inclusive.
See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details.
This proposal adds support for long vectors of length greater than 4 by
allowing `N` to be greater than 4 where previously such a declaration would produce an error.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime specialization constants are not currently supported in HLSL. That's a question we'll need to answer when it is. For now, any similar mechanism produces an error: https://godbolt.org/z/5Kszf9Tr3 This includes the existing vk:: namespace specialization/push constant support.

error: non-type template argument of type 'uint' is not an integral constant expression

* Vectors with length greater than 4 are not permitted inside a `struct`.
* Vectors with length greater than 4 are not permitted as shader input/output parameters.
* Resource types other than ByteAddressBuffer or StructuredBuffer.
* Any element of the shader's signature including entry function parameters and return types.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include or exclude things like payload (DXR or mesh), attribute, or node record structures?

Testing scenarios below make it clear, but it should be clear here, and this should also include corresponding intrinsics that accept UDT values leading to these shader parameter types elsewhere.

I've added a bit more detail here and gone into still more detail about the ray tracing interfaces that are disallowed in the diagnostics section. I think that's an appropriate place where here we can be more vague.

We should also differentiate temporary limitations for the first implementation from limitations that have a good reason to be more permanent in the language.

I thought we determined that we wouldn't draw such distinctions in this spec. It's my intention to document it as it will ultimately be, removing any references to temporary approaches. From context, I suspect you may think of this as a "temporary" approach? I don't think we are relaxing these restrictions as part of this shader model release. Mention of other possibilities might make sense in the "alternatives considered" section as potential future work, but otherwise, I'd prefer to leave it unmentioned.

proposals/0026-hlsl-long-vector-type.md Show resolved Hide resolved
proposals/0026-hlsl-long-vector-type.md Outdated Show resolved Hide resolved
For loading and storing N-dimensional vectors from ByteAddressBuffers we use the `LoadN` and `StoreN` methods, extending
the existing Load/Store, Load2/Store2, Load3/Store3 and Load4/Store4 methods.
N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
with a vector type of the required size as the template parameter and byte offset parameters.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any suggestions on how that discussion might look?

that corresponds to the [allowed elementwise vector intrinsics](#allowed-elementwise-vector-intrinsics)
and are not listed in [native vector intrinsics](#native-vector-intrinsics).

### Execution Testing
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think execution testing may be the strongest argument for keeping it here. Long and native vectors aren't really interdependent and although the tests would likely share a lot of code, they should be independently testable in execution testing.

proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
* FMin
* FMax
* Tertiary
* Fma
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This list is getting removed as we've determined that scalarized versus native elementwise intrinsics are an implementation detail.

proposals/0026-hlsl-long-vector-type.md Outdated Show resolved Hide resolved
proposals/0026-hlsl-long-vector-type.md Outdated Show resolved Hide resolved
proposals/0026-hlsl-long-vector-type.md Outdated Show resolved Hide resolved

Use of long vectors in a shader should be indicated in DXIL with the corresponding
shader model version and shader feature flag.
Devices that support Shader Model 6.9 will be required to fully support this feature.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the interchange format section above we have:

Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors.
Representation of native vectors in DXIL depends on dxil vectors.

This seems to imply that long vectors can be supported in existing shader models. I think it's only native DXIL vectors feature that actually requires SM 6.9?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That wasn't what I intended to imply with that statement, rather that implementations could choose either approach as we intend to do at least temporarily. However, it is true that this is possible. One of the major remaining questions is if we should.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I resolve this conversation: do we have something written down that is tracking this remaining question?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've created #371

Comment on lines +274 to +276
* Q: How should scalar groupshared arrays be loaded/stored into/out of long vectors.
* A: After some consideration, we opted not to include explicit Load/Store operations for this function.
There are at least a couple ways this could be resolved, and the preferred solution is outside the scope.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I think that this Q/A is a bit confusing as written now. If I understand correctly, it's saying that we're not going to tackle solving the problem of people wanting buffer-like access to groupshared memory. So in the same way that we don't support explicit Load/Store operations for other data types, we're not going to add new ones for this. Marking a long vector as groupshared is still expected to work.

proposals/NNNN-dxil-vectors.md Outdated Show resolved Hide resolved
Move asXXX interinsics to the approved list.

Finish reworking validation errors and testing in long vectors spec.

Simplify some listing of allowed locations given that some of them fall under entry function parameters by nature. I left work graphs as explicit since their parameters are not directly user-defined structs, but templates.
#### Allowed Usage

The new vectors will be supported in all shader stages including Node shaders. There are no control flow or wave
uniformity requirements, but implementations may specify best practices in certain uses for optimal performance.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line needed for the long vector spec? It's probably implicitly assumed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part do you take to be implicitly assumed? I don't think support across all shader stages should be assumed. The control/wave uniformity requirements touch more on the followups and could probably be removed. I'm inclined to leave the implementations encouraging best practices language as some platforms might end up scalarizing vector operations at some point or another which won't lead to the best performance.

Anupama had some feedback on the description of where long vectors can be used. This attempts to add language that is more useful.

Removed the mask param and made the load function independent per discussions with Tex and others.
Copy link

@dneto0 dneto0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the spec proposal.


## Proposed solution

Enable vectors of length between 4 and 128 inclusive in HLSL using existing template-based vector declarations.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the range is inclusive, then that 4 should be 5.

See the vector definition [documentation](https://learn.microsoft.com/en-us/windows/win32/direct3dhlsl/dx-graphics-hlsl-vector) for more details.
This proposal adds support for long vectors of length greater than 4 by
allowing `N` to be greater than 4 where previously such a declaration would produce an error.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhat related to what @tex3d asked, but more broad: the spec needs to say how N can be written.

  • Can it be N where earlier there is a declaration const uint N = 15;
  • Can it be N where N is the parameter to a function, and the vector type declaration is inside the function.

I guess it has to be statically computed. (I'm not sure what the HLSL term for it.)

* Parameters and return types of non-entry functions.
* Stored in groupshared memory.
* Static global variables.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the type of a local (function scoped) variable can have long vector type. That's not mentioned. The examples below show cases like these.

Examples:

```hlsl
vector<uint, 5> InitList = {1, 2, 3, 4, 5};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if not all elements are listed explicitly. Are the rest zero-initialized?

For loading and storing N-dimensional vectors from ByteAddressBuffers we use the `LoadN` and `StoreN` methods, extending
the existing Load/Store, Load2/Store2, Load3/Store3 and Load4/Store4 methods.
N-element vectors are loaded and stored from ByteAddressBuffers using the templated load and store methods
with a vector type of the required size as the template parameter and byte offset parameters.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could say that a long vector <T,N> has the same alignment requirements as an array of N elements of type T.
Except that may be different from vector<float,4> which might have 16byte alignment. With scalar alignment they would be the same. (Sorry, I don't remember if modern HLSL assumes scalar alignment for vectors.)

with the element index parameters.

```hlsl
RWStructuredBuffer< vector<T, N> > myBuffer;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the space between > > necessary, as per C++03 style parsing? Might be good to call out.
I hope no space is required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HLSL 2021 templates are C++98/03-era, so the space is required.


### Interchange Format Additions

Long vectors can be represented in DXIL, SPIR-V or other interchange formats as scalarized elements or native vectors.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should be informative.

A good SPIR-V representation could be as arrays of elements. So it's still a single SSA value.


HLSL vectors can be constructed through initializer lists, constructor syntax initialization, or by assignment.
Vectors can be initialized and assigned from various casting operations including scalars and arrays.
Long vectors will maintain equivalent casting abilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a way to initialize all elements of the vector with one value.


Dynamic access to vectors were previously converted to array accesses.
Native vectors can be accessed using `extractelement`, `insertelement`, or `getelementptr` operations.
Previously usage of `extractelement` and `insertelement` in DXIL didn't allow dynamic index parameters.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming that this spec does allow dynamic index parameters in EIE adn IEI?


The scalarized variants of these DXIL intrinsics will remain unchanged and can be used in conjunction
with the vector variants.
This means that the same language-level vector could be used in scalarized operations and native vector operations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what situations are we likely to see scalarization as opposed to vector results?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, wanted to confirm that DXIL vectorization is not limited to HLSL long vectors but even applies to the original ones.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Design Meeting Agenda item for the design meeting
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

10 participants