-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swizzling with .wzxy() function call syntax not compatible with HLSL #59
Comments
Oh I see, you're trying to use HLML as a solution for writing both C++ and HLSL code. I never thought of that. Do you think it would be possible to generate swizzle formats that weren't templated? The only potential problem I have with going and implementing this is that I'd be auto-generating templated code, which would sort of go against one of the reasons I wrote this library to begin with. What do you think? |
I'm sure it could be done without templates! I don't have good ideas yet though :D I don't quite understand the hlslpp approach but what I've seen looks like a perfect match for code generation, at least for C++ version. I'll look into it when I have a bit of time and get back to you!
Yeah, I'm finding it super-useful! Here's some use cases:
|
I had a look at this over my lunch break and I think you're right and I think it might just be possible to do this without templates. I'll take a look. =) |
I've managed to come up with something in a testbed that works via macros. It's quite rough around the edges and it looks a little ugly but it does work. =) I'll see if it can be refined and put into a release. I think the most difficult part about this will be getting Clang, GCC, and MSVC to accept one solution (Clang and GCC are throwing up a lot of warnings about declaring anonymous structs within an anonymous union, for instance, but I might able to get past those with some good old inline I work full time and I only get to work on this in my spare time after work in the evenings so it might take a hot minute to get this out, but I think it can be done and I'm excited to see this now. =) |
Oh wow, nice, can't wait to see it & try it out! Yeah, I bet it's not trivial to get it compiling everywhere. I hope macros don't clash with something in a large codebase. Thanks for doing this! I'm in a similar situation where I'm using it for a personal project, which I don't get lots of time to spend on due to full time work & family so I totally get it :) Hey, just remembered, would you consider adding an optional support for "namespace hlml {}" or similar? I know in README you mention that's exactly what you want to avoid :D but it's useful if it's optional (I had a clash with another library defining their float2 but thankfully I don't have to include both from the same files). |
So I've attached a basic (and VERY incomplete) swizzle test project that contains 2 different ways of implementing this: one by using macros, and one by using templates. Both compile and run. If you download and run this test project then you can toggle the There are issues with this though, and they're all quite large IMO. Apologies if any of what I'm about to go through is stuff you're already really familiar with. I just want to give as thorough a breakdown as I can of the landscape of the problem domain. Wall of text incoming; apologies in advance! Issue 1: All of the swizzle solutions seem to run on undefined behaviour?From what I can tell: Every swizzle solution I've read tonight and last night (that gives the ergonomic effect you're looking for) all run on undefined behaviour to my eyes, which is pretty scary. All the swizzle solutions that I've seen basically work like so (using template<class VecType, class ScalarType, int X, int Y>
struct vec2_swizzle_t {
ScalarType v[2];
VecType operator=( const VecType& vec ) {
return VecType(
v[X] = vec.x,
v[Y] = vec.y
);
}
operator VecType() {
return VecType( v[X], v[Y] );
}
}; (Also I'm using the templated solution just for the sake of example here). Where vec2_swizzle_t<float2, float, 1, 0> yx; This looks fine until you get to the following scenario: float3 vec = { 10.0f, 20.0f, 30.0f };
float2 vec1 = vec.xz; The I could be wrong, but that's certainly what it looks like to me. Issue 2: Writing to swizzles with duplicate componentsThis is explicitly stated as banned in the HLSL spec. and it's not hard to see why. Basically, the following is illegal: float2 vec = { 10.0f, 20.0f };
vec.xx = float2( 5.0f, 9.0f ); // duplicate component - no! The same goes for all other operators that do assignment. I think I know of a way around this without bloating the size of vector types. Let me get back to you on that. Issue 3: Swizzles that are larger than the vector they're taking data from means every vector type becomes 16 bytes bigAnd the last issue that I found, to top it all off... So it's totally possible to do the following: float2 vec = { 10.0f, 20.0f };
float4 vec1 = vec.xxxx; But to add that swizzle (and others where the swizzle is larger than the size of the vector) would mean populating the vector's union with a type that is bigger than the vector is intended to be. Meaning the size of a Apologies for the wall of text! I hate to be the bearer of bad news like this. Let me know what you think. If you have any ideas around this I'd love to hear them. 😅 |
Hi! Thank you so much for doing this - finally had a moment to check it out! Both macro and template seem nice/similar, can't say I have a preference. Yeah I see the issues! I'll try to have a go: Issue 1 - potentially undefined behaviourCould one do something like this:
and then used as
so the type size matches all other types in the union? should be fine? unless I'm missing something :) Issue 2: Writing to swizzles with duplicate componentsI've tried a read-only version of macro, for ex:
and then use that for duplicate components:
seems to work! Issue 3: Swizzles that are larger than the vector they're taking data from means every vector type becomes 16 bytes bigYeah, making every type 16 byte long would be bad - that's the only reason I gave up on hlslpp (they have this constraint although for a different reason - see redorav/hlslpp#58) :) It's not just cache locality & memory size but it breaks C++<->HLSL interop because the memory packing becomes different. I wonder if a variation on proposed "3_TO_2" in the other direction (3_TO_4)? They'd be read-only by definition since they always have duplicates. Might work? Also possibly useful would be a float1-like type (supported in HLSL) to allow for float1(1).xxxx? Anyhow, hope this makes sense and apologies if it doesn't - it's rather late and my brain is fried :) |
Hey, thanks for the thoughts. There are some great ideas here! =D Another wall of text incoming I'm afraid, but it's mostly good news: TLDRYou've basically solved all of the problems that I found, now it's just a question of whether we want to represent this via macros or templates because I think the macro implementation is more difficult to read in its current form (updated sample project attached below). I'll wait to hear your thoughts before I go adding this to the library for the next release. Addressing the undefined behaviour problemOh yeah, that would do it. I feel a little stupid I didn't see that before, now. 😅 Duplicate swizzle componentsYes! Well then, that solves that problem! I can make the auto-generator do checks as to whether I want a writeable vs. non-writeable swizzle struct based on the permutation, so I don't think that will be an issue. Returning swizzles larger than the vector they're coming fromSo you actually solved that problem too! I'm showing all of the examples in templates but I've got the macro equivalents written in the attached, updated sample project below: template<class ReturnType, class ScalarType, int X, int Y, int Z>
struct swizzle_2_to_3_t {
ScalarType v[2];
inline operator ReturnType () {
return ReturnType ( v[X], v[Y], v[Z] );
}
};
// then use in the union like so:
// were just limiting which indices we use - simple!
swizzle_2_to_3_t<float3, float, 0, 0, 0> xxx;
swizzle_2_to_3_t<float3, float, 0, 0, 1> xxy;
swizzle_2_to_3_t<float3, float, 0, 1, 0> xyx;
swizzle_2_to_3_t<float3, float, 0, 1, 1> xyy;
swizzle_2_to_3_t<float3, float, 1, 0, 0> yxx;
swizzle_2_to_3_t<float3, float, 1, 0, 1> yxy;
swizzle_2_to_3_t<float3, float, 1, 1, 0> yyx;
swizzle_2_to_3_t<float3, float, 1, 1, 1> yyy;
// same goes for 2-4 and 3-4... I've attached the updated version of the sample project showing all this off if you want to take a look (I've not put much effort into correctly making sure that each type is writeable or non-writeable for the example project because I can do checks in the auto-generator to determine which one I want based on the swizzle permutation anyway): There is a problem with the macro implementation in that I have to write a lot more boilerplate to avoid circular-dependency issues ( To be honest I think the templated implementation is actually nicer to read through (as things currently stand). I know the point of this library is to get rid of horrible, unreadable templated code but I think I prefer it to the macro'd alternative for swizzling (given what the macro implementation has become), but if we decide that the macro version is actually better for whatever reason then I'll roll with that. If you can find any ways to tidy up the macro implementation, I'd greatly appreciate that! Another thought I had was that if we can get the macro implementation neat enough it's possible I could do swizzles in the C99 API too, which would be a big win! ConclusionThe good news though is that works with no undefined behaviour and all vector types retain their size so I'd say the problem of actually generating correct swizzles is solved! It's a just question of how we want to represent that in code now (macros vs templates). I'll wait to hear your thoughts and then I'll get to work on adding this to the library. 😀 Bear in mind that I'm also going to want to generate tests for all of the possible permutations, so this will take a little while to get in, but it will get done and I'll work as quick as I can to get the release out for it. But anyway hopefully this ends up being useful for you in your project; I certainly hope it will! |
Wait, what am I thinking? I'll just make the auto-generator generate the code that the macros generating in that example. Duh! Ok, I'll get to work. =) |
Awesome, looks like the solution is crystallizing :D
yeah I looked at the both and you're right, macro gets harder to read for sure, templates look better!
Ha! Yes :))) Will that avoid the circular-dependency issues? I wonder what happens in .hlsl if you do |
Ok, so I've currently got HLML generating two versions of the updated swizzles (both are attached below). One is swizzles as templates and the other is swizzles as just straight-up generated code. I think now that there aren't any macros to read through the non-templated version looks better, personally. 😉 They both take roughly the same amount of time to compile in the tests and I didn't notice any glaring differences in runtime performance between the two versions either (though that testing was only done by looking at how long the generated tests took to run for each, so let's call that a superficial microbenchmark). Let me know what you think! hlml_new_swizzles_NO_templates.zip This was done in a semi-rush to get this out to you for you so you could take a look so I'll want a little longer to get some more tests in (writing to the writable swizzles, for instance), but otherwise I'm thinking this is ready for release. 😄 |
Hi! I tried out both on my small test case and it seems to work good. I like the templated version more because there's a lot fewer files and it's more readable and if there's no downside on compile times then why not :) Once I have a bit more time I'll expand code coverage to other shaders, I'm sure a lot of stuff will show up that I can report! At the moment I have a local hack for #58 since it seems to happen everywhere in my code - let me know if I can help anyhow :) |
I really appreciate all the testing and exploration you're doing. Thank you very much for that. =) I'm working on #58 right now. Also, I noticed a comment you made earlier in this thread about an optional namespace. I'm going to open up the Discussions section of this repo so I can put some thoughts down about how and why I've made certain decisions about the library (namespaces being one of those things). |
You're most welcome, I find the library really useful - thank you for creating it :) Oh supercool, please do! I've actually tried just doing namespace around the #include but it breaks things with external includes but haven't spent any more time on it. |
Got the first draft of this submitted just now. I still need to add the write tests for the swizzles. I'll do that tomorrow. Compile times have increased a bit when including every single swizzle format (like the tests are). GCC and MSVC are the worst offenders, but Clang still went up as well. I'll look into that before submitting but if you are able to look into that yourself I'd love to know what's going on there. I'm not comfortable shipping this with compile times being what they are. It's not as noticeable when only using one or two of swizzles, so if it really is just down to the fact that the compiler is bringing in a shit ton of symbols then I suppose there's nothing I can do about that? |
Oh uh how bad are the compile times? I can test before and after b04c6db and let you know (it's just going to take a bit of time - probably next weekend). Could the whole swizzle thing be made optional / under #ifdefs so people who don't need it don't have to pay for longer compile times? :) |
Any investigating you are able to do would be greatly appreciated. I appreciate you're doing this in your free time. =) The compile times on Clang wasn't affected too badly, but MSVC and GCC compile times just tanked. I think the |
I'll try it out as soon as I can! I know very little about optimizing compile times though :D |
Hey there, I've made the discussion about namespaces here with my thoughts put down. If you want to give your thoughts/arguments/counter-arguments then please do. =) |
Hi! Both non-template and template versions work identically and my project compiles and links in 22-23 seconds on both with only the main file that uses hlml changed and clean rebuild is ~65s with both - can't measure any difference; I guess I'm not using enough of HLML to stress the compile times and the rest of the project is too large. I vote for the template version, mainly because it has fewer files so source control management is easier, but non-template one is just fine too! :) |
Ok, thanks for profiling this. Are these new compile times you are seeing an increase from before? |
Ok, so it's difficult to compare to before swizzling changes because swizzling allowed me to compile a lot of math code to work on .cpp however I compared to before any hlml integration and it's a fairly noticeable hit: whole project rebuild goes from ~50s to ~65s. I'm ok with that because I need it but... I totally understand that it might be a deal breaker for some people. I don't know why it's that bad :( Perhaps make swizzling optional? I don't have any good ideas. |
I spent some time removing all swizzle uses and got back to around 55-58s for the compile (vs ~65 with swizzles and 50s with no HLML) but this did remove some of the code files too (there were too many swizzle uses in some places, just couldn't refactor everything). Here's a few ideas: a.) make swizzle for .rgba (on top of .xyzw) optional - cuts swizzling code in half (in fact, one could make the whole .rgba support optional... :) ) Also I just noticed this library: https://github.com/gwiazdorrr/CxxSwizzle and they seem to have slightly different swizzle implementation, use example: https://github.com/gwiazdorrr/CxxSwizzle/blob/master/include/swizzle/detail/vector_storage.hpp#L94 |
Hi there, apologies it's taken me so long to respond, but my personal life got in the way for a long time. I'm back now though. =) I've been looking at this a bit deeper and it looks like the only compiler that actually has a problem with this is GCC. Clang and MSVC are fine. When I initially added the new swizzle code I had to add a compiler flag to the GCC build scripts to allow for bigger .obj file outputs because I was going over the default GCC limit. I think that is what's causing GCC to have such big compile times. Simply removing the RGBA swizzle formats didn't make a difference on GCC, but I still have the compiler flag on. Perhaps this is also to do with my build setup for the tests. I'm using unity builds so the final output intermediate binaries are going to be big. I wonder if moving over to smaller source files makes it easier for the compiler. I imagine I'd be trading compile time for link time though. I'll investigate and get back to you. Once again, apologies for the delay, but I'm back now and actively working on this again. |
Hi there! Sorry to hear that and hope all is good now :) Oh interesting - the compile flag, who would've guessed! Maybe they use a fallback path for large .obj files that is simply slow by design? All the best and thanks for looking into this, much appreciated! |
Hi there, I've been looking into this for a while now and it doesn't matter what I try to get these GCC build times down, it would appear that GCC is just slow. Compile times on it are bad no matter what I do. I'm going to make a new release very soon with the new swizzles and I'll be including both XYZW and RGBA for all vector types (including I'll start thinking of ways to elegantly let people only include bits of the API that they need so that if they're using HLML they can reduce the compile times on their own projects if they need to. Sorry this issue has been active for so long, but man, what a journey! Expect the new swizzles to come out in v2.2.0. I'll close this out when that gets released. 😄 |
Fixed in v2.2.0. |
Awesome, thank you for the update and the great project - I haven't been doing much (any) coding outside of work but I was just thinking about getting back to it so it's perfect timing to try out the 2.2.0 - I'll let you know if I spot anything useful :) |
You are most welcome! I'm happy my software is actually helping someone make something useful! Makes me very happy! And yes! Please do give any and all feedback you have. I can't promise I'll definitely act on it, but I will definitely listen to and take on board any all feedback/criticisms that people give me and definitely not just dismiss it straight away. ❤️ |
One of the main issues trying to compile a couple of generic HLSL functions with HLML was swizzling. https://github.com/redorav/hlslpp has a templated solution to this and it looks like a perfect thing for autogeneration: https://github.com/redorav/hlslpp/tree/master/include/swizzle :)
The text was updated successfully, but these errors were encountered: