Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NV12/YUV->RGB colour accuracy and CUDA #3799

Open
gtebbutt opened this issue Jun 6, 2024 · 5 comments
Open

NV12/YUV->RGB colour accuracy and CUDA #3799

gtebbutt opened this issue Jun 6, 2024 · 5 comments

Comments

@gtebbutt
Copy link

gtebbutt commented Jun 6, 2024

I've noticed some odd colour space conversion issues when using the yuv_to_rgb function in the otherwise very helpful NVDEC tutorial - it seems to be subtly but visibly shifting colours and/or clipping the dynamic range, but I'm not certain why. Originally thought there might be issues between BT.601/BT.709/BT.2020 content, but trying other python functions using those matricies didn't seem to help; it could definitely be my error somewhere, but I wasn't able to get correct colour output on anything that'd been through the implicit NV12->YUV444 conversion step.

Since there's been some discussion on moving the colour space conversion to CUDA anyway, I wanted to flag this implementation in case it's helpful. We ended up seeing a significant speed increase using that rather than applying conversions in tensor format, with all colours coming back exactly as expected.

cc dmlc/decord#283 (comment)

@vadimkantorov
Copy link

vadimkantorov commented Jun 17, 2024

The idea of adding tested/fast colorspace conversions was not supported at the time in torchvision: pytorch/vision#4029

But maybe then torchaudio could host such functions

@gtebbutt
Copy link
Author

Interesting - colour space conversion is mentioned on the torchaudio to-do list at #3139, so there does seem to be some intention to add it, at least. There's also the existing NV12->YUV444 conversion built into StreamReader, so perhaps priorities have changed a bit now that the decoding is being pulled out into the torio namespace?

@vadimkantorov
Copy link

Maybe even then some of these could be upstreamed later into torch core to be available across the board... Regarding torio, really hoping some torch-prefixed name can be invented before it's too late and we are stuck with both torch-prefixed-domain-library-zoo and tor-prefixed-domain-library-zoo :)

@NilanEkanayake
Copy link

NilanEkanayake commented Jul 9, 2024

Here's a fixed version of the function:

def yuv_to_rgb(frames):
    frames = frames.to(torch.float)
    y = frames[..., 0, :, :]
    u = frames[..., 1, :, :]
    v = frames[..., 2, :, :]

    r = 1.164*(y - 16) + 1.596*(v - 128)
    g = 1.164*(y - 16) - 0.813*(v - 128) - 0.392*(u - 128)
    b = 1.164*(y - 16) + 2.017*(u - 128)

    rgb = torch.stack([r, g, b], -1)
    rgb = rgb.clamp(0, 255).to(torch.uint8)
    return rgb

@NilanEkanayake
Copy link

NilanEkanayake commented Jul 9, 2024

It seems that the decoded YUV is range-limited, so when re-converted back to RGB without taking that into account, it gets washed out (and the function above is likely not perfect, as a solid black video without clamping goes from -5 to 182, clamped to 0 to 182).
Surprisingly, found the working conversion code here: https://stackoverflow.com/questions/38530587/full-range-ycbcr-to-rgb-and-back-in-opencv-not-reversible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants