Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PySurfaceDownloader (minor) issue: too fast! #71

Open
Yves33 opened this issue Jul 23, 2024 · 3 comments
Open

PySurfaceDownloader (minor) issue: too fast! #71

Yves33 opened this issue Jul 23, 2024 · 3 comments

Comments

@Yves33
Copy link
Contributor

Yves33 commented Jul 23, 2024

Hi again!
On my machine (core i7 12700 / GeForce RTX 3060), I find that running successfully.

PyDecoder.DecodeSingleSurface(nv12_surface)
PySurfaceDownloader.Run(nv12_surface, nv12_cpu_buffer)

may result in altered nv12 buffer ( top of image is cropped - see reconstituted and resized image from buffer below)

PIL_07

the problem

  • only appears with hi res videos (here 5760 x 2880)
  • does not appear when the same code is run in jupyter notebook (or at least appears less frequently...)
  • does not appear if pycuda is used to copy to cpu (required GpuMem branch)
  • does not appear if frame is converted to rgb on gpu then downloaded.
  • can be solved by introducing 2ms delay between both operations (time.sleep(0.002))

minimal code to reproduce:
https://github.com/Yves33/Vali_luma_chroma_shift/blob/main/Vali_nv12_download.py

it seems to me that the download starts before decoded frame is ready.

@RomanArzumanyan
Copy link
Owner

Hi @Yves33

Thank you for the detailed analysis.
I suspect the actual source of problem is here:

static void CopyToSurface(AVFrame& src, Surface& dst) {
CUDA_MEMCPY2D m = {0};
m.srcMemoryType = CU_MEMORYTYPE_DEVICE;
m.dstMemoryType = CU_MEMORYTYPE_DEVICE;
for (auto i = 0U; src.data[i]; i++) {
m.srcDevice = (CUdeviceptr)src.data[i];
m.srcPitch = src.linesize[i];
m.dstDevice = dst.PixelPtr(i);
m.dstPitch = dst.Pitch(i);
m.WidthInBytes = dst.Width(i) * dst.ElemSize();
m.Height = dst.Height(i);
CudaCtxPush push_ctx(GetContextByDptr(m.dstDevice));
ThrowOnCudaError(cuMemcpy2D(&m), __LINE__);
}
}

After I've switched to default CUDA stream and started to push context created by FFMpeg, the reproduction ratio fell but still some decoder unit tests fail to pass from time to time.

I'll continue investigation on this. Maybe if I make FFMpeg use the same CUDA context as VALI, it will be solved. Anyway, as I find something I'll come back to you.

@RomanArzumanyan
Copy link
Owner

Hi @Yves33

Please check out latest version 3.2.10. Looks like it solves the issue, at least 2 previously unstable hw decoder unit tests are now passing.

@Yves33
Copy link
Contributor Author

Yves33 commented Jul 23, 2024

version 3.2.10 sovles the issue (at least with my test videos...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants