Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GpuMem() #44

Open
Yves33 opened this issue Jun 5, 2024 · 15 comments
Open

GpuMem() #44

Yves33 opened this issue Jun 5, 2024 · 15 comments

Comments

@Yves33
Copy link
Contributor

Yves33 commented Jun 5, 2024

Hi,
Trying to port the opengl example from VPF to VALI, it seems that GpuMem() is not accessible from python anymore.
Is it on purpose?

Y33

@RomanArzumanyan
Copy link
Owner

Hi @Yves33

You can still get raw CUDA device pointer, though it's now available via Surface class. You no longer need to mess with PlanePtr.

Please follow VALI documentation on this topic:
https://romanarzumanyan.github.io/VALI/PyNvCodec.html#PyNvCodec._PyNvCodec.Surface.PlanePtr

@Yves33
Copy link
Contributor Author

Yves33 commented Jun 5, 2024

may be stupid, but among these fields, which ones gives me access to raw cuda ptr/buffer?
CudaBuffer has GpuMem(), but there is no api to get CudaBuffer associated to Surface or PlanePtr

class PyNvCodec._PyNvCodec.Surface¶
    Clone(self: PyNvCodec._PyNvCodec.Surface) → PyNvCodec._PyNvCodec.Surface
    Empty(self: PyNvCodec._PyNvCodec.Surface) → bool
    Format(self: PyNvCodec._PyNvCodec.Surface) → PyNvCodec._PyNvCodec.PixelFormat
    Height(self: PyNvCodec._PyNvCodec.Surface, plane: int = 0) → int
    HostSize(self: PyNvCodec._PyNvCodec.Surface) → int
    Make(format: PyNvCodec._PyNvCodec.PixelFormat, width: int, height: int, gpu_id: int) -> PyNvCodec._PyNvCodec.Surface
    Make(format: PyNvCodec._PyNvCodec.PixelFormat, width: int, height: int, context: int) -> PyNvCodec._PyNvCodec.Surface
    NumPlanes(self: PyNvCodec._PyNvCodec.Surface) → int
    OwnMemory(self: PyNvCodec._PyNvCodec.Surface) → bool
    Pitch(self: PyNvCodec._PyNvCodec.Surface, plane: int = 0) → int
    PlanePtr(self: PyNvCodec._PyNvCodec.Surface, plane: int = 0) → PyNvCodec._PyNvCodec.SurfacePlane
    Width(self: PyNvCodec._PyNvCodec.Surface, plane: int = 0) → int
    static from_dlpack(capsule: capsule, format: PyNvCodec._PyNvCodec.PixelFormat = <PixelFormat.RGB: 2>) → PyNvCodec._PyNvCodec.Surface


class PyNvCodec._PyNvCodec.SurfacePlane
    ElemSize(self: PyNvCodec._PyNvCodec.SurfacePlane) → int
    Height(self: PyNvCodec._PyNvCodec.SurfacePlane) → int
    HostFrameSize(self: PyNvCodec._PyNvCodec.SurfacePlane) → int
    Pitch(self: PyNvCodec._PyNvCodec.SurfacePlane) → int
    Width(self: PyNvCodec._PyNvCodec.SurfacePlane) → int

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jun 5, 2024

@Yves33

I beg your pardon, that's my bad. Surface.PlanePtr indeed returns a reference to SurfacePlane, not the raw memory pointer.

Latest VALI relies on DLPack to share the memory with other frameworks. The main reason for that is the horror of different pixel formats and their memory layout. It's still possible to extract the raw pointer from DLPack capsule however.

That was the point I was trying to explain in my initial reply ))

P. S.
The reason behind this decision is to keep "raw" 2D memory allocation represented by SurfacePlane separate from meta-data kept within Surface. It's an easy job to add GpuMem method back to SurfacePlane API but imagine a code snippet as below:

# 1920x1080 decoded picture form video file
surface_nv12 = decode_surface("/path/to/fullhd/file.mp4")

# Same picture converted to planar RGB
surface_rgb = to_rgb(surface_nv12)

# And the same picture once again, this time converted to YUV420
surface_yuv420 = to_yuv420(surface_nv12)

# Will output 1620
print(surface_nv12.PlanePtr().Height())

# Will output 3240
print(surface_nv12.PlanePtr().Height())

# Will output 1080
print(surface_yuv420.PlanePtr().Height())

It is completely counter-intuitive unless you are very familiar with various pixel formats memory layout and there's no way to verify parameter as simple as picture height without introducing additional API to Surface class to query information about pixel format context.

P. P. S.
BTW, DLPack and CAI (CUDA Array Interface) specs serve exactly this purpose - they store a raw pointer + metadata required to handle it.

@Yves33
Copy link
Contributor Author

Yves33 commented Jun 6, 2024

@RomanArzumanyan
I'll try to get access to raw pointer through dlpack structures, but...

I definitivelly advocate for getting back GpuMem() in SurfacePlane (As you stated, reintroducing is trivial, but I don't thing it's worth maintaining my own fork for that)

  1. In present version of VALI, SurfacePlane.__repr__() dumps the values of Cuda ctx and Cuda Ptr. The programmer (I) expects to have access to these fields in Python API.
    (don't confuse: I'm not asking to remove this information from __repr__(), but to get access to SurfacePlane.GpuMem() and SurfacePlane.Context() through python API)
    nb: using GpuMem= int(src_plane.repr().split(':')[-1].strip()) works, but seems awfull hack to me!

  2. accessing GpuMem() enables direct implementation of Cuda <-> GL Texture tranfer according to the official NVIDIA guidelines ("https://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf")

  3. DLPack is obviously the solution for interoperability with scientific/deep learning frameworks, but seems less suitable and widespread when it comes to graphics programming (couldn't find any example of updating texture/PBO through DLPack objects).
    I'm totally new to DLPack, but from what I understand, PyCapsule is an opaque type and accessing the raw pointer from python requires working with ctypes api, which is much more complex and unreadable than pycuda/cuda python memory copy. Most frameworks use DLPack to transfer pointers between C libs (which is the original goal of DLPack). and don't directly use __dlpack__ from python
    https://docs.python.org/3/c-api/capsule.html
    "This subtype of PyObject represents an opaque value, useful for C extension modules who need to pass an opaque value (as a void* pointer) through Python code to other C code"

  4. I don't see the point with PlanePtr().Height().

  • We are talking about VideoPlanes, not full images (I understand that surface always returns the same Height/Width, whatever the format, but VideoPlanes are not images).
  • VALI API (your code snippet), in its current state already returns different values for Height() and is already "totally counter-intuitive". This has nothing to do with accessing GpuMem() (for clarity, one could optionnally rename SurfacePlane.Width to RowWidth or BytesWidth, and SurfacePlane.Height to NumRows - but not necessary IMHO)
  • Programmer playing with converters (to_yuv, to_rgb), will easily keep track of pixel format. Surface has Format() api which returns the pixel format, and no additional API is required (?).
  • accessing the SurfacePlane through dlpack will not convert the pixel format anyway, and programmer still has to perform the conversion.
  1. Other Frameworks (PyAV, gpac) give access to raw video frame pointer/raw video data for each video plane (plane[x].buffer_ptr for pyav, packet.data for gpac)

  2. It maintains the compatibility with VPF

  3. It enables access to video frames though pycuda and cuda python (an maybe some other python bindings for cuda)

  4. Having both a high level api (dlpack) and a low level api (cuda pointers) is more an advantage than a problem!

@RomanArzumanyan
Copy link
Owner

Hi @Yves33

accessing GpuMem() enables direct implementation of Cuda <-> GL Texture tranfer according to the official NVIDIA guidelines ("https://www.nvidia.com/content/GTC/documents/1055_GTC09.pdf")

Yeah, it was OK 15 years ago. Technology has gone a long way since then. IMO it's no longer a good design if OOP developer has to mess with pixel format and bit depth and CUDA memcpy only to export their picture into OpenGL Surface. Chunks of GPU memory are already there, the whole affair is a weird ritual !

Compare the old school GpuMem() approach

# Export to PyTorch tensor.
surf_plane = rgb24_planar.PlanePtr()
img_tensor = pnvc.makefromDevicePtrUint8(
    surf_plane.GpuMem(),
    surf_plane.Width(),
    surf_plane.Height(),
    surf_plane.Pitch(),
    surf_plane.ElemSize())

# This step is essential because rgb24_planar.PlanePtr() returns a simple
# 2D CUDA pitched memory allocation. Here we convert it the way
# pytorch expects it's tensor data to be arranged.
img_tensor.resize_(3, target_h, target_w)

With DLPack:

img_tensor = torch.from_dlpack(rgb24_planar)

I don't want to resurrect the pixel format and bit depth mess. Probably, a layer of CUDA <> OpenGL glue hidden within PyNvCodec C++ land would solve the issue but it has nothing to do with SurfacePlane.GpuMem.

Even better approach would be to implemet from_dlpack method somewhere which takes DLPack capsule and returns OpenGL surface.

It maintains the compatibility with VPF

That's not a goal of VALI project. There's no active VPF development going on for quite some time. Sooner or later, the paths will diverge.

VALI exists on a shoestring budget, so adding a whole new facet which OpenGL (and graphics in general is) doesn't justify the effort in current circumstances. I have zero experience in it and won't be able to develop and maintain this part of codebase.

If there's a strong demand from community backed up by active participation in development, testing and maintenance - that's a different story but so far 99% of user requests are related to everything Video and ML. Within current development resources budget I'll keep maintaining Video and ML requests.

@Yves33
Copy link
Contributor Author

Yves33 commented Jun 6, 2024

Hi,

not totally convinced, but hey! you're the boss, and surely much more competent than I am regarding video/gpu (and I'm free to re_introduce GpuMem() )
Thanks a lot anyway for taking time to answer, and for the nice software!

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jun 6, 2024

Hi @Yves33

I'm not here to insist on my decisions disregarding the common sense. So let's do what:
If you make a PR with implementation and meaningful unit test for the SurfacePlane.GpuMem method which doesn't operate magic numbers describing the dependency between Surface and SurfacePlane dimensions - I'm going for it.

@Yves33
Copy link
Contributor Author

Yves33 commented Jun 7, 2024

Hi @RomanArzumanyan

I'll try to go with your PR suggestion, although I still don't understand the point with Surface and SurfacePLane dimensions (which makes it difficult to propose some relevant unittest...)
in present version

  • SurfacePlane.Width() returns Width in Bytes (documentation states Pixels, but this is misleading. at least for RGB).
  • SurfacePlane.Height() return Height in Rows/Lines (which, as stated in documentation, may differ from actual Surface.Height())

In the meanwhile, found how to get pointer from Surface.PlanePtr(). Posting this simple snippet

import ctypes
## wget https://github.com/dmlc/dlpack/blob/main/apps/numpy_dlpack/dlpack/dlpack.py
from dlpack import _c_str_dltensor, DLManagedTensor

ctypes.pythonapi.PyCapsule_GetPointer.restype = ctypes.c_void_p
ctypes.pythonapi.PyCapsule_GetPointer.argtypes = [ctypes.py_object, ctypes.c_char_p]

def GpuMem(plane:SurfacePlane):
    '''
    retrieves the pointer to SurfacePlane GPU data
    briefly adapted from adapted from https://github.com/dlpack/apps/numpy_dlpack/dlpack/to_numpy.py
    stripped down version returning only pointer to GPU memory, to be used with pycuda.driver.Memcpy2D().set_src_device(...)
    '''
    pycapsule=plane.__dlpack__()
    dl_managed_tensor = ctypes.pythonapi.PyCapsule_GetPointer(pycapsule, _c_str_dltensor)
    dl_managed_tensor_ptr = ctypes.cast(dl_managed_tensor, ctypes.POINTER(DLManagedTensor))
    return dl_managed_tensor_ptr.contents.dl_tensor.data

@RomanArzumanyan
Copy link
Owner

RomanArzumanyan commented Jun 7, 2024

Hi @Yves33

SurfacePlane.Width returns Width in Bytes.

I don't think so, here's the code of constructor:

SurfacePlane::SurfacePlane(uint32_t width, uint32_t height, uint32_t elem_size,
DLDataTypeCode type_code, CUcontext context,
bool pitched) {
m_own_mem = true;
m_width = width;
m_height = height;
m_elem_size = elem_size;
m_dlpack_ctx.m_type_code = type_code;
Allocate(context, pitched);
}

And of the getter:

/* Get width in pixels;
* Will return 0 if SurfacePlane is blank.
*/
inline uint32_t Width() const noexcept { return m_width; };

If you think you found a bug, please provide a MVP code snippet.

documentation states Pixels, but this is misleading. at least for RGB

E. g. your RGB Surface has width of 1920 pixels. It's first and only SurfacePlane will have a width of 1920 x 3 = 5760 pixels because SurfacePlane knows nothing about RGB pixel format. It just stores 1920 x 3 x 1080 picture elements, 1 byte each.

Also please note that for PyNvCodec.PixelFormat.RGB element size is 1 byte, so size in pixels will be equal to size in bytes.

@Yves33
Copy link
Contributor Author

Yves33 commented Jun 8, 2024

Hi,

Did not express myself well. What I wanted to say is that, as you stated, SurfacePlane.Width and Height refer to the Buffer shape (depending on pixel format), and not to the size of the image. (I may be imprecise. It seems to me that it may be the case from the beginning...)

Made a (very) simple PR to expose GpuMem() in SurfacePlane (and ONLY in surfacePlane). I can invest some more time on unittests, but need some more explanations of what you expect, as the PR does not modify anything to actual VALI behavior.

@RomanArzumanyan
Copy link
Owner

Hi @Yves33

I can invest some more time on unittests, but need some more explanations of what you expect

Sure thing. When you add GpuMem method, it gives you a raw mem address. So basically you need to check if returned address is correct.

Usually it's done by Device To Host memcpy and compare against ethalon. Or you can copy the same information to another data class which is known to be working fine and comparing memory obtained by SurfacePlane.GpuMem() with that data class content.

You can follow this example:

def test_nv12_rgb(self):

The only difference is you don't need to do the color conversion, just upload your data to SurfacePlane and then check if GpuMem returns you a valid pointer by memcmp of your choice.

@Yves33
Copy link
Contributor Author

Yves33 commented Jun 10, 2024

Thanks,,

That's a lot clearer to me. Just have to go for it, now.

Y

@Yves33
Copy link
Contributor Author

Yves33 commented Jul 13, 2024

Not really comfortable with Git. I think I deleted my previous pull request while trying to adapt to lastest API...
Anyway, made a new pull request. Only unittest file is modified compared with previous one.

@rcode6
Copy link

rcode6 commented Jul 15, 2024

Just wanted to throw another vote in for having the raw mem pointer exposed. Not all the libraries I use support dlpack, and there are times I also manipulate the memory with pycuda or cuda-python.

@Yves33
Copy link
Contributor Author

Yves33 commented Sep 18, 2024

Updated the PR to match with latest API

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants