|
许多 16 sp 的显卡只有 256MB 内存容量,仅仅能满足 CUDA decoding 的最低要求,如果在 CoreAVC CUDA enabled 的情况下选择 overlay mixer renderer 的话,甚至会出现由于内存容量不足出现 frame flip 断档的问题。
视频播放本身不仅仅是解码,还有一些前处理和后处理,例如色彩空间转换、resize 等处理,这些部分在 CoreVC 中是有可能由 SP 完成。
http://cbaoth.dk/~cbaoth/nvcuvid.pdf
CUDA Video Decoder
The CUDA Video Decoder API gives developers access to the VP2 video processor on NVIDIA GPUs. This API supports the following video stream formats: MPEG(1/2) and H.264. This API enables developers to decode video streams on the GPU and process the decoded uncompressed surfaces within CUDA programs. The decoded surfaces can be transferred back to system memory using CUDA’s fast asynchronous read-backs, or the application can use CUDA’s 3D interoperability features to render the surfaces using a 3D API (OpenGL or DirectX).
Processing and Displaying Frames
The application’s main loop retrieves images from the FrameQueue (copyDecodedFrameToTexture() in videoDecode.cpp) and renders the texture to the screen. The DirectX device is set up to block on monitor vsync, throttling rendering to 60Hz for the typical flat-screen display. To handle frame rate conversion of 3:2 pulldown content, we also render the frame multiple-times, according to the repeat information passed from the parser.
copyDecodedFrameToTexture() is the method where the CUDA decoder API is used to map a decoded frame (based on its Picture-Index) into CUDA device memory. Post processing on a frame is done by mapping the frame through cudaPostProcessFrame(). This returns a pointer to a NV12 decoded frame. This then gets passed to a CUDA kernel to convert NV12 surface to a RGBA surface. The final RGBA surface is then copied directly into a DirectX texture and then drawn to the screen. |
|