Interest in or need for full H/W-accelerated video presentation?

I’d like to get some feedback on how well the presentation of high-res / high-framerate videos works with the current moviestim versions. If the video decoding and presentation steps take too long, it can be expected that the playback will stutter, and if this is happening in the real world, how much interest is there to have a fully H/W accelerated pipeline for video presentation?

The current moviestim routines either don’t even use H/W accelerated decoding, or may use with VLC, however, as far as I can read the code, the decoded video frame is downloaded to CPU memory and then uploaded again to the GPU for presentation in all versions. This takes time, and with larger video frames or on high-framerate screens, the delay might just be too much. I have a solution for this using gstreamer, but it’s a bit of a hack and only works on nvidia cards (as only the nvcodec plugin can export OpenGL textures) and only on Windows, and, naturally, it also needs a local gstreamer installation.

Even though I expect this to be a niche problem, if there was interest in it, I could try to pretty up the code to make it fit for the public, say, in the form of another moviestim.

@mdc is currently working on a whole new system for playing movies; I’m not up on the details but he may currently be implementing some of what you’re talking about, so it’s definitely worth syncing up to avoid duplicating work.

In general, we’re always open to community contributions via GitHub :slight_smile:

Yes, indeed, and to expand on Todd’s response, the work that @mdc has been doing recently is intended to:

  • make it easier for us to try additional engines (and gstreamer is one clear candidate we’d talked about) for movie rendering. Matthew’s new system adds a single movies.MovieStim class where you specify the backend you want to use, rather than us having a growing plethora of MovieStim classes
  • add support for a better-performing engine (ffpyplayer) that still uses ffmpeg for the decoding, so it isn’t hardware accelerated, but uses cython for faster integration. Matthew has been seeing pretty high performance off that and we’ve already verified that the packaging step is simple

That effort is currently sitting in a draft pull request:

Regarding gstreamer, my experience there is mixed. It has the advantages you point to but is famously hard for users to get installed and running. So it scares me that this will be hard to package if we can get it working at all. But I’m keen to hear what you’ve achieved and how (e.g. what version of gstreamer and which python wrapper are you using?) because in theory I agree this should be the best performer.

It might be worth us actually grabbing a zoom meet with you to chat about this

thanks for your input
Jon

I’m happy to see that there’s interest in this topic. Quite a lot of points to respond to.

my experience with gstreamer

My installed version is currently 1.20.1.1, compiled from their latest source a few weeks ago. Unlike what @jon wrote, I haven’t had serious issues either installing or using gstreamer, but my experience is limited to video playback on Windows. Around the time I started using it (2019), Linux seemed to be their main target with less focus on Windows, since then, however, their support for Windows has improved.

That said, their support for Python on Windows isn’t great; I’ve been told it’s possible to compile it via cygwin, but I didn’t want to bother with that. For movie playback on PsychoPy, I wrote a C library that interfaces with gstreamer development libraries, handles initialisation, and exports a few simple functions, such as fetching the next frame. For these few routines, I wrote bindings in ctypes, because I want to keep it simple as long as I’m just experimenting with it. Cython would certainly be a more performant way to do it. There’s plenty of work to do to get this code into a publishable state, and I think it wouldn’t be that difficult to omit the C library completely - the number of routines that need bindings isn’t that great. Linux has python bindings available so things should be even simpler. Mac, I don’t know.

what to expect from gstreamer

I haven’t done a lot of research about platforms other than Windows, so I cannot say how many platforms gstreamer supports or how it compares to ffmpeg, but I believe it’s worth supporting if only for the fact that it’s one of the largest and most well-known systems of its kind. What I can say based on my experience in Windows, and what also should work on Linux (and other platforms?) is this:

  • It can be used the same way as FFmpeg. With automatic modules like playbin, it selects the appropriate codecs needed to play back a movie. Frames can be fetched from the pipeline (I’m using appsink for this purpose).
  • Even better, the automatic codec selection process can be influenced by adjusting codec priorities. I’ve only started to look into this, but if it works well, it could pick h/w accelerated codecs automatically when available, and fall back to CPU-based ones when not. As far as I’m aware, this cannot be done with ffmpeg(?).
  • It’s also possible to manually pick codecs, if one knows that it’s available and matches the media stream. I believe this is the way to go when one wants h/w acceleration in ffmpeg - first inspect the media to detect the format, pick the appropriate codec (using your own knowledge/logic) and supply it as a parameter to ffmpeg. This has been possible to do in PyschoPy for ages (check out my post from 2019).
  • Gstreamer can even do a fully GPU-based pipeline (that is, after the file has been loaded and demuxed). PsychoPy uses OpenGL, and gstreamer can supply frames as OpenGL textures, which can be drawn to the window, with essentially the same drawing code that exists in moviestim modules. For now, the only pipeline that can do this is based on NVidia’s nvcodec, which is capable of outputting GLMemory textures in the same GL context that’s used by PsychoPy. There’s another similar pipeline based on d3d11 but it doesn’t yet support OpenGL output. I’ve seen code for DirectX → OpenGL on-GPU texture zero-copy sharing, but last I checked it wasn’t available in gstreamer.

Certainly, h/w acceleration is also available in FFmpeg, and perhaps it’s also capable of automatically picking an accelerated path if available. In my experience, however, the only consistent way to ensure that the accelerated codec is picked was to manually specify the video codec for FFmpeg. This isn’t that similar in gstreamer, but gst can use codec priorities, as I mentioned above.

However, I did some testing on high-res (4K) video some time ago, and the CPU-based decoding in PsychoPy simply couldn’t reliably do 60 Hz without often missing some frame deadlines. Not on the systems I’ve tested, anyhow. Same should apply for high-framerate media (imagine an experiment that needs 140Hz+ display and wants to use movies). Even with the h/w acceleration-enabling hack for ffmpeg, PsychoPy still copies the frame back from and back to the GPU, which adds too much overhead.

With the fully-h/w pipeline and 4K video playback, frame fetching takes very little time (1-2 ms on i7-8700k + rtx 3090 + win 10 x64), which isn’t surprising as gstreamer decoding runs in a separate process and on the GPU, and only a texture ID changes hands. Drawing that texture with Pyglet takes more time (code modified from moviestim), with occasional peaks around 7-8 ms. I wonder if that’s because those are calls from Python. With a shared GL context, however, gstreamer can in theory directly draw onto the window, which could speed things up (haven’t tested this). Even if this may not be needed for the majority of users, but wouldn’t it be great if PsychoPy had the ability to do this as well? I mean, in addition to supporting gstreamer and having a clean and stable movie playback function.

new moviestim code

I’ve spent a little time peeking into the new moviestim code in @mdc’s repo. I guess I’m not the only one who found having several moviestim modules confusing, so it’s good to see that things are going to be cleaned up. My thoughts (to be taken with a pinch of salt as I didn’t look very carefully):

  1. I think it wouldn’t be extremely difficult to code a player module for gstreamer, such as the one for ffpyplayer.
  2. However, the current Frame class design and the OpenGL rendering code may need to be modified to accommodate on-GPU textures (OpenGL texture ID) as an alternative to keeping all the frame data, if you think that that’s worth supporting.
  3. I didn’t notice any special code for enabling h/w acceleration in ffmpeg either - I wonder if that’s being left to ffmpeg to decide, and if so, how well does it work these days?
1 Like

Yes, the fully GPU-based approach would be the main benefit that would come from GStreamer, if it can be made to work. That stuff can be hard to code because OpenGL is a state machine and doesn’t like being run in more than one thread - you have to be careful about which thread is making changes and when, or you get memory errors and random crashes, but probably you knew that.

But do check out the ffpyplayer option before diving in. Although that does indeed copy the frame to the GPU, @mdc has found it to be much higher performance than the previous options. I don’t know whether he did anything to request hardware decoding but my guess is that it uses whatever ffmpeg does by default. Maybe that’s something that has improved lately?

Regarding whether it’s now easy to install, there are now binaries to download but they aren’t code-signed, on macos at least, so installing those requires the user to know about overriding the gatekeeper for sure.

Then I just tried to install the official gst-python wrapper to see if the libs were working with that, but it doesn’t even provide a setup.py let alone a wheel. Of course, this isn’t needed if we just call the lib with our own ctypes/cython code, but it still leaves me worried

My knowledge on OpenGL is very superficial. I vaguely remember reading something about sharing certain objects between OpenGL contexts being possible, perhaps this is why/how it works. However, I guess that the gstreamer devs knew how to make it happen because it does work. I guess this means more testing is needed to make sure it doesn’t only work on my computers. In practice, Gstreamer only fills in the textures, and hands over the texture ID. Drawing is handled by my code, and would be handled by PsychoPy.

I’ve already tried ffmpeg with h/w acceleration, so the main difference, I assume, would come from faster bindings with cython. I’ll have a look.

I’d say the main requirement for any user wanting to display a movie is that the frames are decoded in time, no matter whether h/w acceleration is used or not. After a certain complexity (resolution, framerate), however, that acceleration is more than likely to be necessary. For those cases, it’s good to be able to require h/w processing. I don’t know if ffmpeg can do that easily.

I’m not familiar with Mac.

Also on Mac? When I asked on their forums a while ago how to make the bindings work on Windows, I was told that "some people could do it with cygwin". I’m not sure it was their fault that it didn’t compile; as I recall, it was another library (perhaps gobject-introspection) that was causing the trouble. Nevertheless, I felt that maybe it wasn’t something they had time to focus on and that it would be better not to rely on it at all. The bindings, not gstreamer.

Overall, I think probably only a limited number of users actually needs full-GPU processing, but those who need it might be willing to invest in the right system (OS, GPU, time) to make it work. So, for PsychoPy, supporting ffpyplayer in the first place seems to be a good choice; it would be OK if gstreamer support is a bit shaky, as long as it can be made to work with a little effort.

1 Like

Great, yes. GST will be great to add if someone has the time to work it through (and if I’m able to package it into the standalone installers). In the meantime ffpyplayer is easier and should work for most use-cases I believe

I’ll keep on playing around with gstreamer in my free time, see if I can make it work reliably enough.

I have another but related question regarding external bindings. My understanding is that all methods (cffi, cython, etc) with the exception of ctypes require a compiler to be available to be built. PP so far has no part that require compiling (as far as I’m aware). My question is how much it would complicate things for PP to include anything that does, e.g. would it be acceptable to include something in cython, or even in C?

For most users it won’t be an issue - we would provide the built versions in the Standalone.
It makes it harder to have a pip install that never fails. On Win/MacOS we can provide wheels and probably need 1 for each major Python version (maybe less). On linux, it’s harder to provide a single compiled wheel for all versions, although the manylinux project has aimed to help in this domain and works for many libs

Anyone installing to a Python version/OS for which we don’t build a wheel will need a compiler installed on their machine

ctypes-based options are definitely easier to package for sure