Boosting performance with C extensions in PsychoPy

Hello,

I encountered some performance issues when working on stereo support, resulting in frames being dropped periodically (2-5 ms off @ 60Hz on my PC). I attempted several optimizations with little success to eliminate these issues, but it seems that these performance bottle necks originate from OpenGL calls, likely related to context/FBO binding and immediate mode calls. A solution I’m considering is to move rendering code to C extensions (like *.mex files in PTB/MATLAB) to boost the performance of these time critical operations.

Some general questions to ask: Can modules written in C (they can be built with setuptools w/o Cython) make it into core PsychoPy? Has anyone done this in the past, for what reason, and did it solve your performance problems?

1 Like

Technically yes but, so far, we’re not doing any compiling. It will make my life harder because it means releases have to be compiled on each platform/architecture whereas right now the pip installer gets wrapped up once and works everywhere.

What problems might this solve? You’ll get significant boosts on things where there are lots of python calls being made but so far I’ve always got round this by optimising Python/OpenGL code (e.g. ElementArrayStim). Generally a single line of python code will have very little overhead.

1 Like

Technically yes but, so far, we’re not doing any
compiling. It will make my life harder because it means releases have to
be compiled on each platform/architecture whereas right now the pip
installer gets wrapped up once and works everywhere.

Python extensions are considerably easier to compile nowadays across platforms. Microsoft supplies a compiler specifically for building extensions for Python that setuptools picks up. However, I did need additional libraries like GLEW to access OpenGL +2.0 functions. I see your point, moving to compilers will complicate things and might not be necessary given some tests I did …

I did experiment using extensions by overriding methods in the Window class that make OpenGL calls. It actually worked! But the performance gain was negligible, so I’m assuming the frame drops might be driver related. In addition, PTB fails sync tests on this machine (Win 10, i7 XPS workstation GTX 650, dual 1080p) when throwing up a stereo screen (mode 5) with no drawing in the main loop. On the other hand, in 1000 flips PyschoPy misses a dozen or so frames by +/- 4ms but runs at a solid 60Hz when drawing text to off-screen buffers and blitting to a window spanning multiple screens (current stereo branch). I’m still trying to figure out what’s happening here to cause the difference, but C extensions in PsychoPy’s core might be overkill.

What problems might this solve? You’ll get significant boosts on
things where there are lots of python calls being made but so far I’ve
always got round this by optimising Python/OpenGL code (e.g.
ElementArrayStim). Generally a single line of python code will have very
little overhead.

The stereo rendering pipeline I’m working on makes several context binding calls which are expensive, there is no way around this. However, my extensions didn’t speed up the process much given we are already hitting a performance ceiling. Any dropped frames seems to be a driver issue or something else interrupting consistent timing.

At this point I must agree, PsychoPy can work well with C extensions doing rendering, but it does an excellent job without them.

Missing by a few ms is not likely to be a “missed frame”; more likely it’s just that something was occurring around the frame time that caused the return from swap_buffers (within flip() ) to be delayed. If you “drop” a frame then the frame time will roughly double.

This is exactly the sort of time where c-compiling shouldn’t really help. Although the context binding calls are potentially slow that isn’t because of Python; the contents of those calls are already being run in C themselves. The compiling benefit comes from routines that make many python calls (e.g. long for-loops) even where the contents of each call is relatively fast. It’s the overhead of the call itself that is avoided by c-compiling.

1 Like

Do you know any possible reasons for this? They seem to occur even when I run python at high-priority. Not much running in the background on this machine.

No, not really. System file indexing? Dropbox checking? Anti-virus or email.
What OS?

Windows 10 and Ubuntu 15.04 show the same pattern (GTX 650 Ti), they are fresh installs with no additional services running. I’ll post graphs later today showing these timing issues. They are periodic (every few ms) and present when using the vanilla Window class and the stereo.Window class, with a much higher frequency with the latter. My stuff makes more immediate function calls so I figured the bottleneck is there, tried to fix it with C extensions which prompted this thread.

EDIT here is a plot of flip timings running on Ubuntu. I have Win10 data too, but I don’t have a copy on me right now. “Standard” is PsychoPy master branch from a week ago, the others are from the most recent “stereo” branch in my repo. Testing order was randomized, the screen was flipped 1000 times as fast as possible and a single text stimuli was drawn every frame to the left and right buffers where applicable. The display is fixed at 60Hz and there was a 120 frame warm-up period.

You can see there is a weird periodic pattern in timing error. Oddly, the next frame after a ‘delay’ is always displayed early by nearly the same difference the pervious frame was late. Any ideas on what may cause this pattern of results?

I don’t know what’s causing this and I suspect it’s machine-dependent (I’ve certainly get machines where it doesn’t occur). Potentially, for instance, a particular graphics card may do some housekeeping every few frame flips when some memory gets full and this is taking an extra few ms.

The recovery in the frame time in the following frame is exactly what you’d expect if the frame is not dropped but there was a delay reading the clock. The physical flips are occurring at a very regular interval so if you miss-time that flip on one occasion you’ll correct for that on the subsequent occasion.

1 Like

That’s reassuring. I have another machine where the stereo implementation hits 85Hz without missing a frame over a considerable span of time. I was just wondering if implementation was at fault and if anyone had insight. Marked as solved, thanks!