| Reference | Downloads | Github

What would it take to replace pyo?

I’m just curious, because I’m not sure I would be able to find the time to do this in the near future and would have a lot to learn, but was hoping someone could quickly summarize what a sound code component needs to do?

Correct me if I’m wrong, but it seems like pygame for sound is becoming obsolete, and pyo brings nothing but headaches. If we were to make an in-house backend (maybe built with PyAudio?) to replace pyo or be an additional option, what are the requirements and priorities?


##Options beyond pyo
An alternative backend using PySoundCard has already been inserted
and that should have fewer “issues” than pyo because it’s much simpler. You can start using it already (set your backend preference to ‘PySoundCard’) but it isn’t heavily tested.

I’ve also since heard of another lib called SoundDevice written by @matthias.geier who had also been involved in PySoundCard earlier on
I have a feeling this might be better than PySoundCard but they’re very very similar (sharing quite a bit of code). To use this you could take the existing psychopy code for PySoundCard and tweak it.

Either of the above libs are really easy to install with pip and I think they’re both compatible with Python3. Both are based on CFFI interfaces to portaudio.

Improving performance

One thing that I haven’t explored yet is to set up the sound stream/callback at the beginning of the script and simply send it zeros and then, when we call we start sending it the proper sound values. (Currently we populate the sound with the real values and start the sound by starting the stream but that apparently might carry overhead).

Ultimately for performance I wonder if we need to switch away from portaudio-based solutions though and write platform-specific options. Portaudio is convenient but ultimately it’s a wrapper around the platform specific libs like coreaudio and wasapi. But trying the PySoundCard or SoundDevice options with this different way to start() would be an easier fix than that if it reduces latencies enough.

@daniel.e.shub might be interested in being involved in this endeavour too


Also, has grown huge and I think this is a place that would genuinely benefit from refactoring into a folder with several files, one for each backend. I’m sure @richard would enjoy that :wink:


Did I just hear someone say “dangerous, breaking, and potentially backward-incompatible changes”?? Sign me up already! No but seriously, currently has higher priority for me :wink: And I don’t like sound cards, you know… (I’m all for real DAQ boards instead!) But as always, I’m happy to help as time permits! :slight_smile:

1 Like

It would be nice if sound was easy, but I am not sure it will ever be. I think the first steps would be to understand use cases, required features, and then how to test that it works.

I would guess most people use sound with pre-made audio and video files. A small group might create numpy type arrays with their sounds. Then there might be the odd ball like me who would like to generate the sounds in real, or near real, time. While most people will not be too worried about latency, a sizable group of people are going to be concerned about “onset” latency (the time between asking for the sound to start and the sound starting). A portion of the people who want to make their sounds in real time are going to care about ongoing latency (how much sound needs to be buffered).

Catering to the group that want sound with their video is pretty easy. Handling the low latency real time crowd, who might want to use a DAQ instead of a sound card, is going to be harder.


Thank you guys for the info, not sure which to pick as an answer (or if I should pick an answer).

OK, I’ve made some progress here. I’ve now refactored our into a folder with separate files for each backend to make things easier to navigate.

Then I’ve added a backend for sounddevice. Various plus points to using sounddevice:

  • pure python and py3/py2 compatible
  • easy install with pip
  • actively developed (and the developer @matthias.geier has joined psychopy discourse :slight_smile: )
  • @piotr had found good performance on it and my recent addition seems to agree, although I’m using a different method and a less high-power sound card to test

I’ve written this to use a different method for starting the sound. Our existing backends are aiming to preload the sound onto the stream and the doing “start” on the stream when we press play but that might be what’s causing bad timing - the time to start the stream is sometimes v slow.

@daniel.e.shub told me long ago and @matthias.geier said the same recently that we should instead open and start the stream immediately but pass it zeros in the callback. When the user says play() we start passing in the sound block-by-block using the callback. That’s what I’ve done here (and could be done with some of the other backends if we cared to try).

So this new backend will create a stream for sounds according to their format (sample rate, stereo etc) and pass zeros until we need to play. So the stream doesn’t get changed just the data and that depends purely on sample size. I’m finding much better latencies than we had on the old method (using pyo): around 5ms on a basic iMac machine.

  • If we play from a file then we read that file in gradually, but only a few samples at once
  • If we play a tone then we generate the values on each call to the callback (not precalculating the length of the tone, so infinite sounds are valid)
  • If we have multiple sounds with the same format then they can share a stream (they get added together by numpy during the callback)
  • We could also use this to make movies stream the audio rather than reading it and converting to a massive numpy array as we currently do (will need some work with moviepy backend for that). Apparently big movies can cause memory errors at the moment.

Not yet implemented:

  • loops - I need to tell it how to handle reaching the end of the sound and starting again
  • sound from array (right now it’s just tones or files)

Not yet tested (like, at all):

  • linux or win32 at all (but I expect them to be OK)
  • what happens when multiple streams are being opened. Is there a limit to how many concurrent sound formats can exist?
  • is garbage collection working (do we get memory leaks if a sound is created repeatedly)? This has bitten us in the past if the reference counting doesn’t fall to zero

Oh, and we can certainly add the option to select a particular audio device, as Dan requested, but I thought I would test the other aspects first.


It’s nice to see that you actually are able to use smaller latencies with this method!

I saw in your implementation that you are reading from a sound file within the callback. This might seem to work well if the OS buffers nicely, but at some point reading from the file might take more time than is available in the time frame of the callback. I think you should implement some kind of buffering to avoid underflows.

Speaking of which, I think it is really important to always check the status argument of the callback function which will tell you if an xrun occurred. This should never happen during an actual experiment, because it might result in an audible click!

You mentioned “streams”; are you talking about PortAudio streams?
If yes, you should note that multiple streams are not officially supported by PortAudio.
They might work for some host APIs on some systems, but if you want your code to be platform-independent, you should limit yourself to a single stream.

1 Like

Yes fair enough. Buffering into memory from the file can be separate to the hardware sound buffering I guess. In the past I’ve typically loaded an entire soundfile to memory before playing and I’m keen not to do that (some people are now playing the sound tracks of long movie files so we need some streaming). I’ll arrange for an intermediate step of filling a larger-but-not-enormous memory buffer for the sound.

Yes, will do that in the full implementation. So far I’ve just been listening for underruns (and checking oscilloscope traces) and on my mac it’s been fine but I will check for them in code too and log warnings.

Ah, that’s a shame! :frowning:. I was wondering what the limits are but was definitely hoping for limit>1! The issue is that users don’t know to code all sounds to one frequency/channel spec. I guess we’ll just have to initialize the stream according to the first sound that played (with an option for users to force a particular format in advance). The annoying thing is that we then have to check/recode subsequent sounds manually (a 96khz sound played after a 44.1khz sound). I was prematurely excited thinking we could do away with that.

Thanks for your help though. I still think this is a good advance for us.

Hey Jon et al., I’ve been having reliability issues with pyo (random seg faults when recording multiple files using the mic), and so was looking into the sounddevice as a replacement and was happy to see it implemented as part of Psychopy 1.85. To get it to work I had to manually set
sound.audioDriver = 'portaudio’
That works for me (Mac OS 10.11.6 with the 64 bit Anaconda install).

Are there plans to implement the sounddevice backend for the microphone? Looks like it still uses pyo (and that’s what was giving me the problems in the first place).

Matthias: Is the only way to have recordings start and stop on command using an inputStream with a callback? Os is there another way to terminate the recording before the time is up?



I don’t have time to work on the sounddevice replacement for mic I’m afraid. Would love to see it happen though.

Well, generally there is the option of using a so-called “blocking” stream (by not providing a callback method), but I don’t think this has any advantages over using a callback function. One of the disadvantages is that it is not available for all host API (which IMHO is already a deal-breaker).

When using a callback function, there are still two ways to stop some sound from being played:

  1. set some state in the callback so that it (the callback) knows it should stop playing
  2. stop the whole stream

I think the first option (as @jon has implemented it) makes more sense.

I would use a single stream for everything, both playing and recording, and keep that same stream active all the time.

BTW, I mentioned in Low sound latency with sounddevice module & wdm-ks driver that it would be interesting to try to implement the audio callback in C for more stable, predictable and responsive performance.
In the meantime, I’ve started to do exactly that: This is still very much work in progress, but it should already support most of the use cases needed for PsychoPy. I would love to hear what you think about it!

Hello everyone,

I just realized that I have about 250 ms lag between sending out a sound using with pyo and when it’s actually arriving in the participant’s ears. This lag is deducted from the participants’ auditory N100 response in the EEG. I’m using Standalone PsychoPy 1.84.2, pyo 0.8.0, on Windows 10.

From this thread I understand that PySoundCard should already be available. As per Jon’s comment I tried:

from psychopy import prefs, sound  
prefs.general['audioLib'] = ['PySoundCard']

but the output remains:

pyo version 0.8.0 (uses single precision)

I also think I understand that SoundDevice is available for PsychoPy 1.85, which has not officiall been released yet. If I wanted to try the pre-release version nonetheless, how would I download and install it?

Thank you all for your help!

Not sure why the lag is as bad as that. Normally the lag is around 30ms for pyo. Are you trying to load and play the sound immediately?

Two issues for using pysoundcard:
you need to specify it before importing sound and you need to specify it with no capitals:

from psychopy import prefs
prefs.general['audioLib'] = ['pysoundcard']
from psychopy import sound

For trying 1.85.0 you can download it now (from the releases page) and give it a go but a few issues have already been found with it in the sound libs and those need fixing. i.e. it was already released but not announced because I don’t want everyone flocking to it while it’s in a buggy state.

Thanks for your input @matthias.geier

Looking forward to hearing about how the python-mixer project gets on, especially if it means that performance is better in terms of cycles to the callback. To reiterate the key thing that our users will care about: they want to know that when they issue that it starts coming out the speakers as fast as possible. Hopefully remixer will keep these as high priority :slight_smile:

A running-always approach is good. The problem with single-stream approach is knowing in advance what the user wants and whether all their sounds will be a matching sound format. What I’ve implemented using sounddevice so far checks the number of channels and sample rate and:

  • If a stream exists with matching spec then this is used (with mixing of the sounds). If such a stream hasn’t yet been created
  • If no stream has been created then one is created/stored and used
  • If a stream already exists with a different rate/channel spec then
    • on Mac we just create a new one and use multiple streams (seems to work fine)
    • on windows we issue an error and stop

Some transforms at runtime are easy enough to handle. If the user starts with a stereo stream but provides a mono sound we can just paste that to both channels and all is fine but the opposite might be a problem and changing sample rate at runtime might suffer in both speed and quality.

I wonder what solutions python-rtmixer will have

Any improvement in performance is by now purely hypothetical. I currently don’t have the means to measure latency. Nevertheless, low latency and predictable performance are both high priority goals.

The main idea of the rtmixer module is that its callback function is implemented in C and it never touches Python’s GIL and it is not affected by the garbage collector. Therefore it should be theoretically possible to reduce the block size further than it is possible with a callback function written in Python. Again, I don’t have measurements to back that up.

On the Python side, there is still some code that has to be executed in the interpreter, and some memory has to be allocated for a so-called “action struct”. But once that “action struct” is placed in the “action queue”, the Python interpreter is not involved anymore.

When starting a stream in rtmixer, the sample rate, the number of channels and everything else has to be known and cannot change during the runtime of the stream.
However, the given number of channels is an upper bound, and on each individual “play” or “record” action, the affected channels can be specified independently. For example, there is currently a function with this signature:

Mixer.play_buffer(self, buffer, channels, start=0, allow_belated=True) -> action

The channels argument allows to specify a list of arbitrary channel numbers where the channels of buffer will be played back. The same option is available for recording.

The default setting start=0 means that the sound is played back as soon as possible, but if reduced jitter is desired, one can use a given start time in the future (using Mixer.time as reference), and the sound will be played at exactly that time. If the chosen time is not far enough in the future, the option allow_belated decides if the sound will be played back even if the time has actually already passed.

I think it’s not a problem to force the user to decide a priori on a maximum number of channels that should be used, but I’m not sure about different sample rates.
However, I consider any sample rate conversions out-of-scope for rtmixer (as stated here: This is something that could be done separately on your side.

Duplicating a mono signal to multiple channels is currently not supported by the channel argument, but I can probably add that feature. I think this would make sense. Currently, the audio data would have to be duplicated manually.

BTW, you should be careful when opening a mono stream, since some host APIs duplicate that stream on their own to create a stereo signal, others (I think JACK and ASIO) don’t. For that reason, it’s probably best to choose stereo as a default.

The rtmixer module is quite new and not really tested at all, but on the other hand, since it is that new, the API can still be shaped to fit better for the use in PsychoPy (while still staying a more general-purpose library).

If rtmixer isn’t enough for your needs, you could still use it as an example for how to implement your own very specialized C callback function for PsychoPy. After all, I initially intended it merely as a code example …