Voice Key RT issues

Hi Jon,

The recent PsychoPy update 2023.2.1 has added a plugin called psychopy-whisper for voice transcription using Open AI’s Whisper tool (you can find it under Tools > plug-ins/package manager). This plug-in is promising, but isn’t working quite right for me yet.

I haven’t found much documentation for the plug-in because it is quite new, so I am not sure what the recommended set-up and expected behaviours are.

My Current Test Set-up

  • PsychoPy Version: 2023.2.1
  • Install psychopy-whisper through plug-ins/package manager
  • Update typing-extensions with pip install typing-extensions --upgrade (there is a terminal in the plugin/package manager window that you can use for this step)
  • in the microphone component, the following options are set:
    • transcribe audio
      • select Whisper as transcription backend
    • save speaking start/stop times
  • Testing Details:
    • Test environment:
      • open area with some background noise (not your ideal testing environment)
      • USB microphone
      • audio-jack microphone
    • Test words:
      • short words (e.g., dog, cat, snake)
      • long words (e.g., Supercalifragilisticexpialidocious)
    • Test speech style:
      • speaking as quickly as possible
      • delaying speech until near the end of the trial

Observed Test Behaviour

  • Transcription: Transcription with psychopy-whisper appears to be working, the software does a reasonable job of identifying the spoken words
  • Save Speaking Start Time: The column mic.speechStart is created, but the value is almost always 0.0, which does not seem correct. This could be because my microphone was picking up background noise.
  • Save Speaking Stop Time: The column mic.speechEnd is created and a time is recorded. Not sure if these times are relative to routine, mic component, or mic.speechStart, which will be important to figure out.

I am not confident that speechStart and speechEnd are recording times accurately. This could be an issue with the plug-in, but could also be due to the microphones I was using, or the background noise of my environment.

Example trials

mic.speechStart times mic.speechEnd times
0.0 0.38
0.0 1.5
0.0 2.12
0.0 4.5
1.5800000000000005 3.96
2.2400000000000007 4.86

Hope this helps you, let me know if you end up making further progress.

-shabkr