Hi Jon,
The recent PsychoPy update 2023.2.1 has added a plugin called psychopy-whisper for voice transcription using Open AI’s Whisper tool (you can find it under Tools > plug-ins/package manager). This plug-in is promising, but isn’t working quite right for me yet.
I haven’t found much documentation for the plug-in because it is quite new, so I am not sure what the recommended set-up and expected behaviours are.
My Current Test Set-up
- PsychoPy Version: 2023.2.1
- Install psychopy-whisper through plug-ins/package manager
- Update typing-extensions with
pip install typing-extensions --upgrade
(there is a terminal in the plugin/package manager window that you can use for this step)- This was the solution to some errors I was getting during my first test attempts
- See also: psychopy-whisper setup discussion
- in the microphone component, the following options are set:
- transcribe audio
- select Whisper as transcription backend
- save speaking start/stop times
- transcribe audio
- Testing Details:
- Test environment:
- open area with some background noise (not your ideal testing environment)
- USB microphone
- audio-jack microphone
- Test words:
- short words (e.g., dog, cat, snake)
- long words (e.g., Supercalifragilisticexpialidocious)
- Test speech style:
- speaking as quickly as possible
- delaying speech until near the end of the trial
- Test environment:
Observed Test Behaviour
- Transcription: Transcription with psychopy-whisper appears to be working, the software does a reasonable job of identifying the spoken words
- Save Speaking Start Time: The column
mic.speechStart
is created, but the value is almost always 0.0, which does not seem correct. This could be because my microphone was picking up background noise. - Save Speaking Stop Time: The column
mic.speechEnd
is created and a time is recorded. Not sure if these times are relative to routine, mic component, or mic.speechStart, which will be important to figure out.
I am not confident that speechStart
and speechEnd
are recording times accurately. This could be an issue with the plug-in, but could also be due to the microphones I was using, or the background noise of my environment.
Example trials
mic.speechStart times | mic.speechEnd times |
---|---|
0.0 | 0.38 |
0.0 | 1.5 |
0.0 | 2.12 |
0.0 | 4.5 |
1.5800000000000005 | 3.96 |
2.2400000000000007 | 4.86 |
Hope this helps you, let me know if you end up making further progress.
-shabkr