Reaction Times impacted by improper use of time.sleep(). Can they be salvaged?

Summary: I used time.sleep(3) to sustain a visual stimulus while keyboard RTs were supposedly being recorded. However they weren’t, there is a “deadzone” in the data around that sleep time, and then some very very long RTs are being reported post-sleep. I am hopeful that these erroneously-long RTs can be “repaired” with a simple subtraction of 3 seconds, but don’t know enough about how these times were logged to know if that’s a reasonable assumption.
This is on PsychoPy version 2022.1.4 on a macOSMonterey version 12.4

This issue is similar to this post from 5 years ago.

Full disclosure that this code is very rough, this experiment was a student’s first learning experience. However, 70+ participants have been collected, so any amount of data that can be reasonably salvaged would be marvelous.

The project is a audio-tactile dual-task paradigm, meaning that on each trial, the participant simultaneously A) listens to an audiofile, and B) feels a small motor vibrate in their hand, and responds as quickly as possible on the keyboard. There is also a fixation cross on screen that alternates between white and green. Critically, the fixation cross is white for the duration of the audiofile plus 3 seconds, then turns green. This 3 second “delay” was implemented using time.sleep(3), and I believe this is the cause of my very strange RT distribution, which I have attached below.

(The left panel (ST) is from a simply-coded “practice” where the participant ONLY performs the touch-task, and the fixation cross is not manipulated during the trial. Note the reasonable distribution of the RTs. The right panel (DT) is from the rougher-coded dual-task experiment where the code is managing the audio, vibration, and fixation cross. Note the bimodal distribution of the RTs. There was no experimental manipulation that would create this distribution, I am 100% sure it is a coding mishap, and am 95% sure it’s due to the fixation cross delay.

As far as I’ve sussed out, I believe that any RTs occurring after the audiofile has ended are being sucked into limbo during time.sleep(3) and spat out afterward with inaccurate values. As such, my question is this: for those abnormally-long RTs, could they be “repaired” by subtracting 3seconds? Or is this post-sleep processing too variable and unreliable too repair that upper cloud of data?

Here’s the full code for running the routine, I’ve bolded the hypothesized offending portion but I’ve included it all since it’s admittedly very janky and that might be relevant.

# -------Run Routine "DT"-------
    while continueRoutine:
        # get current time
        t = DTClock.getTime()
        tThisFlip = win.getFutureFlipTime(clock=DTClock)
        tThisFlipGlobal = win.getFutureFlipTime(clock=None)
        frameN = frameN + 1  # number of completed frames (so 0 is the first frame)
        # update/draw components on each frame
            
        # *text_fix* updates
        if text_fix.status == NOT_STARTED and tThisFlip >= 0.0-frameTolerance:
            # keep track of start time/frame for later
            text_fix.frameNStart = frameN  # exact frame index
            text_fix.tStart = t  # local t and not account for scr refresh
            text_fix.tStartRefresh = tThisFlipGlobal  # on global time
            win.timeOnFlip(text_fix, 'tStartRefresh')  # time at next scr refresh
            text_fix.setAutoDraw(True)
            win.flip()
        
        # start/stop sound_LP
        if sound_LP.status == NOT_STARTED and tThisFlip >= 0.0-frameTolerance:
            # keep track of start time/frame for later
            sound_LP.frameNStart = frameN  # exact frame index
            sound_LP.tStart = t  # local t and not account for scr refresh
            sound_LP.tStartRefresh = tThisFlipGlobal  # on global time
            sound_LP.play(when=win)  # sync with win flip
            win.flip()
            
            time.sleep(start_interval)
            motoron()
            time.sleep(duration)
            motoroff()
            
        
        # *key_resp_DT* updates
        waitOnFlip = False
        if key_resp_DT.status == NOT_STARTED and tThisFlip >= 0.0-frameTolerance:
            # keep track of start time/frame for later
            key_resp_DT.frameNStart = frameN  # exact frame index
            key_resp_DT.tStart = t  # local t and not account for scr refresh
            key_resp_DT.tStartRefresh = tThisFlipGlobal  # on global time
            win.timeOnFlip(key_resp_DT, 'tStartRefresh')  # time at next scr refresh
            key_resp_DT.status = STARTED
            # keyboard checking is just starting
            waitOnFlip = True
            win.callOnFlip(key_resp_DT.clock.reset)  # t=0 on next screen flip
            win.callOnFlip(key_resp_DT.clearEvents, eventType='keyboard')  # clear events on next screen flip
        if key_resp_DT.status == STARTED and not waitOnFlip:
            theseKeys = key_resp_DT.getKeys(keyList=["j", "k", "l"], waitRelease=False)
            _key_resp_DT_allKeys.extend(theseKeys)
            if len(_key_resp_DT_allKeys):
                key_resp_DT.keys = _key_resp_DT_allKeys[-1].name  # just the last key pressed
                key_resp_DT.rt = _key_resp_DT_allKeys[-1].rt
                # was this correct?
                if (key_resp_DT.keys == str(correct_key)) or (key_resp_DT.keys == correct_key):
                    key_resp_DT.corr = 1
                else:
                    key_resp_DT.corr = 0
        
        **# *text_fixRepeat* updates**
**        if text_fixRepeat.status == NOT_STARTED and t>= (sound_LP.getDuration() + sound_LP.tStart):**
**            # keep track of start time/frame for later**
**            text_fixRepeat.frameNStart = frameN  # exact frame index**
**            text_fixRepeat.tStart = t  # local t and not account for scr refresh**
**            text_fixRepeat.tStartRefresh = tThisFlipGlobal  # on global time**
**            win.timeOnFlip(text_fixRepeat, 'tStartRefresh')  # time at next scr refresh**
**            time.sleep(3.0) #this is the pause after the sentence before the green _fixRepeat appears**
**            text_fixRepeat.setAutoDraw(True)**
        
        # *key_resp_ADV* updates, advancing via spacebar to next trial
        waitOnFlip = False
        if key_resp_ADV.status == NOT_STARTED and tThisFlip >= 0.0-frameTolerance:
            # keep track of start time/frame for later
            key_resp_ADV.frameNStart = frameN  # exact frame index
            key_resp_ADV.tStart = t  # local t and not account for scr refresh
            key_resp_ADV.tStartRefresh = tThisFlipGlobal  # on global time
            win.timeOnFlip(key_resp_ADV, 'tStartRefresh')  # time at next scr refresh
            key_resp_ADV.status = STARTED
            # keyboard checking is just starting
            waitOnFlip = True
            win.callOnFlip(key_resp_ADV.clock.reset)  # t=0 on next screen flip
            win.callOnFlip(key_resp_ADV.clearEvents, eventType='keyboard')  # clear events on next screen flip
        if key_resp_ADV.status == STARTED and not waitOnFlip:
            theseKeys = key_resp_ADV.getKeys(keyList=["space"], waitRelease=False)
            _key_resp_ADV_allKeys.extend(theseKeys)
            if len(_key_resp_ADV_allKeys):
                key_resp_ADV.keys = _key_resp_ADV_allKeys[-1].name  # just the last key pressed
                key_resp_ADV.rt = _key_resp_ADV_allKeys[-1].rt
                # a response ends the routine
                time.sleep(0.5) #pauses a quick bit after the space press
                text_fixPause.setAutoDraw(True)
                win.flip()
                time.sleep(1.5) #pauses on the white fix before the sentence starts
                continueRoutine = False
        
        
        # check for quit (typically the Esc key)
        if endExpNow or defaultKeyboard.getKeys(keyList=["escape"]):
            core.quit()
        
        # check if all components have finished
        if not continueRoutine:  # a component has requested a forced-end of Routine
            break
        continueRoutine = False  # will revert to True if at least one component still running
        for thisComponent in DTComponents:
            if hasattr(thisComponent, "status") and thisComponent.status != FINISHED:
                continueRoutine = True
                break  # at least one component has not yet finished
        
        # refresh the screen
        if continueRoutine:  # don't flip if this routine is over or we'll get a blank screen
            win.flip()
  • time.sleep() should never be used in PsychoPy scripts. Standard PsychoPy practice is to maintain an active drawing and event loop (i.e. drawing to the screen and checking for responses) on every screen refresh, regardless of whether the visual stimuli are apparently static. And time.sleep() is a Python-generic function, which isn’t really suitable for real-time experiments. If you ever do need such functionality, PsychoPy provides the core.wait() function, which tries to stop the rest of the OS interfering with the requested time period. But regardless, neither should be used in a Builder-generated script like this. The entire script is predicated around that drawing/event loop running on every screen refresh (typically at 60 Hz), and you’ve just put that loop into suspended animation for 3 s, breaking its inherent logic.
  • You also seem to be calling win.flip() manually. Builder scripts are predicated on this being called just once per screen refresh cycle. This again is breaking the inherent control flow of the code. If you want to use pauses and have control of the window flipping, you really need to write your script from scratch, using your own logic, rather than trying to shoe-horn this approach into a script generated by Builder - it can break things in all sorts of ways.
  • Builder .psyexp files provide graphical code components that allow for inserting code that runs at appropriate times (e.g. at the start of a routine, on every screen refresh, at the end of a routine, and so). One should avoid ever editing the generated .py file (which I suspect is what has happened here), precisely to avoid this sort of situation.
  • If you check the code, you’ll see that the keyboard component checking code is based on the value of t. t is a value that is supposed to be updated on very screen refresh, but you’ve stopped that happening by inserting long pauses before this check even runs. So t will be out by 3 s or more at times, because its 60 Hz update cycle has been broken.

Is it possible that this could be corrected? Maybe, but that would require reading your entire script and trying to tease out the intended logic and how it has been disrupted.

Hopefully someone here might volunteer to do that, but it won’t be trivial. You should post the entire .py file, and ideally any associated .psyexp file that it might have been based on.

Lessons for the future:

  • Never use win.flip(), time.sleep(), or core.wait() in any Builder-generated script unless you really know what you are doing (and realistically, not even then).
  • ALWAYS test a a script end-to-end before running an experiment. i.e. use it to gather pilot data, and then, crucially, analyse that data to check that it is performing properly. Development of the analysis pipeline is something that should happen in concert with the task development, not after the data has been collected.

Howdy Michael,

In this comment I have attached the entire .py file in case anyone likes problem-solving and being a detective. I can appreciate the nontrivial nature of that task, so unless/until that happens I will exclude the excrescently long RTs.

Thank you for the response, truly. I will continue to read up on the things you’ve mentioned here. Duly noted that the Builder → Coder approach was too extensively applied here, especially without sufficient background knowledge.

I will note that we did conduct and analyze a pilot; there simply weren’t enough data points to notice the pattern until this latest interim analysis. Data collection has been paused until this is sorted, and any future projects would be subjected to much more extensive piloting, with a stronger focus on inspecting the data exploratorily - it was in fact the analysis pipeline itself that hid the issue for so long. C’est la vie. Lesson learned.

Thank you again for the input and thanks in advance if anyone else ever hops in this thread.
VTDT_ppf.py (38.4 KB)

I took a quick look. I think there are two issues at play:

  1. When the response window opens (this is true of both ST and DT trials)
  2. Whether the response occurs before or after the end of sound playback.

However whether the data are recoverable depends on some parameters that I don’t have enough information about.

The way the code is structured, the “time 0” for the keyboard RTs is going to be delayed by the length of start_interval and duration because of these lines:

There’s a similar wait period applied to test_fix in ST trials, and it also happens before the keyboard response is initialized.

            if text_fix.status == NOT_STARTED and tThisFlip >= 0.0-frameTolerance:
                # keep track of start time/frame for later
                text_fix.frameNStart = frameN  # exact frame index
                text_fix.tStart = t  # local t and not account for scr refresh
                text_fix.tStartRefresh = tThisFlipGlobal  # on global time
                win.timeOnFlip(text_fix, 'tStartRefresh')  # time at next scr refresh
                text_fix.setAutoDraw(True)
                win.flip()
            
                time.sleep(start_interval)
                motoron()
                time.sleep(duration)
                motoroff()

This all happens before the key_resp object is initialized at all, so the time the RT is being measured from is not actually the start of the trial, but rather after this period of time. So that might be a problem for the accuracies of RT across the board.

The second and third issues are more complicated. There’s a big problem with the data you are showing, which is that cluster of responses between 0 and 1.5 seconds. It depends on the values of start_interval, duration, and the length of the audio file in sound_LP, which I don’t know.

Here’s the part I’m concerned about: The 3 second delay seems like it’s being tripped by the code you highlighted, which depends on this condition:
t>= (sound_LP.getDuration() + sound_LP.tStart)

However, sound_LP.tStart is set before the first two time.sleep calls I mentioned, the ones with start_interval and duration, and I don’t know how those values are related to the length of the audio file in sound_LP. That leaves us with a few different scenarios.

If start_interval + duration >= sound_LP.getDuration() on every trial, then every response should be delayed by 3 seconds, unless it occurs on the exact frame that the audio ends, in which case the RT would be effectively 0. There aren’t any of the latter, but this is one way to get a 3 second delay.

If start_interval + duration <= sound_LP.getDuration() on every trial, the data don’t make sense. If the difference is around 1.5 seconds then that could explain the responses in between 0 and 1.5 seconds in the DT trials: those are responses that occurred during that window. However, if the 3 second delay kicks in after that, I would expect there to be a 3 second gap, not a bunch of responses picking up at 3 seconds. Essentially I would expect the upper cluster to start at 4.5 seconds.

However, the value of sound_LP.getDuration - start_interval + duration varies from trial to trial then it could explain the pattern we see, which is that on the trials where there is a gap between the two, some responses fall in that gap (the 0-1.5s RTs), and when there isn’t, there’s a 3 second delay on everything. But, that would also mean that the RTs are completely variable from trial to trial based on the trial parameters themselves.

That said one thing I honestly do not know is whether the key buffer even records things that happen during time.sleep(). In other words the RTs greater than 3 seconds could mean there was a window where participant responses were not recorded at all, and those are just responses to the appearance of the fixation. I bring this up because again, there’s that block of responses between 0 and 1.5s, and if those are responses before the fixation appears but while the sound is still playing, then I don’t know what those mean.

What a thorough response, thank you so much for taking a look at this.

It seems I’ve left out some very helpful information. I think these clarifications will help immensely.

so the time the RT is being measured from is not actually the start of the trial, but rather after this period of time.

That was intentional. The collected RTs are meant to be the time between the motor turning off and the keyboard being pressed, so that is why the response window opens immediately following the cessation of the motor. As far as I can tell, the response window timing is accurate.

However, the value of sound_LP.getDuration - start_interval + duration varies from trial to trial then it could explain the pattern we see, which is that on the trials where there is a gap between the two, some responses fall in that gap (the 0-1.5s RTs), and when there isn’t, there’s a 3 second delay on everything. But, that would also mean that the RTs are completely variable from trial to trial based on the trial parameters themselves.

This is the case.The start_interval+duration (the “motor” task) will always be complete before the audiofile, but the lengths of both these tasks differs across trials.

OK, then in terms of recovering data here’s what I think the situation is.

First, you need to figure out what happens with time.sleep and keyboard responses. This is something you can just test in a ‘toy’ experiment where you have a single trial with a keyboard response where you start the keyboard response then insert a time.sleep, and see what happens when you press the key during the sleep period. You should try a couple permutations, like pressing and releasing before the sleep period ends, and pressing but releasing after the sleep period ends, and pressing twice. I can predict one of two things:

  1. It won’t record keypresses that occur during the sleep period at all
  2. It will record them but add the sleep period to the recorded RT.

If (1), then you’re basically stuck, because you don’t know whether those RTs represent their actual response or a second response they made because their actual response was ignored. If (2), then you have RTs you might be able to use, but there’s still a major problem.

In short, you won’t be able to distinguish RTs in DT trials that were genuinely longer than 3 seconds (of which there are a handful even in the ST trials) from those that have been delayed by 3 seconds. That’s likely not a huge proportion of your data, but because you truly have no way to distinguish the two, it means that even applying a 3-second correction to all RTs above 3 seconds won’t give you truly accurate data, because there will be a certain number of those actual slow responses mixed in.

So I’m sorry to say that there’s almost certainly no way to recover most of the data from the DT trials in this experiment. The RTs below 3 seconds in the DT trials are probably accurate but only represent a subset of trials where the sound duration is long. You can probably make some sense of those trials, if that sub-group is still interpretable. RTs above 3 seconds can’t be distinguished between delayed and genuinely slow, and even though genuinely slow responses were likely a minority, you have no way to distinguish them.

Sorry to give you such a pessimitic outcome. I think we’ve all had a learning experience like this at one point or another, it’s the kind of mistake you never make twice.