Timing issue with PsychoJS

I have run a visual masked priming experiment on Pavlovia in the last couple of days. The script also include a way to store actual time of presentation of each routine (i.e., forward mask and prime word), as discussed in this thread. I basically need to be sure that:

  1. the forward mask (########) lasts 500 ms, and
  2. the prime word lasts 33 ms.

When coding the script, I figured it would have been better to set up the routine times in seconds (rather than in frames), since subjects may be using monitor with different refresh rates.

When I tested the experiment on my computer (Mac, using Safari/Chrome/Firefox), the times were more or less exact (with oscillations of a couple of ms). However, I noticed that for half of my subjects (70 in total) the prime time greatly varied. The time prime values recorded in the data file are the following (from the R console):

[1] 0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007 0.008 0.009 0.010 0.011 0.012 0.013 0.014 0.015 0.016 0.017
[19] 0.018 0.019 0.020 0.021 0.022 0.023 0.024 0.025 0.026 0.027 0.028 0.029 0.030 0.031 0.032 0.033 0.034 0.035
[37] 0.036 0.037 0.038 0.039 0.040 0.041 0.042 0.043 0.044 0.045 0.046 0.047 0.048 0.049 0.050 0.051 0.052 0.053
[55] 0.054 0.055 0.056 0.058 0.059 0.060 0.061 0.062 0.063 0.064 0.067 0.068 0.070 0.072 0.079 0.080 0.083 0.097
[73] 0.102 0.131 0.133 0.142 0.145

So there were trials whose prime ended up not showing up at all…! This is not really ideal, especially for time-dependent design such as masked priming. This could have been due to tasks that were being run on parallel while taking the experiment, which is something we can’t really control over (can we?). I was then wondering if there is a feasible solution to at least limit the damage (that is, reduce the varying range), if not to avoid the issue altogether.

Does anyone have any suggestion for how to deal with this?

I am aware that the issue may possible not be easily solved because it substantially depends on the subjects’ computers and operations being run in parallel with the experiment, but I was wondering if there is a way to do some damage control. I was thinking a couple of solutions:

a) Maybe increasing the wanted routine duration may make it more stable throughout the experiment. In my experiment, I have used 33 ms but another possibility could be to set it up at 50 ms.

b) I have set up the routine duration in s because I wanted to make sure that it could be held constant independently of the refresh rate of the monitor used by subjects. Do you think that setting up the duration in number of frames may make it more stable? I’d be okay with this as long as there is a reliable way to measure the refresh rate of the monitor in which the experiment is being run.

Thanks!

Hi @rob-linguistics13, does the above show the time slipping on every trial, with a 33ms presentation duration ranging between 0 and 145 ms?

We have ran timing tests, and can confirm that visual stim durations on a mac (using Safari, Chrome and Firefox) are accurate within 2ms with 5ms standard deviation.

Is this the experiment built using custom code? Also, which version of PsychoPy are you using?

No, I just reported the entire range of durations. I am attaching a density plot of the prime durations recorded from 140 subjects. (In the plot, the green line is the median, the red line is the mean.)

Rplot

The distribution is centered on about 30 ms, but the fluctuations are consistent throughout the whole experiment. I am trying to figure out the source of the problem (and possible solutions/way-arounds). It does not seem to correlate with/depend on response times or number of trials presented.

The experiment was first created on PsychoPy (ver. 3.1.1, as far as I can remember…it was about a couple of months ago) but then I have worked directly on the JS code (e.g., I created clocks for each routine to have reliable routine durations printed on the data files). I can provide the script if it would help.

Just a follow-up on this issue.

Yesterday I ran the same experiment but the routine durations set in number of flips (instead of seconds). I am aware that the refresh rate detection function is still under development and some of the results might be skewed because of the different monitors may be used by the subjects, but I still wanted to share what I found.

First, the distribution of the routine duration in flips (in this case the prime durations, which were set up at 2 flips = ~34 ms) was very similar to the distribution of routine duration in seconds.

Second, fluctuations seem to be dependent on the combination of OS and browser. For yesterday’s experiment, I implemented the code for storing information about OS and browser on the data file. The table below shows, the descriptive statistics of the prime durations across the different OS and browsers.

  OS           browser  subj  mean    sd   min   max
1 Linux x86_64 Chrome      2  32.9  13.6    14   153
2 Linux x86_64 Firefox     1  36.1  7.92    27    64
3 MacIntel     Chrome     12  27.6  3.63     8    73
4 MacIntel     Firefox     1  29.4  1.44    25    35
5 MacIntel     Safari      1  24.8  1.96    21    32
6 Win32        Chrome    100  31.0  17.9     3  2812 // !!!
7 Win32        Firefox    21  26.3  6.56     7    90

It appears that the combination of Windows and Chrome makes routine durations fluctuate more than any other combination. Unfortunately the same OS/browser info was not store for the experiment with the durations set in seconds, but I still hope this piece of information may be helpful the developers to better understand how to improve duration reliability on PsychoJS. Datasets and scripts are available upon request.

1 Like

Essentially, your timing measures match up roughly with what we’re seeing, and with what I would sort of expect, that in the browser timing will just never be quite as good as in the local device. It will be browser/platform dependent.

Why is a browser worse and why might it be browsers dependent?

In Python when we flip the opengl frame buffers we directly make the operating system call requesting the graphics card to make the change. In JavaScript we effectively call the browser requesting the flip, and the browser then calls the operating system. That additional middle-man is necessary going to add a very small delay and possibly a much larger one depending on how it was coded in the application. It could also be that, say, Firefox did a better job of their code on one operating system and Chrome did a better job on another, so you could also see browser/OS interactions.

How long was my stimulus actually present?

Note that some of the variability you’re measuring above is an artefact of the measurement. I’m assuming that you’re measuring this based on reported flip times, which has some variability too. So your plot is confounding the error in reporting with the error in physical presentation. Luckily those are generally fairly different. The error in physical presentations must be in pretty much perfect units of frame periods. Just as you can’t deliberately present your stimulus for 1/3 of a frame, the browser can’t do so by mistake either! So the physical stimulus was actually presented for 0, 16.67, 33.3 or 50ms (assuming you haven’t used a variable rate monitor like GSync or FreeSync). Any scatter you see around those peaks is certainly caused by variance in the measurement.

Actually to complicate things a bit further you can also get what appears a shorter duration if the framebuffer is ‘synchronized’ but not ‘blocking’.

Ultimately, to know whether your stimulus lasted the duration you expected, especially in a browser, you would need to use a photodiode and measurement hardware. I expect when you do that the precision will improve a little (the scatter around the peaks will dissappear and some of the flanking peaks will be reduced) but I do also expect you to see some dropped/gained frames when running in a browser.

We haven’t yet seen any online experiment system manage zero frame drops across all browsers

1 Like

Thank you for this, @jon, your explanation helped a lot. A few comments follow.

The measurements reported were calculated by calling the function clock.getTime() (and the PsychoJS counterpart) were used to measure the actual duration in second for each routine as described here. The same method was used to measure stimulus durations regardless of whether stimulus durations were set up in seconds or in flips. So, I am not sure the variability I am not sure the variability is an actual artifact of the measurement itself, unless I misread the way getTime() works.

Can you expand on this? Is that an option on PsychoPy/PsychoJS?

I am no expert on any of this, so I wonder whether any of this can be set up for online experiments.

Thanks!

The issue is about when getTime() is called. To be accurate it has to be as close as possible to the window flip but if your computer does something else (on another computing process) between the flip and the getTime() call then this will show as variability. As I say, the screen itself is incredibly consistent in its frame durations, so any variance on the order of a few ms is coming from this source not the physical stimulus.

It is an option in PsychoPy, yes. Try running the coder view demo timing> with waitBlanking set to True or False

In PsychoJS we try to do the same but we’re limited (again) by the fact that the actual flip call is made by the browser.

To reiterate, no. We’ve tested all the major packages and found that none are capable of it yet, although on some browser combinations it’s very good

It makes sense, but that would still not explain the great variability in the stimulus durations. I guess this is a drawback that, as of now, we have to grapple with. To be sure only the trials with actually exceeding durations were removed, we set a threshold such that the durations could vary from the wanted duration (e.g., 2 flips/34 ms) by 8 ms (=half of the refresh rate).

So, all in all, I feel like there’s not much left to do to ensure stable stimulus duration, at least for now.

Thank you for the information!
I was wondering how much would be ‘a few ms’ of delay? In my experiments I am using one-frame (~17ms) stimulus displays and the distribution of the timing values recovered from the log files (autoDraw = true to autoDraw = false) varies from 5 to 25ms, with no peak around 17ms; would a good rule of thumb be that it’s likely a measurement error as long as the logged values are not around multiples of the frame rate?
Thank you!

@ade_mh You may want to follow this post I made and carefully consider what the log file timestamp actually records.

See, Response Times in Log Files, Key Press vs. Release, and EXP log messages

1 Like