Question about staircase calculation (am I using the analysis function correctly?)

Hi all,
I’m using a staircase procedure which I’ve modified for auditory presentation of sound files. Participants hear sounds from a continuum of sounds ranging from /ba/ to /da/ - on each trial they choose whether the sound they heard was a /ba/ or a /da/. I want to find the perceptual boundary between these sounds.

The staircase runs perfectly well and I seem to get accurate boundary measurements, but I’m wondering about my analysis. I’m using the built in Psychopy analysis functions. I considered the set-up described above to be a “Yes/No” task (since the participant only makes a categorization of one stimulus on each trial - and I interpreted this to be equivalent to the participant saying “yes that was /ba/” or “no, that was not /ba/” for each stimulus) and so I set the “expectedMin” to 0 as suggested in the documentation for Yes/No tasks. I just wanted other people’s opinions as to whether I have interpreted this correctly and whether the task described above would instead be equivalent to a 2AFC task, in which case “expectedMin” should be set to 0.5.

For reference, here’s the analysis function being used in my code.

#set expectedMin to zero for Yes/No (or PSE). Set to 0.5 for 2AFC
fit = data.FitLogistic(finalLevels, finalResponses, expectedMin = 0.0, sems = binCountf)

A Yes/No task is generally a judgment of whether a stimulus was present or absent on a given trial. i.e. either 0 or 1 stimulus is presented on each trial, and you must indicate it if it was present or not. The stimulus is always of the same sort (e.g. /ba/), which is either presented, or it is absent.

In your case, 1 stimulus is presented on every trial, but it can be of two sorts (sound /ba/ or sound /da/), and you must indicate which. As the name implies, this is a Two Alternative Forced Choice. It is not a yes/no, present/absent task, because a stimulus is always present. It is two-alternative, because there are two candidate stimuli (rather than the presence or absence of one stimulus). It is forced-choice, as a response must be made, and no option is available other than the two candidates.

Then think about what happens in the pathological cases. In the yes/no task, a person can respond no on every trial. Their hit rate will be zero, but the false alarm rate will also be zero, placing them on the line of no discrimination in ROC space.

In the 2AFC task, the equivalent is that the person always responds, say, /ba/. This means that they will be correct 0.5 of the time (as they must get all of the /ba/ trials correct) and incorrect 0.5 of the time (saying /ba/ on all other trials, when the sound was /da/). This also means that they show no discrimination.

Hopefully that shows how the expected minimum must differ across the two types of task? i.e. in the yes/no task, the hit rate and the percentage correct can range from 0 to 100%. In the 2AFC, there is a bedrock of 50% which you can’t really drop below in the long-run if you are truly not-disciminating. In the long run, to get below 50%, you would actually need to be able to discriminate between the stimuli, but then give the anti-answer.

PS there is some debate about whether many of us use 2AFC when we shouldn’t. i.e. strictly I think we are supposed to reserve this term for tasks in which the subject must report in which interval a given stimulus appeared (i.e. was it the first or second stimulus in a trial), but most people use it in the case where a single stimulus is presented and we report which it was. Will leave that to the purists to debate.

Hi Michael,
Thanks very much for such a clear and thorough answer (you’re always so helpful). I agree with your assessment, but there are a couple of issues that leave me still uncertain. As you mentioned, some purists (and this is what I was getting from the textbooks I was consulting) would argue that you need to present two stimuli (or intervals) for this to be a 2AFC, which is why I thought yes/no was more accurate. Another issue is the “correctness” of the answers in my task. There is no such thing, in this audio set up, as a “correct” or “incorrect” /ba/ categorization. The sounds being presented are not “really” one or the other category (unlike with flashing a light in which case there really is a matter of fact about whether the light was there or not). Furthermore, people’s perceptions vary enormously. So there is no meaningful “false alarm” or “false positive” rate in this task. So, a person could genuinely hear every single sound as /ba/ and it’s not that they are hitting the buttons randomly or just guessing - it’s just that for these sounds, every token sounds like /ba/ to them. So a person COULD meaningfully score 0 on this.

To give an example, I had a similar setup once with a continuum from /ba/ to /va/ and I found that if my participants were bilingual with Chinese (common in the city I was working) most of them would hear absolutely everything as /ba/ (because their other language didn’t have the /v/ category, so its representation was somewhat “weak”). They needed to hear a sound that would seem like a ridiculously exaggerated /vvvvvvvva/ to a monolingual English speaker before they’d consider it to be a token of /va/. So for the continuum I used, with that set of participants, many of them scored 100% /ba/ (equivalent to 0% /va/), and that was meaningful - it correctly captured the fact that for them the boundary between their /ba/ and /va/ categories was outside of the options available on the presented continuum of sounds. To me this is like PSE.

Sorry for the long response, but thanks to your explanation I’m just now understanding what the expectedMin was all about and I’m trying to make sure I’m doing it correctly.

Thanks a million!
Mark

Hi Mark,

No, they can’t. In your task, if a person always responds /ba/ (regardless of whether they are guessing or always actually hear /ba/), they will score 50%. In a task with two alternatives, 50% rather than 0% is your minimum performance. You don’t get to score performance on each of the alternatives (i.e. 100% /ba/ and 0% /da/): the score is for discrimination between the alternatives.

Your task isn’t a detection task (“did I or did I not just present a /ba/?”). Yes, strictly you have a discrimination task ("of the single stimulus just presented, was it /ba/ or /da/) rather than a true 2AFC (“of the two stimuli (/ba/ and /da/) just presented, was /ba/ first or second?”). But the way of calculating your psychophysical function (e.g. using percent correct rather than hit rate) is similar to that used in the 2AFC rather than the detection task because, well, you have two alternatives. As you note, hits and false alarms aren’t really relevant in a discrimination task context.

I would say that the distinction between 2AFC and the discrimination task is about underlying signal detection theory (e.g. you have more information in a 2AFC task as there are two stimuli presented per trial, vs the one in a discrimination task). So a 2AFC task is (theoretically) more powerful than a discrimination task. But that doesn’t mean that you can’t construct the psychophysical function in the same way. i.e. the response (a choice between two alternatives) is fundamentally different than in the yes/no task (signalling presence or absence of a single option).

But in a standard discrimination task (using a 2AFC-type response), this would be 50% correct across the entire continuum, indicating that no discrimination is occurring within the continuum tested. PSE doesn’t apply here, as the the PSE threshold is by definition outside of the tested continuum. And the PSE task is quite different: it means active manipulation of the stimulus until it matches a reference.

So I would say, accept that this is a discrimination task but also accept that you can analyse it as if was 2AFC.

Also accept that there isn’t anything particularly special about your stimuli that makes this anything other than straightforward. e.g. imagine a task where a person must judge if a visual stimulus makes a slight displacement. We can present this as a yes/no task: either the stimulus moved or it didn’t, and we vary the amount of movement to get a psychometric function for hit rate. Alternatively (equivalent to your task), we can make the stimulus jump either left or right by varying amounts, and ask the person to report the direction. This is also a discrimination task but can be analysed as 2AFC. Note that there is a point on the amplitude continuum where the stimulus doesn’t move at all, but the subject must by definition report either left or right movement. This is just like you saying your sound isn’t “really” in either category. This point is actually really useful, as it gives us an assessment of whether the person is biased to report in either direction.

Lastly, you said that if a person always actually hears /ba/, that tells you something. Yes it does: it tells you that they have no discrimination over the range of tested stimuli. There is no difference between that and guessing from a scoring point of view. Similarly, a person who doesn’t understand your task instructions at all, and just constantly reports /da/, would also score 50%. There isn’t really any difference here: in no case is any of the subjects discriminating the stimuli. They may be guessing, they may be only able to ever perceive one stimulus, they may be completely uncooperative, but the technique still gives you the same answer: they aren’t discriminating between your stimuli over the range tested.

Try out your analysis and see what comes of it. Treating it as if it was 2AFC will do all that you need. Trying to shoehorn it into a yes/no analysis will lead to all sorts of difficulties.

1 Like

Thanks again Michael, you Rock!

Sorry to belabour the point - I really appreciate your clear explanations. Looking at your example where you compare the audio set-up to moving the light left or right and then not moving it at all and the participants still have to guess left or right and so would get 50% (so no discrimination). I don’t think that applies here. In your thought experiment there is a physical limit to the difference between the movements left or right (no movement) and at that point the person will get 50/50. So, yes, at the physical minimum they must be guessing and we would expect 50/50. However in this set-up they are really still doing a comparison (the dot’s starting point at the centre of the screen vs. it’s end position - was there a difference between those two points? i.e. did it move?). With my set-up there is no comparing two sounds (they hear just one sound on each trial) and there is no case where we could have a lack of physical difference as the lower limit of the set of stimuli.

In a detection task with reporting the presence/absence of a flash of light, it’s possible that the person’s eyesight is so poor that the person never actually detects the presence of the light at all and so correctly and meaningfully reports 0 detections - it tells you that the set of stimuli did not include this person’s threshold, however, if you used a set of stimuli with more intense flashes, perhaps you could find the person’s threshold. So it is meaningful to have 0 as the lower limit here. I’m still seeing my task as like that. In my task the stimulus starts at one end of the continuum and the sounds become more and more like the other end of the continuum - however it is still possible (and happens for some participants) that they never actually categorize the sound as belonging to the other end of the continuum, because the stimulus simply never reaches their perceptual threshold between the two categories (but if I used a continuum with a wider set of sounds it’s possible that I could find their perceptual threshold between the two categories).

You’ve helped me out with other things in the past and I know that you really know your stuff, so I’m not doubting you’re right - it’s just that I like being able to explain my choices and I’m still seeing this as more logical with expectedMin as 0.

On a more practical note - how much of a difference would there really be in the estimated boundaries with expectedMin set at 0 vs 0.5? I’ve used the 0 setting in several tests where I wanted to find the perceptual boundary between two sounds. When I played the “boundary sound” found by the staircase to the participants they categorized it at chance (50% /ba/ 50% /da/) which suggests that the algorithm is correctly picking the sound at their perceptual boundary between these two categories (luckily, in these tests, I just needed ambiguous sounds, not the actual boundary point, so if the boundaries were off by a bit it doesn’t really matter, but I’d still like to know).

Thanks again for all your help!

Hi again,
Sorry for the flurry of emails on this topic (I’m prepping an experiment that’s just about to start so am concerned about getting this right before starting). Sorry also for posting on a topic and then attempting to answer it myself.

I just did an empirical test to compare setting the expectedMin to 0 vs 0.5 on a staircase finding the boundary between /da/ and /na/ on a 10 000-step continuum (yes, I know it’s a continuum with a lot of steps). Setting expectedMin to 0 gave a threshold result that fit perfectly well with my intuition about where the perceptual boundary was on this continuum (about step 6100). I tested it multiple times and consistently got a result within a few points of this step (6043 to 6154) and when I scan the continuum that is indeed where the sounds start to transition from seeming like /da/ to me and start to seem like /na/. I then set the expectedMin to 0.5 and ran the same staircase multiple times. It always told me the perceptual boundary between /da/ and /na/ was step 0 or infinity on the continuum (neither of which is a meaningful result). I may still be misunderstanding something, but the 0 setting for expectedMin gives answers that predict my perceptual boundary very well, whereas the 0.5 setting gives uninterpretable results.

I am the first to admit that I could be missing something and making a stupid error, if so I’d really appreciate it if someone could set me straight (I’d hate to have this blow up in my face in 6 months time).

Michael, thanks a million for your patient help - I hope you don’t take my back and forth as being argumentative; you’ve helped me (and many others) so much on this forum and it’s truly appreciated. If I’m still missing something, please let me know (better to find out now than when I go to publish).

Thanks!
Mark

If I’ve followed correctly, setting expectedMin to 0 seems appropriate to me - as long as your values in finalResponses are the proportion of /da/ responses for that blend level. When your continuum is at its minimum participants should always be responding /ba/ (i.e. proportion /da/ = 0), at its maximum participants should always be responding /da/ (i.e. proportion /da/ = 1). The PSE is where they are responding /ba/ and /da/ equally often (i.e. proportion /da/ = 0.5).

A good check would be to do something like the below to plot your raw and fitted data, which should be in alignment:

import matplotlib.pyplot as plt

plt.scatter(finalLevels, finalResponses)
fine_x = np.linspace(finalLevels[0], finalLevels[-1], 100)
plt.plot(fine_x, fit.eval(fine_x))
plt.show()
1 Like

Hi,
Thanks! Yes, they’re proportions of /da/ responses. I also coded it using the other way around (proportion of /na/ responses) and it produces the same threshold estimation, which is what should happen. Thanks for the plotting suggestion, I’ll implement it to make sure everything’s working as needed.
Thanks!
Mark