I am running a paired associate experiment and I want to account for typos, therefore I want a subject’s response to still be counted as correct even though it may be one letter off, so maybe if 80% of the characters are correct, then the answer will be counted as correct. How should I do this? I understand I would have to get the number of characters in each word and then tell the program to compare each proper answer with the subject’s answer, then make sure a certain amount of characters in the subject’s response are the same as the characters in the correct answer, but I do not know how to implement this or if I misunderstand the steps to do it.
How do I tell the program to list an answer as correct even if only some of the correct letters were entered?
Here are some background pointers:
The StackOverflow approach above using the built-in
difflib module has the advantage of being built-in to Python and hence easy to use in PsychoPy. I don’t know what algorithm it uses: you’ll need to look that up, so you know what it is doing.
To use it, insert a code component. In its “begin experiment” tab, put:
from difflib import SequenceMatcher
Then in, say, the “End routine” tab, where you have a typed response variable that you want to compare to your stored answer, you can do this:
similarity = SequenceMatcher(a = the_typed_response, b = the_correct_answer).ratio() if similarity > 0.6: # do something else: # do something else
Here is the first bit of the help documentation for that function so you have some idea how it calculates the similarity:
SequenceMatcher is a flexible class for comparing pairs of sequences of
any type, so long as the sequence elements are hashable. The basic
algorithm predates, and is a little fancier than, an algorithm
published in the late 1980’s by Ratcliff and Obershelp under the
hyperbolic name “gestalt pattern matching”. The basic idea is to find
the longest contiguous matching subsequence that contains no “junk”
elements (R-O doesn’t address junk). The same idea is then applied
recursively to the pieces of the sequences to the left and to the right
of the matching subsequence. This does not yield minimal edit
sequences, but does tend to yield matches that “look right” to people.
SequenceMatcher tries to compute a “human-friendly diff” between two
sequences. Unlike e.g. UNIX™ diff, the fundamental notion is the
longest contiguous & junk-free matching subsequence. That’s what
catches peoples’ eyes. The Windows™ windiff has another interesting
notion, pairing up elements that appear uniquely in each sequence.
That, and the method here, appear to yield more intuitive difference
reports than does diff. This method appears to be the least vulnerable
to synching up on blocks of “junk lines”, though (like blank lines in
ordinary text files, or maybe “P” lines in HTML files). That may be
because this is the only method of the 3 that has a concept of
Example, comparing two strings, and considering blanks to be “junk”:
s = SequenceMatcher(lambda x: x == " ",
… “private Thread currentThread;”,
… “private volatile Thread currentThread;”)
.ratio() returns a float in [0, 1], measuring the “similarity” of the
sequences. As a rule of thumb, a .ratio() value over 0.6 means the
sequences are close matches:
Thank you very much, it works like a charm