Missing diacritical marks


It looks like you are doing some sort of self-paced presentation of text, masking some characters? I’ll assume that you have some variable called, say, my_text that contains the current text string to display.

So insert a Text component set to the default language style i.e. LTR rather than RTL or Arabic (because your custom code will be doing the text processing and we don’t want the text component to interfere with that).

Then insert a Code component. In its “Begin experiment” tab, put something like this:

from bidi.algorithm import get_display
from arabic_reshaper import ArabicReshaper

# configuration which will retain diacritics but also shift
# them by 1 position to work with the bidi algorithm:
arabic_configuration = {'delete_harakat': False,
                        'shift_harakat_position': True }

reshaper = ArabicReshaper(configuration = arabic_configuration)

Then in, say, the “Begin routine” tab, put something like this:

# ensure characters link to each other properly, retaining but shifting
# diacritics:
processed_text = reshaper.reshape(my_text)
# apply the bi-directional algorithm to reverse text for displaying RTL:
processed_text = get_display(processed_text)
# update the stimulus:
your_text_stimulus.text = processed_text

Please do try the Sahel font again with this solution - the issues you saw may have been due to the reshaping and bidirectional processing not being applied properly. Even if the code above is working, the output will not look correct unless the right font is used.

[Note to future readers: a pull request to address this behaviour has been submitted, so in an upcoming PsychoPy release, diacritic marks in Arabic text should automatically be retained:]

Hi Micheal,

Thank you so much for the code. I added it to the experiment, but I don’t exactly know what went wrong and caused the sentence not to show when I run the experiment.

trial_1.psyexp (11.6 KB)

I can’t run that without the accompanying conditions file (Stims.xlsx).

But even without seeing that, my question is where do you define the variable my_text? As in the post above:

If you don’t have a variable called that, the code simply can’t work. You can replace it of course with your own relevant variable name. I see your conditions file has a variable called S1. Does that contain the text you want to display?

If so, simply replace the variable name my_text in the code I gave above with your actual variable name, S1.

NB you are also trying to update the text component via the graphical interface, by putting $'S1' in the text field and setting it to update every repeat.

That won’t work for several reasons. Firstly you are putting quotes around S1, so that will print the literal letters 'S1', and not the contents of the variable called S1. Secondly, by setting it to update every repeat, and putting it after the code component, you will be undoing whatever the code component does. Don’t try both approaches at the same time: we are updating the text field in code, not via the component settings.

So just put some junk text in that text field (like 12345) and set it to be “constant” rather than to “set every repeat”. That way you won’t be fighting against the custom code, and if 12345 gets shown, you’ll know that the code isn’t working somehow.

1 Like

Hi Michael,

I would like to express my gratitudes to @Michael for providing the code needed to display the diacritics on the Arabic alphabetical characters.

I tried different fonts till I was able and lucky to find the suitable one (at least on my Mac). However, I still have one issue that I was not able to solve. That is, the Arabic sentence on the screen starts from the bottom line. Would you please help to solve this issue, please?

I am providing the conditions file and the psyexp file.
loopTemplate.xlsx (17.8 KB)
Trial1.psyexp (11.4 KB)

Thank you in advance.

Ala’a

This problem with multi-line text is a limitation of the Text component. This relies on a third-party way of displaying the text (a package called pyglet) that we don’t control.

However, PsychoPy now has a more advanced stimulus for displaying text, called the Textbox. That has the location of characters under PsychoPy’s own control, and should make it possible to address this issue.

I’ve only ever made changes to the code for the TextStim, not TextBox, but I think some of my Arabic-related changes have been mirrored there by others, like @TParsons

I’m not sure how much of that has been implemented though. Try using the TextBox stimulus and if there are still errors using that stimulus, please provide some screen shots, as well as some sample text that can be used to replicate it.

Thank you Michael.
I tried Textbox stimulus. The diacritics positions changed i.e., the diacritic of each character shifted to the one on the right as shown in the screen shots. Also, the sentence still starts from the bottom line.


Using Text, the diacritic shift of position did not occur. However, the beginning of the sentence issue is still there.

Thanks for testing this out. I probably know what would be needed to solve the diacritic position issue at least. I also think I know what would be required to solve the issue of starting from the bottom line but not confident that I’d be able to implement that myself - would need to discuss with others.

Would you mind providing a long sample of text with diacritics (i.e. long enough that it would take up a couple of lines)? Just paste it into a post here so we can copy it.

Thanks Michael.

Using Text, I decreased the spaces between words in the condition file and it worked perfectly with the low number of words in a sentence.

With a longer text as the below sentence, it started from the bottom.
.كُرْسِيُّ الْمُعَلِمِ الْمُتْعَبِ جَدِيدٌ وَ مُرِيحٌ جِدّاً وِفْقَ رَأْيِي الشَخْصِّيِّ وَ الصَّرِيحِ

this is a 12 word sentence.

Thank you.

Hi Michael,

.تَوْصِيَّةُ العَالِمَيْنِ الْمُخَضْرَمضيْنِ الْمُبْدِعَيْنِ قَيِّمَةٌ وَ ذَاتْ مَنْفَعَةٍ كَبْيرَةٍ لِلْبَاحِثِ الْمُبْتَدِئِ وِفْقَ رَأْيِي الشَخْصِّيِّ وَ المُتَوَاضِعِ

This is a longer sentence, hope it will help. @Michael

Thanks. I’ve made some progress with getting TextBox2 to (mostly) deal properly with diacritics - it just needs to configure an Arabic Reshaper object in the same way as the TextStim does. That code is still sitting on my computer though, as it only solves part of the problem. i.e. I’ve not been able to get anywhere with solving the issue of multiline right-to-left text starting from the bottom of the page. That bit of the code is too complex for me to try to change.

Would need some input from @TParsons on that front.

Todd - in essence, we are currently hacking right-to-left text (in both TextStim and TextBox2) by simply reversing the logical string (via the bidi algorithm for the fancy stuff) and feeding it to a stimulus that is still fundamentally displaying a string of characters left-to-right. i.e. the characters are always displayed from an origin at the top-left of the stimulus, with each glyph marching one step further to the right. So things (appear to) work for a single line of RTL text, simply by the trick of reversing the string. But this must break for a multi-line piece of text, as the first character of the original unreversed string will always end up as the last character in the stimulus (at the bottom-right). We need it to start at the top-right of the stimulus.

So in essence, instead of a fixed origin (top-left) and fixed directional increment (one step rightward), we need both of those parameters to be variables that alter depending on the direction of the text. i.e.:

  • Left-to-right text: origin = top-left of bounding box, glyph increment = 1 step rightward.
  • Right-to-left text: origin = top-right of bounding box, glyph increment = 1 step leftward.
  • Regardless of text direction, at the end of each line, the vertical drawing coordinate is incremented one line lower down the screen, and the horizontal coordinate is reset to the starting value.

(I’m assuming here left- or right-aligned text format: ignoring the possibility of centre-aligned text, simply because I don’t understand how that works - I guess the origin in that case needs to be calculated on a per-line basis, depending on the length of text in that line?)

If we adopted this approach, we would then no longer need to reverse the string for displaying RTL text. That is, the logical order (i.e. the order in which characters are typed) would be preserved for both LTR and RTL text, only the order of drawing would vary between them. This is in contrast to the current situation, where the order of drawing is fixed, and so we need to reverse the logical order of the RTL string.

If it helps, imagine a RTL language which is simply English but read right-to-left. So an English phrase would be:

The cat sat.

To display this to be read from RTL, we currently just reverse the string, so we can still draw it progressively from left-to-right:

.tas tac ehT

This makes sense when read from right to left. But if the text stimulus gets too narrow, then the line would break like this:

.tas tac
     ehT

This is the issue being reported by @Alaa. The logical order of the text is now broken: it would be read in order (from top-right to bottom right) as cat sat. The. But if instead of reversing the string, we preserved its logical order but drew it progressing rightward from an origin at the top right, it would look like the following (with the line break changing naturally):

tac ehT
   .tas

Does this idea of varying the origin and the glyph-increment direction at a low, drawing level within TextBox2, make sense?

I’m ignoring that there are some situations (e.g. in formal Japanese) where the characters flow from an origin at top-right but down the page, rather than right-to-left.

e.g.

s  T
a  h
t  e
.
   c
   a
   t

We could/should ignore that for the time-being, but it does show that for complete generalisation, the glyph-increment direction variable would ideally be free to vary between a step downwards as well as a step leftwards or rightwards. There are apparently some languages that read up the page, but they are probably rare enough to not worry about…

Hi Michael,

I just found this
Word wrapping of multi-line Arabic text

I think it is the same issue, but no diacritics above/under the characters, am I right? @Michael

Yes, since then I’ve added the handling of diacritics to TextStim. I’ve made a similar change to allow proper diacritic support to TextBox2, but that has not been submitted to the repository yet, as ideally we would fix this at the same time as the text-flow problem.

Thank you. @Michael
Do I need to have the latest version of PsychoPy for the changes on TextBox2? I now have 2022.1.4.

Would by any chance the following path done by @wakecarter help with Arabic sentences since I am designing SPR task using moving window paradigm with a comprehension question after every sentence?

Another issue, my purpose is to have all words initially covered by dashes until the participant presses space bar and the first word appears. I tried the code used in the below thread, but one word displays on the screen. Is it possible to add dashes using code instead of having to add dashes in the condition file (e.g. S1)? I have a large number of sentences, a code for dashes will save me a lot of time.

I am not sure what could be adapted from both codes that might help with the dashes issue.
Here is the psyexp and condition files.
Trial3.psyexp (16.1 KB)
loopTemplate1.xlsx (18.1 KB)

No, like I said, those changes are still just on my computer: I haven’t pushed them to the PsychoPy code repository on GitHub, because ideally it would happen together with other required changes to handle the multi-line issues. But if no progress happens there, I might submit those changes alone, if only to get diacritics working on single lines.

That code is for use online. If you want to run your experiment online, then I think most of these text display issues go away - web browsers are much better at handling international text, without the need for PsychoPy to do much at all. So are you intending to run the experiment locally, or online?

Hi Michael,

If I may ask, how much time do you expect it will need for the whole issue to be handled, please?

I intend to run the experiment locally.

I’m just a volunteer. The diacritic issue is one that I can fix with TextBox2 but facing its handling of multiline text is a beyond my ability to address within a reasonable timeframe - that would need to be done by one of the professional developers.

Thank you so much @Michael. You’ ve provided great help and advice.

Hope the multiline issue would be handled soon.

Thank you.
Ala’a

Hi @Michael

With the help of a programmer, I found a solution for the multi-line issue. He changed Wrap Width under Layout in TextStim form None to 60. The sentence displayed in one line from right to left as shown in the screen shot.

Hi @Michael

I am following your code to create masked self paced reading task from the links:
Masked self-paced reading (google.com)

and

How do I get Moving window presentation in self paced reading task? - Coding - PsychoPy

but I am stuck in the following errors

Alert 4205:Python Syntax Error in ‘Begin Routine’ tab. See ’ for index, word in enumerate(textlist)
’ on line number 5 of the ‘Begin Routine’ tab.
For further info see 4205: Probable syntax error detected in your Python code — PsychoPy v2022.2.2
Alert 4205:Python Syntax Error in ‘Each Frame’ tab. See 'if’space’in keypresses:
’ on line number 5 of the ‘Each Frame’ tab.
For further info see 4205: Probable syntax error detected in your Python code — PsychoPy v2022.2.2

I am not sure what to change in the code to run the experiment.

The following code is at begin routine tab:
SentenceList = Sentence.Split(" ")
wordNumber = (-1)
def replaceWithdashes(textlist, currentWordNumber):
dashSentence = ‘’
for index, word in enumerate(textlist)
if index != currentWordNumber:
dashSentence = dashSentence + ‘–’ * len(word) + ‘–’
else:
dashSentence = dashSentence + word
return dashSentence

textbox.text = replaceWithdash(sentenceList, wordNumber)

Each frame
keypresses = event.getKeys()

if len(keypresses) > 0:

if’space’in keypresses:

wordNumber = wordNumber + 1
if wordNumber< len(sentenceList):
textbox.text = replaceWithdash(sentenceList, wordNumber)
else:
continueRoutine = FalseM

elif’esc’in keypresses:

core.quit()

I am trying to have the sentences displayed masked at the beginning ( the participant is able to see how many words the sentence includes with a space between words and how many characters each word is composed of) and when the participant presses spacebar the first word appears and the rest of the words stay masked. Whenever the participant presses spacebar the preceding word disappears and the following one appears.

Thank you in advance.