Missing diacritical marks

This problem with multi-line text is a limitation of the Text component. This relies on a third-party way of displaying the text (a package called pyglet) that we don’t control.

However, PsychoPy now has a more advanced stimulus for displaying text, called the Textbox. That has the location of characters under PsychoPy’s own control, and should make it possible to address this issue.

I’ve only ever made changes to the code for the TextStim, not TextBox, but I think some of my Arabic-related changes have been mirrored there by others, like @TParsons

I’m not sure how much of that has been implemented though. Try using the TextBox stimulus and if there are still errors using that stimulus, please provide some screen shots, as well as some sample text that can be used to replicate it.

Thank you Michael.
I tried Textbox stimulus. The diacritics positions changed i.e., the diacritic of each character shifted to the one on the right as shown in the screen shots. Also, the sentence still starts from the bottom line.


Using Text, the diacritic shift of position did not occur. However, the beginning of the sentence issue is still there.

Thanks for testing this out. I probably know what would be needed to solve the diacritic position issue at least. I also think I know what would be required to solve the issue of starting from the bottom line but not confident that I’d be able to implement that myself - would need to discuss with others.

Would you mind providing a long sample of text with diacritics (i.e. long enough that it would take up a couple of lines)? Just paste it into a post here so we can copy it.

Thanks Michael.

Using Text, I decreased the spaces between words in the condition file and it worked perfectly with the low number of words in a sentence.

With a longer text as the below sentence, it started from the bottom.
.كُرْسِيُّ الْمُعَلِمِ الْمُتْعَبِ جَدِيدٌ وَ مُرِيحٌ جِدّاً وِفْقَ رَأْيِي الشَخْصِّيِّ وَ الصَّرِيحِ

this is a 12 word sentence.

Thank you.

Hi Michael,

.تَوْصِيَّةُ العَالِمَيْنِ الْمُخَضْرَمضيْنِ الْمُبْدِعَيْنِ قَيِّمَةٌ وَ ذَاتْ مَنْفَعَةٍ كَبْيرَةٍ لِلْبَاحِثِ الْمُبْتَدِئِ وِفْقَ رَأْيِي الشَخْصِّيِّ وَ المُتَوَاضِعِ

This is a longer sentence, hope it will help. @Michael

Thanks. I’ve made some progress with getting TextBox2 to (mostly) deal properly with diacritics - it just needs to configure an Arabic Reshaper object in the same way as the TextStim does. That code is still sitting on my computer though, as it only solves part of the problem. i.e. I’ve not been able to get anywhere with solving the issue of multiline right-to-left text starting from the bottom of the page. That bit of the code is too complex for me to try to change.

Would need some input from @TParsons on that front.

Todd - in essence, we are currently hacking right-to-left text (in both TextStim and TextBox2) by simply reversing the logical string (via the bidi algorithm for the fancy stuff) and feeding it to a stimulus that is still fundamentally displaying a string of characters left-to-right. i.e. the characters are always displayed from an origin at the top-left of the stimulus, with each glyph marching one step further to the right. So things (appear to) work for a single line of RTL text, simply by the trick of reversing the string. But this must break for a multi-line piece of text, as the first character of the original unreversed string will always end up as the last character in the stimulus (at the bottom-right). We need it to start at the top-right of the stimulus.

So in essence, instead of a fixed origin (top-left) and fixed directional increment (one step rightward), we need both of those parameters to be variables that alter depending on the direction of the text. i.e.:

  • Left-to-right text: origin = top-left of bounding box, glyph increment = 1 step rightward.
  • Right-to-left text: origin = top-right of bounding box, glyph increment = 1 step leftward.
  • Regardless of text direction, at the end of each line, the vertical drawing coordinate is incremented one line lower down the screen, and the horizontal coordinate is reset to the starting value.

(I’m assuming here left- or right-aligned text format: ignoring the possibility of centre-aligned text, simply because I don’t understand how that works - I guess the origin in that case needs to be calculated on a per-line basis, depending on the length of text in that line?)

If we adopted this approach, we would then no longer need to reverse the string for displaying RTL text. That is, the logical order (i.e. the order in which characters are typed) would be preserved for both LTR and RTL text, only the order of drawing would vary between them. This is in contrast to the current situation, where the order of drawing is fixed, and so we need to reverse the logical order of the RTL string.

If it helps, imagine a RTL language which is simply English but read right-to-left. So an English phrase would be:

The cat sat.

To display this to be read from RTL, we currently just reverse the string, so we can still draw it progressively from left-to-right:

.tas tac ehT

This makes sense when read from right to left. But if the text stimulus gets too narrow, then the line would break like this:

.tas tac
     ehT

This is the issue being reported by @Alaa. The logical order of the text is now broken: it would be read in order (from top-right to bottom right) as cat sat. The. But if instead of reversing the string, we preserved its logical order but drew it progressing rightward from an origin at the top right, it would look like the following (with the line break changing naturally):

tac ehT
   .tas

Does this idea of varying the origin and the glyph-increment direction at a low, drawing level within TextBox2, make sense?

I’m ignoring that there are some situations (e.g. in formal Japanese) where the characters flow from an origin at top-right but down the page, rather than right-to-left.

e.g.

s  T
a  h
t  e
.
   c
   a
   t

We could/should ignore that for the time-being, but it does show that for complete generalisation, the glyph-increment direction variable would ideally be free to vary between a step downwards as well as a step leftwards or rightwards. There are apparently some languages that read up the page, but they are probably rare enough to not worry about…

Hi Michael,

I just found this
Word wrapping of multi-line Arabic text

I think it is the same issue, but no diacritics above/under the characters, am I right? @Michael

Yes, since then I’ve added the handling of diacritics to TextStim. I’ve made a similar change to allow proper diacritic support to TextBox2, but that has not been submitted to the repository yet, as ideally we would fix this at the same time as the text-flow problem.

Thank you. @Michael
Do I need to have the latest version of PsychoPy for the changes on TextBox2? I now have 2022.1.4.

Would by any chance the following path done by @wakecarter help with Arabic sentences since I am designing SPR task using moving window paradigm with a comprehension question after every sentence?

Another issue, my purpose is to have all words initially covered by dashes until the participant presses space bar and the first word appears. I tried the code used in the below thread, but one word displays on the screen. Is it possible to add dashes using code instead of having to add dashes in the condition file (e.g. S1)? I have a large number of sentences, a code for dashes will save me a lot of time.

I am not sure what could be adapted from both codes that might help with the dashes issue.
Here is the psyexp and condition files.
Trial3.psyexp (16.1 KB)
loopTemplate1.xlsx (18.1 KB)

No, like I said, those changes are still just on my computer: I haven’t pushed them to the PsychoPy code repository on GitHub, because ideally it would happen together with other required changes to handle the multi-line issues. But if no progress happens there, I might submit those changes alone, if only to get diacritics working on single lines.

That code is for use online. If you want to run your experiment online, then I think most of these text display issues go away - web browsers are much better at handling international text, without the need for PsychoPy to do much at all. So are you intending to run the experiment locally, or online?

Hi Michael,

If I may ask, how much time do you expect it will need for the whole issue to be handled, please?

I intend to run the experiment locally.

I’m just a volunteer. The diacritic issue is one that I can fix with TextBox2 but facing its handling of multiline text is a beyond my ability to address within a reasonable timeframe - that would need to be done by one of the professional developers.

Thank you so much @Michael. You’ ve provided great help and advice.

Hope the multiline issue would be handled soon.

Thank you.
Ala’a

Hi @Michael

With the help of a programmer, I found a solution for the multi-line issue. He changed Wrap Width under Layout in TextStim form None to 60. The sentence displayed in one line from right to left as shown in the screen shot.