I guess this is all about font rendering: look what happens if you change your text to
"YyXIXyM". The descenders of the top stimulus come down almost as low as your horizontal line. The text stimulus itself really has coordinates that relate to an invisible bounding box rather than the limits of the individually rendered glyphs, because it has to cater for the dimensions of all possible characters in the font. (i.e. note that the position of the text stimulus does not change when you include characters with descenders).
This sort of stuff is really outside of PsychoPy’s control (and you would see exactly the same thing in a word processor: text is aligned on a baseline, rather than on the boundaries of individual glyphs, or else all text documents would look like they had ragged vertical spacing from line to line). This is also the reason why we recommend that people draw geometric fixation points (which can be controlled precisely), rather than just use a quick and dirty text stimulus containing a
+ character for example: one has no way of knowing in advance how a given font will draw a given character relative to the baseline. In some fonts, a plus sign will be drawn higher than another relative to the baseline, even though the bounding box of the text stimulus will be constant.
If you need precise, pixel-level control of text, it is probably best to use text that has been rendered into bitmap images in advance.