Questions on output data file

Hi there. I have finished collecting data for a self-paced reading task. Since I am only interested in a particular areas of words, I used a code writing in python so that it will give me RTs for these words as well the Means, standard deviations etc(so that I can exclude outliers based on these). But the thing is if I calculate the means/SDs manually in excel, I will get different values. As you can see in the screenshot, the highlighted row is what I got using excel calculations, and the row labelled as Mean is what I got from python. What is the reason for that? And what criteria I should adopt? Looking forward to your reply. Thanks in advance.

Different calculations being applied. We need to see how both are being done. This may very well be a question more of statistics than anything else. e.g. are the means being taken from all observations (grand means), or by taking the mean of a set of sub-group means (e.g. within-subject means). We really can’t know unless you provide the details.

Thank you very much for your reply.
So basically the sentences and the RTs for the words are randomised in the raw output data. That is why I used a code so that it give me RTs for specific words within a specific area in the order I want. Also, for the sake identifying outliers, I also edited the code so that it gave me the value of the formulation(threshold for identifying outliers.(as you can see in the first three screenshots).

But when I manually calculated the Means and Standard deviations manually in the excel the code gave me, and compare these values with the ones given initially by the code. I found they are different (as you can see in the picture when I created the topic). It is said this is because python is more precise and accurate than excel in terms of calculations of Means and standard deviations. Is that the case?Which approaches will you recommend to adopt for this purpose?

Thanks in advance.

Kind regards,

Shawna

1 Like

No, the calculation of a mean is simple and will not vary across standard software like this, except at some very long number of decimal places. The calculation of standard deviation requires a decision as to the degrees of freedom used. As long as the same value is used, again Excel and Python would give the same answer, to quite some number of decimal places.

So the difference will lie in what is being calculated. i.e. you can assume that the mean and SD functions in Excel and Python are equivalent. But they will give different answers if they are applied to different values. That will be the source of your issue.

I can’t really tell what is going on in the Python code, and you don’t show or explain how the Excel values are calculated. But you will be best placed to check that: go through each process and check that the functions are being applied to the exactly the same values (and certainly, the same number of values). i.e. the mean of [1, 2, 3, 4, 5] will be the same in Excel and Python. But the results will be different if you compare the mean of [1, 3, 5] in one system to the mean of [1, 2, 3, 4, 5] in the other. This will be where the problem is.