Questions on output data file

Shawna_Wang · June 20, 2019, 6:29pm

Hi there. I have finished collecting data for a self-paced reading task. Since I am only interested in a particular areas of words, I used a code writing in python so that it will give me RTs for these words as well the Means, standard deviations etc(so that I can exclude outliers based on these). But the thing is if I calculate the means/SDs manually in excel, I will get different values. As you can see in the screenshot, the highlighted row is what I got using excel calculations, and the row labelled as Mean is what I got from python. What is the reason for that? And what criteria I should adopt? Looking forward to your reply. Thanks in advance.

Michael · June 20, 2019, 11:14pm

Different calculations being applied. We need to see how both are being done. This may very well be a question more of statistics than anything else. e.g. are the means being taken from all observations (grand means), or by taking the mean of a set of sub-group means (e.g. within-subject means). We really can’t know unless you provide the details.

Shawna_Wang · June 20, 2019, 11:47pm

Thank you very much for your reply.
So basically the sentences and the RTs for the words are randomised in the raw output data. That is why I used a code so that it give me RTs for specific words within a specific area in the order I want. Also, for the sake identifying outliers, I also edited the code so that it gave me the value of the formulation(threshold for identifying outliers.(as you can see in the first three screenshots).

But when I manually calculated the Means and Standard deviations manually in the excel the code gave me, and compare these values with the ones given initially by the code. I found they are different (as you can see in the picture when I created the topic). It is said this is because python is more precise and accurate than excel in terms of calculations of Means and standard deviations. Is that the case?Which approaches will you recommend to adopt for this purpose?

Thanks in advance.

Kind regards,

Shawna

Michael · June 21, 2019, 12:06am

No, the calculation of a mean is simple and will not vary across standard software like this, except at some very long number of decimal places. The calculation of standard deviation requires a decision as to the degrees of freedom used. As long as the same value is used, again Excel and Python would give the same answer, to quite some number of decimal places.

So the difference will lie in what is being calculated. i.e. you can assume that the mean and SD functions in Excel and Python are equivalent. But they will give different answers if they are applied to different values. That will be the source of your issue.

I can’t really tell what is going on in the Python code, and you don’t show or explain how the Excel values are calculated. But you will be best placed to check that: go through each process and check that the functions are being applied to the exactly the same values (and certainly, the same number of values). i.e. the mean of [1, 2, 3, 4, 5] will be the same in Excel and Python. But the results will be different if you compare the mean of [1, 3, 5] in one system to the mean of [1, 2, 3, 4, 5] in the other. This will be where the problem is.

Topic		Replies	Views
How to get other other options in the output Builder	7	265	February 5, 2023
No clear excel output Builder	2	872	April 13, 2017
Weird (extremely long) reaction times in the output file Builder	7	2948	August 18, 2021
Reading lines from multiple data files and using that data to do calculations Coding	1	583	April 1, 2019
Use data from output file to do analysis within experiment Builder	14	815	January 6, 2021

Questions on output data file

Related topics