Reading lines from multiple data files and using that data to do calculations

Amelia_Shelton · March 30, 2019, 11:43am

I am trying to sort some Stroop data from 20 participants in 20 different .csv files. I need to calculate means and standard deviations for their reaction times, as well as percentage correct, comparing the congruent and incongruent conditions. All of this needs to be put in a table. I am having some trouble with reading the files and saving the correct elements. For example, I ran the below code and it seemed fine, however, my reaction time variable is not right at all. If someone could help with any of the elements I have described, I would be so grateful!

the code:

import glob
import numpy as np

os.chdir('/Users/ameliashelton/Documents/Year 3 /Programming/stroop/data')
path = '/Users/ameliashelton/Documents/Year 3 /Programming/stroop/data'

rts = []

for file in sorted(os.listdir(path)):
    print (file)
    f = open (file, 'r')
    f.readlines()
    for line in f.readlines():
        trialnum, rt =  line.split(',')
        rt = float(rt)
        rts.append(rt)
        
        
rts = np.array(rts)

rt_mean = rts.mean()
rt_std = rts.std()
    
rt_ntrials = len(rts)

#print in correct format 
print("RT Mean : {:.3f}seconds".format(rt_mean))
print("RT Std : {:.3f}seconds".format(rt_std))
print("Num RTs : {}".format(rt_ntrials))

the data file:

the output:

jonathan.kominsky · April 1, 2019, 3:47pm

There’s a bunch of stuff here.

Don’t call readlines twice. I think it puts the ‘reader’ at the end of the file after the first time, so the for loop just doesn’t pick up anything. I think you’re trying to skip the header line, but you’re actually skipping the whole file! What you might want to do instead is put the file into a list of lists, and then iterate through that.
You’re almost certainly going to get a bunch of valueerrors once you fix that. ‘split’ just breaks it into a list of elements, so trying to crunch 7 columns into ‘trialnum, rt’ isn’t going to work. Instead, you need to go find the rt as the second-to-last index in each line’s list.

The fixed code will look something like this

for file in sorted(os.listdir(path)):
    print (file)
    f = open (file, 'r')
    lines = f.readlines()
    for i in range(1, len(lines)): # starting with 1 skips the header line.
        rt = float(lines[i][-2])
        rts.append(rt)

etc.

Topic		Replies	Views
Calculating means and standard deviations from multiple .csv data files Coding	4	942	April 10, 2019
Saving data from Stroop task into .csv file Coding	3	2446	March 28, 2019
Using TrialHandler to save data as a .csv file Coding	3	1634	April 10, 2019
Questions on output data file Builder	3	989	June 21, 2019
Converting python to JS - NumPy and and random.normal function Online experiments	12	4369	July 7, 2020

Reading lines from multiple data files and using that data to do calculations

Related topics