Reading lines from multiple data files and using that data to do calculations

I am trying to sort some Stroop data from 20 participants in 20 different .csv files. I need to calculate means and standard deviations for their reaction times, as well as percentage correct, comparing the congruent and incongruent conditions. All of this needs to be put in a table. I am having some trouble with reading the files and saving the correct elements. For example, I ran the below code and it seemed fine, however, my reaction time variable is not right at all. If someone could help with any of the elements I have described, I would be so grateful! :slight_smile:

the code:

import glob
import numpy as np

os.chdir('/Users/ameliashelton/Documents/Year 3 /Programming/stroop/data')
path = '/Users/ameliashelton/Documents/Year 3 /Programming/stroop/data'

rts = []

for file in sorted(os.listdir(path)):
    print (file)
    f = open (file, 'r')
    for line in f.readlines():
        trialnum, rt =  line.split(',')
        rt = float(rt)
rts = np.array(rts)

rt_mean = rts.mean()
rt_std = rts.std()
rt_ntrials = len(rts)

#print in correct format 
print("RT Mean : {:.3f}seconds".format(rt_mean))
print("RT Std : {:.3f}seconds".format(rt_std))
print("Num RTs : {}".format(rt_ntrials))    

the data file:


the output:

There’s a bunch of stuff here.

  1. Don’t call readlines twice. I think it puts the ‘reader’ at the end of the file after the first time, so the for loop just doesn’t pick up anything. I think you’re trying to skip the header line, but you’re actually skipping the whole file! What you might want to do instead is put the file into a list of lists, and then iterate through that.

  2. You’re almost certainly going to get a bunch of valueerrors once you fix that. ‘split’ just breaks it into a list of elements, so trying to crunch 7 columns into ‘trialnum, rt’ isn’t going to work. Instead, you need to go find the rt as the second-to-last index in each line’s list.

The fixed code will look something like this

for file in sorted(os.listdir(path)):
    print (file)
    f = open (file, 'r')
    lines = f.readlines()
    for i in range(1, len(lines)): # starting with 1 skips the header line.
        rt = float(lines[i][-2])