Description of the problem:
Is there a way to remove columns from the ‘.csv’ results files while they are generated on Pavlovia? Specifically, I am looking into getting rid of the columns ending in ‘.thisRepN’, ‘.thisTrialN’, ‘.thisN’, ‘.thisIndex’ and ‘.ran’. Example of columns below:
You really don’t want to do this. What if a reviewer asks you about learning effects, either within or across blocks, and you can’t answer that, because you threw away all the trial numbers?
Removing columns from consideration is something to do at the analysis stage, not when running the experiment. You never know what you might need. There are also all sorts of columns related to timing performance, which you might not need for analysis but could be invaluable if there are questions about the system performance. Dropping columns is trivial if you are scripting your analysis (which hopefully you are). Having lots of (currently) unneeded columns is only problematic if your analysing data manually, which should be avoided for all sorts of reasons.
You can filter the necessary columns by using a Py code below:
df=pd.read_csv("filename.csv", encoding = "utf-8") #read the file
df2= df.loc[46:85, ["Column1", "Column2"]] #include only the necessary column title between certain rows (e.g. 48:85)
#definin de2 will help you keep the original pdf
df2.to_csv('1_filename.csv', index=False) #save the new file without losing the first one.do not forget to change the name of the actual file
Personally I use R (and in particular the tidyverse packages) to do this sort of thing. I would never manually bind files together.
I would loop over all of the .csv files and read each in with the readr::read_csv() function. Do this from the start and so you can be trialing your analysis before the data collection is complete. Each time you run the script, it will just automatically include the latest data.
The dplyr::bind_rows() function will join the data frame from each csv to an overall data frame, doing so by the names of the columns rather than fixed positions. If one csv has more or fewer columns than the others, that gets handled smartly as well.
The dplyr and tidyr packages will do everything that pivot tables can do and a huge amount more, and will put your data in the right shape for whatever statistical models you want to run, or to directly visualise with ggplot2. All this with human-readable code that is reproducible and able to be reviewed or critiqued.
And the RStudio ecosystem also provides for doing all of this within a document-centric approach, so that your code can be within the actual manuscript you use for producing a journal submission. Here is a real example:
I once used the same sort of Excel-based pathway you describe, but would never go back to it - there are just far to many advantages to scripting the data processing, analysis, and visualisation.
I just tried the RBDMerge add-in for Excel but I found that it ignored the headers and lined up by column—so when I tested with a blank Column B, I still ended up with misaligned columns, just as I would with Terminal.
So the database is for Pavlovia, not for (desktop) PsychoPy. I haven’t needed to use Pavlovia yet, but I will look our for this. Cheers!
You might be interested in the code I am currently working on to summarise data files.
I started writing it in February 2020 and then got unexpectedly distracted by the need to run experiments online.
It still has a way to go, and I might add an option to save long data, but usually what I want is the proportion correct (mean key_resp.corr) and the median reaction time for correct responses only (median key_resp.rt) split by up to three repeated measures.
I’d be happy to have a few people test it out to see how well it works for their data.