Ways to get rid of columns from Pavlovia '.csv' results?

vuandre1 · June 21, 2020, 7:24pm

URL of experiment: Pavlovia

Description of the problem:
Is there a way to remove columns from the ‘.csv’ results files while they are generated on Pavlovia? Specifically, I am looking into getting rid of the columns ending in ‘.thisRepN’, ‘.thisTrialN’, ‘.thisN’, ‘.thisIndex’ and ‘.ran’. Example of columns below:

It’s not much of a problem but the other members of my lab would like to know if they can be removed. So far, there doesn’t seem to be anything in PsychoPy Builder that can remove them beforehand, nor does there seem to be anything in the JavaScript code that would hint at any method of removal (or even creation). The software platform is PsychoJs, the platform version is 2020.1.

Any help is appreciated.

SaraCal · September 9, 2020, 2:07pm

Hi,

did you find a solution to this “issue”? I’d be interested in doing the same.

Many thanks!

Michael · September 9, 2020, 9:14pm

You really don’t want to do this. What if a reviewer asks you about learning effects, either within or across blocks, and you can’t answer that, because you threw away all the trial numbers?

Removing columns from consideration is something to do at the analysis stage, not when running the experiment. You never know what you might need. There are also all sorts of columns related to timing performance, which you might not need for analysis but could be invaluable if there are questions about the system performance. Dropping columns is trivial if you are scripting your analysis (which hopefully you are). Having lots of (currently) unneeded columns is only problematic if your analysing data manually, which should be avoided for all sorts of reasons.

JasperRobinson · April 2, 2022, 9:35am

Hi @Michael — Would you mind sharing details on how you go about scripting your data wrangling?

At the moment, my workflow is:

Concatenate all of the individual files into a single .csv file (Terminal on a Mac)
Either in BBEdit or Excel/Numbers, line-up columns & remove clutter.
Create a Pivot Table Excel/Numbers to create a version that I can run analyses on.

The problem that I have is that the columns never quite line up and the concatenated file is too big to easily see where columns zig-zag.

As far as I know there is no modifier (e.g., -l type of thing) in the concatenate command to line up columns (comma-separated text) by their headers.

TIA, Jasper

wakecarter · April 2, 2022, 3:26pm

One of the advantages of using database saving is that the columns get lined up automatically.

I have some slides of my method here.

I don’t need to use RBDMerge any more and I create my pivot tables in SPSS (even though I now use jamovi for analysis).

B.D.5 · April 2, 2022, 9:06pm

Hi,

You can filter the necessary columns by using a Py code below:

df=pd.read_csv("filename.csv", encoding = "utf-8") #read the file
df2= df.loc[46:85, ["Column1", "Column2"]] #include only the necessary column title between certain rows (e.g. 48:85)
#definin de2 will help you keep the original pdf
df2.to_csv('1_filename.csv', index=False) #save the new file without losing the first one.do not forget to change the name of the actual file

Bests.

Michael · April 3, 2022, 10:00pm

Personally I use R (and in particular the tidyverse packages) to do this sort of thing. I would never manually bind files together.

I would loop over all of the .csv files and read each in with the readr::read_csv() function. Do this from the start and so you can be trialing your analysis before the data collection is complete. Each time you run the script, it will just automatically include the latest data.

The dplyr::bind_rows() function will join the data frame from each csv to an overall data frame, doing so by the names of the columns rather than fixed positions. If one csv has more or fewer columns than the others, that gets handled smartly as well.

The dplyr and tidyr packages will do everything that pivot tables can do and a huge amount more, and will put your data in the right shape for whatever statistical models you want to run, or to directly visualise with ggplot2. All this with human-readable code that is reproducible and able to be reviewed or critiqued.

And the RStudio ecosystem also provides for doing all of this within a document-centric approach, so that your code can be within the actual manuscript you use for producing a journal submission. Here is a real example:

github.com

nzbri/wtar-trajectory/blob/main/WTAR-curve-modelling.Rmd

---
title: "Wechsler Test of Adult Reading in Parkinson’s: a stable yet imperfect measure of premorbid cognitive function"
author:
- name: Kyla-Louise Horne, PhD
  affiliation: '1'
  corresponding: yes
  address: 66 Stewart St, Christchurch 8011, New Zealand
  email: kyla.horne@nzbri.org
- name: Reza Shoorangiz, PhD
  affiliation: '1'
- name: Daniel J. Myall, PhD
  affiliation: '1'
- name: Toni L. Pitcher, PhD
  affiliation: '1,2'
- name: Tim J. Anderson, FRACP, MD
  affiliation: '1,2,3'
- name: John C. Dalrymple-Alford, PhD
  affiliation: '1,2,4'
- name: Michael R. MacAskill, PhD
  affiliation: '1,2'

This file has been truncated. show original

I once used the same sort of Excel-based pathway you describe, but would never go back to it - there are just far to many advantages to scripting the data processing, analysis, and visualisation.

JasperRobinson · April 4, 2022, 3:50pm

Thanks for the tips.

I just tried the RBDMerge add-in for Excel but I found that it ignored the headers and lined up by column—so when I tested with a blank Column B, I still ended up with misaligned columns, just as I would with Terminal.

So the database is for Pavlovia, not for (desktop) PsychoPy. I haven’t needed to use Pavlovia yet, but I will look our for this. Cheers!

wakecarter · April 7, 2022, 8:50pm

You might be interested in the code I am currently working on to summarise data files.

I started writing it in February 2020 and then got unexpectedly distracted by the need to run experiments online.

It still has a way to go, and I might add an option to save long data, but usually what I want is the proportion correct (mean key_resp.corr) and the median reaction time for correct responses only (median key_resp.rt) split by up to three repeated measures.

I’d be happy to have a few people test it out to see how well it works for their data.

Topic		Replies	Views
Reduce Unnecessary Columns in Data CSV Builder	2	1437	October 2, 2019
Generating own logfile Builder	7	2387	May 29, 2020
Messy spreadsheets for data analysis Other	10	931	May 12, 2021
Lots of unecessary columns in Excel data file Builder	2	786	October 28, 2020
Eliminating extra trial information that is output to csv file, and question about timestamp Builder	1	1007	June 17, 2018

Ways to get rid of columns from Pavlovia '.csv' results?

Related topics