Generating own logfile

Hey,
I am new to PsychoPy and I am just done creating my first experiment with the builder.
Now I have a question regarding the output file.
Right now I am getting a .log file, a .csv and a .psydat file. But I don’t how I can control or change what is in these files. For my purpose I only need certain information in my logfile. E.g. I don’t need the PsychoPy version in every .csv file.
So I either want to delete columns of the files or I want to generate my own output file, but I have no idea how to do it.
The format (.csv, .log,…) doesn’t really matter.

I would be very thankful if somebody has a code snippet or any tips for me.

Thank you so much!

Regarding the csv files: If you don’t need the information you could write a Python script using the pandas library to remove the columns, you don’t need after the experiment. You could also do the same thing in R, if you are an R user. What program do you use for data analysis?

For the logging file: You can change the logging level in the experiment settings to a higher level to save less logging information. However, keep in mind that you might be grateful for this excess information at a later point for some reason you cannot think of now. Since you use the csv file for your analysis, what harm is there in keeping the log file the way it is.

The psydat format serves a similar purpose as the log file if I’m correct - to help you figure out what code you ran at what point in the experiment. If you just want to reduce your data for the analysis, this is not the file you’re interested in.

Thanks a lot for your help Lukas!

We use R for data analysis, so the format of the file doesn’t really matter, but the output file should be easy to understand for the students.

You are right I don’t need to change anything in the log file when I am using the csv file for analyzing.
I tried using the pandas library to remove the columns but it didn’t really work. Maybe I changed it in the wrong place.
Is it right, to change it in the .py file or where to I have to change the code when I was working with the Builder view the whole time?

Here is a screenshot of a part of my csv file for making clear what exactly I try to achieve:

I only want the first columns and some that are at the end of my table. If you have another tip for me I would be very grateful.

But thank you so much already!!

Open the coder, paste this into a new file, and save the file to your “data” folder. Then, run it after you collected your data.
It won’t overwrite your data, so there is no danger of losing any data.

import os
import glob
from pathlib import Path
import pandas as pd 

# set the current working directory to the parent directory of this file
script_parent = Path(__file__).parent
os.chdir(script_parent)

# iterate over all csv files in this folder
for fn in glob.glob("*.csv"):
    print('Old file: ' + fn)
    df_old = pd.read_csv(fn)

    fn_new = os.path.splitext(fn)[0] + '_new' + '.csv' # append a suffix to the old file name to create a new filename
    print('New file: ' +  fn_new)
    df_new = df_old[['word1', 'word2']] # put here the column names that you want to keep
    df_new.to_csv(fn_new, index = False)

I would push back pretty strongly on the premise here, because one of the best lessons to teach your students (not just in analysing experimental data like this, but in any data science role they might end up in the future) is don’t edit the raw data.

The more positive converse of that is that all data manipulations should be done in code, so that they are documented and allow for reproducibility. This also allows them to be reversed or changed.

If you are using R, then the first steps in a data analysis pipeline might look like this:

dat = readr::read_csv('your_data_file.csv') %>%
      # select only variables of interest:
      dplyr::select(-psychopy_version, some_var:some_other_var) %>%
      # drop subject who was colour-blind:
      dplyr::filter(subject_id != '014')

Anyone reading this can see that only some variables are of interest, but can also see easily how and where to reinstate others if required. They can also see that a principled decision was made to drop one subject.

Students shouldn’t be mollycoddled to work only with simple datasets. They need to learn that the first step in any real data analysis is to tidy it as needed, and just as importantly, to document those steps in their code. e.g. rather than deleting the raw data of the invalid subject at source, we explicitly drop those records and note why.

If students learn that raw data files can be edited, what is to stop them dropping cases or observations at that step, or transforming variables? As the .csv files don’t document those changes, they have become invisible, and permanent. And this extends to variables that don’t immediately seem useful, like the PsychoPy version column. If it was indeed constant during your experiment, then it won’t play any role in your analysis, so it can simply be dropped at the analysis stage with a single function call. But it might be of use to someone else to retain it in the raw data. Let’s say you release your edited data files publicly. Someone else might find their results differ from yours. For example, PsychoPy Builder recently shifted from using the event module to record reaction times to the Keyboard class. The former would often have a 16.7 ms granularity, while the latter is sub-millisecond. Seeing that your data was recorded with a newer version of PsychoPy would help explain the discrepancy. And do you really know that the version was constant during the study? What if a helpful student or technician upgraded PsychoPy during the experiment, leading to inconsistencies in the data? This becomes impossible to check if it has been removed from the files to be analysed. We often see people saying that they aren’t interested in all of the various trial and block numbers that PsychoPy records, as they just want to calculate within-subject totals or means. That’s all fine, until a reviewer asks if there was a learning effect going on within blocks… if those columns have been dropped from the data files, it becomes impossible to answer the question.

And if students learn that columns can be dropped from raw data files, they might also think its OK to transform remaining columns, to save time in the future. If someone else then analyses those files, the transformation might be applied twice. And if cases are dropped, there is no way of knowing how many or why. All of these issues disappear if one applies a rigorous rule that all such data manipulation should happen within code, so that it is documented, and can be reversed if required.

Sorry, this was a bit long, but I’ve found time and time again that it is a useful principle to use and to teach. But also, its fundamentally easier to have a single select() function that drops any unneeded columns, rather than go through the hassle of processing your source csv files, and potentially having to re-do that process if your needs change.

Hi Michael, thank you so much for your detailed answer!!! I really appreciate it.
You are absolutely right about the point of not editing the raw data.
I am only doing this work for another person and she told me it is very important, to only have some parts of the data but I will tell her that it is better to keep it and that the students can drop columns in R.

Thanks a lot for taking the time and providing a explanation that detailed!!

1 Like

Thank you LukasPsy I will try that. But I am not quite sure, how do save the file to my “data” folder then.
Just add it at the end or how do I do it?

Sorry I am just getting started with PsychoPy. Thanks a lot for taking the time to help me!

I very much agree with @Michael, which is why my code doesn’t overwrite your raw data files. It just creates new files with the suffix ‘_new’ and saves them alongside your raw data. You can then give the files with the suffix ‘_new’ to the person who is only interested in certain columns.

When you run your experiment, it saves to a folder called “data” (if you didn’t change the default settings). In this folder, you create a new ‘.py’-file, in which you copy the code snippet, I posted above. When your data is collected, you open this file and run it. For every ‘.csv’, the script will create a new ‘.csv’ file with only the columns, which you specify in the line that says:

df_new = df_old[['word1', 'word2']] # put here the column names that you want to keep