Iohub hdf5 file being overwritten- UnicodeDecodeError

Hello,

I am running into an issue with the iohub module that causes an existing .hdf5 file to be overwritten every time I start the experiment script. I’m guessing the error has to to do with how iohub searches for an existing .hdf5 file and tries to open it to append new data. The problem is probably coming when it tries to open the file and it can’t due to some decode error, so it defaults to creating a new file instead and this is causing the old data to be lost.

The error message is a little bit cryptic as it does not provide a specific stack traceback error message and the code runs fine. As a result, I’m not sure what in the code is causing this error. I’m running on the standalone psychopy package (v2022.2.4) on Windows OS.

Here is the message:

pygame 2.1.0 (SDL 2.0.16, Python 3.8.10)
Hello from the pygame community. https://www.pygame.org/contribute.html
1.5123     WARNING     Monitor specification not found. Creating a temporary one...
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 26: invalid start byte
displayAPI: init_net() start use_tcp_only=0
displayAPI: Connecting
CreateFile failed with 123
file size = 638614
total bytes = 638614
displayAPI: Disconnect
ioHub Server Process Completed With Code:  0
################ Experiment ended with exit code 0 [pid:13548] #################

Does anyone know what might be causing this issue, “UnicodeDecodeError: ‘utf-8’…”

That kind of error usually means that it’s trying to read a file that starts with a non-unicode symbol. This happens sometimes if, for example, you modify a CSV file in Excel. It adds some weird invisible symbols to the start of the file that the UTF-8 codec can’t handle, and you get this kind of thing.

I don’t work with hdf5 files usually so I don’t know how you can get that kind of thing in one of those files but it’s probably along the same lines. Has the file been opened in another program before PsychoPy was run?

Also, is the file actually being overwritten or is it just not changing the file at all? If this is happening with files that PsychoPy itself is creating then I really don’t know what’s causing it.

Hi Jonathan,

Thanks for the feedback.

One of my experiment generation scripts creates a .csv file which contains all the condition variables for a given block of trials. I feed the filepath for that .csv file into the Psychopy Trialhandler object, I have copied a snippet of my code from the script below:

#Create psychopy trialhandler for iterating through miniblock of trials
exp_conditions = importConditions(trial_handler_fp)
trials = TrialHandler(exp_conditions, 1, extraInfo=subject_info)

#ToDO understand launch Iohub condition table
now = datetime.now()
data_dir_iohub = os.path.join(data_directory, "iohub files", str(now).replace(":","-"))
io_hub = lanchHubServer(
                          experiment_code = "ENTER STRING HERE"
                          ...
                          datastore_name = os.path.join(data_dir_iohub, "search exp"),
                          window = win,
                          **device_config)
io_hub.createTrialHandlerRecordTable(trials)

I suspect the issue arises with the last line of code above which initializes the trial handler record table for the .hdf5 file.

To your point, I have definitely opened the .csv files in excel to ensure that everything looks good, but do not modify anything. So maybe just opening the file in Excel causes the software to add the invisible symbols to the start of the file.

The file is being overwritten. I can tell because when I run through a couple of trials and close out of the experiment and then check the file size (e.g., 40kb) and then restart the experiment and run through a single trial and check the file size, it is smaller (e…g, 20kb). Also, when I look at the metadata in the hdf5 file, I can tell that only the most recent trial is saved in the hdf5 file.

My workaround at the moment has been to create a file directory that is time stamped to the time the script was run and then store the hdf5 file in that directory. As a result, each time the script is run, a new data store directory folder is created to prevent overwriting of previous files. This solution works Ok, but is not optimal because now when I analyze that data I have to iterate through all the file directories via glob.glob and then run a post processing script on each hdf5 file separately. It gets the job done, but is not very pythonic.

I see. So if the goal was to have one hdf5 file for ALL of your participants, it seems like iohub doesn’t want to do that in the first place (from the documentation I can find).

However it should at least be possible to have it save separate files in the same folder if you attach the timestamp to the filename rather than the folder path, right? Something like this:

data_filename_iohub = "search_exp_" + str(now).replace(":","-"))
io_hub = lanchHubServer(
                          experiment_code = "ENTER STRING HERE"
                          ...
                          datastore_name = os.path.join(data_directory, "iohub files", data_filename_iohub),
                          window = win,
                          **device_config)
io_hub.createTrialHandlerRecordTable(trials)

Unless I’m misunderstanding how datastore_name works, that should put all of the data in a single folder, but have separate files for each run. Not a perfect solution, but probably not as annoying as what you’re encountering now.