URGENT HELP! Various dataAccessUtil functions cannot extract data from .hdf5 file

I’ve decided to port my codes over from Python 2.7 to 3.5 and I’ve encountered the following issue when trying to extract eyetracking and behavioural data off from .hdf5 file.

Just to highlight, all my codes previously worked fine on python 2.7, as well as when run on the standalone 1.85.4 version.

Now, my data extraction script runs without returning any form of error message. But the following codes returned an empty list for both the behavioural exp_data and blink_event_data. Note that at least the dataAccessUtil variable seems to be working, as the output leads to

<psychopy.iohub.datastore.util.ExperimentDataAccessUtility at 0x25c463b44c0>

The test variable is a literal test to see what else is working. Here, it printed out all variables within the header correctly, based on the .xls file that was initially used to run the trial_handler during the actual experiment.

dataAccessUtil=ExperimentDataAccessUtility('C:\\Users\\angjw\\NUS PVT\\hdf5', filenameHDF[0], experimentCode=None,sessionCodes=[])   

test = dataAccessUtil.getConditionVariableNames()

exp_data = dataAccessUtil.getConditionVariables()

blink_type = constants.EventConstants.BLINK_START
blink_event_data=dataAccessUtil.getEventAttributeValues(blink_type, retrieve_attributes2,
                conditionVariablesFilter=None, startConditions={'time':('>=','@TRIAL_START@')},

Will anyone be able to help me out right here? @sol Maybe?

1 Like

Hi @angjw.aaron,

There are issues in some of the ExperimentDataAccessUtility methods in the current release of psychopy caused by python 2 to 3 differences that were not fixed properly until now. I’ve been working on some updates to psychopy.iohub.datastore.util in this branch, including what should be a fix to the issue you are running into. This branch is not currently usable for production work though, so I do not suggest using it directly.

If you can wait until this fix is officially released (in early 2022), that would be easiest, but I understand if not ideal.

If you are running psychopy in your own python environment let me know and we should be able to apply the fixes I made to psychopy.iohub.datastore.util in your copy of the current release branch source so you can use it until the official release.

If possible, could you PM me an example hdf5 file and your full analysis script so I can make sure the fixes I’ve already made are enough.

Thank you

Hi @sol ,

Thank you so much! I’m currently using the standalone v2021.2.3 on windows. I do my coding on a fresh conda environment on spyder, which imports psychopy modules using the following placed at the start of the script to allow me access to the psychopy modules that I need without messing with any paths:

sys.path.append("C:\\Program Files\\PsychoPy\\Lib\\site-packages")

So, if I install a dev copy (if there is one), or I just simply paste the newer (in-work) iohub folder over the existing one (assuming there are no dependency changes), if anything goes wrong, I can just simply reinstall the current stable version to reset any detrimental errors that may occur.

Attached is a minimum working example of a new analysis script that is currently at work. For now, all I need is just for the Exp Data and Blink Event Data to work. The codes after dataAccessUtil.close() are not that important.
PVT analysis_simplified.py (4.0 KB)

The example data file can be downloaded from this link, which is available for 30 days from this post.

Looks like I have some more work to do. Here is what the branch I mentioned above outputs, which still seems to be missing the conditions table rows as well as the blink start events that are clearly in the hdf5 file.

Behavioural Variable Names: ['EXPERIMENT_ID', 'SESSION_ID', 'TRIAL_END', 'TRIAL_START', 'RESP_TIME', 'TARG_TIME', 'ITI', 'trial_id', 'session_id']
Behavioural Variable Table: /data_collection/condition_variables/EXP_CV_1 (Table(56,)) 'Condition Variable Values for Experiment ID 1'
Exp Data: []
Blink Type Constant: 57
Blink Event Data: []

I’ll look at this today / tomorrow and get back to you. Thanks for posting the issue, example script and hdf5 file.

@angjw.aaron, what version of Python did you use to create the example data file you sent? It seems like the issue with the current code might be with reading that file, when I test with a file created using recent versions, even your analysis script runs as expected. So I’m wondering if it was created using Python 2.7 and and maybe can not be read by recent / Python 3 versions of pytables or something.

Thank you

Hi again @sol ,

Yes, indeed, the data file was obtained from v1.85.4 as the PC that is running the study is a little old and had issues running the newer psychopy versions, thus I had to stick to py2.7 for the duration of this current study.

I suspect there is something to do with unicode issue as when I tried to extract data using the h5py module, some of the string data came with a b' in front (ie. b’ 90.3566). Data with the b' prefix were variables that were manually appended during each trial using the self.hub.addRowToConditionVariableTable(trial.values()) method, while the rest were values added by the eyetracker itself.

Is there going to be an issue with backward compatibility here?

Seems like there is right now, but I should be able to figure it out / work around it. I did have to add logic to handle some str / bytes issues because of python 2 to python 3 changes and how pytables handles them. Maybe that fixed the issue for pytables hdf5 files created using python 3 but not python 2…

The current code is still able to read the conditions table column labels, but is failing when getting rows for some reason, so I’m hoping it won’t be a big issue to fix once I have time to dig into it a bit.

thanks again

1 Like

The issue is that the condition variables column called ‘session_id’ was being saved as a string in the script you were using, instead of an int. This was probably from a bug in one of the original demo’s or something. There is also a SESSION_ID column in the table that was, and still is, being saved as an int. Updating the ExperimentDataAccessUtility to use SESSION_ID instead of session_id fixes the issue in your file and still works with current files.

So with the fix in place, your analysis script now outputs (truncated):

Behavioural Variable Names: ['EXPERIMENT_ID', 'SESSION_ID', 'TRIAL_END', 'TRIAL_START', 'RESP_TIME', 'TARG_TIME', 'ITI', 'trial_id', 'session_id']
Behavioural Variable Table: /data_collection/condition_variables/EXP_CV_1 (Table(56,)) 'Condition Variable Values for Experiment ID 1'
Exp Data: [ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=85.35438040015288, TRIAL_START=74.77161050005816, RESP_TIME=b'75.3549271', TARG_TIME=b'76.5554078', ITI=b'0', trial_id=b'1', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=95.3867683999706, TRIAL_START=85.37107470002957, RESP_TIME=b'93.5035456', TARG_TIME=b'92.8703266', ITI=b'0', trial_id=b'2', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=105.41895349998958, TRIAL_START=95.4033731999807, RESP_TIME=b'98.3197487001', TARG_TIME=b'97.7031273001', ITI=b'0', trial_id=b'3', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=115.45128779998049, TRIAL_START=105.43570050015114, RESP_TIME=b'111.585031', TARG_TIME=b'111.1184816', ITI=b'0', trial_id=b'4', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=125.48298420011997, TRIAL_START=115.46793909999542, RESP_TIME=b'120.417547', TARG_TIME=b'119.8341947', ITI=b'0', trial_id=b'5', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=135.5159052000381, TRIAL_START=125.49971579993144, RESP_TIME=b'131.582425', TARG_TIME=b'130.8824773', ITI=b'0', trial_id=b'6', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=145.54821670008823, TRIAL_START=135.53254869999364, RESP_TIME=b'140.5154831', TARG_TIME=b'140.0487702', ITI=b'0', trial_id=b'7', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=155.58059570007026, TRIAL_START=145.5648811000865, RESP_TIME=b'153.4474091', TARG_TIME=b'152.9808158', ITI=b'0', trial_id=b'8', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=165.61281049996614, TRIAL_START=155.59723680000752, RESP_TIME=b'162.9297465', TARG_TIME=b'162.2797768', ITI=b'0', trial_id=b'9', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=175.64453710010275, TRIAL_START=165.62945979996584, RESP_TIME=b'171.3790141', TARG_TIME=b'170.8123153', ITI=b'0', trial_id=b'10', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=185.67740079993382, TRIAL_START=175.6611904001329, RESP_TIME=b'181.4439146', TARG_TIME=b'180.9106791', ITI=b'0', trial_id=b'11', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=195.709766499931, TRIAL_START=185.69408629997633, RESP_TIME=b'188.9270987', TARG_TIME=b'188.2271179', ITI=b'0', trial_id=b'12', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=205.74369539995678, TRIAL_START=195.72641130001284, RESP_TIME=b'198.9760108', TARG_TIME=b'198.4927401', ITI=b'0', trial_id=b'13', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=215.77610230003484, TRIAL_START=205.76035460014828, RESP_TIME=b'213.4429331', TARG_TIME=b'212.9263373', ITI=b'0', trial_id=b'14', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=225.8060513001401, TRIAL_START=215.79271790012717, RESP_TIME=b'221.175453', TARG_TIME=b'220.642172', ITI=b'0', trial_id=b'15', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=235.8389131000731, TRIAL_START=225.82270120014437, RESP_TIME=b'230.9554912', TARG_TIME=b'230.4388423', ITI=b'0', trial_id=b'16', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=245.87292570015416, TRIAL_START=235.8555948000867, RESP_TIME=b'238.755299', TARG_TIME=b'238.05537', ITI=b'0', trial_id=b'17', session_id=b'1'), ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=255.9052190000657, TRIAL_START=245.88958590012044, RESP_TIME=b'253.3555604', TARG_TIME=b'252.8388596', ITI=b'0', trial_id=b'18', session_id=b'1'), 
Blink Type Constant: 57
Blink Event Data: [EventAttributeResults(eye=array([22, 22, 22, 22, 22], dtype=uint8), status=array([0, 0, 0, 0, 0], dtype=uint8), logged_time=array([76.49246, 78.35361, 80.59253, 82.5226 , 84.4107 ], dtype=float32), query_string='( experiment_id == 1 ) & ( session_id == 1 ) & ( type == 57 ) & ( ( time >= 74.77161050005816 ) )  & ( ( time <= 85.35438040015288 ) ) ', condition_set=ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=85.35438040015288, TRIAL_START=74.77161050005816, RESP_TIME=b'75.3549271', TARG_TIME=b'76.5554078', ITI=b'0', trial_id=b'1', session_id=b'1')), EventAttributeResults(eye=array([22, 22, 22, 22, 22], dtype=uint8), status=array([0, 0, 0, 0, 0], dtype=uint8), logged_time=array([86.8635 , 89.10044, 91.30144, 93.18738, 95.06159], dtype=float32), query_string='( experiment_id == 1 ) & ( session_id == 1 ) & ( type == 57 ) & ( ( time >= 85.37107470002957 ) )  & ( ( time <= 95.3867683999706 ) ) ', condition_set=ConditionSetInstance(EXPERIMENT_ID=1, SESSION_ID=1, TRIAL_END=95.3867683999706, TRIAL_START=85.37107470002957, RESP_TIME=b'93.5035456', TARG_TIME=b'92.8703266', ITI=b'0', trial_id=b'2', session_id=b'1')), 

Thanks again for reporting the issue.

Hi @sol ,

Thanks! Can I check with you what went wrong here? I replaced every single session_id with the caps SESSION_ID in the util.py script but I’m getting this:

getConditionVariables was accessing the MetaData rather than the behavioural data. Under MetaData, there isn’t a SESSION_ID variable (refer to the SessionMetaDataInstance output on the right).

As for the ExperimentDataAccessUtility function itself, there wasn’t any session_id variable within the function itself for me to change.

Update: Okay, I’ve managed to fix it. So only three particular session_id needed to be updated to SESSION_ID:

filter = dict(SESSION_ID=(' in ', session_ids))

and two cv.session_id in the later section of the script that needs to be changed to cv.SESSION_ID

The script has now managed to read the data! The only issue now is that some of the read data are in bytes:

TypeError: unsupported operand type(s) for -: 'bytes' and 'bytes'

I guess I can circumvent this on my side for now by manually recoding the variables. I noticed that your output above consist of a few variables still in the ‘byte’ format (ie. with b’ in front). Will that also be addressed in subsequent patches?