Duplicate timestamps in .hdf5 file output from Pupil Core

Dear All,

I’m using a Pupil Labs Pupil Core eyetracker, and having solved one of the problems with the data not being recorded (due to a threshold being applied incorrectly), I now find that the .hdf5 file that is output has lots of duplicate timestamps for consecutive samples stored. I.e. there might be 4 different measurements with the same timestamp. I used Pupil Labs Service 3.5.1 to configure the “glasses” to record at 60Hz, so the time increment between samples looks plausible (see below for example).

timestamp pupil_size pupil_x pupil_y
90.0416797 24.65824 0.46113947 0.6341769
90.0416797 24.659914 0.46109676 0.6342506
90.0416797 26.743896 0.27378133 0.6250556
90.0416797 26.742664 0.2737806 0.62503195
90.0566602 24.782402 0.46105444 0.6332933
90.0566602 24.783678 0.46104982 0.6333272
90.0566602 27.196869 0.27445492 0.6252303
90.0566602 27.198088 0.27445367 0.6252611
90.0769303 24.869741 0.4606103 0.63383394
90.0769303 27.065561 0.2749664 0.62616295
90.0769303 24.869802 0.46061468 0.63383234
90.0769303 27.066133 0.27496833 0.6261733
90.0990981 27.110886 0.27536234 0.6255919
90.0990981 24.889042 0.4603511 0.6338006
90.0990981 27.112803 0.27537686 0.62561965

Here’s the code I’ve used to convert the .hdf5 to .csv (with inspiration from Becca - thank you!):

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
Created on Sun Aug 11 16:22:57 2024
@author: JCWB
"""

import h5py
import pandas as pd
import os
from glob import glob

infolder='/Users/gnx20mmu/PROJECTS/PUPILLOMETRY/Data/'

for hdf5_file in glob(infolder + '*.hdf5'):
    participant_ID = os.path.basename(hdf5_file).split('_')[0]
    print(participant_ID)

    with h5py.File(hdf5_file, "r") as f:
    
        # get the list of eyetracker measures available in the hdf5
        eyetracker_measures = list(f['data_collection']['events']['eyetracker'])
    
        for measure in ['MonocularEyeSampleEvent']:
            print('Extracting events of type: ', measure)
            data_collection = list(f['data_collection']['events']['eyetracker'][measure])
            if len(data_collection)>0:
                column_headers = data_collection[0].dtype.descr
                cols = []
                data_dict = {}
                for ch in column_headers:
                    cols.append(ch[0])
                    data_dict[ch[0]] = []
    
                for row in data_collection:
                    for i, col in enumerate(cols):
                        data_dict[col].append(row[i])
                pd_data = pd.DataFrame.from_dict(data_dict)
                pd_data.to_csv(infolder+participant_ID+'_'+measure+'.csv', index = False)
            else:
                print('No data for type', measure, ' moving on')
        
        # get the list of eyetracker measures available in the hdf5
        eyetracker_measures = list(f['data_collection']['events']['experiment'])
    
        for measure in ['MessageEvent']:
            print('Extracting events of type: ', measure)
            data_collection = list(f['data_collection']['events']['experiment'][measure])
            if len(data_collection)>0:
                column_headers = data_collection[0].dtype.descr
                cols = []
                data_dict = {}
                for ch in column_headers:
                    cols.append(ch[0])
                    data_dict[ch[0]] = []
    
                for row in data_collection:
                    for i, col in enumerate(cols):
                        data_dict[col].append(row[i])
                pd_data = pd.DataFrame.from_dict(data_dict)
                pd_data.to_csv(infolder+participant_ID+'_'+measure+'.csv', index = False)
            else:
                print('No data for type', measure, ' moving on')

Any ideas?

Cheers, Jon

I have a hypothesis as to what’s going on, but I’m not sure why or how to fix it.

  1. It’s giving you separate sample for left and right eye but not reporting which one is which
  2. While you told it to run at 60Hz it’s actually running at 120.

This is based on the fact that the four samples for each timestamp clearly come in two pairs that are very similar to each other, but not necessarily similar to the other pair. For example, the first set of four has two lines with a pupil size of 24.65xxx, while the other two have a pupil size of 26.74xx, and the x and y coordinates are also similar within each pair in the same way. That’s the kind of difference you often see if you get samples from each eye independently.

If you go to the next set of four, you see that the first two lines are similar to the first two lines of the first set of four, while the next two are similar to the bottom two lines of the first set of four. This would also fit with updating left and right eye separately.

So possibly switch from “MonocularEyeSampleEvent” to “BinocularEyeSampleEvent” and see what happens would be my step 1.

However that would only explain getting two samples per timestamp. The fact that you get four makes me think that it’s actually somehow recording at 120Hz while pretending to record at 60, and no, I don’t know why or how. The closeness within each pair makes it feel like they really are two samples of the same eye in close temporal proximity, but other than that, I have no idea.

Hope that at least gives you something to start looking into, but I’m not familiar with pupil core and rarely use hdf5s so take it with a grain of salt.

Really good call, there is another field in the hdf5 that reports which eye the sample refers to (coded 22 for right, 21 for left). I’ll try to share an example which should simplify things…

Either way it’s probably not outputting as expected as if you select pupullometry only should really only get one eye, no gaze data etc. There might be an “easy” fix as only certain rows have an associated pupil size in mm, so could potentially filter on that to restrict the samples.

Thanks for the suggestions.

Cheers,

Jon