Trying to extract data from dictionaries in Excel cells

I’m trying to decipher what appears to be a dictionary in a data file for a colleague.

I’ve managed to read the Excel file into a dictionary using:

import pandas as pd

wordList = pd.read_excel('DM_ALLJune2020short.xlsx', sheet_name='DM_ALLJune2020')

rows=len(wordList)

Most of the Excel file is simple, but one column has entries like this:

{"2019-01-01":{"drinks":[{"type":"Champagne","name":"Flute","size":"150ml","vol":"150","css":"champagne-flute-150","abv":"12","qty":"1"}]},"2019-01-02":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Schooner","size":"425ml","vol":"425","css":"beer-schooner-425","abv":"3.5","qty":"4"}]}}

and

{"2019-01-16":{"spend":"23"},"2019-01-15":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Schooner","size":"425ml","vol":"425","css":"beer-schooner-425","abv":"3.5","qty":"1"}]}}

In principle what I’d like to do is loop over the wordList variable and extract the date and other data so I can save it into a more usable format.

However, as soon as I try alcometer=wordList[trials.thisN][‘Alcoholbreakdown’] or alcometer=wordList[trials.thisN+1][‘Alcoholbreakdown’] (in case dictionaries start at index 1)

it gives an error

Traceback (most recent call last):
  File "C:\Program Files\PsychoPy3\lib\site-packages\pandas\core\indexes\base.py", line 3078, in get_loc
    return self._engine.get_loc(key)
  File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\p0071480\Documents\data\E2020\alcometer_lastrun.py", line 227, in <module>
    alcometer=wordList[trials.thisN+1]['Alcoholbreakdown']
  File "C:\Program Files\PsychoPy3\lib\site-packages\pandas\core\frame.py", line 2688, in __getitem__
    return self._getitem_column(key)
  File "C:\Program Files\PsychoPy3\lib\site-packages\pandas\core\frame.py", line 2695, in _getitem_column
    return self._get_item_cache(key)
  File "C:\Program Files\PsychoPy3\lib\site-packages\pandas\core\generic.py", line 2489, in _get_item_cache
    values = self._data.get(item)
  File "C:\Program Files\PsychoPy3\lib\site-packages\pandas\core\internals.py", line 4115, in get
    loc = self.items.get_loc(item)
  File "C:\Program Files\PsychoPy3\lib\site-packages\pandas\core\indexes\base.py", line 3080, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas\_libs\index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 1
##### Experiment ended. #####

I think the issue is that with a Pandas dataframe, you have to use loc (label based) or iloc (integer based) methods to index a row. E.g.

alcometer=wordList.iloc[trials.thisN][‘Alcoholbreakdown’] 

Hi David,

I’m marking .iloc as the solution because it got me miles further forward. I’ve managed to decipher around 33000 rows of 35000. The other 2000 have malformed dictionaries which I’m trying to fix. Here is my current code. Do you (or anyone else) have any suggestions? I did try checking (2*.count(’{’)+.count(’"))%4 as the expected number of quotes but it didn’t work.

thisLine=wordList.iloc[trials.thisN]
        
for Idx in thisLine.keys():
    if Idx != 'Alcoholbreakdown':
        thisExp.addData(Idx,wordList.iloc[trials.thisN][Idx])

Errors=0

if pd.isna(wordList.iloc[trials.thisN]['Alcoholbreakdown']):
    alcometer=[]
    thisExp.addData('Error','-1')
else:
    alcometerall=wordList.iloc[trials.thisN]['Alcoholbreakdown']
    if alcometerall[-1]==':':
        alcometerall+='""'
        Errors+=1
    elif alcometerall[-1]==',':
        alcometerall=alcometerall[:-1]
        Errors+=1
    while alcometerall.count('"')%2!=0:
        alcometerall+='"'
        Errors+=1
    while alcometerall.count('[')>alcometerall.count(']'):
        alcometerall+=']'
        Errors+=1
    while alcometerall.count('{')>alcometerall.count('}'):
        alcometerall+='}'
        Errors+=1
    if Errors > 0:
        Catch+=1
    thisExp.addData('Error',Errors)
    try:
        alcometer=ast.literal_eval(alcometerall)
        keynum=0
        #Is there any data at all?
        if isinstance(alcometer,dict):
        #The top level key is date.
            for Idx in alcometer.keys():
                thisExp.addData('Date_'+str(keynum),Idx)
                thisData=alcometer[Idx]
        #If second level is list
                if isinstance(thisData, list):
                    for Ilx in range(len(thisData)):
                        thisDataList=thisData[Ilx]
                        if isinstance(thisDataList, dict):
                            for Jdx in thisDataList.keys():
                                thisExp.addData(Jdx+'_'+str(keynum),thisDataList[Jdx])
                                thisDataDetails=thisDataList[Jdx]
        #If third level is list
                        elif isinstance(thisDataList, list):
                            for Jlx in range(len(thisDataList)):
                                thisDataListDetails=thisDataList[Jlx]
                                if isinstance(thisDataListDetails, dict):
                                    for Kdx in thisDataListDetails.keys():
                                        thisExp.addData(Jdx+'_'+str(keynum)+'_'+Kdx,thisDataListDetails[Kdx])
        #If second level is dict
                elif isinstance(thisData, dict):
                    for Jdx in thisData.keys():
                        thisExp.addData(Jdx+'_'+str(keynum),thisData[Jdx])
                        thisDataDetails=thisData[Jdx]
        #If third level is list
                        if isinstance(thisDataDetails, list):
                            for Jlx in range(len(thisDataDetails)):
                                thisDataDetailsList=thisDataDetails[Jlx]
                                if isinstance(thisDataDetailsList, dict):
                                    for Kdx in thisDataDetailsList.keys():
                                        thisExp.addData(Jdx+'_'+str(keynum)+'_'+Kdx,thisDataDetailsList[Kdx])
        #If third level is dict
                        elif isinstance(thisDataDetails, dict):
                            for Jdx in thisDataDetails.keys():
                                thisExp.addData(Jdx+'_'+str(keynum),thisDataDetails[Jdx])
                keynum=keynum+1
    except SyntaxError:
        Uncaught+=1
        alcometer=wordList.iloc[trials.thisN]['Alcoholbreakdown'].replace(',','|')
        thisExp.addData('Alcometer',alcometer)
    except ValueError:
        Uncaught+=1
        alcometer=wordList.iloc[trials.thisN]['Alcoholbreakdown'].replace(',','|')
        thisExp.addData('Alcometer',alcometer)

The code too 3 hours to run first time and then closer to 90 minutes in a form similar to above).

Here’s an example of an unfixed error

{"2020-01-20":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Bottle","size":"330ml","vol":"330","css":"beer-bottle-330","abv":"4","qty":"3"},{"type":"Beer\/Lager\/Ale","name":"Half pint","size":"284ml","vol":"284","css":"beer-half-pint-284","abv":"4","qty":"1"}]},"2020-01-21":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Half pint","size":"284ml","vol":"284","css":"beer-half-pint-284","abv":"4","qty":"3"}]},"2020-01-22":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Half pint","size":"284ml","vol":"284","css":"beer-half-pint-284","abv":"4","qty":"1"},{"type":"Spirits","name":"Single","size":"25ml","vol":"25","css":"spirits-single-25","abv":"37.5","qty":"1"}]},"2020-01-23":{"drinks":[{"type":"Wine","name":"Sherry glass","size":"100ml","vol":"100","css":"wine-glass-100","abv":"12","qty":"2"}]},"2020-01-24":{"drinks":[{"type":"Spirits","name":"Single","size":"25ml","vol":"25","css":"spirits-single-25","abv":"37.5","qty":"2"}]},"2020-01-25":{"drinks":[{"type":"Spirits","name":"Single","size":"25ml","vol":"25","css"

Can I use alcometerall+=’}’ etc.?

Best wishes,

Wakefield

If the dictionary is a string, then yes thats good, however adding curly braces in the example above will not work, you will have to complete the key, value pairing first and then close the dict e.g.,

 "css": None}]}}

You will have to keep track of the depth somehow, making sure the square and curly brackets are added in the correct order.

Thanks. I haven’t done a full run yet, but the following code looks like it fixed the first 10 errors.

Errors=0

if pd.isna(wordList.iloc[trials.thisN]['Alcoholbreakdown']):
    alcometer=[]
    thisExp.addData('Error','-1')
else:
    alcometerall=wordList.iloc[trials.thisN]['Alcoholbreakdown']
    if alcometerall[-1]==':':
        alcometerall+=' None}'
        Errors+=1
    elif alcometerall[-1]==',':
        alcometerall=alcometerall[:-1]
        alcometerall+='}]'
        Errors+=1
    while alcometerall.count('"')%2!=0:
        alcometerall+='"'
        Errors+=1
    if alcometerall[-1]=='"':
        if alcometerall.rfind(',')>alcometerall.rfind(':'):
            alcometerall+=': None}'
            Errors+=1
        else:
            alcometerall+='}'
            Errors+=1
    while alcometerall.count('[')>alcometerall.count(']'):
        alcometerall+=']'
        Errors+=1
    while alcometerall.count('{')>alcometerall.count('}'):
        alcometerall+='}'
        Errors+=1
    if Errors > 0:
        Catch+=1
    thisExp.addData('Error',Errors)
    try:
        alcometer=ast.literal_eval(alcometerall)

My new code reduced 2000 errors to 60. Thanks for your help

A couple of examples of unfixed error are:

{"2020-06-07":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Large can","size":"500ml","vol":"500","css":"beer-can-500","abv":"8","qty":"2"},{"type":"Spirits","name":"Large single","size":"35ml","vol":"35","css":"spirits-large-single-35","abv":"37.5","qty":"5"}]},"2020-06-05":{"drinks":[{"type":"Spirits","name":"Large single","size":"35ml","vol":"35","css":"spirits-large-single-35","abv":"37.5","qty":"2","removed":"1"}]},"2020-06-04":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Large can","size":"500ml","vol":"500","css":"beer-can-500","abv":"8","qty":"1","removed":"1"},{"type":"Spirits","name":"Large single","size":"35ml","vol":"35","css":"spirits-large-single-35","abv":"37.5","qty":"3","removed":"2"}]},"2020-06-02":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Large can","size":"500ml","vol":"500","css":"beer-can-500","abv":"8","qty":"1","removed":"1"}]},"2020-06-03":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Large can","size":"500ml","vol":"500","css":"beer-can-500","abv":"8","qty":"2","removed":"1"}]},

which my code tried to fix as:

{"2020-06-07":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Large can","size":"500ml","vol":"500","css":"beer-can-500","abv":"8","qty":"2"},{"type":"Spirits","name":"Large single","size":"35ml","vol":"35","css":"spirits-large-single-35","abv":"37.5","qty":"5"}]},"2020-06-05":{"drinks":[{"type":"Spirits","name":"Large single","size":"35ml","vol":"35","css":"spirits-large-single-35","abv":"37.5","qty":"2","removed":"1"}]},"2020-06-04":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Large can","size":"500ml","vol":"500","css":"beer-can-500","abv":"8","qty":"1","removed":"1"},{"type":"Spirits","name":"Large single","size":"35ml","vol":"35","css":"spirits-large-single-35","abv":"37.5","qty":"3","removed":"2"}]},"2020-06-02":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Large can","size":"500ml","vol":"500","css":"beer-can-500","abv":"8","qty":"1","removed":"1"}]},"2020-06-03":{"drinks":[{"type":"Beer\/Lager\/Ale","name":"Large can","size":"500ml","vol":"500","css":"beer-can-500","abv":"8","qty":"2","removed":"1"}]}}]

and

{"2020-02-23":{"drinks":[]},"2020-02-22":{"drinks":[]},"2020-02-21":{"drinks":[{"type":"Wine","name":"Glass","size":"250ml","vol":"250","css":"wine-glass-250","abv":"12","qty":"4","removed":"3"},{"type":"Wine","name":"Glass","size":"175ml","vol":"175","css":"wine-glass-175","abv":"10","qty":"1"},{"type":"Spirits","name":"Single","size":"25ml","vol":"25","css":"spirits-single-25","abv":"25","qty":"1"}]},"2020-02-20":{"drinks":[]},"2020-02-19":{"drinks":[{"type":"Spirits","name":"Double","size":"50ml","vol":"50","css":"spirits-double-50","abv":"40","qty":"1"},{"type":"Spirits","name":"Double","size":"50ml","vol":"50","css":"spirits-double-50","abv":"40","qty":"1"},{"type":"Spirits","name":"Single","size":"30ml","vol":"30","css":"spirits-large-single-35","abv":"40","qty":"1"}]},"2020-02-18":{"drinks":[{"type":"Spirits","name":"Double","size":"50ml","vol":"50","css":"spirits-double-50","abv":"40","qty":"1"},{"type":"Spirits","name":"Double","size":"50ml","vol":"50","css":"spirits-double-50","abv":"40","qty":"1"},

which it tried to fix as

{"2020-02-23":{"drinks":[]},"2020-02-22":{"drinks":[]},"2020-02-21":{"drinks":[{"type":"Wine","name":"Glass","size":"250ml","vol":"250","css":"wine-glass-250","abv":"12","qty":"4","removed":"3"},{"type":"Wine","name":"Glass","size":"175ml","vol":"175","css":"wine-glass-175","abv":"10","qty":"1"},{"type":"Spirits","name":"Single","size":"25ml","vol":"25","css":"spirits-single-25","abv":"25","qty":"1"}]},"2020-02-20":{"drinks":[]},"2020-02-19":{"drinks":[{"type":"Spirits","name":"Double","size":"50ml","vol":"50","css":"spirits-double-50","abv":"40","qty":"1"},{"type":"Spirits","name":"Double","size":"50ml","vol":"50","css":"spirits-double-50","abv":"40","qty":"1"},{"type":"Spirits","name":"Single","size":"30ml","vol":"30","css":"spirits-large-single-35","abv":"40","qty":"1"}]},"2020-02-18":{"drinks":[{"type":"Spirits","name":"Double","size":"50ml","vol":"50","css":"spirits-double-50","abv":"40","qty":"1"},{"type":"Spirits","name":"Double","size":"50ml","vol":"50","css":"spirits-double-50","abv":"40","qty":"1"}}]}