convert list of lists to dictionary - python

how can I create a list of dictionaries with those lists
temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,000', '2,000', '2,000']]
the headers of the dictionary is the first element of the lists
the expected Output is:
output= [{'header1': '4', 'media_range': '1,200', 'speed': '300'}, {'header1': '8', 'media_range': '2,400', 'speed': '600'}, ...]
Ideally the code should handle any amount of lists (in this case 3)

IIUC
>>> temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4
...: ,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,0
...: 00', '2,000', '2,000']]
>>>
>>> keys = [l[0] for l in temp]
>>> values = [l[1:] for l in temp]
>>> dicts = [dict(zip(keys, sub)) for sub in zip(*values)]
>>>
>>> dicts
[{'header1': '4', 'media_range': '1,200', 'speed': '300'},
{'header1': '8', 'media_range': '2,400', 'speed': '600'},
{'header1': '16', 'media_range': '4,800', 'speed': '1,200'},
{'header1': '32', 'media_range': '4,800', 'speed': '2,000'},
{'header1': '64', 'media_range': '6,200', 'speed': '2,000'},
{'header1': '128', 'media_range': '38,400', 'speed': '2,000'},
{'header1': '256', 'media_range': '76,800', 'speed': '2,000'},
{'header1': '512', 'media_range': '153,600', 'speed': '2,000'},
{'header1': '243,6', 'media_range': '160,000', 'speed': '2,000'}]

Slightly shorter solution with zip and unpacking:
temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,000', '2,000', '2,000']]
header, *data = zip(*temp)
result = [dict(zip(header, i)) for i in data]
Output:
[{'header1': '4', 'media_range': '1,200', 'speed': '300'}, {'header1': '8', 'media_range': '2,400', 'speed': '600'}, {'header1': '16', 'media_range': '4,800', 'speed': '1,200'}, {'header1': '32', 'media_range': '4,800', 'speed': '2,000'}, {'header1': '64', 'media_range': '6,200', 'speed': '2,000'}, {'header1': '128', 'media_range': '38,400', 'speed': '2,000'}, {'header1': '256', 'media_range': '76,800', 'speed': '2,000'}, {'header1': '512', 'media_range': '153,600', 'speed': '2,000'}, {'header1': '243,6', 'media_range': '160,000', 'speed': '2,000'}]

You could use zip(). This requires you to know how many lists but does the expected output.
for header1,media_range,speed in zip(temp[0], temp[1], temp[2]):
if header1 != "header1":
output.append({temp[0][0]: header1, temp[1][0]: media_range, temp[2][0]: speed})

Related

how to compare each cell of dataframe with list of dictionary in python?

I am trying to compare column values of each rows of dataframe with predefined list of dictionary, and do filtering. I tried pandas to compare column value by row-wise with list of dictionary, but it is not quite working, I got type error. I think I may need to convert dataframe into dictionary then compare it with list of dictionary then convert back to dataframe with new column added, but this still not giving my desired output. Does anyone suggest possible workaround on this? How can we do this easily in python
working minimal example
import pandas as pd
indf=pd.DataFrame.from_dict(indf_dict)
indf_lst=indf.to_dict(orient='records')
matches=[]
for each in rules_list:
for row in indf_lst:
if row in each:
matches.append(row)
I tried pandas approach to check column values of every rows in rules_list but the attempt is not successful. Now I tried to convert indf dataframe to dictionary and compare two dictionary, but I have type error as follow:
TypeError Traceback (most recent call last)
Input In [11], in <cell line: 12>()
12 for each in rules_list:
13 for row in indf_lst:
---> 14 if row in each:
15 matches.append(row)
TypeError: unhashable type: 'dict'
objective
I need to compare columns of every rows with list of dictionary rules_list, and add new column which shows found match or not. How this can be done in python?
updated desired output
here is my desired output where I want to add two new columns when columns values hit match with list of dictionary rules_list that I defined.
output={'code0':{0:('5'),1:'nan',2:('98'),3:('98'),4:'nan',5:('15'),6:('40'),7:('52'),8:('52'),9:('40'),10:('52'),11:('52'),12:('58')},'code1':{0:('Agr','Serv'),1:('VA','HC','NIH','SAP','AUS','HOL','ATT','COL','UCL'),2:('ATT','NC'),3:('ATT','VA','NC'),4:('VA','HC','NIH','ATT','COL','UCL'),5:('Agr'),6:'nan',7:('NC'),8:('NC'),9:('VA'),10:('NC'),11:('NC'),12:('CE')},'code2':{0:'nan',1:'nan',2:('103','104','105','106','31'),3:('104','105'),4:'nan',5:('5'),6:'nan',7:('109'),8:('109'),9:('11'),10:('109'),11:('109'),12:('109')},'code3':{0:('90'),1:'nan',2:('810'),3:('810'),4:'nan',5:('58'),6:('518'),7:('610','620','682','642','621','611'),8:('620','682','642','611'),9:('113','174','131','115'),10:('612','790','110'),11:('612','110'),12:('423','114')},'code4':{0:('1'),1:'nan',2:('computerscience'),3:('computerscience'),4:'nan',5:('fishing'),6:'nan',7:('biology'),8:('biology'),9:'nan',10:('biology'),11:('biology'),12:'nan'},'code5':{0:'nan',1:'nan',2:'nan',3:'nan',4:'nan',5:'nan',6:'nan',7:'nan',8:'nan',9:('11','19','31'),10:('12','16','18','19'),11:('12','18','19'),12:('31')},'code6':{0:'nan',1:'nan',2:'nan',3:'nan',4:'nan',5:'nan',6:('594'),7:('712','479','297','639','452','172'),8:('712','479','297'),9:('164','157','388','158'),10:('285','295','236','239','269','284','237'),11:('285','295','237'),12:('372','238')},'isHit':{0:False,1:True,2:True,3:True,4:True,5:False,6:True,7:True,8:True,9:True,10:True,11:True,12:True},'rules_desc':{0:'None',1:'rules1',2:'rules2',3:'rules2',4:'rules1',5:'None',6:'rules12',7:'rules21',8:'rules21',9:'rules4',10:'rules3',11:'rules3',12:'rules5'}}
outdf=pd.DataFrame.from_dict(output)
how can I achieve this sort of mapping value from each cell of dataframe to list of dictionary? should I handle this in pandas or convert them into list then compare it? any possible thoughts? Anything close to above desired output should be fine.
The code below should do what you are asking for, but I haven't tested it yet if it actually really does what it should. I have put some effort in appropriate naming of the variables to make it easier to understand what the code does and how it works.
In the first step the code transforms the list with dictionaries for the rules into a list of tuples with code and code value for each of the rules with the purpose of making the final loop for checking if there is a hit easier to put together, understand, maintain and debug.
In the second step the code transforms the dictionary with data using pandas like it is done in code mentioned in the question.
Probably there is also a pandas way of transforming the list of dictionaries in the first step, so if you read this and know how to accomplish this using pandas I would be glad to hear about that.
Maybe there is a way to accomplish the entire task using pandas and two or three lines of code ... now with the variable naming and the provided code of the loops it would be easier for you who is reading this to come up with the code and provide maybe another and better answer.
from pprint import pprint
import pandas as pd
from collections import defaultdict
# ----------------------------------------------------------------------
rules_list=rules_dict=[{'code1':('VA','HC','NIH','SAP','AUS','HOL','ATT','COL','UCL'),'rules_desc':'rules1'},{'code0':('40'),'code3':('518'),'code6':('594'),'rules_desc':'rules12'},{'code0':('98'),'code1':('ATT','NC'),'code2':('103','104','105','106','31'),'code3':('810'),'code4':('computerscience'),'rules_desc':'rules2'},{'code0':('98'),'code1':('ATT','VA','NC'),'code2':('104','105','106','31'),'code4':('computerscience'),'rules_desc':'rules2'},{'code0':('52'),'code1':('NC'),'code2':('109'),'code3':('610','620','682','642','621','611'),'code4':('biology'),'code6':('712','479','297','639','452','172'),'rules_desc':'rules2'},{'code0':('52'),'code1':('NC'),'code2':('109'),'code3':('396','340','394','393','240'),'code4':('biology'),'code5':('12','18'),'rules_desc':'rules2'},{'code0':('52'),'code1':('NC'),'code2':('109'),'code3':('612','790','110'),'code4':('biology'),'code5':('12','16','18','19'),'code6':('285','295','236','239','269','284','237'),'rules_desc':'rules3'},{'code0':('52'),'code1':('NC'),'code2':('109'),'code3':('730','320','350','379','812','374'),'code4':('biology'),'code5':('12','18','19'),'rules_desc':'rules3'},{'code0':('40'),'code1':('VA'),'code2':('11'),'code3':('113','174','131','115'),'code5':('11','19','31'),'code6':('164','157','388','158'),'rules_desc':'rules4'},{'code0':('58'),'code1':('CE'),'code2':('109'),'code3':('423','114'),'code5':('31'),'code6':('372','238'),'rules_desc':'rules5'}]
# codeNname : 'code1', 'code2', 'code3', ..., 'code6'
# ruleNname : 'rules1', 'rules12', 'rules2', ..., 'rules5'
# ruleDescrKey : 'rules_desc'
# dictRulesSpec : dictionary { codeNname:value {1,N} ... , rulesDct_ruleKey:ruleNname }
# dictCodes : dictionary { codeNname:value, codeNname:value, ... }
# Rules : List [ dictRulesSpec, dictRulesSpec, ... ]
# dictRules : { ruleNname:[codeNname, codeNnameValue], ... }
Rules = rules_list
ruleDescrKey = 'rules_desc'
dictRules = defaultdict(list)
for dictRulesSpec in Rules:
ruleNname = dictRulesSpec.pop(ruleDescrKey)
# dictRulesSpec without ruleDescrKey item has only Codes as keys, so:
dictCodes = dictRulesSpec
for codeNname, codeNnameValue in dictCodes.items():
dictRules[ruleNname].append( (codeNname, codeNnameValue) )
print(f'{Rules=}')
print(f'{dictRules=}')
print(' ------------- ')
# ----------------------------------------------------------------------
indf_dict={'code0':{0:('5'),1:'nan',2:('98'),3:('98'),4:'',5:('15'),6:('40'),7:('52'),8:('52'),9:('40'),10:('52'),11:('52'),12:('58')},'code1':{0:('Agr','Serv'),1:('VA','HC','NIH','SAP','AUS','HOL','ATT','COL','UCL'),2:('ATT','NC'),3:('ATT','VA','NC'),4:('VA','HC','NIH','ATT','COL','UCL'),5:('Agr'),6:'nan',7:('NC'),8:('NC'),9:('VA'),10:('NC'),11:('NC'),12:('CE')},'code2':{0:'nan',1:'nan',2:('103','104','105','106','31'),3:('104','105'),4:'nan',5:('5'),6:'nan',7:('109'),8:('109'),9:('11'),10:('109'),11:('109'),12:('109')},'code3':{0:('90'),1:'nan',2:('810'),3:('810'),4:'nan',5:('58'),6:('518'),7:('610','620','682','642','621','611'),8:('620','682','642','611'),9:('113','174','131','115'),10:('612','790','110'),11:('612','110'),12:('423','114')},'code4':{0:('1'),1:'nan',2:('computerscience'),3:('computerscience'),4:'nan',5:('fishing'),6:'nan',7:('biology'),8:('biology'),9:'nan',10:('biology'),11:('biology'),12:'nan'},'code5':{0:'nan',1:'nan',2:'nan',3:'nan',4:'nan',5:'nan',6:'nan',7:'nan',8:'nan',9:('11','19','31'),10:('12','16','18','19'),11:('12','18','19'),12:'31'},'code6':{0:'nan',1:'nan',2:'nan',3:'nan',4:'nan',5:'nan',6:'594',7:('712','479','297','639','452','172'),8:('712','479','297'),9:('164','157','388','158'),10:('285','295','236','239','269','284','237'),11:('285','295','237'),12:('372','238')}}
dictDataRowsByCodeNname = indf_dict
df_dictDataRowsByCodeNname = pd.DataFrame.from_dict(dictDataRowsByCodeNname)
print(f'{dictDataRowsByCodeNname=}')
listDataRowsByRow = df_dictDataRowsByCodeNname.to_dict(orient='records')
print(f'{listDataRowsByRow=}')
print(' ------------- ')
isHit_Column = []
rules_desc_Column = []
# The loop below tests for only one hit within the rule ...
for dctDataRow in listDataRowsByRow:
isHit = False
for ruleNname, listTuplesCodeNnameValue in dictRules.items():
if isHit:
break
for codeNname, codeNnameValue in listTuplesCodeNnameValue:
if isHit:
break
else:
if dctDataRow[codeNname] == codeNnameValue:
isHit = True
bckpRuleNname = ruleNname
break
rules_desc_Column.append( bckpRuleNname if isHit else None)
isHit_Column.append(isHit)
print(f'{rules_desc_Column = }')
print(f'{isHit_Column = }')
print('================================')
df_dictDataRowsByCodeNname['isHit'] = isHit_Column
df_dictDataRowsByCodeNname['rules_desc'] = rules_desc_Column
print(df_dictDataRowsByCodeNname)
print('================================')
isHit_Column = []
rules_desc_Column = []
# The loop below tests for all hits within the rule and
# lists all rules that apply in case of hits:
for dctDataRow in listDataRowsByRow:
lstRulesWithHits = []
for ruleNname, listTuplesCodeNnameValue in dictRules.items():
ruleItemsWithHits = 0
for codeNname, codeNnameValue in listTuplesCodeNnameValue:
if dctDataRow[codeNname] == codeNnameValue:
ruleItemsWithHits += 1
if ruleItemsWithHits == len(listTuplesCodeNnameValue):
lstRulesWithHits.append(ruleNname)
isHit = bool(lstRulesWithHits)
rules_desc_Column.append( lstRulesWithHits if isHit else None)
isHit_Column.append(isHit)
df_dictDataRowsByCodeNname['isHit'] = isHit_Column
df_dictDataRowsByCodeNname['rules_desc'] = rules_desc_Column
print(df_dictDataRowsByCodeNname)
print('================================')
which gives:
Rules=[{'code1': ('VA', 'HC', 'NIH', 'SAP', 'AUS', 'HOL', 'ATT', 'COL', 'UCL')}, {'code0': '40', 'code3': '518', 'code6': '594'}, {'code0': '98', 'code1': ('ATT', 'NC'), 'code2': ('103', '104', '105', '106', '31'), 'code3': '810', 'code4': 'computerscience'}, {'code0': '98', 'code1': ('ATT', 'VA', 'NC'), 'code2': ('104', '105', '106', '31'), 'code4': 'computerscience'}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('610', '620', '682', '642', '621', '611'), 'code4': 'biology', 'code6': ('712', '479', '297', '639', '452', '172')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('396', '340', '394', '393', '240'), 'code4': 'biology', 'code5': ('12', '18')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('612', '790', '110'), 'code4': 'biology', 'code5': ('12', '16', '18', '19'), 'code6': ('285', '295', '236', '239', '269', '284', '237')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('730', '320', '350', '379', '812', '374'), 'code4': 'biology', 'code5': ('12', '18', '19')}, {'code0': '40', 'code1': 'VA', 'code2': '11', 'code3': ('113', '174', '131', '115'), 'code5': ('11', '19', '31'), 'code6': ('164', '157', '388', '158')}, {'code0': '58', 'code1': 'CE', 'code2': '109', 'code3': ('423', '114'), 'code5': '31', 'code6': ('372', '238')}]
dictRules=defaultdict(<class 'list'>, {'rules1': [('code1', ('VA', 'HC', 'NIH', 'SAP', 'AUS', 'HOL', 'ATT', 'COL', 'UCL'))], 'rules12': [('code0', '40'), ('code3', '518'), ('code6', '594')], 'rules2': [('code0', '98'), ('code1', ('ATT', 'NC')), ('code2', ('103', '104', '105', '106', '31')), ('code3', '810'), ('code4', 'computerscience'), ('code0', '98'), ('code1', ('ATT', 'VA', 'NC')), ('code2', ('104', '105', '106', '31')), ('code4', 'computerscience'), ('code0', '52'), ('code1', 'NC'), ('code2', '109'), ('code3', ('610', '620', '682', '642', '621', '611')), ('code4', 'biology'), ('code6', ('712', '479', '297', '639', '452', '172')), ('code0', '52'), ('code1', 'NC'), ('code2', '109'), ('code3', ('396', '340', '394', '393', '240')), ('code4', 'biology'), ('code5', ('12', '18'))], 'rules3': [('code0', '52'), ('code1', 'NC'), ('code2', '109'), ('code3', ('612', '790', '110')), ('code4', 'biology'), ('code5', ('12', '16', '18', '19')), ('code6', ('285', '295', '236', '239', '269', '284', '237')), ('code0', '52'), ('code1', 'NC'), ('code2', '109'), ('code3', ('730', '320', '350', '379', '812', '374')), ('code4', 'biology'), ('code5', ('12', '18', '19'))], 'rules4': [('code0', '40'), ('code1', 'VA'), ('code2', '11'), ('code3', ('113', '174', '131', '115')), ('code5', ('11', '19', '31')), ('code6', ('164', '157', '388', '158'))], 'rules5': [('code0', '58'), ('code1', 'CE'), ('code2', '109'), ('code3', ('423', '114')), ('code5', '31'), ('code6', ('372', '238'))]})
-------------
dictDataRowsByCodeNname={'code0': {0: '5', 1: 'nan', 2: '98', 3: '98', 4: '', 5: '15', 6: '40', 7: '52', 8: '52', 9: '40', 10: '52', 11: '52', 12: '58'}, 'code1': {0: ('Agr', 'Serv'), 1: ('VA', 'HC', 'NIH', 'SAP', 'AUS', 'HOL', 'ATT', 'COL', 'UCL'), 2: ('ATT', 'NC'), 3: ('ATT', 'VA', 'NC'), 4: ('VA', 'HC', 'NIH', 'ATT', 'COL', 'UCL'), 5: 'Agr', 6: 'nan', 7: 'NC', 8: 'NC', 9: 'VA', 10: 'NC', 11: 'NC', 12: 'CE'}, 'code2': {0: 'nan', 1: 'nan', 2: ('103', '104', '105', '106', '31'), 3: ('104', '105'), 4: 'nan', 5: '5', 6: 'nan', 7: '109', 8: '109', 9: '11', 10: '109', 11: '109', 12: '109'}, 'code3': {0: '90', 1: 'nan', 2: '810', 3: '810', 4: 'nan', 5: '58', 6: '518', 7: ('610', '620', '682', '642', '621', '611'), 8: ('620', '682', '642', '611'), 9: ('113', '174', '131', '115'), 10: ('612', '790', '110'), 11: ('612', '110'), 12: ('423', '114')}, 'code4': {0: '1', 1: 'nan', 2: 'computerscience', 3: 'computerscience', 4: 'nan', 5: 'fishing', 6: 'nan', 7: 'biology', 8: 'biology', 9: 'nan', 10: 'biology', 11: 'biology', 12: 'nan'}, 'code5': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'nan', 4: 'nan', 5: 'nan', 6: 'nan', 7: 'nan', 8: 'nan', 9: ('11', '19', '31'), 10: ('12', '16', '18', '19'), 11: ('12', '18', '19'), 12: '31'}, 'code6': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'nan', 4: 'nan', 5: 'nan', 6: '594', 7: ('712', '479', '297', '639', '452', '172'), 8: ('712', '479', '297'), 9: ('164', '157', '388', '158'), 10: ('285', '295', '236', '239', '269', '284', '237'), 11: ('285', '295', '237'), 12: ('372', '238')}}
listDataRowsByRow=[{'code0': '5', 'code1': ('Agr', 'Serv'), 'code2': 'nan', 'code3': '90', 'code4': '1', 'code5': 'nan', 'code6': 'nan'}, {'code0': 'nan', 'code1': ('VA', 'HC', 'NIH', 'SAP', 'AUS', 'HOL', 'ATT', 'COL', 'UCL'), 'code2': 'nan', 'code3': 'nan', 'code4': 'nan', 'code5': 'nan', 'code6': 'nan'}, {'code0': '98', 'code1': ('ATT', 'NC'), 'code2': ('103', '104', '105', '106', '31'), 'code3': '810', 'code4': 'computerscience', 'code5': 'nan', 'code6': 'nan'}, {'code0': '98', 'code1': ('ATT', 'VA', 'NC'), 'code2': ('104', '105'), 'code3': '810', 'code4': 'computerscience', 'code5': 'nan', 'code6': 'nan'}, {'code0': '', 'code1': ('VA', 'HC', 'NIH', 'ATT', 'COL', 'UCL'), 'code2': 'nan', 'code3': 'nan', 'code4': 'nan', 'code5': 'nan', 'code6': 'nan'}, {'code0': '15', 'code1': 'Agr', 'code2': '5', 'code3': '58', 'code4': 'fishing', 'code5': 'nan', 'code6': 'nan'}, {'code0': '40', 'code1': 'nan', 'code2': 'nan', 'code3': '518', 'code4': 'nan', 'code5': 'nan', 'code6': '594'}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('610', '620', '682', '642', '621', '611'), 'code4': 'biology', 'code5': 'nan', 'code6': ('712', '479', '297', '639', '452', '172')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('620', '682', '642', '611'), 'code4': 'biology', 'code5': 'nan', 'code6': ('712', '479', '297')}, {'code0': '40', 'code1': 'VA', 'code2': '11', 'code3': ('113', '174', '131', '115'), 'code4': 'nan', 'code5': ('11', '19', '31'), 'code6': ('164', '157', '388', '158')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('612', '790', '110'), 'code4': 'biology', 'code5': ('12', '16', '18', '19'), 'code6': ('285', '295', '236', '239', '269', '284', '237')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('612', '110'), 'code4': 'biology', 'code5': ('12', '18', '19'), 'code6': ('285', '295', '237')}, {'code0': '58', 'code1': 'CE', 'code2': '109', 'code3': ('423', '114'), 'code4': 'nan', 'code5': '31', 'code6': ('372', '238')}]
-------------
rules_desc_Column = [None, 'rules12', 'rules3', 'rules3', None, None, 'rules2', 'rules3', 'rules3', 'rules2', 'rules3', 'rules3', 'rules3']
isHit_Column = [False, True, True, True, False, False, True, True, True, True, True, True, True]
================================
code0 code1 ... isHit rules_desc
0 5 (Agr, Serv) ... False None
1 nan (VA, HC, NIH, SAP, AUS, HOL, ATT, COL, UCL) ... True rules12
2 98 (ATT, NC) ... True rules3
3 98 (ATT, VA, NC) ... True rules3
4 (VA, HC, NIH, ATT, COL, UCL) ... False None
5 15 Agr ... False None
6 40 nan ... True rules2
7 52 NC ... True rules3
8 52 NC ... True rules3
9 40 VA ... True rules2
10 52 NC ... True rules3
11 52 NC ... True rules3
12 58 CE ... True rules3
[13 rows x 9 columns]
================================
code0 code1 ... isHit rules_desc
0 5 (Agr, Serv) ... False None
1 nan (VA, HC, NIH, SAP, AUS, HOL, ATT, COL, UCL) ... True [rules1]
2 98 (ATT, NC) ... False None
3 98 (ATT, VA, NC) ... False None
4 (VA, HC, NIH, ATT, COL, UCL) ... False None
5 15 Agr ... False None
6 40 nan ... True [rules12]
7 52 NC ... False None
8 52 NC ... False None
9 40 VA ... True [rules4]
10 52 NC ... False None
11 52 NC ... False None
12 58 CE ... True [rules5]
[13 rows x 9 columns]
================================
P.S. The first final loop in the code above does NOT accumulate the hits providing a list of all rules which apply if there is a hit. In other words the search for hits is stopped after the first hit and first rule item which give a hit.
The second final loop tests all rule items and collects the rules which give hits in a list.
Perhaps this will get you started. The only tricky thing here is the all function. What I'm saying here is, "for every key and value in this particular rule, if the value is found in the list of values for the corresponding key in our data row, and that's true for EVERY part of this rule, then it is a winner".
When you have nested data like this, pandas is not the right tool. You could probably make it work, but this is way easier.
A key point here is that you need to search the VALUES in your data dictionary. Right? You have {0:'5',2:'98'...}, but we don't care about 0 and 2. We only care about the strings.
for row in indf_dict:
for rno,rule in enumerate(rules_list):
print("New rule", rno)
match = all( val in row[key].values() for key,val in rule.items() if key in row)
if match:
print("Rule", rno, "matches")
Output:
New rule 0
Rule 0 matches
New rule 1
Rule 1 matches
New rule 2
Rule 2 matches
New rule 3
New rule 4
Rule 4 matches
New rule 5
New rule 6
Rule 6 matches
New rule 7
New rule 8
Rule 8 matches
New rule 9
Rule 9 matches

How do I create a function that will return a value in a dictionary for each row within a data sheet using Python?

I need to create a new column in my table for a state region that populates a region for every row of data (each having a state). How do I write a function to call upon a dictionary for each row item?
I have about 30,000 row items, and I believe a loop would take too long. I am certain there is some way to do this with dictionaries. I've tried using different methods to call this but cannot seem to get it to populate the correct data.
states = {
'AK': 'Alaska',
'AL': 'Alabama',
'AR': 'Arkansas',
'AZ': 'Arizona',
'CA': 'California',
'CO': 'Colorado',
'CT': 'Connecticut',
'DC': 'District of Columbia',
'DE': 'Delaware',
'FL': 'Florida',
'GA': 'Georgia',
'HI': 'Hawaii',
'IA': 'Iowa',
'ID': 'Idaho',
'IL': 'Illinois',
'IN': 'Indiana',
'KS': 'Kansas',
'KY': 'Kentucky',
'LA': 'Louisiana',
'MA': 'Massachusetts',
'MD': 'Maryland',
'ME': 'Maine',
'MI': 'Michigan',
'MN': 'Minnesota',
'MO': 'Missouri',
'MS': 'Mississippi',
'MT': 'Montana',
'NC': 'North Carolina',
'ND': 'North Dakota',
'NE': 'Nebraska',
'NH': 'New Hampshire',
'NJ': 'New Jersey',
'NM': 'New Mexico',
'NV': 'Nevada',
'NY': 'New York',
'OH': 'Ohio',
'OK': 'Oklahoma',
'OR': 'Oregon',
'PA': 'Pennsylvania',
'RI': 'Rhode Island',
'SC': 'South Carolina',
'SD': 'South Dakota',
'TN': 'Tennessee',
'TX': 'Texas',
'UT': 'Utah',
'VA': 'Virginia',
'VT': 'Vermont',
'WA': 'Washington',
'WI': 'Wisconsin',
'WV': 'West Virginia',
'WY': 'Wyoming'
}
state_abbrev = {v: k for k, v in states.items()}
state_code = {
'AK': '10','AL': '4', 'AR': '9', 'AR': '6', 'CA': '9', 'CO': '8', 'CT': '1', 'DC': '3', 'DE': '3', 'FL': '4',
'GA': '4', 'HI': '9', 'IA': '7', 'ID': '10', 'IL': '5', 'IN': '5', 'KS': '7', 'KY': '4', 'LA': '6',
'MA': '1', 'MD': '3', 'ME': '1', 'MI': '5', 'MN': '5','MO': '7', 'MS': '4', 'MT': '8', 'NC': '4',
'ND': '8', 'NE': '7', 'NH': '1', 'NJ': '2', 'NM': '6','NV': '9', 'NY': '2', 'OH': '5', 'OK': '6',
'OR': '10', 'PA': '3', 'PR': '2', 'RI': '1', 'SC': '4', 'SD': '8', 'TN': '4', 'TX': '6', 'UT': '8',
'VA': '3', 'VI': '2', 'VT': '1', 'WA': '10', 'WI': '5', 'WV': '3', 'WY': '8', 'PI': '9'
}
state_region = {v: k for k, v in state_code.items()}
def get_region():
return [state_region[i] for i in fulldf['state']]
fulldf["Region"] = get_region()
fulldf.tail()
Returns key error 'MA', expected to return a new column named "Region" that populates the region for each "state" listed.
KeyError Traceback (most recent call last)
<ipython-input-338-6afc1e48556a> in <module>
33 return [state_region[i] for i in fulldf['state']]
34
---> 35 fulldf["Region"] = get_region()
36 fulldf.tail()
37
<ipython-input-338-6afc1e48556a> in get_region()
31
32 def get_region():
---> 33 return [state_region[i] for i in fulldf['state']]
34
35 fulldf["Region"] = get_region()
<ipython-input-338-6afc1e48556a> in <listcomp>(.0)
31
32 def get_region():
---> 33 return [state_region[i] for i in fulldf['state']]
34
35 fulldf["Region"] = get_region()
KeyError: 'MA'
Your get_region function is flawed. It should be:
def get_region():
return [state_region[i] for i in fulldf['state']]
Python comprehensions are optimized enough for that function to be fine for a 30k length dataframe.

Print nested dictionary in python and export all on a csv file

I have a dictionary like this:
{'https://github.com/project1': {'Batchfile': '91', 'Gradle': '110', 'INI': '25', 'Java': '1879', 'Markdown': '393', 'QMake': '52', 'Shell': '161', 'Text': '202', 'XML': '943'}}
{'https://github.com/project2': {'Batchfile': '91', 'Gradle': '123', 'INI': '25', 'Java': '1305', 'Markdown': '121', 'QMake': '52', 'Shell': '161', 'XML': '234'}}
{'https://github.com/project3': {'Batchfile': '91', 'Gradle': '360', 'INI': '27', 'Java': '805', 'Markdown': '27', 'QMake': '156', 'Shell': '161', 'XML': '380'}}
It is a structured in this way:
{'url': {'lang1': 'locs', 'lang2': 'locs', ...}}
{'url2': {'lang6': 'locs', 'lang5': 'locs', ...}}
where lang stay for languages and locs stay for line of codes (related to the previous language).
What i want to do is print this dictionary in a pretty way,so i can see the results before the export.
After that i want to export the dictionary into a csv file to make other operation. The problem is the languages are not sorted. That is what i mean:
{'https://github.com/Project4': {'HTML': '29', 'Java': '229', 'Markdown': '101', 'Maven POM': '88', 'XML': '62'}}
{'https://github.com/Project5': {'Batchfile': '85', 'Gradle': '84', 'INI': '22', 'Java': '2422', 'Markdown': '25', 'Prolog': '25', 'Shell': '173', 'XML': '3243', 'YAML': '43'}}
Any idea?
You could use pandas:
import pandas as pd
t = [{'https://github.com/project1': {'Batchfile': '91', 'Gradle': '110', 'INI': '25', 'Java': '1879', 'Markdown': '393', 'QMake': '52', 'Shell': '161', 'Text': '202', 'XML': '943'}},
{'https://github.com/project2': {'Batchfile': '91', 'Gradle': '123', 'INI': '25', 'Java': '1305', 'Markdown': '121', 'QMake': '52', 'Shell': '161', 'XML': '234'}},
{'https://github.com/project3': {'Batchfile': '91', 'Gradle': '360', 'INI': '27', 'Java': '805', 'Markdown': '27', 'QMake': '156', 'Shell': '161', 'XML': '380'}}]
columns = set([lang for x in t for l in x.values() for lang in l])
index = [p for x in t for p in x.keys()]
rows = [l for x in t for l in x.values() ]
df = pd.DataFrame(rows, columns=columns, index=index).fillna('N/A')
df.to_csv('projects.csv')
Which gives:
>>> df
Gradle INI Markdown ... Batchfile Java QMake
https://github.com/project1 110 25 393 ... 91 1879 52
https://github.com/project2 123 25 121 ... 91 1305 52
https://github.com/project3 360 27 27 ... 91 805 156
[3 rows x 9 columns]
And in the csv:

sorting by dictionary value in array python

Okay so I've been working on processing some annotated text output. What I have so far is a dictionary with annotation as key and relations an array of elements:
'Adenotonsillectomy': ['0', '18', '1869', '1716'],
'OSAS': ['57', '61'],
'apnea': ['41', '46'],
'can': ['94', '97', '1796', '1746'],
'deleterious': ['103', '114'],
'effects': ['122', '129', '1806', '1752'],
'for': ['19', '22'],
'gain': ['82', '86', '1776', '1734'],
'have': ['98', '102', ['1776 1786 1796 1806 1816'], '1702'],
'health': ['115', '121'],
'lead': ['67', '71', ['1869 1879 1889'], '1695'],
'leading': ['135', '142', ['1842 1852'], '1709'],
'may': ['63', '66', '1879', '1722'],
'obesity': ['146', '153'],
'obstructive': ['23', '34'],
'sleep': ['35', '40'],
'syndrome': ['47', '55'],
'to': ['143', '145', '1852', '1770'],
'weight': ['75', '81'],
'when': ['130', '134', '1842', '1758'],
'which': ['88', '93', '1786', '1740']}
What I want to do is sort this by the first element in the array and reorder the dict as:
'Adenotonsillectomy': ['0', '18', '1869', '1716']
'for': ['19', '22'],
'obstructive': ['23', '34'],
'sleep': ['35', '40'],
'apnea': ['41', '46'],
etc...
right now I've tried to use operator to sort by value:
sorted(dependency_dict.items(), key=lambda x: x[1][0])
However the output I'm getting is still incorrect:
[('Adenotonsillectomy', ['0', '18', '1869', '1716']),
('deleterious', ['103', '114']),
('health', ['115', '121']),
('effects', ['122', '129', '1806', '1752']),
('when', ['130', '134', '1842', '1758']),
('leading', ['135', '142', ['1842 1852'], '1709']),
('to', ['143', '145', '1852', '1770']),
('obesity', ['146', '153']),
('for', ['19', '22']),
('obstructive', ['23', '34']),
('sleep', ['35', '40']),
('apnea', ['41', '46']),
('syndrome', ['47', '55']),
('OSAS', ['57', '61']),
('may', ['63', '66', '1879', '1722']),
('lead', ['67', '71', ['1869 1879 1889'], '1695']),
('weight', ['75', '81']),
('gain', ['82', '86', '1776', '1734']),
('which', ['88', '93', '1786', '1740']),
('can', ['94', '97', '1796', '1746']),
('have', ['98', '102', ['1776 1786 1796 1806 1816'], '1702'])]
I'm not sure whats going wrong. Any help is appreciated.
The entries are sorted in alphabetical order. If you want to sort them on integer value, convert the value to int first:
sorted(dependency_dict.items(), key=lambda x: int(x[1][0]))

Merging nested dictionaries by keys preserving different values

I have two list of nested dictionaries with the same keys, but different values:
d1 = {
'distilled ': [{'water': '45'}, {'vodka': '9'}, {'vinegar': '7'}, {'beer': '6'}, {'alcohol': '5'}, {'whiskey': '5'}],
'planted': [{'tree': '30'}, {'seed': '28'}, {'flower': '20'}, {'plant': '7'}, {'bomb': '4'}, {'garden': '2'}]
}
and
d2 = {
'distilled ': [{'water': '14'}, {'vinegar': '9'}, {'wine': '8'}, {'alcohol': '8'}, {'liquid': '7'}, {'whiskey': '6'}, {'beer': '5'}],
'planted ': [{'flower': '28'}, {'tree': '18'}, {'seed': '9'}, {'vegetable': '4'}, {'bush': '3'}, {'grass': '3'}, {'garden': '3'}]
}
I want to merge them in a way that preserves the values and merges only the keys in the nested dictionaries. So that the outcome would look like:
{
'distilled ': [('water', '45', '14'), ('vodka', '9'), ('vinegar', '7', '9'), ('beer', '6', '5'), ('alcohol', '5'), ('whiskey', '5'), ('wine', '8')],
'planted': [('tree', '30', '18'), ('seed', '28', '9'), ('flower', '20', '7'), ('plant', '7'), ('bomb', '4'), ('garden', '2', '3')]
}
I tried merging the two using:
d_merged = { k: [ d1[k], d2_to_compare[k] ] for k in d1 }
but the in the outcome only the values of the first dictionary are presented, obviously. Do you have any ideas on how to fix this? Thank you very much in advance.
I am not sure which way to take from here. Would really appreciate any suggestions! Thanks a lot.
dict only has one key-value pair is not a good idea, but anyway, we can work out like this:
d1 = {
'distilled': [{'water': '45'}, {'vodka': '9'}, {'vinegar': '7'}, {'beer': '6'}, {'alcohol': '5'}, {'whiskey': '5'}],
'planted': [{'tree': '30'}, {'seed': '28'}, {'flower': '20'}, {'plant': '7'}, {'bomb': '4'}, {'garden': '2'}]
}
d2 = {
'distilled': [{'water': '14'}, {'vinegar': '9'}, {'wine': '8'}, {'alcohol': '8'}, {'liquid': '7'}, {'whiskey': '6'}, {'beer': '5'}],
'planted': [{'flower': '28'}, {'tree': '18'}, {'seed': '9'}, {'vegetable': '4'}, {'bush': '3'}, {'grass': '3'}, {'garden': '3'}]
}
d3 = {}
for k, v in d1.items():
k1 = dict([d.items()[0] for d in d1[k]])
k2 = dict([d.items()[0] for d in d2[k]])
ret = []
for d in (set(k1.keys()) | set(k2.keys())):
ret.append((d, k1.get(d), k2.get(d)))
d3[k] = ret
print d3

Categories