Search tuples between two string of dates - python

I want to list the values inside a tuples between two dates(string), My data look like this:
[(1, 'ch-01-07-1', '2021-07-01', '262', 'okinama', 'OR15G9431', 'Dhenkanal', 'FULAPADA', '67', '450', '34', '395151.0', 'Not Yet'),
(3, 'ch-01-07-3', '2021-07-02', '262', 'okinama', 'OR 21 7911', 'Dhenkanal', 'FULAPADA', '67', '450', '34', '395151.0', 'Not Yet'),
(4, 'ch-01-07-4', '2021-07-01', '262', 'okinama', 'OR 21 7911', 'Dhenkanal', 'DIGHI', '67', '450', '34', '299743.0', 'Not Yet'),
(5, 'ch-01-07-5', '2021-07-03', '262', 'okinama', 'OR 21 7911', 'Dhenkanal', 'CUTTACK', '67', '450', '34', '384163.0', 'Not Yet'),
(6, 'ch-01-07-6', '2021-07-04', '262', 'okinama', 'OR 21 7911', 'Dhenkanal', 'BARSINGHA (BARAMBA)', '67', '450', '34', '356425.0', 'Not Yet'),
(7, 'ch-18-07-1', '2021-07-12', '256', 'ultra tech', 'OR 21 7911', 'Dhenkanal', 'DERA', '52', '63', '21', '340672.0', 'Not Yet'),
(8, 'ch-18-07-2', '2021-07-11', '457', 'ultra tech', 'OR15G9431', 'Dhenkanal', 'DHENKANAL TOWN AREA (M.PAT, COLLEGE BYEPASS)', '45', '5677', '66', '88082.0', 'Not Yet'),
(9, 'ch-18-07-3', '2021-07-15', '545', 'okinama', 'OR 21 7911', 'Dhenkanal', 'FULAPADA', '67', '66', '55', '395514.0', 'Not Yet'),
(10, 'ch-18-07-4', '2021-07-09', '545', 'ultra tech', 'OR 21 7911', 'Dhenkanal', 'FULAPADA', '67', '66', '55', '395514.0', 'Not Yet'),
(12, 'ch-01-07-2', '2021-07-08', '123', 'ultra tech', 'OR 21 7911', 'Dhenkanal', 'DHUBALAPALA (TELKOI)', '23', '23', '12', '287534.0', 'Not Yet'),
(17, 'ch-2021-07-1', '2021-07-12', '565', 'ultra tech', 'OR 21 7911', 'Dhenkanal', 'DHENKANAL TOWN AREA (UPTO MAHAVEER BAZAR) ', '32', '33', '22', '61289.0', 'Not Yet'),
(19, 'ch-2021-07-2022', '2021-07-18', '741', 'okinama', 'OR 21 7911', 'Dhenkanal', 'FULAPADA', '21', '22', '22', '123961.0', 'Not Yet'),
(20, 'ch-2021-07-2023', '2021-07-19', '693', 'ultra tech', 'od062598', 'Dhenkanal', 'DUDURKOTE', '78', '78', '78', '352014.0', 'Not Yet'),
(21, 'ch-2021-07-2024', '2021-07-20', '123', 'okinama', 'OR 21 7911', 'Dhenkanal', 'CUTTACK', '10', '100', '100', '57210.0', 'Not Yet')]
for example i want to search dates between "2021-07-03" to "2021-07-15", then as a result i expect the rows of 5, 6, 7, 8, 9, 10, 12, 17 to list in my console and further if with column number of [5] where value is equal to "ultra tech" then to list the rows of 7, 8, 10, 12, 17.

You can convert the date to integer this way.
cr_date = "2021-07-03"
cr_date = list(map(int, cr_date.split('-')))
start_date = 10000 * cr_date[0] + 100 * cr_date[1] + cr_date[2]
Then for query:
find_value = 'ultra tech'
for t in data:
cr_date = list(map(int, str(list(t)[2]).split('-')))
find_date = to_int(cr_date)
if end_date >= find_date >= start_date:
search_result.append(t)
if find_value in t:
adv_search.append(t)

from datetime import datetime
from datetime import date
d1 = date(2021,7,3)
d2 = date(2021,7,15)
for b in A:
if(b[4]=="ultra tech"):
c = datetime.strptime(b[2], "%Y-%m-%d").date()
print(c)
if (d1 < c < d2):
print(b)

Related

how to compare each cell of dataframe with list of dictionary in python?

I am trying to compare column values of each rows of dataframe with predefined list of dictionary, and do filtering. I tried pandas to compare column value by row-wise with list of dictionary, but it is not quite working, I got type error. I think I may need to convert dataframe into dictionary then compare it with list of dictionary then convert back to dataframe with new column added, but this still not giving my desired output. Does anyone suggest possible workaround on this? How can we do this easily in python
working minimal example
import pandas as pd
indf=pd.DataFrame.from_dict(indf_dict)
indf_lst=indf.to_dict(orient='records')
matches=[]
for each in rules_list:
for row in indf_lst:
if row in each:
matches.append(row)
I tried pandas approach to check column values of every rows in rules_list but the attempt is not successful. Now I tried to convert indf dataframe to dictionary and compare two dictionary, but I have type error as follow:
TypeError Traceback (most recent call last)
Input In [11], in <cell line: 12>()
12 for each in rules_list:
13 for row in indf_lst:
---> 14 if row in each:
15 matches.append(row)
TypeError: unhashable type: 'dict'
objective
I need to compare columns of every rows with list of dictionary rules_list, and add new column which shows found match or not. How this can be done in python?
updated desired output
here is my desired output where I want to add two new columns when columns values hit match with list of dictionary rules_list that I defined.
output={'code0':{0:('5'),1:'nan',2:('98'),3:('98'),4:'nan',5:('15'),6:('40'),7:('52'),8:('52'),9:('40'),10:('52'),11:('52'),12:('58')},'code1':{0:('Agr','Serv'),1:('VA','HC','NIH','SAP','AUS','HOL','ATT','COL','UCL'),2:('ATT','NC'),3:('ATT','VA','NC'),4:('VA','HC','NIH','ATT','COL','UCL'),5:('Agr'),6:'nan',7:('NC'),8:('NC'),9:('VA'),10:('NC'),11:('NC'),12:('CE')},'code2':{0:'nan',1:'nan',2:('103','104','105','106','31'),3:('104','105'),4:'nan',5:('5'),6:'nan',7:('109'),8:('109'),9:('11'),10:('109'),11:('109'),12:('109')},'code3':{0:('90'),1:'nan',2:('810'),3:('810'),4:'nan',5:('58'),6:('518'),7:('610','620','682','642','621','611'),8:('620','682','642','611'),9:('113','174','131','115'),10:('612','790','110'),11:('612','110'),12:('423','114')},'code4':{0:('1'),1:'nan',2:('computerscience'),3:('computerscience'),4:'nan',5:('fishing'),6:'nan',7:('biology'),8:('biology'),9:'nan',10:('biology'),11:('biology'),12:'nan'},'code5':{0:'nan',1:'nan',2:'nan',3:'nan',4:'nan',5:'nan',6:'nan',7:'nan',8:'nan',9:('11','19','31'),10:('12','16','18','19'),11:('12','18','19'),12:('31')},'code6':{0:'nan',1:'nan',2:'nan',3:'nan',4:'nan',5:'nan',6:('594'),7:('712','479','297','639','452','172'),8:('712','479','297'),9:('164','157','388','158'),10:('285','295','236','239','269','284','237'),11:('285','295','237'),12:('372','238')},'isHit':{0:False,1:True,2:True,3:True,4:True,5:False,6:True,7:True,8:True,9:True,10:True,11:True,12:True},'rules_desc':{0:'None',1:'rules1',2:'rules2',3:'rules2',4:'rules1',5:'None',6:'rules12',7:'rules21',8:'rules21',9:'rules4',10:'rules3',11:'rules3',12:'rules5'}}
outdf=pd.DataFrame.from_dict(output)
how can I achieve this sort of mapping value from each cell of dataframe to list of dictionary? should I handle this in pandas or convert them into list then compare it? any possible thoughts? Anything close to above desired output should be fine.
The code below should do what you are asking for, but I haven't tested it yet if it actually really does what it should. I have put some effort in appropriate naming of the variables to make it easier to understand what the code does and how it works.
In the first step the code transforms the list with dictionaries for the rules into a list of tuples with code and code value for each of the rules with the purpose of making the final loop for checking if there is a hit easier to put together, understand, maintain and debug.
In the second step the code transforms the dictionary with data using pandas like it is done in code mentioned in the question.
Probably there is also a pandas way of transforming the list of dictionaries in the first step, so if you read this and know how to accomplish this using pandas I would be glad to hear about that.
Maybe there is a way to accomplish the entire task using pandas and two or three lines of code ... now with the variable naming and the provided code of the loops it would be easier for you who is reading this to come up with the code and provide maybe another and better answer.
from pprint import pprint
import pandas as pd
from collections import defaultdict
# ----------------------------------------------------------------------
rules_list=rules_dict=[{'code1':('VA','HC','NIH','SAP','AUS','HOL','ATT','COL','UCL'),'rules_desc':'rules1'},{'code0':('40'),'code3':('518'),'code6':('594'),'rules_desc':'rules12'},{'code0':('98'),'code1':('ATT','NC'),'code2':('103','104','105','106','31'),'code3':('810'),'code4':('computerscience'),'rules_desc':'rules2'},{'code0':('98'),'code1':('ATT','VA','NC'),'code2':('104','105','106','31'),'code4':('computerscience'),'rules_desc':'rules2'},{'code0':('52'),'code1':('NC'),'code2':('109'),'code3':('610','620','682','642','621','611'),'code4':('biology'),'code6':('712','479','297','639','452','172'),'rules_desc':'rules2'},{'code0':('52'),'code1':('NC'),'code2':('109'),'code3':('396','340','394','393','240'),'code4':('biology'),'code5':('12','18'),'rules_desc':'rules2'},{'code0':('52'),'code1':('NC'),'code2':('109'),'code3':('612','790','110'),'code4':('biology'),'code5':('12','16','18','19'),'code6':('285','295','236','239','269','284','237'),'rules_desc':'rules3'},{'code0':('52'),'code1':('NC'),'code2':('109'),'code3':('730','320','350','379','812','374'),'code4':('biology'),'code5':('12','18','19'),'rules_desc':'rules3'},{'code0':('40'),'code1':('VA'),'code2':('11'),'code3':('113','174','131','115'),'code5':('11','19','31'),'code6':('164','157','388','158'),'rules_desc':'rules4'},{'code0':('58'),'code1':('CE'),'code2':('109'),'code3':('423','114'),'code5':('31'),'code6':('372','238'),'rules_desc':'rules5'}]
# codeNname : 'code1', 'code2', 'code3', ..., 'code6'
# ruleNname : 'rules1', 'rules12', 'rules2', ..., 'rules5'
# ruleDescrKey : 'rules_desc'
# dictRulesSpec : dictionary { codeNname:value {1,N} ... , rulesDct_ruleKey:ruleNname }
# dictCodes : dictionary { codeNname:value, codeNname:value, ... }
# Rules : List [ dictRulesSpec, dictRulesSpec, ... ]
# dictRules : { ruleNname:[codeNname, codeNnameValue], ... }
Rules = rules_list
ruleDescrKey = 'rules_desc'
dictRules = defaultdict(list)
for dictRulesSpec in Rules:
ruleNname = dictRulesSpec.pop(ruleDescrKey)
# dictRulesSpec without ruleDescrKey item has only Codes as keys, so:
dictCodes = dictRulesSpec
for codeNname, codeNnameValue in dictCodes.items():
dictRules[ruleNname].append( (codeNname, codeNnameValue) )
print(f'{Rules=}')
print(f'{dictRules=}')
print(' ------------- ')
# ----------------------------------------------------------------------
indf_dict={'code0':{0:('5'),1:'nan',2:('98'),3:('98'),4:'',5:('15'),6:('40'),7:('52'),8:('52'),9:('40'),10:('52'),11:('52'),12:('58')},'code1':{0:('Agr','Serv'),1:('VA','HC','NIH','SAP','AUS','HOL','ATT','COL','UCL'),2:('ATT','NC'),3:('ATT','VA','NC'),4:('VA','HC','NIH','ATT','COL','UCL'),5:('Agr'),6:'nan',7:('NC'),8:('NC'),9:('VA'),10:('NC'),11:('NC'),12:('CE')},'code2':{0:'nan',1:'nan',2:('103','104','105','106','31'),3:('104','105'),4:'nan',5:('5'),6:'nan',7:('109'),8:('109'),9:('11'),10:('109'),11:('109'),12:('109')},'code3':{0:('90'),1:'nan',2:('810'),3:('810'),4:'nan',5:('58'),6:('518'),7:('610','620','682','642','621','611'),8:('620','682','642','611'),9:('113','174','131','115'),10:('612','790','110'),11:('612','110'),12:('423','114')},'code4':{0:('1'),1:'nan',2:('computerscience'),3:('computerscience'),4:'nan',5:('fishing'),6:'nan',7:('biology'),8:('biology'),9:'nan',10:('biology'),11:('biology'),12:'nan'},'code5':{0:'nan',1:'nan',2:'nan',3:'nan',4:'nan',5:'nan',6:'nan',7:'nan',8:'nan',9:('11','19','31'),10:('12','16','18','19'),11:('12','18','19'),12:'31'},'code6':{0:'nan',1:'nan',2:'nan',3:'nan',4:'nan',5:'nan',6:'594',7:('712','479','297','639','452','172'),8:('712','479','297'),9:('164','157','388','158'),10:('285','295','236','239','269','284','237'),11:('285','295','237'),12:('372','238')}}
dictDataRowsByCodeNname = indf_dict
df_dictDataRowsByCodeNname = pd.DataFrame.from_dict(dictDataRowsByCodeNname)
print(f'{dictDataRowsByCodeNname=}')
listDataRowsByRow = df_dictDataRowsByCodeNname.to_dict(orient='records')
print(f'{listDataRowsByRow=}')
print(' ------------- ')
isHit_Column = []
rules_desc_Column = []
# The loop below tests for only one hit within the rule ...
for dctDataRow in listDataRowsByRow:
isHit = False
for ruleNname, listTuplesCodeNnameValue in dictRules.items():
if isHit:
break
for codeNname, codeNnameValue in listTuplesCodeNnameValue:
if isHit:
break
else:
if dctDataRow[codeNname] == codeNnameValue:
isHit = True
bckpRuleNname = ruleNname
break
rules_desc_Column.append( bckpRuleNname if isHit else None)
isHit_Column.append(isHit)
print(f'{rules_desc_Column = }')
print(f'{isHit_Column = }')
print('================================')
df_dictDataRowsByCodeNname['isHit'] = isHit_Column
df_dictDataRowsByCodeNname['rules_desc'] = rules_desc_Column
print(df_dictDataRowsByCodeNname)
print('================================')
isHit_Column = []
rules_desc_Column = []
# The loop below tests for all hits within the rule and
# lists all rules that apply in case of hits:
for dctDataRow in listDataRowsByRow:
lstRulesWithHits = []
for ruleNname, listTuplesCodeNnameValue in dictRules.items():
ruleItemsWithHits = 0
for codeNname, codeNnameValue in listTuplesCodeNnameValue:
if dctDataRow[codeNname] == codeNnameValue:
ruleItemsWithHits += 1
if ruleItemsWithHits == len(listTuplesCodeNnameValue):
lstRulesWithHits.append(ruleNname)
isHit = bool(lstRulesWithHits)
rules_desc_Column.append( lstRulesWithHits if isHit else None)
isHit_Column.append(isHit)
df_dictDataRowsByCodeNname['isHit'] = isHit_Column
df_dictDataRowsByCodeNname['rules_desc'] = rules_desc_Column
print(df_dictDataRowsByCodeNname)
print('================================')
which gives:
Rules=[{'code1': ('VA', 'HC', 'NIH', 'SAP', 'AUS', 'HOL', 'ATT', 'COL', 'UCL')}, {'code0': '40', 'code3': '518', 'code6': '594'}, {'code0': '98', 'code1': ('ATT', 'NC'), 'code2': ('103', '104', '105', '106', '31'), 'code3': '810', 'code4': 'computerscience'}, {'code0': '98', 'code1': ('ATT', 'VA', 'NC'), 'code2': ('104', '105', '106', '31'), 'code4': 'computerscience'}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('610', '620', '682', '642', '621', '611'), 'code4': 'biology', 'code6': ('712', '479', '297', '639', '452', '172')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('396', '340', '394', '393', '240'), 'code4': 'biology', 'code5': ('12', '18')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('612', '790', '110'), 'code4': 'biology', 'code5': ('12', '16', '18', '19'), 'code6': ('285', '295', '236', '239', '269', '284', '237')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('730', '320', '350', '379', '812', '374'), 'code4': 'biology', 'code5': ('12', '18', '19')}, {'code0': '40', 'code1': 'VA', 'code2': '11', 'code3': ('113', '174', '131', '115'), 'code5': ('11', '19', '31'), 'code6': ('164', '157', '388', '158')}, {'code0': '58', 'code1': 'CE', 'code2': '109', 'code3': ('423', '114'), 'code5': '31', 'code6': ('372', '238')}]
dictRules=defaultdict(<class 'list'>, {'rules1': [('code1', ('VA', 'HC', 'NIH', 'SAP', 'AUS', 'HOL', 'ATT', 'COL', 'UCL'))], 'rules12': [('code0', '40'), ('code3', '518'), ('code6', '594')], 'rules2': [('code0', '98'), ('code1', ('ATT', 'NC')), ('code2', ('103', '104', '105', '106', '31')), ('code3', '810'), ('code4', 'computerscience'), ('code0', '98'), ('code1', ('ATT', 'VA', 'NC')), ('code2', ('104', '105', '106', '31')), ('code4', 'computerscience'), ('code0', '52'), ('code1', 'NC'), ('code2', '109'), ('code3', ('610', '620', '682', '642', '621', '611')), ('code4', 'biology'), ('code6', ('712', '479', '297', '639', '452', '172')), ('code0', '52'), ('code1', 'NC'), ('code2', '109'), ('code3', ('396', '340', '394', '393', '240')), ('code4', 'biology'), ('code5', ('12', '18'))], 'rules3': [('code0', '52'), ('code1', 'NC'), ('code2', '109'), ('code3', ('612', '790', '110')), ('code4', 'biology'), ('code5', ('12', '16', '18', '19')), ('code6', ('285', '295', '236', '239', '269', '284', '237')), ('code0', '52'), ('code1', 'NC'), ('code2', '109'), ('code3', ('730', '320', '350', '379', '812', '374')), ('code4', 'biology'), ('code5', ('12', '18', '19'))], 'rules4': [('code0', '40'), ('code1', 'VA'), ('code2', '11'), ('code3', ('113', '174', '131', '115')), ('code5', ('11', '19', '31')), ('code6', ('164', '157', '388', '158'))], 'rules5': [('code0', '58'), ('code1', 'CE'), ('code2', '109'), ('code3', ('423', '114')), ('code5', '31'), ('code6', ('372', '238'))]})
-------------
dictDataRowsByCodeNname={'code0': {0: '5', 1: 'nan', 2: '98', 3: '98', 4: '', 5: '15', 6: '40', 7: '52', 8: '52', 9: '40', 10: '52', 11: '52', 12: '58'}, 'code1': {0: ('Agr', 'Serv'), 1: ('VA', 'HC', 'NIH', 'SAP', 'AUS', 'HOL', 'ATT', 'COL', 'UCL'), 2: ('ATT', 'NC'), 3: ('ATT', 'VA', 'NC'), 4: ('VA', 'HC', 'NIH', 'ATT', 'COL', 'UCL'), 5: 'Agr', 6: 'nan', 7: 'NC', 8: 'NC', 9: 'VA', 10: 'NC', 11: 'NC', 12: 'CE'}, 'code2': {0: 'nan', 1: 'nan', 2: ('103', '104', '105', '106', '31'), 3: ('104', '105'), 4: 'nan', 5: '5', 6: 'nan', 7: '109', 8: '109', 9: '11', 10: '109', 11: '109', 12: '109'}, 'code3': {0: '90', 1: 'nan', 2: '810', 3: '810', 4: 'nan', 5: '58', 6: '518', 7: ('610', '620', '682', '642', '621', '611'), 8: ('620', '682', '642', '611'), 9: ('113', '174', '131', '115'), 10: ('612', '790', '110'), 11: ('612', '110'), 12: ('423', '114')}, 'code4': {0: '1', 1: 'nan', 2: 'computerscience', 3: 'computerscience', 4: 'nan', 5: 'fishing', 6: 'nan', 7: 'biology', 8: 'biology', 9: 'nan', 10: 'biology', 11: 'biology', 12: 'nan'}, 'code5': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'nan', 4: 'nan', 5: 'nan', 6: 'nan', 7: 'nan', 8: 'nan', 9: ('11', '19', '31'), 10: ('12', '16', '18', '19'), 11: ('12', '18', '19'), 12: '31'}, 'code6': {0: 'nan', 1: 'nan', 2: 'nan', 3: 'nan', 4: 'nan', 5: 'nan', 6: '594', 7: ('712', '479', '297', '639', '452', '172'), 8: ('712', '479', '297'), 9: ('164', '157', '388', '158'), 10: ('285', '295', '236', '239', '269', '284', '237'), 11: ('285', '295', '237'), 12: ('372', '238')}}
listDataRowsByRow=[{'code0': '5', 'code1': ('Agr', 'Serv'), 'code2': 'nan', 'code3': '90', 'code4': '1', 'code5': 'nan', 'code6': 'nan'}, {'code0': 'nan', 'code1': ('VA', 'HC', 'NIH', 'SAP', 'AUS', 'HOL', 'ATT', 'COL', 'UCL'), 'code2': 'nan', 'code3': 'nan', 'code4': 'nan', 'code5': 'nan', 'code6': 'nan'}, {'code0': '98', 'code1': ('ATT', 'NC'), 'code2': ('103', '104', '105', '106', '31'), 'code3': '810', 'code4': 'computerscience', 'code5': 'nan', 'code6': 'nan'}, {'code0': '98', 'code1': ('ATT', 'VA', 'NC'), 'code2': ('104', '105'), 'code3': '810', 'code4': 'computerscience', 'code5': 'nan', 'code6': 'nan'}, {'code0': '', 'code1': ('VA', 'HC', 'NIH', 'ATT', 'COL', 'UCL'), 'code2': 'nan', 'code3': 'nan', 'code4': 'nan', 'code5': 'nan', 'code6': 'nan'}, {'code0': '15', 'code1': 'Agr', 'code2': '5', 'code3': '58', 'code4': 'fishing', 'code5': 'nan', 'code6': 'nan'}, {'code0': '40', 'code1': 'nan', 'code2': 'nan', 'code3': '518', 'code4': 'nan', 'code5': 'nan', 'code6': '594'}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('610', '620', '682', '642', '621', '611'), 'code4': 'biology', 'code5': 'nan', 'code6': ('712', '479', '297', '639', '452', '172')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('620', '682', '642', '611'), 'code4': 'biology', 'code5': 'nan', 'code6': ('712', '479', '297')}, {'code0': '40', 'code1': 'VA', 'code2': '11', 'code3': ('113', '174', '131', '115'), 'code4': 'nan', 'code5': ('11', '19', '31'), 'code6': ('164', '157', '388', '158')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('612', '790', '110'), 'code4': 'biology', 'code5': ('12', '16', '18', '19'), 'code6': ('285', '295', '236', '239', '269', '284', '237')}, {'code0': '52', 'code1': 'NC', 'code2': '109', 'code3': ('612', '110'), 'code4': 'biology', 'code5': ('12', '18', '19'), 'code6': ('285', '295', '237')}, {'code0': '58', 'code1': 'CE', 'code2': '109', 'code3': ('423', '114'), 'code4': 'nan', 'code5': '31', 'code6': ('372', '238')}]
-------------
rules_desc_Column = [None, 'rules12', 'rules3', 'rules3', None, None, 'rules2', 'rules3', 'rules3', 'rules2', 'rules3', 'rules3', 'rules3']
isHit_Column = [False, True, True, True, False, False, True, True, True, True, True, True, True]
================================
code0 code1 ... isHit rules_desc
0 5 (Agr, Serv) ... False None
1 nan (VA, HC, NIH, SAP, AUS, HOL, ATT, COL, UCL) ... True rules12
2 98 (ATT, NC) ... True rules3
3 98 (ATT, VA, NC) ... True rules3
4 (VA, HC, NIH, ATT, COL, UCL) ... False None
5 15 Agr ... False None
6 40 nan ... True rules2
7 52 NC ... True rules3
8 52 NC ... True rules3
9 40 VA ... True rules2
10 52 NC ... True rules3
11 52 NC ... True rules3
12 58 CE ... True rules3
[13 rows x 9 columns]
================================
code0 code1 ... isHit rules_desc
0 5 (Agr, Serv) ... False None
1 nan (VA, HC, NIH, SAP, AUS, HOL, ATT, COL, UCL) ... True [rules1]
2 98 (ATT, NC) ... False None
3 98 (ATT, VA, NC) ... False None
4 (VA, HC, NIH, ATT, COL, UCL) ... False None
5 15 Agr ... False None
6 40 nan ... True [rules12]
7 52 NC ... False None
8 52 NC ... False None
9 40 VA ... True [rules4]
10 52 NC ... False None
11 52 NC ... False None
12 58 CE ... True [rules5]
[13 rows x 9 columns]
================================
P.S. The first final loop in the code above does NOT accumulate the hits providing a list of all rules which apply if there is a hit. In other words the search for hits is stopped after the first hit and first rule item which give a hit.
The second final loop tests all rule items and collects the rules which give hits in a list.
Perhaps this will get you started. The only tricky thing here is the all function. What I'm saying here is, "for every key and value in this particular rule, if the value is found in the list of values for the corresponding key in our data row, and that's true for EVERY part of this rule, then it is a winner".
When you have nested data like this, pandas is not the right tool. You could probably make it work, but this is way easier.
A key point here is that you need to search the VALUES in your data dictionary. Right? You have {0:'5',2:'98'...}, but we don't care about 0 and 2. We only care about the strings.
for row in indf_dict:
for rno,rule in enumerate(rules_list):
print("New rule", rno)
match = all( val in row[key].values() for key,val in rule.items() if key in row)
if match:
print("Rule", rno, "matches")
Output:
New rule 0
Rule 0 matches
New rule 1
Rule 1 matches
New rule 2
Rule 2 matches
New rule 3
New rule 4
Rule 4 matches
New rule 5
New rule 6
Rule 6 matches
New rule 7
New rule 8
Rule 8 matches
New rule 9
Rule 9 matches

convert list of lists to dictionary

how can I create a list of dictionaries with those lists
temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,000', '2,000', '2,000']]
the headers of the dictionary is the first element of the lists
the expected Output is:
output= [{'header1': '4', 'media_range': '1,200', 'speed': '300'}, {'header1': '8', 'media_range': '2,400', 'speed': '600'}, ...]
Ideally the code should handle any amount of lists (in this case 3)
IIUC
>>> temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4
...: ,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,0
...: 00', '2,000', '2,000']]
>>>
>>> keys = [l[0] for l in temp]
>>> values = [l[1:] for l in temp]
>>> dicts = [dict(zip(keys, sub)) for sub in zip(*values)]
>>>
>>> dicts
[{'header1': '4', 'media_range': '1,200', 'speed': '300'},
{'header1': '8', 'media_range': '2,400', 'speed': '600'},
{'header1': '16', 'media_range': '4,800', 'speed': '1,200'},
{'header1': '32', 'media_range': '4,800', 'speed': '2,000'},
{'header1': '64', 'media_range': '6,200', 'speed': '2,000'},
{'header1': '128', 'media_range': '38,400', 'speed': '2,000'},
{'header1': '256', 'media_range': '76,800', 'speed': '2,000'},
{'header1': '512', 'media_range': '153,600', 'speed': '2,000'},
{'header1': '243,6', 'media_range': '160,000', 'speed': '2,000'}]
Slightly shorter solution with zip and unpacking:
temp = [['header1', '4', '8', '16', '32', '64', '128', '256', '512', '243,6'], ['media_range', '1,200', '2,400', '4,800', '4,800', '6,200', '38,400', '76,800', '153,600', '160,000'], ['speed', '300', '600', '1,200', '2,000', '2,000', '2,000', '2,000', '2,000', '2,000']]
header, *data = zip(*temp)
result = [dict(zip(header, i)) for i in data]
Output:
[{'header1': '4', 'media_range': '1,200', 'speed': '300'}, {'header1': '8', 'media_range': '2,400', 'speed': '600'}, {'header1': '16', 'media_range': '4,800', 'speed': '1,200'}, {'header1': '32', 'media_range': '4,800', 'speed': '2,000'}, {'header1': '64', 'media_range': '6,200', 'speed': '2,000'}, {'header1': '128', 'media_range': '38,400', 'speed': '2,000'}, {'header1': '256', 'media_range': '76,800', 'speed': '2,000'}, {'header1': '512', 'media_range': '153,600', 'speed': '2,000'}, {'header1': '243,6', 'media_range': '160,000', 'speed': '2,000'}]
You could use zip(). This requires you to know how many lists but does the expected output.
for header1,media_range,speed in zip(temp[0], temp[1], temp[2]):
if header1 != "header1":
output.append({temp[0][0]: header1, temp[1][0]: media_range, temp[2][0]: speed})

Convert a matrix from string to integer

I'm trying to change a matrix of numbers from string to integer but it just doesn't work.
for element in list:
for i in element:
i = int(i)
What am I doing wrong?
Edit:
This is the whole code:
import numpy as np
t_list = []
t_list = np.array(t_list)
list_rains_per_months = [['63', '65', '50', '77', '66', '69'],
['65', '65', '67', '50', '54', '58'],
['77', '73', '80', '83', '89', '100'],
['90', '85', '90', '90', '84', '90'],
['129', '113', '120', '135', '117', '130'],
['99', '116', '114', '111', '119', '100'],
['105', '98', '112', '113', '102', '100'],
['131', '120', '111', '141', '130', '126'],
['85', '101', '88', '89', '94', '91'],
['122', '103', '119', '98', '101', '107'],
['121', '101', '104', '121', '115', '104'],
['67', '44', '58', '61', '64', '58']]
for element in t_list:
for i in element:
i = int(i)
I apologize for any mistakes, I'm new to python
What you're doing wrong, is that you're not changing the list or any list element: the 'i' inside the loop starts by pointing to each element of the list, then you make it point to something else, but that doesn't affect your list (also, avoid using 'list' as an identifier, it's an existing type, that's asking for trouble).
One way to do it is with list comprehensions. Assuming your matrix is a list of (inner) lists, for example:
a_list = [["3", "56", "78"], ["2", "39", "60"], ["87", "9", "71"]]
then two nested list comprehensions should do the trick:
a_list = [[int(i) for i in inner_list] for inner_list in a_list]
This builds a new list, formed by going over your initial list, applying the change you want, and saving it a another (or the same) list.
In numpy you do it that way.
import numpy as np
list_rains_per_months = [['63', '65', '50', '77', '66', '69'],
['65', '65', '67', '50', '54', '58'],
['77', '73', '80', '83', '89', '100'],
['90', '85', '90', '90', '84', '90'],
['129', '113', '120', '135', '117', '130'],
['99', '116', '114', '111', '119', '100'],
['105', '98', '112', '113', '102', '100'],
['131', '120', '111', '141', '130', '126'],
['85', '101', '88', '89', '94', '91'],
['122', '103', '119', '98', '101', '107'],
['121', '101', '104', '121', '115', '104'],
['67', '44', '58', '61', '64', '58']]
list_rains_per_months = np.array(list_rains_per_months)
myfunc = np.vectorize(lambda x: int(x))
list_rains_per_months = myfunc(list_rains_per_months)
print(list_rains_per_months)
Output
[[ 63 65 50 77 66 69]
[ 65 65 67 50 54 58]
[ 77 73 80 83 89 100]
[ 90 85 90 90 84 90]
[129 113 120 135 117 130]
[ 99 116 114 111 119 100]
[105 98 112 113 102 100]
[131 120 111 141 130 126]
[ 85 101 88 89 94 91]
[122 103 119 98 101 107]
[121 101 104 121 115 104]
[ 67 44 58 61 64 58]]
You could use enumerate object in loops:
list = [["12", "10", "0"],
["0", "33", "60"]]
for h, i in enumerate(list):
for j, k in enumerate(i):
list[h][j] = int(k)
print(list)
Could also just map each row's values to int:
for row in list_rains_per_months:
row[:] = map(int, row)
Note that I assign to row[:], i.e., into the row and thus into the matrix. If I assigned to row instead, I'd have the same problem as you with your i: I'd only assign to the variable, not into the row/matrix.

Print nested dictionary in python and export all on a csv file

I have a dictionary like this:
{'https://github.com/project1': {'Batchfile': '91', 'Gradle': '110', 'INI': '25', 'Java': '1879', 'Markdown': '393', 'QMake': '52', 'Shell': '161', 'Text': '202', 'XML': '943'}}
{'https://github.com/project2': {'Batchfile': '91', 'Gradle': '123', 'INI': '25', 'Java': '1305', 'Markdown': '121', 'QMake': '52', 'Shell': '161', 'XML': '234'}}
{'https://github.com/project3': {'Batchfile': '91', 'Gradle': '360', 'INI': '27', 'Java': '805', 'Markdown': '27', 'QMake': '156', 'Shell': '161', 'XML': '380'}}
It is a structured in this way:
{'url': {'lang1': 'locs', 'lang2': 'locs', ...}}
{'url2': {'lang6': 'locs', 'lang5': 'locs', ...}}
where lang stay for languages and locs stay for line of codes (related to the previous language).
What i want to do is print this dictionary in a pretty way,so i can see the results before the export.
After that i want to export the dictionary into a csv file to make other operation. The problem is the languages are not sorted. That is what i mean:
{'https://github.com/Project4': {'HTML': '29', 'Java': '229', 'Markdown': '101', 'Maven POM': '88', 'XML': '62'}}
{'https://github.com/Project5': {'Batchfile': '85', 'Gradle': '84', 'INI': '22', 'Java': '2422', 'Markdown': '25', 'Prolog': '25', 'Shell': '173', 'XML': '3243', 'YAML': '43'}}
Any idea?
You could use pandas:
import pandas as pd
t = [{'https://github.com/project1': {'Batchfile': '91', 'Gradle': '110', 'INI': '25', 'Java': '1879', 'Markdown': '393', 'QMake': '52', 'Shell': '161', 'Text': '202', 'XML': '943'}},
{'https://github.com/project2': {'Batchfile': '91', 'Gradle': '123', 'INI': '25', 'Java': '1305', 'Markdown': '121', 'QMake': '52', 'Shell': '161', 'XML': '234'}},
{'https://github.com/project3': {'Batchfile': '91', 'Gradle': '360', 'INI': '27', 'Java': '805', 'Markdown': '27', 'QMake': '156', 'Shell': '161', 'XML': '380'}}]
columns = set([lang for x in t for l in x.values() for lang in l])
index = [p for x in t for p in x.keys()]
rows = [l for x in t for l in x.values() ]
df = pd.DataFrame(rows, columns=columns, index=index).fillna('N/A')
df.to_csv('projects.csv')
Which gives:
>>> df
Gradle INI Markdown ... Batchfile Java QMake
https://github.com/project1 110 25 393 ... 91 1879 52
https://github.com/project2 123 25 121 ... 91 1305 52
https://github.com/project3 360 27 27 ... 91 805 156
[3 rows x 9 columns]
And in the csv:

sorting by dictionary value in array python

Okay so I've been working on processing some annotated text output. What I have so far is a dictionary with annotation as key and relations an array of elements:
'Adenotonsillectomy': ['0', '18', '1869', '1716'],
'OSAS': ['57', '61'],
'apnea': ['41', '46'],
'can': ['94', '97', '1796', '1746'],
'deleterious': ['103', '114'],
'effects': ['122', '129', '1806', '1752'],
'for': ['19', '22'],
'gain': ['82', '86', '1776', '1734'],
'have': ['98', '102', ['1776 1786 1796 1806 1816'], '1702'],
'health': ['115', '121'],
'lead': ['67', '71', ['1869 1879 1889'], '1695'],
'leading': ['135', '142', ['1842 1852'], '1709'],
'may': ['63', '66', '1879', '1722'],
'obesity': ['146', '153'],
'obstructive': ['23', '34'],
'sleep': ['35', '40'],
'syndrome': ['47', '55'],
'to': ['143', '145', '1852', '1770'],
'weight': ['75', '81'],
'when': ['130', '134', '1842', '1758'],
'which': ['88', '93', '1786', '1740']}
What I want to do is sort this by the first element in the array and reorder the dict as:
'Adenotonsillectomy': ['0', '18', '1869', '1716']
'for': ['19', '22'],
'obstructive': ['23', '34'],
'sleep': ['35', '40'],
'apnea': ['41', '46'],
etc...
right now I've tried to use operator to sort by value:
sorted(dependency_dict.items(), key=lambda x: x[1][0])
However the output I'm getting is still incorrect:
[('Adenotonsillectomy', ['0', '18', '1869', '1716']),
('deleterious', ['103', '114']),
('health', ['115', '121']),
('effects', ['122', '129', '1806', '1752']),
('when', ['130', '134', '1842', '1758']),
('leading', ['135', '142', ['1842 1852'], '1709']),
('to', ['143', '145', '1852', '1770']),
('obesity', ['146', '153']),
('for', ['19', '22']),
('obstructive', ['23', '34']),
('sleep', ['35', '40']),
('apnea', ['41', '46']),
('syndrome', ['47', '55']),
('OSAS', ['57', '61']),
('may', ['63', '66', '1879', '1722']),
('lead', ['67', '71', ['1869 1879 1889'], '1695']),
('weight', ['75', '81']),
('gain', ['82', '86', '1776', '1734']),
('which', ['88', '93', '1786', '1740']),
('can', ['94', '97', '1796', '1746']),
('have', ['98', '102', ['1776 1786 1796 1806 1816'], '1702'])]
I'm not sure whats going wrong. Any help is appreciated.
The entries are sorted in alphabetical order. If you want to sort them on integer value, convert the value to int first:
sorted(dependency_dict.items(), key=lambda x: int(x[1][0]))

Categories