Recursive dictionary from list of hierarchical codes

Recursive dictionary from list of hierarchical codes - python

I have a list of (around 100) values like this list:
list = ['40201020', '45102020', '25203020', '20106020', '25301020', '40402030', '20202010']
I need a dictionary that
a) lists all parents for each value. The parent has one digit less (from the right):
child = '40201020'
parent = '4020102'
this format would be ideal:
dict['4020102parent'] = '40201020'
b) I need all parents of parents up to one remaining digit. So parent '4020102' gets this parent:
dict['4020102parent"] = '402010'
and
dict['402010parent"] = '40201'
etc.
c) I then need all last descendents for each parent as a list. By last descendant, I mean the 8 digit codes of the original list. So the digit '4' would have these codes:
dict['4children'] = ['40201020', '45102020', '40402030']
or:
dict['40children'] = ['40201020', '40402030']

Will your list always contain strings and is a dictionary a requirement? If you will always be working with strings and you are just wanting a way to find parents and children, I would suggest using python's string handling capabilities. You could define functions parent and children like this:
def parent(list_item):
return list_item[:-1]
def children(my_list, parent_str):
children_found = []
for item in my_list:
if item.startswith(parent_str)
children_found.append(item)
return children_found
Then calling parent('40201020') would produce '4020102' and calling children(my_list, '40') would produce ['40201020', '40402030']. You can call parent recursively to get a string with one less item each time.

I am still confused why you need parent dict when you can just store recursive result in a list and can use str.startswith() method :
still i have stored parentdict in dict_data you can use that :
list1 = ['40201020', '45102020', '25203020', '20106020', '25301020', '40402030', '20202010']
dict_data=[]
track=[]
list_2={}
def main_function(lst_1):
for i in lst_1:
def recursive(lst):
parent = {}
if not lst:
return 0
else:
parent[lst[:-1] + 'parent'] = lst
track.append(lst)
dict_data.append(parent)
return recursive(lst[:-1])
recursive(i)
main_function(list1)
for recursive_unit in set(track):
for items in list1:
if items.startswith(recursive_unit):
if recursive_unit not in list_2:
list_2[recursive_unit]=[items]
else:
list_2[recursive_unit].append(items)
print(list_2)
output:
{'25203': ['25203020'], '25': ['25203020', '25301020'],'4': ['40201020', '45102020', '40402030'],'4510': ['45102020'], '2520302': ['25203020'], '40402030': ['40402030'], '2010602': ['20106020'], '45102020': ['45102020'], '45': ['45102020'], '253010': ['25301020'], '4020': ['40201020'], '252': ['25203020'], '20202010': ['20202010'], '20106': ['20106020'], '201060': ['20106020'],'202020': ['20202010'], '2530102': ['25301020'], '402': ['40201020'], '2010': ['20106020'], '4510202': ['45102020'], '2530': ['25301020'], '451020': ['45102020'], '2020201': ['20202010'], '404020': ['40402030'], '25203020': ['25203020'], '2': ['25203020', '20106020', '25301020', '20202010'], '20202': ['20202010'], '253': ['25301020'], '40402': ['40402030'], '451': ['45102020'], '40201020': ['40201020'], '252030': ['25203020'], '2520': ['25203020'], '40': ['40201020', '40402030'], '4040': ['40402030'], '402010': ['40201020'], '4020102': ['40201020'], '25301020': ['25301020'], '20106020': ['20106020'], '201': ['20106020'], '20': ['20106020', '20202010'], '202': ['20202010'], '40201': ['40201020'], '45102': ['45102020'], '2020': ['20202010'], '25301': ['25301020'], '4040203': ['40402030'], '404': ['40402030']}

As stated in my comment, a dictionary where each key contains the children seems like a more reasonable idea.
To achieve this, we can loop through each element in your list (which I renamed to l as to not override the built-in list() function), and append this value to the lists of all its parents in a dictionary, d.
The code for the above described method would look something along the lines of:
d = {}
for i in l:
for e in range(1, len(l)-1):
d.setdefault(i[:e], []).append(i)
which will then allow you to do things like:
>>> d['4']
['40201020', '45102020', '40402030']
>>> d['40']
['40201020', '40402030']
>>> d['25']
['25203020', '25301020']

# do not use "list" as variable name (it's a python builtin)
mylist = ['2', '20', '201', '2010', '20106', '201060', '2010602']
mylist_sorted = mylist.clone()
mylist_sorted.sort()
parents = {}
children = {}
for element in mylist_sorted:
parents[e] = []
children[e] = []
for subelement in [element[0:x] for x in range(1,len(element))]:
if subelement in parents:
parents[element].append(subelement)
children[subelement].append(element)
print(parents)
print(children)
Output:
{'201060': ['2', '20', '201', '2010', '20106'],
'20106': ['2', '20', '201', '2010'],
'2010': ['2', '20', '201'],
'201': ['2', '20'],
'2': [],
'2010602': ['2', '20', '201', '2010', '20106', '201060'],
'20': ['2']}
{'201060': ['2010602'],
'20106': ['201060', '2010602'],
'2010': ['20106', '201060', '2010602'],
'201': ['2010', '20106', '201060', '2010602'],
'2': ['20', '201', '2010', '20106', '201060', '2010602'],
'2010602': [],
'20': ['201', '2010', '20106', '201060', '2010602']}

Related

I have problem with my loop list. cant get the expected result

here's my code :
UserList = [['person1', '25yo','70kg','170cm'],[ 'person2','21yo','54kg','164cm']]
ListStrUser = []
for ListStrUser in UserList:
ListStrUser = GetNum(UserList)
def GetNum(anyList):
for i in range(1,len(anyList)):
anyList[i] = re.sub (r'\D',"", str(anyList[i]))
return anyList
print(ListStrUser)
########
expected result :
[['person1', '25','70','170'],[ 'person2','21','54','164']]

You were not far off Asif. But I cannot add much more to Ethan's answer which is why I'm confused that it was down voted. If you want a function that can handle all the work without the need for another for loop then this function below will do just that:
import re
UserList = [['person1', '25yo','70kg','170cm'],[ 'person2','21yo','54kg','164cm']]
def get_num(any_list):
# includes the for loop to iterate through the list of lists
list_str_user = []
for inner_list in any_list:
temp_list = [inner_list[0]]
for i in range(1,len(inner_list)):
temp_list.append(re.sub(r'\D', '', str(inner_list[i])))
list_str_user.append(temp_list)
return list_str_user
print(get_num(UserList))
Output:
[['person1', '25', '70', '170'], ['person2', '21', '54', '164']]
So no need for the for loop outside the function.

import re
def GetNum(anyList):
for i in range(1, len(anyList)):
anyList[i] = re.sub(r'\D[^0-9]',"",str(anyList[i]))
return anyList
userList = [['person1','25yo','70kg','170cm'],['person2','21yo','54kg','164cm']]
for ListStrUser in userList: ListStrUser = GetNum(ListStrUser)
print("Output : ", userList)
output: [['person1', '25', '70', '170'], ['person2', '21', '54', '164']]

from #Guy 's comment:
UserList = [['person1', '25yo','70kg','170cm'],[ 'person2','21yo','54kg','164cm']]
import re
def GetNum(anyList):
for i in range(1,len(anyList)):
anyList[i] = re.sub (r'\D',"", str(anyList[i]))
return anyList
ListStrUser = []
for ListStr in UserList:
ListStrUser.append(GetNum(ListStr))
print(ListStrUser)
gives
[['person1', '25', '70', '170'], ['person2', '21', '54', '164']]

Try the following code:
user_list = [['person1', '25yo','70kg','170cm'],[ 'person2','21yo','54kg','164cm']]
list_str_user = []
def get_num(any_list):
updated_list = [any_list[0]]
for i in range(1,len(any_list)):
updated_list.append(re.sub(r'\D',"", str(any_list[i])))
return updated_list
for user in user_list:
list_str_user.append(get_num(user))
print(list_str_user)
Notice I also updated the naming of your variables and functions and the spacing between functions to be compliant with pep8. Keep this in mind when writing Python code.
Also functions should be defined before you use them otherwise Python won't find them.
I also created the updated_list variable in get_num, it's never a bad idea to not mutate parameters in functions.

Nested dictionary representing catalogue

A text file has information about after class activities (name, price per month, days and time) that looks like so:
Swimming,20,Monday,15,Monday,17,Wednesday,18,Friday,15
Football,20,Tuesday,18,Wednesday,17,Wednesday,18,Thursday,19
Ballet,40,Monday,18,Tuesday,18,Wednesday,16,Thursday,16,Friday,17
To represent the course catalogue, I've created a nested dictionary in a format like this:
{'Swimming': {'Price': '20', 'Dates': {'Monday': ['15', '17'], 'Wednesday': ['18'], 'Friday': ['15']}}, 'Football': {'Price': '20', 'Dates': {'Tuesday': ['18'], 'Wednesday': ['17', '18'], 'Thursday': ['19']}}, 'Ballet': {'Price': '40', 'Dates': {'Monday': ['18'], 'Tuesday': ['18'], 'Wednesday': ['16'], 'Thursday': ['16'], 'Friday': ['17']}}}
And the code looks like this:
with open("fil.txt", "r") as f:
catalogue = {}
while True:
content = f.readline().strip()
if not content: break
content = content.split(',')
u[content[0]] = {}
u[content[0]]['Price'] = content[1]
u[content[0]]['Dates'] = {}
for i in range(2,len(content),2):
if content[i] in u[content[0]]['Dates']:
u[content[0]]['Dates'][content[i]].append(content[i+1])
else:
u[content[0]]['Dates'][content[i]] = [content[i+1]]
My question is : is there a simpler way to implement such dictionary? Or maybe another data structure should have been used to represent catalogue rather than this one?

This is one way using a nested dictionary structure via collections.defaultdict.
from collections import defaultdict
u = defaultdict(lambda: defaultdict(lambda: defaultdict(list)))
with open("fil.txt", "r") as f:
catalogue = {}
while True:
content = f.readline().strip()
if not content: break
content = content.split(',')
u[content[0]]['Price'] = content[1]
for i in range(2,len(content),2):
u[content[0]]['Dates'][content[i]].append(content[i+1])

I would simply write a class.
from collections import defaultdict
class SportClass:
def __init__(self, name, price, *times):
self.name = name
self.price = float(price)
self.days = defaultdict(list)
for day, hour in zip(times[::2], times[1::2]):
self.days[day].append(int(hour))
with open('fil.txt') as fp:
classes = [SportClass(*line.split(',')) for line in fp if line.strip()]

Nested dict in python, searching based on inner key get inner value and parent key

I have following dict:
defaultdict(<type 'dict'>,
{'11': {('extreme_fajita', 'jalapeno_poppers'): '4',('test12', 'test14'): '5'},
'10': {('jalapeno_poppers', 'test', ): '2', ('test2',): '3', ('test14',): '5'}
}
And I want to search on based on inner key i.e ('test2',) I should get the value from inner dictionary and parent key (outer key)
i.e searching for ('test2',) I should get get ['10', '3'] or whole like ['10', '('test2', )', '3']

I'm going to assume your defaultdict looks like:
defaultdict = {'11': {('extreme_fajita', 'jalapeno_poppers'): '4',('test12', 'test14'): '5'}, '10': {('jalapeno_poppers', 'test2', ): '2', ('test2',): '3', ('test14',): '5'} }
If that's the case, then you can use:
searchValue = 'test2'; found = []
for masterkey,mastervalue in defaultdict.iteritems():
for childkey,childvalue in mastervalue.iteritems():
if searchValue in childkey:
found.append(childvalue)
print found

Dictionary is not ordered so you will not get in the order as '2','3' instead you can get all values from the dictionary where 'test2' found. I have following code for this:
def getKeys(d1, path="", lastDict=list()):
for k in d1.keys():
if type(k) is tuple:
if 'test2' in k:
print "test2 found at::", path + "->" , k
print "Value of test2::", d1[k]
print "Values in parent::", [kl for kl in lastDict[len(lastDict)-1].values()]
elif type(d1[k]) is dict:
lastDict.append(d1[k])
if path == "":
path = k
else:
path = path + "->" + k
getKeys(d1[k], path)
d = {'11': {('extreme_fajita', 'jalapeno_poppers'): '4',('test12', 'test14'): '5'}, '10': {('jalapeno_poppers', 'test', ): '2', ('test2',): '3', ('test14',): '5'}}
getKeys(d)
Output:
test2 found at:: 11->10-> ('test2',)
Value of test2:: 3
Values in parent:: ['2', '5', '3']

looping once over html as a string

I try to read data from a table in html. I read periodically and the table length always change and I don't know its length. However the table is always on the same format so I try to recognize some pattern and read data based on it's position.
The html is of the form:
<head>
<title>Some webside</title>
</head>
<body
<tr><td> There are some information coming here</td></tr>
<tbody><table>
<tr><td>First</td><td>London</td><td>24</td><td>3</td><td>19:00</td><td align="center"></td></tr>
<tr bgcolor="#cccccc"><td>Second</td><td>NewYork</td><td>24</td><td>4</td><td>20:13</td><td align="center"></td></tr>
<tr><td>Some surprise</td><td>Swindon</td><td>25</td><td>5</td><td>20:29</td><td align="center"></td></tr>
<tr bgcolor="#cccccc"><td>Third</td><td>Swindon</td><td>24</td><td>6</td><td>20:45</td><td align="center"></td></tr>
</tbody></table>
<tr><td> There are some information coming here</td></tr>
</body>
I convert html to a string and go over it to read the data but I want to read it only once. My code is:
def ReadTable(m):
refList = []
firstId = 1
nextId = 2
k = 1
helper = 1
while firstId != nextId:
row = []
helper = m.find('<td><a href="d?k=', helper) + 17
end_helper = m.find('">', helper)
rowId = m[helper : end_helper]
if k == 1: # to check if looped again
firstId = rowId
else:
nextId = rowId
row.append(rowId)
helper = end_helper + 2
end_helper = m.find('</a></td><td>', helper)
rowPlace = m[helper : end_helper]
row.append(rowPlace)
helper = m.find('</a></td><td>', end_helper) + 13
end_helper = m.find('</td><td>', helper)
rowCity = m[helper : end_helper]
row.append(rowCity)
helper = end_helper + 9
end_helper = m.find('</td><td>', helper)
rowDay = m[helper : end_helper]
row.append(rowDay)
helper = end_helper + 9
end_helper = m.find('</td><td>', helper)
rowNumber = m[helper : end_helper]
row.append(rowNumber)
helper = end_helper + 9
end_helper = m.find('</td>', helper)
rowTime = m[helper : end_helper]
row.append(rowTime)
refList.append(row)
k +=1
return refList
if __name__ == '__main__':
filePath = '/home/m/workspace/Tests/mainP.html'
fileRead = open(filePath)
myString = fileRead.read()
print myString
refList = ReadTable(myString)
print 'Final List = %s' % refList
I expect the outcome as a list with 4 lists inside like that:
Final List = [['101', 'First', 'London', '24', '3', '19:00'], ['102', 'Second', 'NewYork', '24', '4', '20:13'], ['201', 'Some surprise', 'Swindon', '25', '5', '20:29'], ['202', 'Third', 'Swindon', '24', '6', '20:45']]
I expect that after first loop the string is read again and the firstId is found again and my while-loop will terminate. Instead I have infinite loop and my list start to look like this:
Final List = [['101', 'First', 'London', '24', '3', '19:00'], ['102', 'Second', 'NewYork', '24', '4', '20:13'], ['201', 'Some surprise', 'Swindon', '25', '5', '20:29'], ['202', 'Third', 'Swindon', '24', '6', '20:45'], ['me webside</title>\n</head>\n<body \n<tr><td> There are some information coming here</td></tr>\n<tbody><table>\n<tr><td><a href="d?k=101', 'First', 'London', '24', '3', '19:00'], ['102', 'Second', 'NewYork', '24', '4', '20:13']...
I don't understand why my helper start to behave this way and I can't figure out how a program like that should be written. Can you suggest a good/effective way to write it or to fix my loop?

I would suggest you invest some time in looking at LXML. It allows you to look at all of the tables in an html file and work with the sub-elements of the things that make up the table (like rows and cells)
LXML is not hard to work with and it allows you to feed in a string with the
html.fromstring(somestring)
Further, there arte a lot of lxml questions that have been asked and answered here on SO so it is not to hard to find good examples to work from

You aren't checking the return from your find and it is returning -1 when it doesn't find a match.
http://docs.python.org/2/library/string.html#string.find
Return -1 on failure
I updated this section of the code and it returns as you expect now. First and last row below match what you have above so you can find the replacement.
row = []
helper = m.find('<td><a href="d?k=', helper)
if helper == -1:
break
helper += 17
end_helper = m.find('">', helper)

issue in list of dict

class MyOwnClass:
# list who contains the queries
queries = []
# a template dict
template_query = {}
template_query['name'] = 'mat'
template_query['age'] = '12'
obj = MyOwnClass()
query = obj.template_query
query['name'] = 'sam'
query['age'] = '23'
obj.queries.append(query)
query2 = obj.template_query
query2['name'] = 'dj'
query2['age'] = '19'
obj.queries.append(query2)
print obj.queries
It gives me
[{'age': '19', 'name': 'dj'}, {'age': '19', 'name': 'dj'}]
while I expect to have
[{'age': '23' , 'name': 'sam'}, {'age': '19', 'name': 'dj'}]
I thought to use a template for this list because I'm gonna to use it very often and there are some default variable who does not need to be changed.
Why does doing it the template_query itself changes? I'm new to python and I'm getting pretty confused.

this is because you are pointing to the same dictionary each time ... and overwriting the keys ...
# query = obj.template_query - dont need this
query = {}
query['name'] = 'sam'
query['age'] = '23'
obj.queries.append(query)
query2 = {} #obj.template_query-dont need this
query2['name'] = 'dj'
query2['age'] = '19'
obj.queries.append(query2)
this should demonstrate your problem
>>> q = {'a':1}
>>> lst = []
>>> lst.append(q)
>>> q['a']=2
>>> lst
[{'a': 2}]
>>> lst.append(q)
>>> lst
[{'a': 2}, {'a': 2}]
you could implement your class differently
class MyOwnClass:
# a template dict
#property
def template_query():
return {'name':'default','age':-1}
this will make obj.template_query return a new dict each time

This is because query and query2 are both referring to the same object. obj.template_query, in this case.
Better to make a template factory:
def template_query(**kwargs):
template = {'name': 'some default value',
'age': 'another default value',
'car': 'generic car name'}
template.update(**kwargs)
return template
That creates a new dictionary every time it's called. So you can do:
>>> my_query = template_query(name="sam")
>>> my_query
{'name': 'sam', 'age': 'another default value', 'car': 'generic car name'}

You're copying the same dict into query2. Instead, you might want to create the needed dict by creating a function template_query() and constructing a new dict each time:
class MyOwnClass:
# a template dict
def template_query():
d = {}
d['name'] = 'mat'
d['age'] = '12'
d['car'] = 'ferrari'
return d

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Recursive dictionary from list of hierarchical codes - python

Related

I have problem with my loop list. cant get the expected result

Nested dictionary representing catalogue

Nested dict in python, searching based on inner key get inner value and parent key

looping once over html as a string

issue in list of dict

Categories

Resources