Reaching a leaf using substring in a dictionary with multiple structures - python

I have such a json file that combines all previous data storing versions. An example would be like this:
myList = {1: {'name': 'John', 'age': '27', 'class'= '2', 'drop' = True},
2: {'name': 'Marie', 'other_info': {'age': '22', 'class'= '3', 'dropped'= True }},
3: {'name': 'James', 'other_info': {'age': '23', 'class'= '1', 'is_dropped'= False}},
4: {'name': 'Lucy', 'some_info': {'age': '20', 'class'= '4', 'other_branch': {'is_dropped' = True, 'how_drop'= 'Foo'}}}}
I want to reach the information that contains drop in the key or subkey . I don't know all the dictionary structures, there can be 20 or more. All I know is that they all contain the phrase 'drop'. There might be other phrases that might contain the phrase 'drop', as well but they are not too much. If there are multiple drops, I can manually adjust which one to take.
I tried to flatten, but after flattening every dictionary item had a different key name.
There are other information I'd like to reach but most of these attributes have the similar problem, too.
I want to get the True, True, False, True values in the drop, dropped, and is_dropped keys.
How can I reach this node?

You can use recursion to solve this:
def get_drop(dct):
for key, val in dct.items():
if isinstance(key, str) and 'drop' in key and isinstance(val, bool):
yield val
elif isinstance(val, dict):
yield from get_drop(val)
print(list(get_drop(myList)))
[True, True, False, True]

Create a recursive function to search and adds to the incrementally keys. Without putting in details in safety checks, you can do something like this:
def find(input_dict, base='', search_key='drop'):
found_paths = []
if search_key in input_dict.keys():
found_paths.append(base)
for each_key in input_dict.keys():
if isinstance(input_dict[each_key], dict):
new_base = base + '.' + each_key
found_paths += find(input_dict[each_key], base=new_base, search_key=search_key)
return found_paths

Related

Passing nested list as an argument to a method

Below code is working fine
class p:
def __init__(self):
self.log={
'name':'',
'id':'',
'age':'',
'grade':''
}
def parse(self,line):
self.log['id']=line[0]
self.log['name']=line[1]
self.log['age']=line[2]
self.log['grade']=line[3].replace('\n',"")
return self.log
obj=p()
with open(r"C:\Users\sksar\Desktop\Azure DE\Datasets\dark.csv",'r') as fp:
line=fp.read()
data=[i.split(',') for i in line.split('\n')]
for i in data:
a=obj.parse(i)
print(a)
Input:
1,jonas,23,A
2,martha,23,B
Output:
{'name': 'jonas', 'id': '1', 'age': '23', 'grade': 'A'}
{'name': 'martha', 'id': '2', 'age': '23', 'grade': 'B'}
Question is: When i make a method call(a=obj.parse(i)) out of the loop, inputs are overwritten and give below as o/p {'name': 'martha', 'id': '2', 'age': '23', 'grade': 'B'} simply missing the previous records.
How to make a method(parse) call without having to iterate through nested loop(Input data) and feed data to the method call? simply how to get the desired output without for loop...
I dont get why you are trying to avoid an explicit loop. I mean, even if you don't see it in your code, if there is something being iterated, there will be a loop somewhere, and if so, "explicit is better than implicit".
In any case, check this:
with open(r"C:\Users\sksar\Desktop\Azure DE\Datasets\dark.csv",'r') as fp:
[print(obj.parse(x.split(','))) for x in fp.readlines()]

The key of the dict in the python array is the most frequent

I have an array with some dictionaries in it.
Although the following method can be achieved.
But I have to do some more processing on the returned value, which I think is very bad.
Is there a better way?
data = [{'name': 'A'},
{'name': 'A'},
None,
None,
{'name': 'B'},
{'name': 'B'},
{'name': 'B'}]
process = list(map(lambda x: x.get('name') if isinstance(x, dict) else None, data))
result = max(process, key=process.count)
for _ in data:
if isinstance(_, dict) and _['name'] == result:
array_index = _
break
print(data.index(array_index))
{'name':'B'} appears the most times.
Where is the data array {'name':'B'}?
According to the above example, I want to get 4.
But the code above has to be processed by the for loop again, which I think is very bad.
Hey I used github copilot to see how it will solve this problem
def get_index_of_most_frequent_dict_value(data):
"""
Return the index of the most frequent value in the data array
"""
# Create a dictionary to store the frequency of each value
frequency = {}
for item in data:
if item is None:
continue
if item['name'] in frequency:
frequency[item['name']] += 1
else:
frequency[item['name']] = 1
# Find the most frequent value
most_frequent_value = None
most_frequent_value_count = 0
for key, value in frequency.items():
if value > most_frequent_value_count:
most_frequent_value = key
most_frequent_value_count = value
# Find the position of the most frequent value
for i in range(len(data)):
if data[i] is None:
continue
if data[i]['name'] == most_frequent_value:
return i
Output:
4
Comparisons of time:
My solution (a): 5.499999999998562e-06 seconds
Your solution (b): 7.400000000004625e-06 seconds
a < b? Yes
Here is what you can do
import ast
data = [{'name': 'A'},
{'name': 'A'},
None,
None,
{'name': 'B'},
{'name': 'B'},
{'name': 'B'}]
x={str(y):data.count(y) for y in data}
j=ast.literal_eval(max (x))
print(j)

How can I use list comprehension to separate values in a dictionary?

name=[]
age=[]
address=[]
...
for line in pg:
for key,value in line.items():
if key == 'name':
name.append(value)
elif key == 'age':
age.append(value)
elif key == 'address':
address.append(value)
.
.
.
Is it possible to use list comprehension for above code because I need to separate lots of value in the dict? I will use the lists to write to a text file.
Source Data:
a = [{'name': 'paul', 'age': '26.', 'address': 'AU', 'gender': 'male'},
{'name': 'mei', 'age': '26.', 'address': 'NY', 'gender': 'female'},
{'name': 'smith', 'age': '16.', 'address': 'NY', 'gender': 'male'},
{'name': 'raj', 'age': '13.', 'address': 'IND', 'gender': 'male'}]
I don't think list comprehension will be a wise choice because you have multiple lists.
Instead of making multiple lists and appending to them the value if the key matches you can use defaultdict to simplify your code.
from collections import defaultdict
result = defaultdict(list)
for line in pg:
for key, value in line.items():
result[key].append(value)
You can get the name list by using result.get('name')
['paul', 'mei', 'smith', 'raj']
This probably won't work the way you want: Your'e trying to assign the three different lists, so you would need three different comprehensions. If your dict is large, this would roughly triple your execution time.
Something straightforward, such as
name = [value for for key,value in line.items() if key == "name"]
seems to be what you'd want ... three times.
You can proceed as :
pg=[{"name":"name1","age":"age1","address":"address1"},{"name":"name2","age":"age2","address":"address2"}]
name=[v for line in pg for k,v in line.items() if k=="name"]
age=[v for line in pg for k,v in line.items() if k=="age"]
address=[v for line in pg for k,v in line.items() if k=="address"]
In continuation with Vishal's answer, please dont use defaultdict. Using defaultdict is a very bad practice when you want to catch keyerrors. Please use setdefault.
results = dict()
for line in pg:
for key, value in line.items():
result.setdefault(key, []).append(value)
Output
{
'name': ['paul', 'mei', 'smith', 'raj'],
'age': [26, 26, 26, 13],
...
}
However, note that if all dicts in pg dont have the same keys, you will lose the relation/correspondence between the items in the dict
Here is a really simple solution if you want to use pandas:
import pandas as pd
df = pd.DataFrame(a)
name = df['name'].tolist()
age = df['age'].tolist()
address = df['address'].tolist()
print(name)
print(age)
print(address)
Output:
['paul', 'mei', 'smith', 'raj']
['26.', '26.', '16.', '13.']
['AU', 'NY', 'NY', 'IND']
Additionally, if your end result is a text file, you can skip the list creation and write the DataFrame (or parts thereof) directly to a CSV with something as simple as:
df.to_csv('/path/to/output.csv')

How to sort data in the dictionary of list of dictionary in python?

Please help me. I have dataset like this:
my_dict = { 'project_1' : [{'commit_number':'14','name':'john'},
{'commit_number':'10','name':'steve'}],
'project_2' : [{'commit_number':'12','name':'jack'},
{'commit_number':'15','name':'anna'},
{'commit_number':'11','name':'andy'}]
}
I need to sort the dataset based on the commit number in descending order and make it into a new list by ignoring the name of the project using python. The list expected will be like this:
ordered_list_of_dict = [{'commit_number':'15','name':'anna'},
{'commit_number':'14','name':'john'},
{'commit_number':'12','name':'jack'},
{'commit_number':'11','name':'andy'},
{'commit_number':'10','name':'steve'}]
Thank you so much for helping me.
Extract my_dict's values as a list of lists*
Join each sub-list together (flatten dict_values) to form a flat list
Sort each element by commit_number
*list of lists on python2. On python3, a dict_values object is returned.
from itertools import chain
res = sorted(chain.from_iterable(my_dict.values()),
key=lambda x: x['commit_number'],
reverse=True)
[{'commit_number': '15', 'name': 'anna'},
{'commit_number': '14', 'name': 'john'},
{'commit_number': '12', 'name': 'jack'},
{'commit_number': '11', 'name': 'andy'},
{'commit_number': '10', 'name': 'steve'}]
On python2, you'd use dict.itervalues instead of dict.values to the same effect.
Coldspeed's answer is great as usual but as an alternative, you can use the following:
ordered_list_of_dict = sorted([x for y in my_dict.values() for x in y], key=lambda x: x['commit_number'], reverse=True)
which, when printed, gives:
print(ordered_list_of_dict)
# [{'commit_number': '15', 'name': 'anna'}, {'commit_number': '14', 'name': 'john'}, {'commit_number': '12', 'name': 'jack'}, {'commit_number': '11', 'name': 'andy'}, {'commit_number': '10', 'name': 'steve'}]
Note that in the list-comprehension you have the standard construct for flattening a list of lists:
[x for sublist in big_list for x in sublist]
I'll provide the less-pythonic and more reader-friendly answer.
First, iterate through key-value pairs in my_dict, and add each element of value to an empty list. This way you avoid having to flatten out a list of lists:
commits = []
for key, val in my_dict.items():
for commit in val:
commits.append(commit)
which gives this:
In [121]: commits
Out[121]:
[{'commit_number': '12', 'name': 'jack'},
{'commit_number': '15', 'name': 'anna'},
{'commit_number': '11', 'name': 'andy'},
{'commit_number': '14', 'name': 'john'},
{'commit_number': '10', 'name': 'steve'}]
Then sort it in descending order:
sorted(commits, reverse = True)
This will sort based on 'commit_number' even if you don't specify it because it comes alphabetically before 'name'. If you want to specify it for the sake of defensive coding, this would be fastest and cleanest way, to the best of my knowledge :
from operator import itemgetter
sorted(commits, key = itemgetter('commit_number'), reverse = True)

What is the Pythonic way to iterate over a dict of dicts and lists?

I have a dict which contains some lists and some dicts, as illustrated below.
What is the most pythonic way to iterate over the dict and print out the name and address pairs for each top level dict key?
Thanks
{
'Resent-Bcc': [],
'Delivered-To': [],
'From': {'Name': 'Steve Watson', 'Address': 'steve.watson#example.org'},
'Cc': [],
'Resent-Cc': [],
'Bcc': [ {'Name': 'Daryl Hurstbridge', 'Address': 'daryl.hurstbridge#example.org'},
{'Name': 'Sally Hervorth', 'Address': 'sally.hervorth#example.org'},
{'Name': 'Mike Merry', 'Address': 'mike.merry#example.org'},
{'Name': 'Jenny Callisto', 'Address': 'jenny.callisto#example.org'}
],
'To': {'Name': 'Darius Jedburgh', 'Address': 'darius.jedburgh#example.org'}
}
Use the iteritems() method on the dict. It's clear and easy to understand: that seems Pythonic to me. iteritems() also creates less temporary items than items(), as Preet Kukreti mentioned in the comments. First, fix your data. Right now, some of the values in the top-level dict are lists, and some are more dicts:
# list
'Delivered-To': [],
# dict
'From': {'Name': 'Steve Watson', 'Address': 'steve.watson#example.org'},
This means you have to check the type of the value and act accordingly (and you might forget to check!). Make your data consistent:
# list
'Delivered-To': [],
# also list
'From': [{'Name': 'Steve Watson', 'Address': 'steve.watson#example.org'}],
This will prevent weird type-related bugs in the future. Since Python is an interpreted language, it's very easy to make type bugs and not notice until your code is in production and crashes. Try to make your code as type-safe as possible!
Then you can use something like this:
for k, v in d.iteritems():
for row in v:
if "Name" in row and "Address" in row:
print row["Name"], ":", row["Address"]
One way is to change the lone dicts into a list containing the dict. Then all the entries can be treated the same
>>> D = {
... 'Resent-Bcc': [],
... 'Delivered-To': [],
... 'From': {'Name': 'Steve Watson', 'Address': 'steve.watson#example.org'},
... 'Cc': [],
... 'Resent-Cc': [],
... 'Bcc': [ {'Name': 'Daryl Hurstbridge', 'Address': 'daryl.hurstbridge#example.org'},
... {'Name': 'Sally Hervorth', 'Address': 'sally.hervorth#example.org'},
... {'Name': 'Mike Merry', 'Address': 'mike.merry#example.org'},
... {'Name': 'Jenny Callisto', 'Address': 'jenny.callisto#example.org'}
... ],
... 'To': {'Name': 'Darius Jedburgh', 'Address': 'darius.jedburgh#example.org'}
... }
>>> L = [v if type(v) is list else [v] for v in D.values()]
>>> [(d["Name"], d["Address"]) for item in L for d in item ]
[('Steve Watson', 'steve.watson#example.org'), ('Daryl Hurstbridge', 'daryl.hurstbridge#example.org'), ('Sally Hervorth', 'sally.hervorth#example.org'), ('Mike Merry', 'mike.merry#example.org'), ('Jenny Callisto', 'jenny.callisto#example.org'), ('Darius Jedburgh', 'darius.jedburgh#example.org')]
Or the one liner version
[(d["Name"], d["Address"]) for item in (v if type(v) is list else [v] for v in D.values())]
It's probably best to keep your data simple, by making the naked dict's be a list of one element holding the original dict. Otherwise, you're kind of asking for harder to test code.
I tend to lean away from isinstance(foo, dict) and instead use things like:
if getattr(d, 'iteritems'): print list(d.iteritems())
...It strikes me as more duck-typed this way; it opens the door to using one of the many dict-replacements - things that act like a dict, but nominally aren't a dict.
for key in header:
if header[key] and type(header[key])==type([]):
for item in header[key]:
print (item)
elif type(header[key])==type({}):
print(header[key])
# this option is not the easiest to read, so I classify it as less "pythonic"
l = [header[key] for key in header if header[key] and type(header[key])==type({})] + [header[key][i] for key in header if header[key] and type(header[key])==type([]) for i in range(len(header[key]))]
for item in l:
print(item)
if you're looking for the contents of a specific header you could modify the if statements accordingly. Both of these examples print the dictionaries, but could easily be adapted to print specific values.
for i in dict:
if 'Name' in dict[i]:
print (dict[i]['Name'],dict[i]['Address'])
this will not work for the bcc where its in a list (right now it will only print the from and to names and addresses) Do you need it to print the bcc addresses too?

Categories