From list to nested dictionary - python

there are list :
data = ['man', 'man1', 'man2']
key = ['name', 'id', 'sal']
man_res = ['Alexandra', 'RST01', '$34,000']
man1_res = ['Santio', 'RST009', '$45,000']
man2_res = ['Rumbalski', 'RST50', '$78,000']
the expected output will be nested output:
Expected o/p:- {'man':{'name':'Alexandra', 'id':'RST01', 'sal':$34,000},
'man1':{'name':'Santio', 'id':'RST009', 'sal':$45,000},
'man2':{'name':'Rumbalski', 'id':'RST50', 'sal':$78,000}}

Easy way would be using pandas dataframe
import pandas as pd
df = pd.DataFrame([man_res, man1_res, man2_res], index=data, columns=key)
print(df)
df.to_dict(orient='index')
name id sal
man Alexandra RST01 $34,000
man1 Santio RST009 $45,000
man2 Rumbalski RST50 $78,000
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}
Or you could manually merge them using dict + zip
d = dict(zip(
data,
(dict(zip(key, res)) for res in (man_res, man1_res, man2_res))
))
d
{'man': {'name': 'Alexandra', 'id': 'RST01', 'sal': '$34,000'},
'man1': {'name': 'Santio', 'id': 'RST009', 'sal': '$45,000'},
'man2': {'name': 'Rumbalski', 'id': 'RST50', 'sal': '$78,000'}}

#save it in 2D array
all_man_res = []
all_man_res.append(man_res)
all_man_res.append(man1_res)
all_man_res.append(man2_res)
print(all_man_res)
#Add it into a dict output
output = {}
for i in range(len(l)):
person = l[i]
details = {}
for j in range(len(key)):
value = key[j]
details[value] = all_man_res[i][j]
output[person] = details
output

The pandas dataframe answer provided by NoThInG makes the most intuitive sense. If you are looking to use only the built in python tools, you can do
info_list = [dict(zip(key,man) for man in (man_res, man1_res, man2_res)]
output = dict(zip(data,info_list))

Related

How to get lenght of dict keys after specific element?

There is a dict
example_dict =
{'spend': '3.91',
'impressions': '791',
'clicks': '19',
'campaign_id': '1111',
'date_start': '2017-11-01',
'date_stop': '2019-11-27',
'age': '18-24',
'gender': 'male'}
I have to check if there are any additional keys after date_stop key and if yes, get the lenght of them and their names.
So far I made a list of keys
list_keys = list(example_dict.keys())
list_keys =
['spend',
'impressions',
'clicks',
'campaign_id',
'date_start',
'date_stop',
'age',
'gender']
And to check that there is 'date_stop' element is simple
if 'date_stop' in list_keys:
# what next
But how to proceed am not sure. Appreciate any help.
I guess it should be implement in diffrent way, You should be using dict, but if You really want to do this way You could use OrderedDict from collections:
from collections import OrderedDict
my_dict = {
'spend': '3.91',
'impressions': '791',
'clicks': '19',
'campaign_id': '1111',
'date_start': '2017-11-01',
'date_stop': '2019-11-27',
'age': '18-24',
'gender': 'male'
}
sorted_ordered_dict = OrderedDict(sorted(my_dict.items(), key=lambda t: t[0]))
if 'date_stop' in sorted_ordered_dict.keys():
keys = list(sorted_ordered_dict.keys())
index = keys.index('date_stop')
after_list = keys[index:]
print('len: ', len(after_list))
print('list: ', after_list)
use below code:
new_dict={}
list_keys = list(example_dict.keys())
k=""
for i in list_keys:
if 'date_stop' == i:
k="done"
if k=="done":
new_dict[i]=len(i)
output:
{'date_stop': 9, 'age': 3, 'gender': 6}
I hope you understand your question
if you want just name and number of keys use this:
new_dict=[]
list_keys = list(example_dict.keys())
k=""
for i in list_keys:
if 'date_stop' == i:
k="done"
if k=="done":
new_dict.append(i)
output:
print (new_dict)
print (len(new_dict))
['date_stop', 'age', 'gender']
3

Remove duplicates in python dictionary

I have a list of dictionaries in python and I would like to override old value with duplicate value. Please let me know how can I do.
{'message': [{'name': 'raghav', 'id': 10}, {'name': 'raghav', 'id': 11}]}
Output should be:
{'message': [ {'name': 'raghav', 'id': 11}]}
I don't know what you mean by "override old value with duplicate value". If you mean just picking the second dict from the list, you could:
print({k: [v[1]] for (k, v) in data.items()})
If the idea is to update the "name" with a newer value of "id" as you move along the list, then maybe:
def merge_records(data):
records = data['message']
users = {}
for record in records:
name = record['name']
id_ = record['id']
users[name] = id_
new_records = []
for name, id_ in users.items():
new_records.append({'name': name, 'id': id_})
return {'message': new_records}
But, if you have any control over how the data is represented, you might reconsider. You probably want a different data structure.
Here you go:
d = {'message': [{'name': 'raghav', 'id': 10}, {'name': 'raghav', 'id': 11}]}
#loop over outer dictionary
for key, value in d.items():
d[key] = [dict([t for k in value for t in k.items()])]
print(d)
Edit:
As per your requirement:
d = {'message': [ {'name': 'raghav', 'id': 11}, {'name': 'krish', 'id': 20}, {'name': 'anu', 'id': 30}]}
for key, value in d.items():
print [dict((k1,v1)) for k1,v1 in dict([tuple(i.items()) for i in value for val in i.items()]).items()]

compare two different length lists of dictionaries in python

I want to compare below dictionaries. Name key in the dictionary is common in both dictionaries.
If Name matched in both the dictionaries, i wanted to do some other stuff with the data.
PerfData = [
{'Name': 'abc', 'Type': 'Ex1', 'Access': 'N1', 'perfStatus':'Latest Perf', 'Comments': '07/12/2017 S/W Version'},
{'Name': 'xyz', 'Type': 'Ex1', 'Access': 'N2', 'perfStatus':'Latest Perf', 'Comments': '11/12/2017 S/W Version upgrade failed'},
{'Name': 'efg', 'Type': 'Cust1', 'Access': 'A1', 'perfStatus':'Old Perf', 'Comments': '11/10/2017 S/W Version upgrade failed, test data is active'}
]
beatData = [
{'Name': 'efg', 'Status': 'Latest', 'rcvd-timestamp': '1516756202.632'},
{'Name': 'abc', 'Status': 'Latest', 'rcvd-timestamp': '1516756202.896'}
]
Thanks
Rajeev
l = [{'name': 'abc'}, {'name': 'xyz'}]
k = [{'name': 'a'}, {'name': 'abc'}]
[i['name'] for i in l for f in k if i['name'] == f['name']]
Hope above logic work for you.
The answer provided didn't assign the result to any variable. If you want to print it, add the following would work:
result = [i['name'] for i in l for f in k if i['name'] == f['name']]
print(result)

parse multilevel json to string with condition

I have this nested json item that I just want to flatten out to a comma separated string (i.e. parkinson:5, billy mays:4)so I can store in a database if needed for future analysis. I wrote out the function below but am wondering if there's a more elegant way using list comprehension (or something else). I found this post but I'm not sure how to adapt it for my needs (Python - parse JSON values by multilevel keys).
Data looks like this:
{'persons':
[{'name': 'parkinson', 'sentiment': '5'},
{'name': 'knott david', 'sentiment': 'none'},
{'name': 'billy mays', 'sentiment': '4'}],
'organizations':
[{'name': 'piper jaffray companies', 'sentiment': 'none'},
{'name': 'marketbeat.com', 'sentiment': 'none'},
{'name': 'zacks investment research', 'sentiment': 'none'}]
'locations': []
}
Here's my code:
def parse_entities(data):
results = ''
for category in data.keys():
# for c_id, category in enumerate(data.keys()):
entity_data = data[category]
for e_id, entity in enumerate(entity_data):
if not entity_data[e_id]['sentiment'] == 'none':
results = results + (data[category][e_id]['name'] + ":" +
data[category][e_id]['sentiment'] + ",")
return results
Firstly, the most important thing to make your code shorter and nicer to look at is to use your own variables. Be aware that entity_data = data[category] and entity = entity_data[e_id]. So you can write entity['name'] instead of data[category][e_id]['name'].
Secondly, if you want something like
for category in data.keys():
entity_data = data[category]
you can make it shorter and easier to read by changing it to
for category, entity_data in data.items():
But you don't even need that here, you can just use the data.values() iterator to get the values. When combining these improvements your code looks like this:
def parse_entities(data):
results = ''
for entity_data in data.values():
for entity in entity_data:
if entity['sentiment'] != 'none':
results += entity['name'] + ":" + entity['sentiment'] + ","
return results
(I have also changed results = results + ... to results += ... and if not entity['sentiment'] == 'none' to if entity['sentiment'] != 'none', because it is shorter and doesn't lower the readability)
When you have this it is much easier to make it even shorter and more elegant by using list comprehension:
def parse_entities(data):
return ",".join([entity['name'] + ":" + entity['sentiment']
for entity_data in data.values()
for entity in entity_data
if not entity['sentiment'] == 'none'])
Maybe something like this will work?
def parse_entities(data):
results = []
for category in data.keys():
results += list(map(lambda x: '{0}:{1}'.format(x['name'], x['sentiment']),
filter(lambda i: i['sentiment'] != 'none', data[category])))
return ','.join(results)
if __name__ == '__main__':
print(parse_entities(data))
With the output looking like this
parkinson:5,billy mays:4
This might be a way to do it. Even though using a 'proper library' (depending on your actual use case) makes more sense.
data = {
'persons':
[{'name': 'parkinson', 'sentiment': '5'},
{'name': 'knott david', 'sentiment': 'none'},
{'name': 'billy mays', 'sentiment': '4'}],
'organizations':
[{'name': 'piper jaffray companies', 'sentiment': 'none'},
{'name': 'marketbeat.com', 'sentiment': 'none'},
{'name': 'zacks investment research', 'sentiment': 'none'}],
'locations': []
}
import itertools
# eq. = itertools.chain.from_iterable(data.values())
dicts = itertools.chain(*data.values())
pairs = [":".join([d['name'], d['sentiment']])
for d in dicts if d['sentiment'] != 'none']
result = ",".join(pairs)
print(result)
# parkinson:5,billy mays:4
# short, but less readable version
result = ",".join([":".join([d['name'], d['sentiment']])
for d in itertools.chain(*data.values())
if d['sentiment'] != 'none'])
This is a problem where we need to perform the 3 separate tasks:
Filter out unqualified rows of data
Flatten the dict of lists into a simple list
Transform each dictionary object into a simple tuple, ready for formatting
Here is the code:
def parse_entities(data):
new_data = [
(row['name'], row['sentiment']) # 3. Transform
for rows in data.values() # 2. Flatten
for row in rows # 2. Flatten
if row['sentiment'] != 'none' # 1. Filter
]
# e.g, new_data = [('parkinson', '5'), ('billy mays', '4')]
return ','.join('{}:{}'.format(*row) for row in new_data)
#
# test code
#
data = {
'locations': [],
'organizations': [
{'name': 'piper jaffray companies', 'sentiment': 'none'},
{'name': 'marketbeat.com', 'sentiment': 'none'},
{'name': 'zacks investment research', 'sentiment': 'none'}
],
'persons': [
{'name': 'parkinson', 'sentiment': '5'},
{'name': 'knott david', 'sentiment': 'none'},
{'name': 'billy mays', 'sentiment': '4'}
],
}
print parse_entities(data)
Output:
parkinson:5,billy mays:4
Here's a generator expression that does it:
data = {'persons': [
{'name': 'parkinson', 'sentiment': '5'},
{'name': 'knott david', 'sentiment': 'none'},
{'name': 'billy mays', 'sentiment': '4'}],
'organizations': [
{'name': 'piper jaffray companies', 'sentiment': 'none'},
{'name': 'marketbeat.com', 'sentiment': '99'},
{'name': 'zacks investment research', 'sentiment': 'none'}],
'locations': []
}
results = ','.join(entity['name'] + ':' + entity['sentiment']
for category, entity_data in data.items()
for entity in entity_data if entity['sentiment'] is not 'none')
print(results) # -> parkinson:5,billy mays:4,marketbeat.com:99
Note: I changed the sample data slightly to make sure it handled data in more than one category the same as your code.

Merging similar dictionaries in a list together

New to python here. I've been pulling my hair for hours and still can't figure this out.
I have a list of dictionaries:
[ {'FX0XST001.MID5': '195', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'}
{'FX0XST001.MID13': '4929', 'Name': 'Firmicutes', 'Taxonomy ID': '1239','Type': 'phylum'},
{'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'},
.
.
.
.
{'FX0XST001.MID6': '125', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}
{'FX0XST001.MID25': '70', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}
{'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'} ]
I want to merge the dictionaries in the list based on their Type, Name, and Taxonomy ID
[ {'FX0XST001.MID5': '195', 'FX0XST001.MID13': '4929', 'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'}
.
.
.
.
{'FX0XST001.MID6': '125', 'FX0XST001.MID25': '70', 'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'}]
I have the data structure setup like this because I need to write the data to CSV using csv.DictWriter later.
Would anyone kindly point me to the right direction?
You can use the groupby function for this:
http://docs.python.org/library/itertools.html#itertools.groupby
from itertools import groupby
keyfunc = lambda row : (row['Type'], row['Taxonomy ID'], row['Name'])
result = []
data = sorted(data, key=keyfunc)
for k, g in groupby(data, keyfunc):
# you can either add the matching rows to the item so you end up with what you wanted
item = {}
for row in g:
item.update(row)
result.append(item)
# or you could just add the matched rows as subitems to a parent dictionary
# which might come in handy if you need to work with just the parts that are
# different
item = {'Type': k[0], 'Taxonomy ID' : k[1], 'Name' : k[2], 'matches': [])
for row in g:
del row['Type']
del row['Taxonomy ID']
del row['Name']
item['matches'].append(row)
result.append(item)
Make some test data:
list_of_dicts = [
{"Taxonomy ID":1, "Name":"Bob", "Type":"M", "hair":"brown", "eyes":"green"},
{"Taxonomy ID":1, "Name":"Bob", "Type":"M", "height":"6'2''", "weight":200},
{"Taxonomy ID":2, "Name":"Alice", "Type":"F", "hair":"black", "eyes":"hazel"},
{"Taxonomy ID":2, "Name":"Alice", "Type":"F", "height":"5'7''", "weight":145}
]
I think this (below) is a neat trick using reduce that improves upon the other groupby solution.
import itertools
def key_func(elem):
return (elem["Taxonomy ID"], elem["Name"], elem["Type"])
output_list_of_dicts = [reduce((lambda x,y: x.update(y) or x), list(val)) for key, val in itertools.groupby(list_of_dicts, key_func)]
Then print the output:
for elem in output_list_of_dicts:
print elem
This prints:
{'eyes': 'green', 'Name': 'Bob', 'weight': 200, 'Taxonomy ID': 1, 'hair': 'brown', 'height': "6'2''", 'Type': 'M'}
{'eyes': 'hazel', 'Name': 'Alice', 'weight': 145, 'Taxonomy ID': 2, 'hair': 'black', 'height': "5'7''", 'Type': 'F'}
FYI, Python Pandas is far better for this sort of aggregation, especially when dealing with file I/O to .csv or .h5 files, than the itertools stuff.
Perhaps the easiest thing to do would be to create a new dictionary, indexed by a (Type, Name, Taxonomy ID) tuple, and iterate over your dictionary, storing values by (Type, Name, Taxonomy ID). Use a default dict to make this easier. For example:
from collections import defaultdict
grouped = defaultdict(lambda : {})
# iterate over items and store:
for entry in list_of_dictionaries:
grouped[(entry["Type"], entry["Name"], entry["Taxonomy ID"])].update(entry)
# now you have everything stored the way you want in values, and you don't
# need the dict anymore
grouped_entries = grouped.values()
This is a bit hackish, especially because you end up overwriting "Type", "Name", and "Phylum" every time you use update, but since your dict keys are variable, that might be the best you can do. This will get you at least close to what you need.
Even better would be to do this on your initial import and skip intermediate steps (unless you actually need to transform the data beforehand). Plus, if you could get at the only varying field, you could change the update to just: grouped[(type, name, taxonomy_id)][key] = value where key and value are something like: 'FX0XST001.MID5', '195'
from itertools import groupby
data = [ {'FX0XST001.MID5': '195', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type':'phylum'},
{'FX0XST001.MID13': '4929', 'Name': 'Firmicutes', 'Taxonomy ID': '1239','Type': 'phylum'},
{'FX0XST001.MID6': '826', 'Name': 'Firmicutes', 'Taxonomy ID': '1239', 'Type': 'phylum'},
{'FX0XST001.MID6': '125', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'},
{'FX0XST001.MID25': '70', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'},
{'FX0XST001.MID40': '40', 'Name': 'Acidobacteria', 'Taxonomy ID': '57723', 'Type': 'phylum'} ,]
kk = ('Name', 'Taxonomy ID', 'Type')
def key(item): return tuple(item[k] for k in kk)
result = []
data = sorted(data, key=key)
for k, g in groupby(data, key):
result.append(dict((i, j) for d in g for i,j in d.items()))
print result

Categories