Convert tuples to list of dictionary - python

I'm having trouble converting two tuples into a list of dictionaries. Here is the structure:
train_detail = (Counter({2: 50, 0: 62, 1: 38}),
{2: 0.3333333333333333, 0: 0.41333333333333333, 1: 0.25333333333333335})
test_detail = (Counter({2: 6, 0: 49, 1: 4}),
{2: 0.1016949152542373, 0: 0.8305084745762712, 1: 0.06779661016949153})
Now i want to turn these two into a structure like the following:
[
{
"label": "0",
"trainPercent": 0.41333333333333333,
"trainNumber": 62,
"testPercent": 0.8305084745762712,
"testNumber": 49,
},
{
"label": "1",
"trainPercent": 0.25333333333333335,
"trainNumber": 38,
"testPercent": 0.06779661016949153,
"testNumber": 4,
},
{
"label": "2",
"trainPercent": 0.3333333333333333,
"trainNumber": 50,
"testPercent": 0.1016949152542373,
"testNumber": 6,
},
]
What's an effective way of doing that with minimum looping? Thank you. Note Counter is a subclass of dict so inherited every methods of a regular dict.

from pprint import pprint
from collections import Counter
train_detail = (Counter({2: 50, 0: 62, 1: 38}),
{2: 0.3333333333333333, 0: 0.41333333333333333, 1: 0.25333333333333335})
test_detail = (Counter({2: 6, 0: 49, 1: 4}),
{2: 0.1016949152542373, 0: 0.8305084745762712, 1: 0.06779661016949153})
out = []
for t in train_detail[0]:
out.append({
'label': str(t),
'trainNumber': train_detail[0][t],
'trainPercent': train_detail[1][t],
'testPercent': test_detail[1][t],
'testNumber': test_detail[0][t]
})
# pretty print to screen:
pprint(out)
Prints:
[{'label': '2',
'testNumber': 6,
'testPercent': 0.1016949152542373,
'trainNumber': 50,
'trainPercent': 0.3333333333333333},
{'label': '0',
'testNumber': 49,
'testPercent': 0.8305084745762712,
'trainNumber': 62,
'trainPercent': 0.41333333333333333},
{'label': '1',
'testNumber': 4,
'testPercent': 0.06779661016949153,
'trainNumber': 38,
'trainPercent': 0.25333333333333335}]

Use lambda function.lambda is used to unpack tuples.

Thanks for #Andrej Kesely and #Björn Marschollek
To combine their answers:
labels_detail_list = [{'label': str(i),
'trainNumber': train_detail[0][i],
'trainPercent': train_detail[1][i],
'testNumber': test_detail[0][i],
'testPercent': test_detail[1][i]
} for i in train_detail[0]]
sorted(labels_detail_list, key=lambda x: int(x['label']))
should be more concise version.

Related

Python - Iterate over a dict with list as values and create new dict based on list values

I need to create a new dict for each value in the list for each key
dict = {'app_clicks': [56, 65, 41, 40, 64],
'billed_charge_local_micro': [219941307,
247274715,
181271175,
164359644,
223745830],
'billed_engagements': [85831, 89976, 60566, 55304, 88839],
'card_engagements': None,
'carousel_swipes': None,
'clicks': [322, 351, 225, 197, 337],
'engagements': [363, 397, 258, 233, 383],
'follows': None,
'impressions': [86236, 90763, 60596, 55689, 88916],
'likes': [4, 2, 3, 1, 8],
'media_engagements': [41, 45, 33, 36, 46],
'media_views': [33533, 35665, 23611, 21957, 35792],
'poll_card_vote': None,
'qualified_impressions': None,
'replies': [0, 1, 0, 0, 0],
'retweets': None,
'tweets_send': None,
'unfollows': None,
'url_clicks': [56, 65, 41, 40, 64],
'video_15s_views': [27859, 29801, 19852, 18974, 30373],
'video_3s100pct_views': [16441, 17699, 11112, 10337, 16993],
'video_6s_views': [17332, 18785, 12126, 11517, 18663],
'video_content_starts': [81824, 85312, 58449, 53392, 84893],
'video_cta_clicks': None,
'video_total_views': [33533, 35665, 23611, 21957, 35792],
'video_views_100': [27861, 29774, 19840, 18982, 30386],
'video_views_25': [72137, 75452, 51049, 46138, 74744],
'video_views_50': [48377, 50961, 34242, 31603, 51323],
'video_views_75': [35233, 37444, 24959, 23430, 38004]}
And basically I need to create a new list with dicts for each value on the list.
So it would be something like this
new_list_with_dicts = [
{ "billed_charge_local_micro" : 219941307,
"clicks": 322,
"impressions" : 86236,
},
{ "billed_charge_local_micro" : 247274715,
"clicks": 351,
"impressions" : 90763,
},
{ "billed_charge_local_micro" : 181271175,
"clicks": 225,
"impressions" : 60596,
},
]
This is one of the ways I tried:
i = 0
list_data = []
dict = {}
for key,value in report.items():
if isinstance(value, list):
while i < len(value):
dict[key] = value[i]
pprint.pprint(dict)
i = i + 1
i = 0
But it is creating the dict with repeated values, apparently for the last item of each array
I hope I could explain clearly
I'm stuck in this so any help would be much appreciated.
Thanks in advance
I belive this solves the problem, it seems like you were almost there but messed up some of the syntax
list_data = [{} for i in range(len(list(report.values())[0]))]
for key,value in report.items():
if isinstance(value, list):
for i in range(len(value)):
list_data[i][key] = value[i]
print(list_data)
and this should return the desired output assuming your dictionary is stored in report

How to sort nested python dictionary using item's sub-value

How to sort nested python dictionary items by their sub-values and save that dictionary items in descending order
Describing dictionary:
Before sorted
my_dict = {
"Bob": {"Buy": 25, "Sell": 33, "Quantity": 100},
"Moli": {"Buy": 75, "Sell": 53, "Quantity": 300},
"Annie": {"Buy": 74, "Sell": 83, "Quantity": 96},
"Anna": {"Buy": 55, "Sell": 83, "Quantity": 154},
}
I want to sort dictionary items in descending order by their sub-values (i.e., "Quantity") and the output should be like this:
After sorted
my_dict={
"Moli": {"Buy": 75, "Sell": 53, "Quantity": 300},
"Anna": {"Buy": 55, "Sell": 83, "Quantity": 154},
"Bob": {"Buy": 25, "Sell": 33, "Quantity": 100},
"Annie": {"Buy": 74, "Sell": 83, "Quantity": 96},
}
How can I do it without using any functions like:
def sort_by_quantity(dic):
keys = sorted(dic.items(), key=lambda x: x[1]['Quantity']) # list of sorted keys
return dict((x, y) for x, y in keys) # convert tuple back to dict
or
sdct = dictt.copy()
def func(d):
return sdct[d]['Quantity']
dct = sorted(dictt, key=func, reverse=True)
newDict = {i: dictt[i] for i in dct}
print(newDict)
etc...
The way we do sort in Pandas dataframe like:
df = df1.sort_values('Quantity', ascending=False)
I will be very grateful if someone tells me how to do this :) :) :)
try this:
my_dict = {
"Bob": {"Buy": 25, "Sell": 33, "Quantity": 100},
"Moli": {"Buy": 75, "Sell": 53, "Quantity": 300},
"Annie": {"Buy": 74, "Sell": 83, "Quantity": 96},
"Anna": {"Buy": 55, "Sell": 83, "Quantity": 154}
}
sorted_dict = dict( sorted(my_dict.items(), key=lambda v: v[1]['Quantity']) )
print(sorted_dict)
# for reverse
sorted_dict_reversed = dict( sorted(my_dict.items(), key=lambda v: v[1]['Quantity'], reverse=True) )
print(sorted_dict_reversed)
If you want to achieve this behavior in Python 3.5-, or some implementations of Python 3.6, you will have to use collections.OrderedDict. In Python 3.7+, or some implementations of Python 3.6 (such as the standard CPython3.6) the normal dict has the same behavior as collections.OrderedDict. Form here all, I will call these versions 3.5-* and 3.7+* to indicate that 3.6 may be included in both depending on your Python implementation.
In any case, the mentioned structures sort items by insertion time, so you will need to create a new object.
Dict comprehension (only valid for dict in 3.7+*):
my_new_dict = {k: v for k, v in sorted(my_dict.items(), key=lambda x: x[1]['Quantity'])}
Constructor (valid for both dict in 3.7+* and collections.OrderedDict backwards compatible):
my_new_dict = dict(sorted(my_dict.items(), key=lambda x: x[1]['Quantity']))
from collections import OrderedDict
my_new_dict = OrderedDict(sorted(my_dict.items(), key=lambda x: x[1]['Quantity']))

How to flatten disorganised dictionaries into list?

I attempted to flatten a disorganized dictionary (that in turn was taken from a json file) to ease extracting info. Below is an example of how the dictionary is structured and my attempt at flattening it:
data = {'horse':{'speed':{"walk": 40, "run":50}}, 'dog':{'run':30}, 'human':{'gait':{'normal':{'run': 25, 'walk': 30}}}}
flat_dict = []
for items in list(data.items()):
flat_list = []
flat_list.append(items[0])
try:
for item in list(items[1].items())[0]:
if type(item) is not dict:
flat_list.append(item)
else:
flat_list.append(list(item.keys())[0])
flat_list.append(list(item.values())[0])
except:
flat_list.append(items[0])
flat_dict.append(flat_list)
print(flat_dict)
However the above code does not flatten the entire dictionary and some information is lost, here's the output of the above code:
[['horse', 'speed', 'walk', 40], ['dog', 'run', 30], ['human', 'gait', 'normal', {'run': 25, 'walk': 30}]]
What I wanted was:
[['horse', 'speed', 'walk', 40, 'run', 50], ['dog', 'run', 30], ['human', 'gait', 'normal', 'run', 25, 'walk', 30]]
What do I do?
you can use a recursive approach with a list comprehension:
def gen(d):
if isinstance(d, dict):
for k, v in d.items():
yield k
yield from gen(v)
else:
yield d
[[k, *gen(v)] for k, v in data.items()]
output:
[['horse', 'speed', 'walk', 40, 'run', 50],
['dog', 'run', 30],
['human', 'gait', 'normal', 'run', 25, 'walk', 30]]
As you don't know the structure inside the dict you cannot use simple loops to handle each case, you need to use recursion, I'd suggest an utility method to flatten whatever structure recursivly, then make use it to make arrays of [key, flatten(values)]
def flatten(values) -> list:
if isinstance(values, list):
return [v for value in values for v in flatten(value)]
if isinstance(values, dict):
return [*values.keys(), *flatten(list(values.values()))]
return [values]
def flatten_dict(values: dict) -> list:
return [[key, *flatten(value)] for key, value in values.items()]
if __name__ == '__main__':
# ['foo']
print(flatten('foo'))
# ['foo', 'bar', 'uio', 1, 2, 3, 'k1', 'k2', 'v1', 'kk1', '9', 5, 9, 8, 7]
print(flatten(['foo', ['bar', 'uio', [1, 2, 3]], {'k1': 'v1', 'k2': {'kk1': ['9', 5, 9, 8, 7, ]}}]))
data = {'horse': {'speed': {"walk": 40, "run": 50}}, 'dog': {'run': 30},
'human': {'gait': {'normal': {'run': 25, 'walk': 30}}}}
# [['horse', 'speed', 'walk', 'run', 40, 50], ['dog', 'run', 30], ['human', 'gait', 'normal', 'run', 'walk', 25, 30]]
print(flatten_dict(data))
Answered as asked:
data = {
'horse': {
'speed': {
"walk": 40, "run": 50}},
'dog': {
'run': 30},
'human': {
'gait': {
'normal': {
'run': 25, 'walk': 30}}}}
def my_flatten(ddict, mylist):
for k, v in ddict.items():
if isinstance(v, dict):
mylist.append(k)
my_flatten(v, mylist)
else:
mylist.extend([k, v])
return mylist
flist = [my_flatten(v, [k]) for k, v in data.items()]
print(flist)

Extracting values from list of dicts to be used in calculation

My input data is a list of dicts (matches), where each dict has 2 possible places for a record to show up as well as a correlating factor between the two and their respective data sources:
[
{ 'r1': record_1, 'r2': record_2, corr: 85, 'r1_source': source_1, 'r2_source': source_2 },
{ 'r1': record_1, 'r2': record_3, corr: 90, 'r1_source': source_1, 'r2_source': source_3 },
{ 'r1': record_2, 'r2': record_3, corr: 77, 'r1_source': source_2, 'r2_source': source_3 },
...
]
Each record is represented by a list which comes from a finite list of unique records.
The structure of my desired output data is a list of dicts where each unique record has itself, its source, and its average correlating factor:
[
{ 'record': record_1, 'source': source_1, 'avg': (85 + 90) / 2 },
{ 'record': record_2, 'source': source_2, 'avg': (85 + 77) / 2 },
{ 'record': record_3, 'source': source_3, 'avg': (90 + 77) / 2 },
]
My current solution:
def average_record_from_match_value(matches):
averaged_recs = []
for match in matches:
# Q1
if [rec for rec in averaged_recs if rec['record'] == match['r1']] == []:
a_recs = []
# Q2
a_recs.extend([m['corr'] for m in matches if m['r1'] == match['r1']])
a_recs.extend([m['corr'] for m in matches if m['r2'] == match['r1']])
# Q3
r1_value = sum(a_recs) / len(a_recs)
averaged_recs.append({ 'record': match['r1'],
'source': match['r1_source'],
'match_value': r1_value,
'record_value': r1_value})
if [rec for rec in averaged_recs if rec['record'] == match['r2']] == []:
b_recs = []
b_recs.extend([m['corr'] for m in matches if m['r1'] == match['r2']])
b_recs.extend([m['corr'] for m in matches if m['r2'] == match['r2']])
r2_value = sum(b_recs) / len(b_recs)
averaged_recs.append({ 'record': match['r2'],
'source': match['r2_source'],
'match_value': r2_value,
'record_value': r2_value})
return averaged_recs
This works, but I'm sure it can be improved. My questions as labeled by the comments above are:
Is there a better way to enforce uniqueness here? I have a gut
feeling that I don't need to be traversing my averaged_recs list
for every match.
Can I corral all of these records without looping
over them twice like this?
Can/should this average calculation be combined with the previous list extension?
Thanks for your help!
it's a little hard to do it with list comprehension but I managed to write it with a few less line and hopefully clutter using a tmp dict to sort the keys
lst = [
{ 'r1': 'record_1', 'r2': 'record_2', 'corr': 85, 'r1_source': 'source_1', 'r2_source': 'source_2' },
{ 'r1': 'record_1', 'r2': 'record_3', 'corr': 90, 'r1_source': 'source_1', 'r2_source': 'source_3' },
{ 'r1': 'record_2', 'r2': 'record_3', 'corr': 77, 'r1_source': 'source_2', 'r2_source': 'source_3' },
]
tmp_dict = {}
for d in lst:
if d['r1'] not in tmp_dict.keys():
tmp_dict[d['r1']] = {}
tmp_dict[d['r1']]['corr'] = list()
tmp_dict[d['r1']]['source'] = d['r1_source']
if d['r2'] not in tmp_dict.keys():
tmp_dict[d['r2']] = {}
tmp_dict[d['r2']]['corr'] = list()
tmp_dict[d['r2']]['source'] = d['r2_source']
tmp_dict[d['r1']]['corr'].append(d['corr'])
tmp_dict[d['r2']]['corr'].append(d['corr'])
print [{ 'record': k, 'source': tmp_dict[k]['source'], 'avg': sum(tmp_dict[k]['corr'])/float(len(tmp_dict[k]['corr'])) } for k in tmp_dict.keys()]
My idea,
We can loop the list to generate one dict for all r1, r2, if r1, append it to the head of the list, if r2, add it to the tail.
Then loop this dict to get the output you expected.
from collections import defaultdict
test = [
{ 'r1': 'record_1', 'r2': 'record_2', 'corr': 85, 'r1_source': 'source_1', 'r2_source': 'source_2' },
{ 'r1': 'record_1', 'r2': 'record_3', 'corr': 90, 'r1_source': 'source_1', 'r2_source': 'source_3' },
{ 'r1': 'record_2', 'r2': 'record_3', 'corr': 77, 'r1_source': 'source_2', 'r2_source': 'source_3' },
]
temp = defaultdict(list)
for item in test:
temp[item['r1']].insert(0, item)
temp[item['r2']].append(item)
result = []
for key, value in temp.items():
new_item = {}
new_item['avg'] = sum(list(map(lambda item: item['corr'], value)))*1.0/len(value)
new_item['record'] = key
new_item['source'] = value[0]['r1_source'] if key == value[0]['r1'] else value[0]['r2_source']
result.append(new_item)
print(result)
Output:
[{'avg': 87.5, 'record': 'record_1', 'source': 'source_1'}, {'avg': 81.0, 'record': 'record_2', 'source': 'source_2'}, {'avg': 83.5, 'record': 'record_3', 'source': 'source_3'}]
[Finished in 0.175s]
Update 1:
If r1 and r2 are the list, we can convert it to tuple, then convert it back when calculate the output.
so the codes will be like:
from collections import defaultdict
record1 = [1, 2, 3]
record2 = [4, 5, 6]
record3 = [7, 8, 9]
test = [
{ 'r1': record1, 'r2': record2, 'corr': 85, 'r1_source': 'source_1', 'r2_source': 'source_2' },
{ 'r1': record1, 'r2': record3, 'corr': 90, 'r1_source': 'source_1', 'r2_source': 'source_3' },
{ 'r1': record2, 'r2': record3, 'corr': 77, 'r1_source': 'source_2', 'r2_source': 'source_3' },
]
temp = defaultdict(list)
for item in test:
temp[tuple(item['r1'])].insert(0, item)
temp[tuple(item['r2'])].append(item)
result = []
for key, value in temp.items():
new_item = {}
new_item['avg'] = sum(list(map(lambda item: item['corr'], value)))*1.0/len(value)
new_item['record'] = list(key)
new_item['source'] = value[0]['r1_source'] if list(key) == value[0]['r1'] else value[0]['r2_source']
result.append(new_item)
print(result)
Output:
[{'avg': 87.5, 'record': [1, 2, 3], 'source': 'source_3'}, {'avg': 81.0, 'record': [4, 5, 6], 'source': 'source_3'}, {'avg': 83.5, 'record': [7, 8, 9], 'source': 'source_3'}]
[Finished in 0.178s]
Q1 - A dictionary is inherently made by unique elements so I don't believe you need to recheck it this way. You're also iterating through averaged recs, which is empty.
Q2 - You could use or in the if statement
[m['corr'] for m in matches if m['r1'] == match['r1'] or m['r2'] == match['r1']]
Q3 - I don't really think you need another way to do it

pprint dictionary on multiple lines

I'm trying to get a pretty print of a dictionary, but I'm having no luck:
>>> import pprint
>>> a = {'first': 123, 'second': 456, 'third': {1:1, 2:2}}
>>> pprint.pprint(a)
{'first': 123, 'second': 456, 'third': {1: 1, 2: 2}}
I wanted the output to be on multiple lines, something like this:
{'first': 123,
'second': 456,
'third': {1: 1,
2: 2}
}
Can pprint do this? If not, then which module does it? I'm using Python 2.7.3.
Use width=1 or width=-1:
In [33]: pprint.pprint(a, width=1)
{'first': 123,
'second': 456,
'third': {1: 1,
2: 2}}
You could convert the dict to json through json.dumps(d, indent=4)
import json
print(json.dumps(item, indent=4))
{
"second": 456,
"third": {
"1": 1,
"2": 2
},
"first": 123
}
If you are trying to pretty print the environment variables, use:
pprint.pprint(dict(os.environ), width=1)
Two things to add on top of Ryan Chou's already very helpful answer:
pass the sort_keys argument for an easier visual grok on your dict, esp. if you're working with pre-3.6 Python (in which dictionaries are unordered)
print(json.dumps(item, indent=4, sort_keys=True))
"""
{
"first": 123,
"second": 456,
"third": {
"1": 1,
"2": 2
}
}
"""
dumps() will only work if the dictionary keys are primitives (strings, int, etc.)
This is a Copy-Pasta for testing purposes and to help with a usage example.
from pprint import pprint # I usually only need this module from the package.
a = {'first': 123, 'second': 456, 'third': {1:1, 2:2}, 'zfourth': [{3:9, 7:8}, 'distribution'], 1:2344, 2:359832, 3:49738428, 4:'fourth', 5:{'dictionary':'of things', 'new':['l','i','s','t']}}
pprint(dict(a), indent=4, width=1)
# Wrap your variable in dict() function
# Optional: indent=4. for readability
# Required: width=1 for wrapping each item to its own row.
# Note: Default pprint is to sort the dictionary
# Note: This also auto-wraps anything sting that has spaces in it. See 'of things' below.
# Documentation: https://docs.python.org/3/library/pprint.html
# Examples: https://pymotw.com/2/pprint/
# Blog: https://realpython.com/python-pretty-print/
Provides the following result:
{ 1: 2344,
2: 359832,
3: 49738428,
4: 'fourth',
5: { 'dictionary': 'of '
'things',
'new': [ 'l',
'i',
's',
't']},
'first': 123,
'second': 456,
'third': { 1: 1,
2: 2},
'zfourth': [ { 3: 9,
7: 8},
'distribution']}

Categories