Let's say we have the following data
all_values = (('a', 0, 0.1), ('b', 1, 0.5), ('c', 2, 1.0))
from which we want to produce a list of dictionaries like so:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
What's the most elegant way to do this in Python?
The best solution I've been able to come up with is
>>> import itertools
>>> zipped = zip(itertools.repeat(('name', 'location', 'value')), all_values)
>>> zipped
[(('name', 'location', 'value'), ('a', 0, 0.1)),
(('name', 'location', 'value'), ('b', 1, 0.5)),
(('name', 'location', 'value'), ('c', 2, 1.0))]
>>> dicts = [dict(zip(*e)) for e in zipped]
>>> dicts
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
It seems like a more elegant way to do this exists, probably using more of the tools in itertools.
How about:
In [8]: [{'location':l, 'name':n, 'value':v} for (n, l, v) in all_values]
Out[8]:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
or, if you prefer a more general solution:
In [12]: keys = ('name', 'location', 'value')
In [13]: [dict(zip(keys, values)) for values in all_values]
Out[13]:
[{'location': 0, 'name': 'a', 'value': 0.1},
{'location': 1, 'name': 'b', 'value': 0.5},
{'location': 2, 'name': 'c', 'value': 1.0}]
Related
I have a piece of code which generates a list of nested dictionaries like below:
[{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 2, 'num': 68}),
'final_value': 118},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 4, 'num': 67}),
'final_value': 117},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 6, 'num': 67}),
'final_value': 117}]
I want to convert the dictionary into a dataframe like below
How can I do it using Python?
I have tried the below piece of code
merge_values = [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 2, 'num': 68}),
'final_value': 118},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 4, 'num': 67}),
'final_value': 117},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 6, 'num': 67}),
'final_value': 117}]
test = pd.DataFrame()
i = 0
for match in merge_values:
for d in match:
final_cfr = d['final_value']
comb = d['cb']
i = i+1
z = pd.DataFrame()
for t in comb:
dct = {k:[v] for k,v in t.items()}
x = pd.DataFrame(dct)
x['merge_id'] = i
x['Final_Value'] = final_value
test = pd.concat([test, x])
The problem with this piece of code is it adds the rows one below another. I need the elements of the tuple next to each other.
You will need to clean your data by creating a new dict with the structure that you want, like this:
import pandas as pd
dirty_data = [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 2, 'num': 68}),
'final_value': 118},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 4, 'num': 67}),
'final_value': 117},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 6, 'num': 67}),
'final_value': 117}]
def clean_data(dirty_data: dict) -> dict:
names = []
ids = []
nums = []
m_ids = []
m_nums = []
finals = []
for cb in dirty_data:
names.append(cb["cb"][0]["Name"])
ids.append(cb["cb"][0]["ID"])
nums.append(cb["cb"][0]["num"])
m_ids.append(cb["cb"][1]["ID"])
m_nums.append(cb["cb"][1]["num"])
finals.append(cb["final_value"])
return {"Name": names, "ID": ids, "num": nums, "M_ID": m_ids, "M_num": m_nums, "Final": finals}
df = pd.DataFrame(clean_data(dirty_data))
df
You could try to read the data into a dataframe as is and then restructure it until you get the desired result, but in this case, it doesn't seem practical.
Instead, I'd flatten the input into a list of lists to pass to pd.DataFrame. Here is a relatively concise way to do that with your sample data:
from operator import itemgetter
import pandas as pd
data = [{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 2, 'num': 68}),
'final_value': 118},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 4, 'num': 67}),
'final_value': 117},
{'cb': ({'Name': 'A', 'ID': 1, 'num': 50},
{'Name': 'A', 'ID': 6, 'num': 67}),
'final_value': 117}]
keys = ['Name', 'ID', 'num', 'M_Name', 'M_ID', 'M_num', 'final_value']
# generates ['A', 1, 50, 'A', 2, 68, 118] etc.
flattened = ([value for item in row['cb']
for value in itemgetter(*keys[:3])(item)]
+ [row['final_value']]
for row in data)
df = pd.DataFrame(flattened)
df.columns = keys
# get rid of superfluous M_Name column
df.drop('M_Name', axis=1, inplace=True)
itemgetter(*keys[:3])(item) is the same as [item[k] for k in keys[:3]]. On flattening lists of lists with list (or generator) comprehensions, see How do I make a flat list out of a list of lists?.
Result:
Name ID num M_ID M_num final_value
0 A 1 50 2 68 118
1 A 1 50 4 67 117
2 A 1 50 6 67 117
Suppose I have an array of objects.
arr = [
{'grade': 'A', 'name': 'James'},
{'grade': 'B', 'name': 'Tom'},
{'grade': 'A', 'name': 'Zelda'}
]
I want this result
{
'A': [
{'grade': 'A', 'name': 'James'},
{'grade': 'A', 'name': 'Zelda'}
],
'B': [ {'grade': 'B', 'name': 'Tom'} ]
}
Use a dict and setdefault:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
arr2 = {}
for d in arr:
t = arr2.setdefault(d['grade'], [])
t.append(d)
>>> arr2
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}],
'B': [{'grade': 'B', 'name': 'Tom'}]}
Using dict.setdefault we can do this:
import json
gradeList = [
{"grade": 'A', "name": 'James'},
{"grade": 'B', "name": 'Tom'},
{"grade": 'A', "name": 'Zelda'}
]
gradeDict = {}
for d in gradeList:
gradeDict.setdefault(d["grade"], []).append(d)
print(json.dumps(gradeDict, indent=4))
Output:
{
"A": [
{
"grade": "A",
"name": "James"
},
{
"grade": "A",
"name": "Zelda"
}
],
"B": [
{
"grade": "B",
"name": "Tom"
}
]
}
You can use itertools.groupby
>>> keyfunc = lambda item: item['grade']
>>> {k:list(v) for k,v in itertools.groupby( sorted(arr,key=keyfunc) , keyfunc) }
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}], 'B': [{'grade': 'B', 'name': 'Tom'}]}
I would use a pd.Dataframe and do it like this:
import pandas as pd
df = pd.Dataframe(arr)
for index, group in df.groupby('grade'):
print(group)
Instead of print(group) you can write the data to whatever you need it, I suppose it is not necessarily a dict like you described.
I would do a simple loop like this:
arr = [{'grade': 'A', 'name': 'James'}, {'grade': 'B', 'name': 'Tom'}, {'grade': 'A', 'name': 'Zelda'}]
grouped_grades = {}
for item in arr:
if item['grade'] not in grouped_grades:
grouped_grades[item['grade']] = []
grouped_grades[item['grade']].append(item)
print(grouped_grades)
Output:
{'A': [{'grade': 'A', 'name': 'James'}, {'grade': 'A', 'name': 'Zelda'}], 'B': [{'grade': 'B', 'name': 'Tom'}]}
I think that the easiest way is to use defaultdict. Then you could convert the result back into an ordinary dict if you need to by simply passing it in the constructor like dict(output).
from collections import defaultdict
output = defaultdict(lambda: [])
for item in arr:
output[item['grade']].append(item)
What I have:
a=[{'name':'a','vals':1,'required':'yes'},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
What I tried:
[[i]+[j] for i in b for j in a if i['name']==j['name']]
What I got:
[[{'name': 'a', 'type': 'car'}, {'name': 'a', 'vals': 1}], [{'name': 'b', 'type': 'bike'}, {'name': 'b', 'vals': 2}]]
What I want:
[{'name': 'a', 'type': 'car','vals': 1},{'name': 'b', 'type': 'bike','vals': 2}]
Note:
I need to merge dicts into one dict.
It should merge only those have common 'name' in both a and b.
I want python one liner answer.
For Python 3, you can do this:
a=[{'name':'a','vals':1},{'name':'b','vals':2},{'name':'d','vals':3}]
b=[{'name':'a','type':'car'},{'name':'b','type':'bike'},{'name':'c','type':'van'}]
print([{**i,**j} for i in b for j in a if i['name']==j['name']])
I have a list of a dictionary of data that is in order in some places and out of order in others:
Eg:
data = [{"text":'a', "value":1},
{"text":'b', "value":1},
{"text":'j', "value":2},
{"text":'k', "value":50},
{"text":'b', "value":50},
{"text":'y', "value":52},
{"text":'x', "value":2},
{"text":'k', "value":3},
{"text":'m', "value":3}]
I want to sort them as:
o = [{"text":'a', "value":1},
{"text":'b', "value":1},
{"text":'j', "value":2},
{"text":'x', "value":2},
{"text":'k', "value":3},
{"text":'m', "value":3},
{"text":'k', "value":50},
{"text":'b', "value":50},
{"text":'y', "value":52}]
wherein my sorting is some combination of the index of the item and the 2nd value, I was thinking sort with:
key=[(2nd value)<<len(closest power of 2 to len(index)) + index]
I can sort by the list of dicts by the 2nd value with:
data.sort(key= lambda x:x['value'])
How do I also add the index of the dictionary?
And is there a better sorting key I could use?
It appears that you're looking for the text field as a secondary sort key. The easiest way is to simply use a tuple for your keys, in priority order:
sorted(data, key=lambda x: (x['value'], x['text']) )
Does that yield what you need? Output:
[{'text': 'a', 'value': 1}, {'text': 'b', 'value': 1}, {'text': 'j', 'value': 2}, {'text': 'x', 'value': 2}, {'text': 'k', 'value': 3}, {'text': 'm', 'value': 3}, {'text': 'b', 'value': 50}, {'text': 'k', 'value': 50}, {'text': 'y', 'value': 52}]
The values (k, 50) and (b, 50) are now in the other order; I'm hopeful that I read your mind correctly.
UPDATE per OP clarification
I checked the docs. Python's sort method is stable, so you don't need the second sort key at all: in case of a tie, sort will maintain the original ordering:
>>> data.sort(key= lambda x:x['value'])
>>> data
[{'text': 'a', 'value': 1}, {'text': 'b', 'value': 1}, {'text': 'j', 'value': 2}, {'text': 'x', 'value': 2}, {'text': 'k', 'value': 3}, {'text': 'm', 'value': 3}, {'text': 'k', 'value': 50}, {'text': 'b', 'value': 50}, {'text': 'y', 'value': 52}]
... and this is what you requested.
Use enumerate to get the index and use that to sort
>>> res = [d for i,d in sorted(enumerate(data), key=lambda i_d: (i_d[1]['value'], i_d[0]))]
>>> pprint(res)
[{'text': 'a', 'value': 1},
{'text': 'b', 'value': 1},
{'text': 'j', 'value': 2},
{'text': 'x', 'value': 2},
{'text': 'k', 'value': 3},
{'text': 'm', 'value': 3},
{'text': 'k', 'value': 50},
{'text': 'b', 'value': 50},
{'text': 'y', 'value': 52}]
To sort it in-place, you can try using itertools.count
>>> from itertools import count
>>> cnt=count()
>>> data.sort(key=lambda d: (d['value'], next(cnt)))
>>> pprint(data)
[{'text': 'a', 'value': 1},
{'text': 'b', 'value': 1},
{'text': 'j', 'value': 2},
{'text': 'x', 'value': 2},
{'text': 'k', 'value': 3},
{'text': 'm', 'value': 3},
{'text': 'k', 'value': 50},
{'text': 'b', 'value': 50},
{'text': 'y', 'value': 52}]
>>>
Have you tried this:
sorted(data, key=lambda x: x['value'])
I hate to ask this but I can't figure it out and it's getting to me.
I have to make a function that takes a given dictionary d1 and sort of compares it to another dictionary d2 then adds the compared value to d2.
d1 is already in the format needed to I don't have to worry about it.
d2 however, is a nested dictionary. It looks like this:
{’345’: {’Name’: ’xyzzy’, ’ID’: ’345’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’b’, ’Q2’: ’a’}},
’123’: {’Name’: ’foo’, ’ID’: ’123’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’a’, ’Q2’: ’b’}},
’234’: {’Name’: ’bar’, ’ID’: ’234’, ’Responses’: {’Q3’: ’c’, ’Q1’: ’a’, ’Q4’: ’b’, ’Q2’: ’b’}}}
So d1 is in the format of the Responses key, and that's what I need from d2 to compare it to d1.
So to do that I isolate responses:
for key, i in d2.items():
temp = i['Responses']
Now I need to run temp through a function with d1 that will output an integer. Then match that integer with the top-level key it came from and update a new k/v entry associated with it. But I don't know how to do this.
I've managed to update each top-level key with that compared value, but it only uses the first compared value for all the top-level keys. I can't figure out how to match the integer found to its key. This is what I have so far that works the best:
for i in d2:
score = grade_student(d1,temp) #integer
placement = {'Score': score}
d2[i].update(placement)
You could just iterate over sub dictionaries in d2 and update them once you've called grade_student:
for v in d2.values():
v['Score'] = grade_student(d1, v['Responses'])
Here's a complete example:
import pprint
d1 = {}
d2 = {
'345': {'Name': 'xyzzy', 'ID': '345', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'b', 'Q2': 'a'}},
'123': {'Name': 'foo', 'ID': '123', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'a', 'Q2': 'b'}},
'234': {'Name': 'bar', 'ID': '234', 'Responses': {'Q3': 'c', 'Q1': 'a', 'Q4': 'b', 'Q2': 'b'}}
}
# Dummy
def grade_student(x, y):
return 1
for v in d2.values():
v['Score'] = grade_student(d1, v['Responses'])
pprint.pprint(d2)
Output:
{'123': {'ID': '123',
'Name': 'foo',
'Responses': {'Q1': 'a', 'Q2': 'b', 'Q3': 'c', 'Q4': 'a'},
'Score': 1},
'234': {'ID': '234',
'Name': 'bar',
'Responses': {'Q1': 'a', 'Q2': 'b', 'Q3': 'c', 'Q4': 'b'},
'Score': 1},
'345': {'ID': '345',
'Name': 'xyzzy',
'Responses': {'Q1': 'a', 'Q2': 'a', 'Q3': 'c', 'Q4': 'b'},
'Score': 1}}
You don't have to iterate them. Use the built-in update() method. Here is an example
>>> A = {'cat':10, 'dog':5, 'rat':50}
>>> B = {'cat':5, 'dog':10, 'pig':20}
>>> A.update(B) #This will merge the dicts by keeping the values of B if collision
>>> A
{'rat': 50, 'pig': 20, 'dog': 10, 'cat': 5}
>>> B
{'pig': 20, 'dog': 10, 'cat': 5}