Converting complex dictionary to pandas dataframe - python

I am looking into Pythonian ways of extracting part of the dictionary below and turning it into a pandas DataFrame as shown. Appreciate your help with that!
{'data': [{'x': {'name': 'Gamma', 'unit': 'cps', 'values': [10, 20, 30]},
'y': {'name': 'Depth', 'unit': 'm', 'values': [34.3, 34.5, 34.7]}}]}
Depth
Gamma
1
34.3
10
2
34.4
20
3
34.5
30

Sure, basically, you need to iterate over the values of each dict in the 'data' list, which is itself a dict of column information:
In [1]: data = {'data': [{'x': {'name': 'Gamma', 'unit': 'cps', 'values': [10, 20, 30]},
...: 'y': {'name': 'Depth', 'unit': 'm', 'values': [34.3, 34.5, 34.7]}}]}
In [2]: import pandas as pd
In [3]: pd.DataFrame({
...: col["name"]: col["values"]
...: for d in data['data']
...: for col in d.values()
...: })
Out[3]:
Gamma Depth
0 10 34.3
1 20 34.5
2 30 34.7

Related

Delete some rows in dataframe based on condition in another column

I have a dataframe as follows:
name
value
aa
0
aa
0
aa
1
aa
0
aa
0
bb
0
bb
0
bb
1
bb
0
bb
0
bb
0
I want to delete all rows of the dataframe when there is 1 appeared in column 'value' with relation to 'name' column.
name
value
aa
0
aa
0
aa
1
bb
0
bb
0
bb
1
What is the best way to do so? I thought about pd.groupby method and use some conditions inside, but cannot understand how to make it work.
Not the most beautiful of ways to do it but this should work.
df = df.loc[df['value'].groupby(df['name']).cumsum().groupby(df['name']).cumsum() <=1]
Here's my approach on solving this.
# Imports.
import pandas as pd
# Creating a DataFrame.
df = pd.DataFrame([{'name': 'aa', 'value': 0},
{'name': 'aa', 'value': 0},
{'name': 'aa', 'value': 1},
{'name': 'aa', 'value': 0},
{'name': 'aa', 'value': 0},
{'name': 'bb', 'value': 0},
{'name': 'bb', 'value': 0},
{'name': 'bb', 'value': 1},
{'name': 'bb', 'value': 0},
{'name': 'bb', 'value': 0},
{'name': 'bb', 'value': 0},
{'name': 'bb', 'value': 0}])
# Filtering the DataFrame.
df_filtered = df.groupby('name').apply(lambda x: x[x.index <= x['value'].idxmax()]).reset_index(drop=True)

dictionary from df with columns within a key

I have a df such as follows:
data = [['a', 10, 1], ['b', 15,12], ['c', 14,12]]
df = pd.DataFrame(data, columns = ['Name', 'x', 'y'])
Name x y
0 a 10 1
1 b 15 12
2 c 14 12
Now I want it to pass it to a dict where x and y are inside of a key called total:
so the final dict would be like this
{
'Name': 'a',
"total": {
"x": 308,
"y": 229
},
}
I know i can use df.to_dict('records') to get this dict:
{
'Name': 'a',
"x": 308,
"y": 229
}
Any tips?
You could try
my_dict = [{'Name': row['Name'], 'total': {'x': row['x'], 'y': row['y']}} for row in df.to_dict('records')]
Result:
[{'Name': 'a', 'total': {'x': 10, 'y': 1}}, {'Name': 'b', 'total': {'x': 15, 'y': 12}}, {'Name': 'c', 'total': {'x': 14, 'y': 12}}]
Or, if you wish to convert all columns except the 'Name' to the 'total', and provided that there are no repititions in 'Name':
df.set_index('Name', inplace=True)
result = [{'Name': name, 'total': total} for name, total in df.to_dict('index').items()]
With the same result as before.

Making new columns of keys with values stores as a list of values from a list of dicts?

I have a data frame (10 million rows) which looks like following. For better understanding, I have simplified it.
user_id event_params
10 [{'key': 'x', 'value': '1'}, {'key': 'y', 'value': '3'}, {'key': 'z', 'value': '4'}]
11 [{'key': 'y', 'value': '5'}, {'key': 'z', 'value': '9'}]
12 [{'key': 'a', 'value': '5'}]
I want to make new columns that are all the unique keys from the dataframe, with values stored in the respective keys. Output should like below:
user_id x y z a
10 1 3 4 NA
11 NA 5 9 NA
12 NA NA NA 5
Just create the new dataframe and append new lines via append function. You can find more alternatives here.
import pandas as pd
df = pd.DataFrame()
data = [
[12, [{'key': 'x', 'value': '1'}, {'key': 'y', 'value': '3'}, {'key': 'z', 'value': '4'}]],
[13, [{'key': 'a', 'value': '5'}]]
]
for user_id, event_params in data:
record = {e['key']: e['value'] for e in event_params}
record['user_id'] = user_id
df = df.append(record, ignore_index=True)
df

Python dict group and sum multiple values [duplicate]

This question already has answers here:
Group by multiple keys and summarize/average values of a list of dictionaries
(8 answers)
Closed 5 years ago.
I have a set of data in the list of dict format like below:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
I'm trying to group by 'name' and sum the 'tea' separately and 'coffee' separately
The final grouped data must be in the this format:
grouped_data = [
{'name': 'A', 'tea':7, 'coffee':9},
{'name': 'B', 'tea':16, 'coffee':5},
]
I tried some steps:
from collections import Counter
c = Counter()
for v in data:
c[v['name']] += v['tea']
my_data = [{'name': name, 'tea':tea} for name, tea in c.items()]
for e in my_data:
print e
The above step returned the following output:
{'name': 'A', 'tea':7,}
{'name': 'B', 'tea':16}
Only I can sum the key 'tea', I'm not able to get the sum for the key 'coffee', can you guys please help to solve this solution to get the grouped_data format
Using pandas:
df = pd.DataFrame(data)
df
coffee name tea
0 6 A 5
1 3 A 2
2 1 B 7
3 4 B 9
g = df.groupby('name', as_index=False).sum()
g
name coffee tea
0 A 9 7
1 B 5 16
And, the final step, df.to_dict:
d = g.to_dict('r')
d
[{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
You can try this:
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
import itertools
final_data = [(a, list(b)) for a, b in itertools.groupby([i.items() for i in data], key=lambda x:dict(x)["name"])]
new_final_data = [{i[0][0]:sum(c[-1] for c in i if isinstance(c[-1], int)) if i[0][0] != "name" else i[0][-1] for i in zip(*b)} for a, b in final_data]
Output:
[{'tea': 7, 'coffee': 9, 'name': 'A'}, {'tea': 16, 'coffee': 5, 'name': 'B'}
Using pandas, this is pretty easy to do:
import pandas as pd
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
df = pd.DataFrame(data)
df.groupby(['name']).sum()
coffee tea
name
A 9 7
B 5 16
Here's one way to get it into your dict format:
grouped_data = []
for idx in gb.index:
d = {'name': idx}
d = {**d, **{col: gb.loc[idx, col] for col in gb}}
grouped_data.append(d)
grouped_data
Out[15]: [{'coffee': 9, 'name': 'A', 'tea': 7}, {'coffee': 5, 'name': 'B', 'tea': 16}]
But COLDSPEED got the native pandas solution with the as_index=False config...
Click here to see snap shot
import pandas as pd
df = pd.DataFrame(data)
df2=df.groupby('name').sum()
df2.to_dict('r')
Here is a method I created, you can input the key you want to group by:
def group_sum(key,list_of_dicts):
d = {}
for dct in list_of_dicts:
if dct[key] not in d:
d[dct[key]] = {}
for k,v in dct.items():
if k != key:
if k not in d[dct[key]]:
d[dct[key]][k] = v
else:
d[dct[key]][k] += v
final_list = []
for k,v in d.items():
temp_d = {key: k}
for k2,v2 in v.items():
temp_d[k2] = v2
final_list.append(temp_d)
return final_list
data = [
{'name': 'A', 'tea':5, 'coffee':6},
{'name': 'A', 'tea':2, 'coffee':3},
{'name': 'B', 'tea':7, 'coffee':1},
{'name': 'B', 'tea':9, 'coffee':4},
]
grouped_data = group_sum("name",data)
print (grouped_data)
result:
[{'coffee': 5, 'name': 'B', 'tea': 16}, {'coffee': 9, 'name': 'A', 'tea': 7}]
I guess this would be slower when summing thousands of dicts compared to pandas, maybe not, I don't know. It also doesn't seem to maintain order unless you use ordereddict or python 3.6

Python sum values of list of dictionaries if two other key value pairs match

I have a list of dictionaries of the following form:
lst = [{"Name":'Nick','Hour':0,'Value':2.75},
{"Name":'Sam','Hour':1,'Value':7.0},
{"Name":'Nick','Hour':0,'Value':2.21},
{'Name':'Val',"Hour":1,'Value':10.1},
{'Name':'Nick','Hour':1,'Value':2.1},
{'Name':'Val',"Hour":1,'Value':11},]
I want to be able to sum all values for a name for a particular hour, e.g. if Name == Nick and Hour == 0, I want value to give me the sum of all values meeting the condition. 2.75 + 2.21, according to the piece above.
I have already tried the following but it doesn't help me out with both conditions.
finalList = collections.defaultdict(float)
for info in lst:
finalList[info['Name']] += info['Value']
finalList = [{'Name': c, 'Value': finalList[c]} for c in finalList]
This sums up all the values for a particular Name, not checking if the Hour was the same. How can I incorporate that condition into my code as well?
My expected output :
finalList = [{"Name":'Nick','Hour':0,'Value':4.96},
{"Name":'Sam','Hour':1,'Value':7.0},
{'Name':'Val',"Hour":1,'Value':21.1},
{'Name':'Nick','Hour':1,'Value':2.1}...]
consider using pandas module - it's very comfortable for such data sets:
import pandas as pd
In [109]: lst
Out[109]:
[{'Hour': 0, 'Name': 'Nick', 'Value': 2.75},
{'Hour': 1, 'Name': 'Sam', 'Value': 7.0},
{'Hour': 0, 'Name': 'Nick', 'Value': 2.21},
{'Hour': 1, 'Name': 'Val', 'Value': 10.1},
{'Hour': 1, 'Name': 'Nick', 'Value': 2.1}]
In [110]: df = pd.DataFrame(lst)
In [111]: df
Out[111]:
Hour Name Value
0 0 Nick 2.75
1 1 Sam 7.00
2 0 Nick 2.21
3 1 Val 10.10
4 1 Nick 2.10
In [123]: df.groupby(['Name','Hour']).sum().reset_index()
Out[123]:
Name Hour Value
0 Nick 0 4.96
1 Nick 1 2.10
2 Sam 1 7.00
3 Val 1 10.10
export it to CSV:
df.groupby(['Name','Hour']).sum().reset_index().to_csv('/path/to/file.csv', index=False)
result:
Name,Hour,Value
Nick,0,4.96
Nick,1,2.1
Sam,1,7.0
Val,1,10.1
if you want to have it as a dictionary:
In [125]: df.groupby(['Name','Hour']).sum().reset_index().to_dict('r')
Out[125]:
[{'Hour': 0, 'Name': 'Nick', 'Value': 4.96},
{'Hour': 1, 'Name': 'Nick', 'Value': 2.1},
{'Hour': 1, 'Name': 'Sam', 'Value': 7.0},
{'Hour': 1, 'Name': 'Val', 'Value': 10.1}]
you can do many fancy things using pandas:
In [112]: df.loc[(df.Name == 'Nick') & (df.Hour == 0), 'Value'].sum()
Out[112]: 4.96
In [121]: df.groupby('Name')['Value'].agg(['sum','mean'])
Out[121]:
sum mean
Name
Nick 7.06 2.353333
Sam 7.00 7.000000
Val 10.10 10.100000
[{'Name':name, 'Hour':hour, 'Value': sum(d['Value'] for d in lst if d['Name']==name and d['Hour']==hour)} for hour in hours for name in names]
if you don't already have all names and hours in lists (or sets) you can get them like so:
names = {d['Name'] for d in lst}
hours= {d['Hour'] for d in lst}
You can use any (hashable) object as a key for a python dictionary, so just use a tuple containing Name and Hour as the key:
from collections import defaultdict
d = defaultdict(float)
for item in lst:
d[(item['Name'], item['Hour'])] += item['Value']

Categories