Related
I have 2 dataframes
df1 = pd.DataFrame(data={'ID': ['0','1'], 'col1': [0.73, 0.58], 'col2': [0.51, 0.93], 'Type': ['mean', 'mean'] })
df2 = pd.DataFrame(data={'ID': ['0','1'], 'col1': [0.44, 0.49], 'col2': [0.50, 0.24], 'Type': ['std', 'std'] })
print(df1)
print(df2)
I need to convert to nested dictionary like
mydict = {0: {'col1': {'mean': 0.73, 'std': 0.44}, 'col2': {'mean': 0.51, 'std': 0.5}},
1: {'col1': {'mean': 0.58, 'std': 0.49}, 'col2': {'mean': 0.93, 'std': 0.24}}}
where 'ID' as key, column names as nested key and 'Type' as nested keys and column values as values
Use concat with DataFrame.pivot for MultiIndex DataFrame and then convert to nested dict:
df = pd.concat([df1, df2]).pivot('Type', 'ID')
d = {level: df.xs(level, axis=1, level=1).to_dict() for level in df.columns.levels[1]}
print (d)
{'0': {'col1': {'mean': 0.73, 'std': 0.44},
'col2': {'mean': 0.51, 'std': 0.5}},
'1': {'col1': {'mean': 0.58, 'std': 0.49},
'col2': {'mean': 0.93, 'std': 0.24}}}
(df1.drop(columns = 'Type').melt('ID', value_name='mean')
.merge(df2.drop(columns='Type').melt('ID', value_name='std'))
.assign(c = lambda x:x[['mean', 'std']].to_dict('records'))
.pivot('variable','ID', 'c').to_dict())
{'0': {'col1': {'mean': 0.73, 'std': 0.44},
'col2': {'mean': 0.51, 'std': 0.5}},
'1': {'col1': {'mean': 0.58, 'std': 0.49},
'col2': {'mean': 0.93, 'std': 0.24}}}
I have checked the advicse here: Nested dictionary to multiindex dataframe where dictionary keys are column labels
However, I couldn't get it to work in my problem.
I would like to change a dictionary into multiindexed dataframe, where 'a','b','c' are names of multiindexes, their values 12,0.8,1.8,bla1,bla2,bla3,bla4 are multiindexes and values from lists are assign to the multiindexes as in the picture of table below.
My dictionary:
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
The result DataFrame I would like to get:
The code which don't make multiindexes and set every values under each other in next row:
df_a = pd.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
df_b = pd.DataFrame(df_a[0].values.tolist(), index=df_a.index)
Use ast.literal_eval to convert each string into a dictionary and build the index from there:
import pandas as pd
from ast import literal_eval
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
keys, data = zip(*dictionary.items())
index = pd.MultiIndex.from_frame(pd.DataFrame([literal_eval(i) for i in keys]))
res = pd.DataFrame(data=list(data), index=index)
print(res)
Output
0 1 2
a b c
12.0 0.8 bla1 200 0.0 0.0
bla2 37 44.0 0.6
1.8 bla3 100 2.0 1.0
bla4 400 3.0 1.0
I have this list of dictionaries:
[{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
I would like to map and reduce (or group) to have a result like this:
[
{
'topic_id': 1,
'count': 2,
'variance': 3.0,
'global_average': 6.5
},
{
'topic_id': 2,
'count': 1,
'variance': 5.0,
'global_average': 5.0
}
]
Something that calculate the variance (max average - min average) and sum the count of items too.
What I have already did:
Before I just tried sum the count changing the structure of the dictionary, and making the key be the topic_id and value the count, my result was:
result = sorted(dict(functools.reduce(operator.add, map(collections.Counter, data))).items(), reverse=True)
this was just the first try.
You could achieve this with some comprehensions, a map, and the mean function from the built-in statistics module.
from statistics import mean
data = [
{
'topic_id': 1,
'average': 5.0,
'count': 1
}, {
'topic_id': 1,
'average': 8.0,
'count': 1
}, {
'topic_id': 2,
'average': 5.0,
'count': 1
}
]
# a set of unique topic_id's
keys = set(i['topic_id'] for i in data)
# a list of list of averages for each topic_id
averages = [[i['average'] for i in data if i['topic_id'] == j] for j in keys]
# a map of tuples of (counts, variances, averages) for each topic_id
stats = map(lambda x: (len(x), max(x) - min(x), mean(x)), averages)
# finally reconstruct it back into a list
result = [
{
'topic_id': key,
'count': count,
'variance': variance,
'global_average': average
} for key, (count, variance, average) in zip(keys, stats)
]
print(result)
Returns
[{'topic_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5}, {'topic_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
Here is an attempt using itertools.groupby to group the data based on the topic_id:
import itertools
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# groupby
grouper = itertools.groupby(data, key=lambda x: x['topic_id'])
# holder for output
output = []
# iterate over grouper to calculate things
for key, group in grouper:
# variables for calculations
count = 0
maxi = -1
mini = float('inf')
total = 0
# one pass over each dictionary
for g in group:
avg = g['average']
maxi = avg if avg > maxi else maxi
mini = avg if avg < mini else mini
total += avg
count += 1
# write to output
output.append({'total_id':key,
'count':count,
'variance':maxi-mini,
'global_average':total/count})
Giving this output:
[{'total_id': 1, 'count': 2, 'variance': 3.0, 'global_average': 6.5},
{'total_id': 2, 'count': 1, 'variance': 0.0, 'global_average': 5.0}]
Note that the 'variance' for the second group is 0.0 here instead of 5.0; this is different from your expected output, but I would guess this is what you want?
If you are willing to use pandas, this seems like an appropriate use case:
import pandas as pd
data = [{'topic_id': 1, 'average': 5.0, 'count': 1}, {'topic_id': 1, 'average': 8.0, 'count': 1}, {'topic_id': 2, 'average': 5.0, 'count': 1}]
# move to dataframe
df = pd.DataFrame(data)
# groupby and get all desired metrics
grouped = df.groupby('topic_id')['average'].describe()
grouped['variance'] = grouped['max'] - grouped['min']
# rename columns and remove unneeded ones
grouped = grouped.reset_index().loc[:, ['topic_id', 'count', 'mean', 'variance']].rename({'mean':'global_average'}, axis=1)
# back to list of dicts
output = grouped.to_dict('records')
output is:
[{'topic_id': 1, 'count': 2.0, 'global_average': 6.5, 'variance': 3.0},
{'topic_id': 2, 'count': 1.0, 'global_average': 5.0, 'variance': 0.0}]
You can also try to use the agg functionality of pandas dataframe like this
import pandas as pd
f = pd.DataFrame(d).set_index('topic_id')
def var(x):
return x.max() - x.min()
out = f.groupby(level=0).agg(count=('count', 'sum'),
global_average=('average', 'mean'),
variance=('average', var))
I have a data frame with three columns, I would like to create a dictionary after applying groupby function on first and second column.I can do this by for loops, but is there any pandas way of doing it?
DataFrame:
Col X Col Y Sum
A a 3
A b 2
A c 1
B p 5
B q 6
B r 7
After grouping by on Col X and Col Y : df.groupby(['Col X','Col Y']).sum()
Sum
Col X Col Y
A a 3
b 2
c 1
B p 5
q 6
r 7
Dictionary I want to create
{A:{'a':3,'b':2,'c':1}, B:{'p':5,'q':6,'r':7}}
Use a dictionary comprehension while iterating via a groupby object
{name: dict(zip(g['Col Y'], g['Sum'])) for name, g in df.groupby('Col X')}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
If you insisted on using to_dict somewhere, you could do something like this:
s = df.set_index(['Col X', 'Col Y']).Sum
{k: s.xs(k).to_dict() for k in s.index.levels[0]}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
Keep in mind, that the to_dict method is just using some comprehension under the hood. If you have a special use case that requires something more than what the orient options provide for... there is no shame in constructing your own comprehension.
You can iterate over the MultiIndex series:
>>> s = df.set_index(['ColX', 'ColY'])['Sum']
>>> {k: v.reset_index(level=0, drop=True).to_dict() for k, v in s.groupby(level=0)}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
#A to_dict() solution
d = df.groupby(['Col X','Col Y']).sum().reset_index().pivot(columns='Col X',values='Sum').to_dict()
Out[70]:
{'A': {0: 3.0, 1: 2.0, 2: 1.0, 3: nan, 4: nan, 5: nan},
'B': {0: nan, 1: nan, 2: nan, 3: 5.0, 4: 6.0, 5: 7.0}}
#if you need to get rid of the nans:
{k1:{k2:v2 for k2,v2 in v1.items() if pd.notnull(v2)} for k1,v1 in d.items()}
Out[73]: {'A': {0: 3.0, 1: 2.0, 2: 1.0}, 'B': {3: 5.0, 4: 6.0, 5: 7.0}}
I can sum items in a list of dicts per key like so:
import functools
dict(
functools.reduce(
lambda x, y:x.update(y) or x,
dict1,
collections.Counter())
)
But given that
dict1 = [{'ledecky': 1, 'king': 2, 'vollmer': 3},
{'ledecky': 1, 'vollmer': 2, 'king': 3},
{'schmitt': 1, 'ledecky': 2, 'vollmer': 3}]
how could I sum their values according to medal value, given that:
medal_value = {1: 10.0, 2: 5.0, 3: 3.0}
Such that the final dict would yield:
{'ledecky': 25.0, 'king': 8.0, 'vollmer': 11.0, 'schmitt': 10.0}
The get() dictionary function works really well in this example, we either give the newly created dictionary a default value of 0 or add it's current value with the weighted value using our value (the value of dict1) as the search key.
def calculate_points(results, medal_value):
d = {}
for item in results:
for key, value in item.iteritems():
d[key] = d.get(key, 0) + medal_value[value]
return d
Sample output:
dict1 = [{'ledecky': 1, 'king': 2, 'vollmer': 3},
{'ledecky': 1, 'vollmer': 2, 'king': 3},
{'schmitt': 1, 'ledecky': 2, 'vollmer': 3}]
medal_value = {1 : 10.0, 2 : 5.0, 3 : 3.0}
print calculate_points(dict1, medal_value)
>>> {'ledecky': 25.0, 'king': 8.0, 'schmitt': 10.0, 'vollmer': 11.0}
Just define a lookup function to transform the original dict to a medal values dict:
def lookup(d):
return dict((k, medal_value[v]) for k, v in d.items())
And apply this function to your update part of the expression:
dict(
functools.reduce(
lambda x, y: x.update(lookup(y)) or x,
dict1,
collections.Counter())
)