How to convert python dataframe to nested dictionary based on column values?

How to convert python dataframe to nested dictionary based on column values? - python

I have 2 dataframes
df1 = pd.DataFrame(data={'ID': ['0','1'], 'col1': [0.73, 0.58], 'col2': [0.51, 0.93], 'Type': ['mean', 'mean'] })
df2 = pd.DataFrame(data={'ID': ['0','1'], 'col1': [0.44, 0.49], 'col2': [0.50, 0.24], 'Type': ['std', 'std'] })
print(df1)
print(df2)
I need to convert to nested dictionary like
mydict = {0: {'col1': {'mean': 0.73, 'std': 0.44}, 'col2': {'mean': 0.51, 'std': 0.5}},
1: {'col1': {'mean': 0.58, 'std': 0.49}, 'col2': {'mean': 0.93, 'std': 0.24}}}
where 'ID' as key, column names as nested key and 'Type' as nested keys and column values as values

Use concat with DataFrame.pivot for MultiIndex DataFrame and then convert to nested dict:
df = pd.concat([df1, df2]).pivot('Type', 'ID')
d = {level: df.xs(level, axis=1, level=1).to_dict() for level in df.columns.levels[1]}
print (d)
{'0': {'col1': {'mean': 0.73, 'std': 0.44},
'col2': {'mean': 0.51, 'std': 0.5}},
'1': {'col1': {'mean': 0.58, 'std': 0.49},
'col2': {'mean': 0.93, 'std': 0.24}}}

(df1.drop(columns = 'Type').melt('ID', value_name='mean')
.merge(df2.drop(columns='Type').melt('ID', value_name='std'))
.assign(c = lambda x:x[['mean', 'std']].to_dict('records'))
.pivot('variable','ID', 'c').to_dict())
{'0': {'col1': {'mean': 0.73, 'std': 0.44},
'col2': {'mean': 0.51, 'std': 0.5}},
'1': {'col1': {'mean': 0.58, 'std': 0.49},
'col2': {'mean': 0.93, 'std': 0.24}}}

Related

Fill a python dictionary with values from a pandas dataFrame

This is my dictionary, called "reviews":
reviews= {1: {'like', 'the', 'acting'},
2: {'hate', 'plot', 'story'}}
And this is my "lexicon" dataFrame:
import pandas as pd
lexicon = {'word': ['like', 'movie', 'hate'],
'neg': [0.0005, 0.0014, 0.0029],
'pos': [0.0025, 0.0019, 0.0002]
}
lexicon = pd.DataFrame(lexicon, columns = ['word', 'neg','pos'])
print (lexicon)
I need to fill my "reviews" dictionary with the neg and pos values from the "lexicon" dataFrame.
If there is no value in the lexicon, then I want to put 0.5
To finally get this outcome:
reviews= {1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
2: {'plot': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'story': [0.5, 0.5]}}

You can use df.reindex here.
df_ = lexicon.set_index("word").agg(list, axis=1)
out = {k: df_.reindex(v, fill_value=[0.5, 0.5]).to_dict() for k, v in reviews.items()}
# {1: {'the': [0.5, 0.5], 'like': [0.0005, 0.0025], 'acting': [0.5, 0.5]},
# 2: {'story': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'plot': [0.5, 0.5]}}

Create dictionary from lexicon and then in double dictionary comprehension mapping by dict.get for possible add default value if no match:
d = lexicon.set_index('word').agg(list, axis=1).to_dict()
print (d)
{'like': [0.0005, 0.0025], 'movie': [0.0014, 0.0019], 'hate': [0.0029, 0.0002]}
out = {k: {x: d.get(x, [0.5,0.5]) for x in v} for k, v in reviews.items()}
print (out)
{1: {'like': [0.0005, 0.0025], 'the': [0.5, 0.5], 'acting': [0.5, 0.5]},
2: {'story': [0.5, 0.5], 'hate': [0.0029, 0.0002], 'plot': [0.5, 0.5]}}

Convert Pandas Series to Dictionary Without Index

I need to convert Pandas Series to a Dictionary, without Index (like pandas.DataFrame.to_dict('r')) - code is below:
grouped_df = df.groupby(index_column)
for key, val in tqdm(grouped):
json_dict[key] = val.apply(lambda x: x.to_dict(), axis=1).to_dict()
Currently, I get output like so:
{
"15717":{
"col1":1.61,
"col2":1.53,
"col3":1.0
},
"15718":{
"col1":10.97,
"col2":5.79,
"col3":2.0
},
"15719":{
"col1":15.38,
"col2":12.81,
"col3":1.0
}
}
but i need output like:
[
{
"col1":1.61,
"col2":1.53,
"col3":1.0
},
{
"col1":10.97,
"col2":5.79,
"col3":2.0
},
{
"col1":15.38,
"col2":12.81,
"col3":1.0
}
]
Thanks for your help!
Edit: Here is the original dataframe:
col1 col2 col3
2751 5.46 1.0 1.11
2752 16.47 0.0 6.54
2753 26.51 0.0 18.25
2754 31.04 1.0 28.95
2755 36.45 0.0 32.91

Two ways of doing that:
[v for _, v in df.to_dict(orient="index").items()]
Another one:
df.to_dict(orient="records")
The output, either way, is:
[{'col1': 1.61, 'col2': 1.53, 'col3': 1.0},
{'col1': 10.97, 'col2': 5.79, 'col3': 2.0},
{'col1': 15.38, 'col2': 12.81, 'col3': 1.0}]

You can try:
df.T.to_dict('r')
Output:
[{'col1': 1.61, 'col2': 1.53, 'col3': 1.0},
{'col1': 10.97, 'col2': 5.79, 'col3': 2.0},
{'col1': 15.38, 'col2': 12.81, 'col3': 1.0}]

How to do math manipulations on python dictionaries?

I have a dictionary as
ex_dict_tot={'recency': 12, 'frequency': 12, 'money': 12}
another count dictionary as
ex_dict_count= {'recency': {'current': 4, 'savings': 2, 'fixed': 6},
'frequency': {'freq': 10, 'infreq': 2},
'money': {'med': 2, 'high': 8, 'low': 1, 'md': 1}}
I would like to calculate the proportions of each key values as,
In key - recency,
current=4/12,
savings=2/12,
fixed=6/12
Similarly - in key - frequency,
freq=10/12
infreq=2/12
And the required output would be,
{'recency': {'current': 0.3, 'savings': 0.16, 'fixed': 0.5},
'frequency': {'freq': 0.83, 'infreq': 0.16},
'money': {'med': 0.16, 'high': 0.6, 'low': 0.08, 'md': 0.08}}
Could you please write your suggestions/inputs on it?

You can do this with dict comprehension.
out = {key:{k:v/ex_dict_tot[key] for k,v in val.items()} for key,val in ex_dict_count.items()}
out
{'recency': {'current': 0.3333333333333333, 'savings': 0.16666666666666666, 'fixed': 0.5},
'frequency': {'freq': 0.8333333333333334, 'infreq': 0.16666666666666666},
'money': {'med': 0.16666666666666666, 'high': 0.6666666666666666, 'low': 0.08333333333333333, 'md': 0.08333333333333333}}
Use round to get values with floating-point precision 2.
out = {key:{k:round(v/ex_dict_tot[key],2) for k,v in val.items()} for key,val in ex_dict_count.items()}
out
{'recency': {'current': 0.33, 'savings': 0.17, 'fixed': 0.5},
'frequency': {'freq': 0.83, 'infreq': 0.17},
'money': {'med': 0.17, 'high': 0.67, 'low': 0.08, 'md': 0.08}}

Is there a pythonic way to replace nan values from dictionary?

I want to replace nan values from my dictionary.
For example, sometimes my dictionary looks like:
{'mean': nan, 'std': nan, 'median': nan, 'sum': 0, 'average_per_day': 0.0, 'freq': 0}
Now I'm doing it like this:
for k, v in stats_record.items():
if math.isnan(v):
stats_record[k] = 0
Is there a more pythonic way to replace nan values from dictionary?

Dict-comprehension can be handy here.
import numpy as np
e = {'mean': np.nan, 'std': np.nan, 'median': np.nan, 'sum': 0, 'average_per_day': 0.0, 'freq': 0}
e = {k:v if not np.isnan(v) else 0 for k,v in e.items() }
print(e)
Output:
{'average_per_day': 0.0, 'sum': 0, 'freq': 0, 'median': 0, 'std': 0, 'mean': 0}

Converting dataframe into sub-list or dictionaries

I have the data in tabular format (rows and columns) which I read into a dataframe (Data1) :
Name D Score
0 Angelica D1 3.5
1 Angelica D2 2.0
2 Bill D1 2.0
3 Chan D3 1.0
......
I am able to convert it into a list using:
Data2 = Data1.values.tolist()
and get the below output:
[
['Angelica', 'D1', 3.5], ['Angelica', 'D2', 2.0],
['Bill', 'D1', 2.0], ['Bill', 'D2', 3.5],
['Chan', 'D8', 1.0], ['Chan', 'D3', 3.0], ['Chan', 'D4', 5.0],
['Dan', 'D4', 3.0], ['Dan', 'D5', 4.5], ['Dan', 'D6', 4.0]
]
What I want is, the output to be like this:
{
'Angelica': {'D1': 3.5, 'D2': 2.0} ,
'Bill': {'D1': 2.0, 'D2': 3.5}
'Chan': {'D8': 1.0, 'D3': 3.0, 'D4': 5.0 }
'Dan': {'D4': 3.0, 'D5': 4.5, 'D6': 4.0}
}
How can I achieve this in Python?

You can use a dictionary comprehension after grouping the df by the Name column:
>>> df = pd.DataFrame([{'Name': 'Angela', 'Score': 3.5, 'D': 'D1'}, {'Name': 'Angela', 'Score': 2.0, 'D': 'D2'}, {'Name': 'Bill', 'Score': 2.0, 'D': 'D1'}, {'Name': 'Chan', 'Score': 1.0, 'D': 'D3'}])
>>> df
D Name Score
0 D1 Angela 3.5
1 D2 Angela 2.0
2 D1 Bill 2.0
3 D3 Chan 1.0
>>> data2 = {name: {df.ix[v].D: df.ix[v].Score for v in val} for name, val in df.groupby('Name').groups.items()}
>>> data2
{'Chan': {'D3': 1.0}, 'Angela': {'D1': 3.5, 'D2': 2.0}, 'Bill': {'D1': 2.0}}

You can zip up the values from each group after grouping by Name:
In [4]: l = [
...: ['Angelica', 'D1', 3.5], ['Angelica', 'D2', 2.0],
...: ['Bill', 'D1', 2.0], ['Bill', 'D2', 3.5],
...: ['Chan', 'D8', 1.0], ['Chan', 'D3', 3.0], ['Chan', 'D4', 5.0],
...: ['Dan', 'D4', 3.0], ['Dan', 'D5', 4.5], ['Dan', 'D6', 4.0]
...: ]
...: columns=["Name", "D", "Score"]
...: df = pd.DataFrame(l, columns=columns)
...:
In [5]: {name: dict(zip(v["D"], v["Score"])) for name, v in df.groupby("Name")}
In [6]: data
Out[6]:
{'Angelica': {'D1': 3.5, 'D2': 2.0},
'Bill': {'D1': 2.0, 'D2': 3.5},
'Chan': {'D3': 3.0, 'D4': 5.0, 'D8': 1.0},
'Dan': {'D4': 3.0, 'D5': 4.5, 'D6': 4.0}}

from collections import defaultdict
result = defaultdict(dict)
for item in Data2:
result[item[0]].update(dict([item[1:]]))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to convert python dataframe to nested dictionary based on column values? - python

Related

Fill a python dictionary with values from a pandas dataFrame

Convert Pandas Series to Dictionary Without Index

How to do math manipulations on python dictionaries?

Is there a pythonic way to replace nan values from dictionary?

Converting dataframe into sub-list or dictionaries

Categories

Resources