Convert Pandas Series to Dictionary Without Index - python

I need to convert Pandas Series to a Dictionary, without Index (like pandas.DataFrame.to_dict('r')) - code is below:
grouped_df = df.groupby(index_column)
for key, val in tqdm(grouped):
json_dict[key] = val.apply(lambda x: x.to_dict(), axis=1).to_dict()
Currently, I get output like so:
{
"15717":{
"col1":1.61,
"col2":1.53,
"col3":1.0
},
"15718":{
"col1":10.97,
"col2":5.79,
"col3":2.0
},
"15719":{
"col1":15.38,
"col2":12.81,
"col3":1.0
}
}
but i need output like:
[
{
"col1":1.61,
"col2":1.53,
"col3":1.0
},
{
"col1":10.97,
"col2":5.79,
"col3":2.0
},
{
"col1":15.38,
"col2":12.81,
"col3":1.0
}
]
Thanks for your help!
Edit: Here is the original dataframe:
col1 col2 col3
2751 5.46 1.0 1.11
2752 16.47 0.0 6.54
2753 26.51 0.0 18.25
2754 31.04 1.0 28.95
2755 36.45 0.0 32.91

Two ways of doing that:
[v for _, v in df.to_dict(orient="index").items()]
Another one:
df.to_dict(orient="records")
The output, either way, is:
[{'col1': 1.61, 'col2': 1.53, 'col3': 1.0},
{'col1': 10.97, 'col2': 5.79, 'col3': 2.0},
{'col1': 15.38, 'col2': 12.81, 'col3': 1.0}]

You can try:
df.T.to_dict('r')
Output:
[{'col1': 1.61, 'col2': 1.53, 'col3': 1.0},
{'col1': 10.97, 'col2': 5.79, 'col3': 2.0},
{'col1': 15.38, 'col2': 12.81, 'col3': 1.0}]

Related

How to convert python dataframe to nested dictionary based on column values?

I have 2 dataframes
df1 = pd.DataFrame(data={'ID': ['0','1'], 'col1': [0.73, 0.58], 'col2': [0.51, 0.93], 'Type': ['mean', 'mean'] })
df2 = pd.DataFrame(data={'ID': ['0','1'], 'col1': [0.44, 0.49], 'col2': [0.50, 0.24], 'Type': ['std', 'std'] })
print(df1)
print(df2)
I need to convert to nested dictionary like
mydict = {0: {'col1': {'mean': 0.73, 'std': 0.44}, 'col2': {'mean': 0.51, 'std': 0.5}},
1: {'col1': {'mean': 0.58, 'std': 0.49}, 'col2': {'mean': 0.93, 'std': 0.24}}}
where 'ID' as key, column names as nested key and 'Type' as nested keys and column values as values
Use concat with DataFrame.pivot for MultiIndex DataFrame and then convert to nested dict:
df = pd.concat([df1, df2]).pivot('Type', 'ID')
d = {level: df.xs(level, axis=1, level=1).to_dict() for level in df.columns.levels[1]}
print (d)
{'0': {'col1': {'mean': 0.73, 'std': 0.44},
'col2': {'mean': 0.51, 'std': 0.5}},
'1': {'col1': {'mean': 0.58, 'std': 0.49},
'col2': {'mean': 0.93, 'std': 0.24}}}
(df1.drop(columns = 'Type').melt('ID', value_name='mean')
.merge(df2.drop(columns='Type').melt('ID', value_name='std'))
.assign(c = lambda x:x[['mean', 'std']].to_dict('records'))
.pivot('variable','ID', 'c').to_dict())
{'0': {'col1': {'mean': 0.73, 'std': 0.44},
'col2': {'mean': 0.51, 'std': 0.5}},
'1': {'col1': {'mean': 0.58, 'std': 0.49},
'col2': {'mean': 0.93, 'std': 0.24}}}

Print Row and Column Header if column/row is not NaN Pandas

It's a weird question - but can y'all think of a good way to just print the rows or a list of the rows and their corresponding column headers if the dataframe cell is not NaN?
Imagine a dataframe like this:
col1 col2 col3 col4
1 1 NaN 2 NaN
2 NaN NaN 1 2
3 2 NaN NaN 1
Result should look something like this:
1 [col1: 1, col3: 2]
2 [col3: 1, col4: 2]
3 [col1: 2, col4: 1]
Thanks in advance!
You can transpose the dataframe, and for each row, drop NaNs and convert to dict:
out = df.T.apply(lambda x: dict(x.dropna().astype(int)))
Output:
>>> out
1 {'col1': 1, 'col3': 2}
2 {'col3': 1, 'col4': 2}
3 {'col1': 2, 'col4': 1}
dtype: object
Let us try stack
df.stack().reset_index(level=0).groupby('level_0')[0].agg(dict)
Out[184]:
level_0
1 {'col1': 1.0, 'col3': 2.0}
2 {'col3': 1.0, 'col4': 2.0}
3 {'col1': 2.0, 'col4': 1.0}
Name: 0, dtype: object
combine agg(dict) and list comprehension
d = [{k:v for k, v in x.items() if v == v } for x in df.agg(dict,1)]
[{'col1': 1.0, 'col3': 2.0},
{'col3': 1.0, 'col4': 2.0},
{'col1': 2.0, 'col4': 1.0}]

Turn pandas columns into JSON string

I have a pandas dataframe with columns col1, col2 and col3 and respective values. I would need to transform column names and values into a JSON string.
For instance, if the dataset is
data= pd.DataFrame({'col1': ['bravo', 'charlie','price'], 'col2': [1, 2, 3],'col3':['alpha','beta','gamma']})
I need to obtain an output like this
newdata= pd.DataFrame({'index': [0,1,2], 'payload': ['{"col1":"bravo", "col2":"1", "col3":"alpha"}', '{"col1":"charlie", "col2":"2", "col3":"beta"}', '{"col1":"price", "col2":"3", "col3":"gamma"}']})
I didn't find any function or iterative tool to perform this.
Thank you in advance!
You can use:
df = data.agg(lambda s: dict(zip(s.index, s)), axis=1).rename('payload').to_frame()
Result:
# print(df)
payload
0 {'col1': 'bravo', 'col2': 1, 'col3': 'alpha'}
1 {'col1': 'charlie', 'col2': 2, 'col3': 'beta'}
2 {'col1': 'price', 'col2': 3, 'col3': 'gamma'}
Here you go:
import pandas as pd
data= pd.DataFrame({'col1': ['bravo', 'charlie','price'], 'col2': [1, 2, 3],'col3':['alpha','beta','gamma']})
new_data = pd.DataFrame({
'payload': data.to_dict(orient='records')
})
print(new_data)
## -- End pasted text --
payload
0 {'col1': 'bravo', 'col2': 1, 'col3': 'alpha'}
1 {'col1': 'charlie', 'col2': 2, 'col3': 'beta'}
2 {'col1': 'price', 'col2': 3, 'col3': 'gamma'}
If my understanding is correct, you want the index and the data records as a dict.
So:
dict(index=list(data.index), payload=data.to_dict(orient='records'))
For your example data:
>>> import pprint
>>> pprint.pprint(dict(index=list(data.index), payload=data.to_dict(orient='records')))
{'index': [0, 1, 2],
'payload': [{'col1': 'bravo', 'col2': 1, 'col3': 'alpha'},
{'col1': 'charlie', 'col2': 2, 'col3': 'beta'},
{'col1': 'price', 'col2': 3, 'col3': 'gamma'}]}
This is one approach using .to_dict('index').
Ex:
import pandas as pd
data= pd.DataFrame({'col1': ['bravo', 'charlie','price'], 'col2': [1, 2, 3],'col3':['alpha','beta','gamma']})
newdata = data.to_dict('index')
print({'index': list(newdata.keys()), 'payload': list(newdata.values())})
#OR -->newdata= pd.DataFrame({'index': list(newdata.keys()), 'payload': list(newdata.values())})
Output:
{'index': [0, 1, 2],
'payload': [{'col1': 'bravo', 'col2': 1, 'col3': 'alpha'},
{'col1': 'charlie', 'col2': 2, 'col3': 'beta'},
{'col1': 'price', 'col2': 3, 'col3': 'gamma'}]}
Use to_dict: newdata = data.T.to_dict()
>>> print(newdata.values())
[
{'col2': 1, 'col3': 'alpha', 'col1': 'bravo'},
{'col2': 2, 'col3': 'beta', 'col1': 'charlie'},
{'col2': 3, 'col3': 'gamma', 'col1': 'price'}
]

Convert Pandas DataFrame to dictionairy

I have a simple DataFrame:
Name Format
0 cntry int
1 dweight str
2 pspwght str
3 pweight str
4 nwspol str
I want a dictionairy as such:
{
"cntry":"int",
"dweight":"str",
"pspwght":"str",
"pweight":"str",
"nwspol":"str"
}
Where dict["cntry"] would return int or dict["dweight"] would return str.
How could I do this?
How about this:
import pandas as pd
df = pd.DataFrame({'col_1': ['A', 'B', 'C', 'D'], 'col_2': [1, 1, 2, 3], 'col_3': ['Bla', 'Foo', 'Sup', 'Asdf']})
res_dict = dict(zip(df['col_1'], df['col_3']))
Contents of res_dict:
{'A': 'Bla', 'B': 'Foo', 'C': 'Sup', 'D': 'Asdf'}
You're looking for DataFrame.to_dict()
From the documentation:
>>> df = pd.DataFrame({'col1': [1, 2],
... 'col2': [0.5, 0.75]},
... index=['row1', 'row2'])
>>> df
col1 col2
row1 1 0.50
row2 2 0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
You can always invert an internal dictionary if it's not mapped how you'd like it to be:
inv_dict = {v: k for k, v in original_dict['Name'].items()}
I think you want is:
df.set_index('Name').to_dict()['Format']
Since you want to use the values in the Name column as the keys to your dict.
Note that you might want to do:
df.set_index('Name').astype(str).to_dict()['Format']
if you want the values of the dictionary to be strings.

python aggregate list to dictionary

I have a file that looks like this -
Col1 Col2 Key Value
101 a f1 abc
101 a f2 def
102 a f2 xyz
102 a f3 fgh
103 b f1 rst
and I need output file that looks like:
{"Col1":101, "Col2":"a", "kvpairs":{"f1":"abc","f2":"def"}}
{"Col1":102, "Col2":"a", "kvpairs":{"f2":"xyz","f3":"fgh"}}
{"Col1":103, "Col2":"b", "kvpairs":{"f1":"rst"}}
I can loop through the file clubbing the key values pairs for the grouping fields Col1 and Col2 into a list and dropping it into a dict but was hoping there was a more pythonic way of doing it. There are questions answered using pandas aggregation but i can't find a neat (and efficient way) of building that nested map. Also, the source file is gonna be large, like 80m records crunching down to 8m in the resulting file.
I can see those eyes lighting up :)
Using itertools.groupby():
from itertools import groupby
for ((c1,c2),items) in groupby(lines, key=lambda x: x[:2]):
d = {"Col1": c1, "Col2:": c2, "kvpairs":dict(x[2:] for x in items)}
print(d)
Produces:
{'Col1': '101', 'Col2:': 'a', 'kvpairs': {'f1': 'abc', 'f2': 'def'}}
{'Col1': '102', 'Col2:': 'a', 'kvpairs': {'f2': 'xyz', 'f3': 'fgh'}}
{'Col1': '103', 'Col2:': 'b', 'kvpairs': {'f1': 'rst'}}
It looks like you're parsing some of the values to literals -- the int you can do with int(c1), but I'm not sure how you want to deal with turning "a" into a.
(Assuming your have a list of iterables, maybe from the csv module:)
lines = [
['101','a','f1','abc'],
['101','a','f2','def'],
['102','a','f2','xyz'],
['102','a','f3','fgh'],
['103','b','f1','rst']
]
data = []
for col1, col2, key, value in input:
# look for an existing dict with col1 and col2
for d in data:
if d['col1'] == col1 and d['col2'] == col2:
d['kvpairs'][key] = value
break
# no existing dict was found
else:
d.append({'col1': col1, 'col2': col2, 'kvpairs': {key: value}})
for d in data:
print d
groupby + agg + to_dict
df.groupby(["Col1", "Col2"])[["Key", "Value"]].agg(list).transform(lambda x: dict(zip(*x)),1).reset_index(name='kvpairs').to_dict('records')
[{'Col1': 101, 'Col2': 'a', 'kvpairs': {'f1': 'abc', 'f2': 'def'}},
{'Col1': 102, 'Col2': 'a', 'kvpairs': {'f2': 'xyz', 'f3': 'fgh'}},
{'Col1': 103, 'Col2': 'b', 'kvpairs': {'f1': 'rst'}}]
Assuming of course, df is
z = io.StringIO("""Col1 Col2 Key Value
101 a f1 abc
101 a f2 def
102 a f2 xyz
102 a f3 fgh
103 b f1 rst""")
df = pd.read_table(z,delim_whitespace=True)
Explanation
First you aggregate using list
df.groupby(["Col1", "Col2"])[["Key", "Value"]].agg(list)
Key Value
Col1 Col2
101 a [f1, f2] [abc, def]
102 a [f2, f3] [xyz, fgh]
103 b [f1] [rst]
Then transform this output to dictionaries and rename the axis altogether
.transform(lambda x: dict(zip(*x)),1).reset_index(name='kvpairs')
Col1 Col2 kvpairs
0 101 a {'f1': 'abc', 'f2': 'def'}
1 102 a {'f2': 'xyz', 'f3': 'fgh'}
2 103 b {'f1': 'rst'}
Finally, use to_dict('records') to get your list of dictionaries
.to_dict('records')
[{'Col1': 101, 'Col2': 'a', 'kvpairs': {'f1': 'abc', 'f2': 'def'}},
{'Col1': 102, 'Col2': 'a', 'kvpairs': {'f2': 'xyz', 'f3': 'fgh'}},
{'Col1': 103, 'Col2': 'b', 'kvpairs': {'f1': 'rst'}}]

Categories