Turn pandas columns into JSON string

Turn pandas columns into JSON string - python

I have a pandas dataframe with columns col1, col2 and col3 and respective values. I would need to transform column names and values into a JSON string.
For instance, if the dataset is
data= pd.DataFrame({'col1': ['bravo', 'charlie','price'], 'col2': [1, 2, 3],'col3':['alpha','beta','gamma']})
I need to obtain an output like this
newdata= pd.DataFrame({'index': [0,1,2], 'payload': ['{"col1":"bravo", "col2":"1", "col3":"alpha"}', '{"col1":"charlie", "col2":"2", "col3":"beta"}', '{"col1":"price", "col2":"3", "col3":"gamma"}']})
I didn't find any function or iterative tool to perform this.
Thank you in advance!

You can use:
df = data.agg(lambda s: dict(zip(s.index, s)), axis=1).rename('payload').to_frame()
Result:
# print(df)
payload
0 {'col1': 'bravo', 'col2': 1, 'col3': 'alpha'}
1 {'col1': 'charlie', 'col2': 2, 'col3': 'beta'}
2 {'col1': 'price', 'col2': 3, 'col3': 'gamma'}

Here you go:
import pandas as pd
data= pd.DataFrame({'col1': ['bravo', 'charlie','price'], 'col2': [1, 2, 3],'col3':['alpha','beta','gamma']})
new_data = pd.DataFrame({
'payload': data.to_dict(orient='records')
})
print(new_data)
## -- End pasted text --
payload
0 {'col1': 'bravo', 'col2': 1, 'col3': 'alpha'}
1 {'col1': 'charlie', 'col2': 2, 'col3': 'beta'}
2 {'col1': 'price', 'col2': 3, 'col3': 'gamma'}

If my understanding is correct, you want the index and the data records as a dict.
So:
dict(index=list(data.index), payload=data.to_dict(orient='records'))
For your example data:
>>> import pprint
>>> pprint.pprint(dict(index=list(data.index), payload=data.to_dict(orient='records')))
{'index': [0, 1, 2],
'payload': [{'col1': 'bravo', 'col2': 1, 'col3': 'alpha'},
{'col1': 'charlie', 'col2': 2, 'col3': 'beta'},
{'col1': 'price', 'col2': 3, 'col3': 'gamma'}]}

This is one approach using .to_dict('index').
Ex:
import pandas as pd
data= pd.DataFrame({'col1': ['bravo', 'charlie','price'], 'col2': [1, 2, 3],'col3':['alpha','beta','gamma']})
newdata = data.to_dict('index')
print({'index': list(newdata.keys()), 'payload': list(newdata.values())})
#OR -->newdata= pd.DataFrame({'index': list(newdata.keys()), 'payload': list(newdata.values())})
Output:
{'index': [0, 1, 2],
'payload': [{'col1': 'bravo', 'col2': 1, 'col3': 'alpha'},
{'col1': 'charlie', 'col2': 2, 'col3': 'beta'},
{'col1': 'price', 'col2': 3, 'col3': 'gamma'}]}

Use to_dict: newdata = data.T.to_dict()
>>> print(newdata.values())
[
{'col2': 1, 'col3': 'alpha', 'col1': 'bravo'},
{'col2': 2, 'col3': 'beta', 'col1': 'charlie'},
{'col2': 3, 'col3': 'gamma', 'col1': 'price'}
]

Related

How to show data with different number of columns Pandas?

I load CSV document with different number of columns. Therefore I got this error:
Expected 12 fields in line 29, saw 13
To avoid this error I use the hack names=range(24)
df = pd.read_csv(filename, header=None, quoting=csv.QUOTE_NONE, dtype='object', sep=data_file_delimiter, engine='python', encoding = "utf-8", names=range(24))
Problem is I need to know the real number of columns to group this data further into dict data:
data = {}
for row in df.rows:
line = line.strip()
row = line.split(' ')
if len(row) not in data:
data[ len(row) ] = []
data[ len(row) ].append(row)

You can have the number of columns using len(df.columns) but if you only want to convert a pandas df to a dictionary then there are already many built-in methods as given below,
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.5, 0.75]},index=['row1', 'row2'])
df
col1 col2
row1 1 0.50
row2 2 0.75
df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
# You can specify the return orientation.
df.to_dict('series')
{'col1': row1 1
row2 2
Name: col1, dtype: int64,
'col2': row1 0.50
row2 0.75
Name: col2, dtype: float64}
df.to_dict('split')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]]}
df.to_dict('records')
[{'col1': 1, 'col2': 0.5}, {'col1': 2, 'col2': 0.75}]
df.to_dict('index')
{'row1': {'col1': 1, 'col2': 0.5}, 'row2': {'col1': 2, 'col2': 0.75}}
df.to_dict('tight')
{'index': ['row1', 'row2'], 'columns': ['col1', 'col2'],
'data': [[1, 0.5], [2, 0.75]], 'index_names': [None], 'column_names': [None]}
# You can also specify the mapping type.
from collections import OrderedDict, defaultdict
df.to_dict(into=OrderedDict)
OrderedDict([('col1', OrderedDict([('row1', 1), ('row2', 2)])),
('col2', OrderedDict([('row1', 0.5), ('row2', 0.75)]))])
Taken from here

How to aggregate columns in pandas

I have a data frame with number of columns I want to group them under two main groups A and B while preserving the old columns names as dictionary as follow
index userid col1 col2 col3 col4 col5 col6 col7
0 1 6 3 Nora 100 11 22 44
the desired data frame is as follow
index userid A B
0 1 {"col1":6, "col2":3, "col3":"Nora","col4":100} {"col5":11, "col6":22, "col7":44}

To match your desired dataframe exactly:
>>> import pandas as pd
# recreating your data
>>> df = pd.DataFrame.from_dict({'index': [0], 'userid': [1], 'col1': [6], 'col2': [3], 'col3': ['Nora'], 'col4': [100], 'col5': [11], 'col6': [22], 'col7': [44]})
# copy of unchanged columns
>>> df_new = df[['index', 'userid']].copy()
# grouping columns together
>>> df_new['A'] = df[['col1', 'col2', 'col3', 'col4']].copy().to_dict(orient='records')
>>> df_new['B'] = df[['col5', 'col6', 'col7']].copy().to_dict(orient='records')
>>> df_new
index userid A B
0 0 1 {'col1': 6, 'col2': 3, 'col3': 'Nora', 'col4': 100} {'col5': 11, 'col6': 22, 'col7': 44}

You can try something like this:
d = {'col1': 'A',
'col2': 'A',
'col3': 'A',
'col4': 'A',
'col5': 'B',
'col6': 'B',
'col7': 'B'}
df.groupby(d, axis=1).apply(pd.DataFrame.to_dict, orient='series').to_frame().T
Output:
A B
0 {'col1': [6], 'col2': [3], 'col3': ['Nora'], '... {'col5': [11], 'col6': [22], 'col7': [44]}

Working with the original dataframe.
import pandas as pd
df1 = pd.DataFrame({'index':[0], 'userid':[1],
'col1': [6], 'col2': [3], 'col3': ['Nora'] ,'col4':[100],
'col5':[11], 'col6': [22], 'col7':[44]})
df1['A'] = df1[['col1', 'col2', 'col3', 'col4']].to_dict(orient='records')
df1['B'] = df1[['col5', 'col6', 'col7']].to_dict(orient='records')
df1.drop(df1.columns[range(2, 9)], axis=1, inplace=True)
print(df1)

Convert csv into list of dictionaries in python

I got a CSV file where first row are headers, then other rows are data in columns.
I am using python to parse this data into the list of dictionaries
Normally I would use this code:
def csv_to_list_of_dictionaries(file):
with open(file) as f:
a = []
for row in csv.DictReader(f, skipinitialspace=True):
a.append({k: v for k, v in row.items()})
return a
but because data in one column are stored in dictionary, this code doesn't work (it separates key:value pairs in this dictionary
so data in my csv file looks like this:
col1,col2,col3,col4
1,{'a':'b', 'c':'d'},'bla',sometimestamp
dictionary from this is created as this: {col1:1, col2:{'a':'b', col3: 'c':'d'}, col4: 'bla'}
What I wish to have as result is: {col1:1, col2:{'a':'b', 'c':'d'}, col3: 'bla', col4: sometimestamp}

Don't use the csv module use a regular expression to extract the fields from each row. Then make dictionaries from the extracted rows.
Example file:
col1,col2,col3,col4
1,{'a':'b', 'c':'d'},'bla',sometimestamp
2,{'a':'b', 'c':'d'},'bla',sometimestamp
3,{'a':'b', 'c':'d'},'bla',sometimestamp
4,{'a':'b', 'c':'d'},'bla',sometimestamp
5,{'a':'b', 'c':'d'},'bla',sometimestamp
6,{'a':'b', 'c':'d'},'bla',sometimestamp
.
import re
pattern = r'^([^,]*),({.*}),([^,]*),([^,]*)$'
regex = re.compile(pattern,flags=re.M)
def csv_to_list_of_dictionaries(file):
with open(file) as f:
columns = next(f).strip().split(',')
stuff = regex.findall(f.read())
a = [dict(zip(columns,values)) for values in stuff]
return a
stuff = csv_to_list_of_dictionaries(f)
In [20]: stuff
Out[20]:
[{'col1': '1',
'col2': "{'a':'b', 'c':'d'}",
'col3': "'bla'",
'col4': 'sometimestamp'},
{'col1': '2',
'col2': "{'a':'b', 'c':'d'}",
'col3': "'bla'",
'col4': 'sometimestamp'},
{'col1': '3',
'col2': "{'a':'b', 'c':'d'}",
'col3': "'bla'",
'col4': 'sometimestamp'},
{'col1': '4',
'col2': "{'a':'b', 'c':'d'}",
'col3': "'bla'",
'col4': 'sometimestamp'},
{'col1': '5',
'col2': "{'a':'b', 'c':'d'}",
'col3': "'bla'",
'col4': 'sometimestamp'},
{'col1': '6',
'col2': "{'a':'b', 'c':'d'}",
'col3': "'bla'",
'col4': 'sometimestamp'}]

Convert Pandas DataFrame to dictionairy

I have a simple DataFrame:
Name Format
0 cntry int
1 dweight str
2 pspwght str
3 pweight str
4 nwspol str
I want a dictionairy as such:
{
"cntry":"int",
"dweight":"str",
"pspwght":"str",
"pweight":"str",
"nwspol":"str"
}
Where dict["cntry"] would return int or dict["dweight"] would return str.
How could I do this?

How about this:
import pandas as pd
df = pd.DataFrame({'col_1': ['A', 'B', 'C', 'D'], 'col_2': [1, 1, 2, 3], 'col_3': ['Bla', 'Foo', 'Sup', 'Asdf']})
res_dict = dict(zip(df['col_1'], df['col_3']))
Contents of res_dict:
{'A': 'Bla', 'B': 'Foo', 'C': 'Sup', 'D': 'Asdf'}

You're looking for DataFrame.to_dict()
From the documentation:
>>> df = pd.DataFrame({'col1': [1, 2],
... 'col2': [0.5, 0.75]},
... index=['row1', 'row2'])
>>> df
col1 col2
row1 1 0.50
row2 2 0.75
>>> df.to_dict()
{'col1': {'row1': 1, 'row2': 2}, 'col2': {'row1': 0.5, 'row2': 0.75}}
You can always invert an internal dictionary if it's not mapped how you'd like it to be:
inv_dict = {v: k for k, v in original_dict['Name'].items()}

I think you want is:
df.set_index('Name').to_dict()['Format']
Since you want to use the values in the Name column as the keys to your dict.
Note that you might want to do:
df.set_index('Name').astype(str).to_dict()['Format']
if you want the values of the dictionary to be strings.

Speed improvement for DataFrame of dict?

I have a dict of symbol: DataFrame. Each DataFrame is a time series with an arbitrary number of columns. I want to transform this data structure into a unique time series DataFrame (indexed by date) where each column contains the values of a symbol as a dict.
The following code does what I want, but is slow when it is performed on a dict with hundreds of symbols and DataFrames of 10k rows / 10 columns. I'm looking for ways to improve its speed.
import pandas as pd
dates = pd.bdate_range('2010-01-01', '2049-12-31')[:100]
data = {
'A': pd.DataFrame(data={'col1': range(100), 'col2': range(200, 300)}, index=dates),
'B': pd.DataFrame(data={'col1': range(100), 'col2': range(300, 400)}, index=dates),
'C': pd.DataFrame(data={'col1': range(100), 'col2': range(400, 500)}, index=dates)
}
def convert(data, name):
data = pd.concat([
pd.DataFrame(data={symbol: [dict(zip(df.columns, v)) for v in df.values]},
index=df.index)
for symbol, df in data.items()
], axis=1, join='outer')
data['type'] = name
data.index.name = 'date'
return data
result = convert(data, name='system')
result.head()
A B C type
date
2010-05-18 {'col1': 97, 'col2': 297} {'col1': 97, 'col2': 397} {'col1': 97, 'col2': 497} system
2010-05-19 {'col1': 98, 'col2': 298} {'col1': 98, 'col2': 398} {'col1': 98, 'col2': 498} system
2010-05-20 {'col1': 99, 'col2': 299} {'col1': 99, 'col2': 399} {'col1': 99, 'col2': 499} system
Any help is greatly appreciated! Thank you.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turn pandas columns into JSON string - python

You can use: df = data.agg(lambda s: dict(zip(s.index, s)), axis=1).rename('payload').to_frame() Result: # print(df) payload 0 {'col1': 'bravo', 'col2': 1, 'col3': 'alpha'} 1 {'col1': 'charlie', 'col2': 2, 'col3': 'beta'} 2 {'col1': 'price', 'col2': 3, 'col3': 'gamma'}

Use to_dict: newdata = data.T.to_dict() >>> print(newdata.values()) [ {'col2': 1, 'col3': 'alpha', 'col1': 'bravo'}, {'col2': 2, 'col3': 'beta', 'col1': 'charlie'}, {'col2': 3, 'col3': 'gamma', 'col1': 'price'} ]

Related

How to show data with different number of columns Pandas?

How to aggregate columns in pandas

Convert csv into list of dictionaries in python

Convert Pandas DataFrame to dictionairy

Speed improvement for DataFrame of dict?

Categories

Resources