How to create a nested json with key names in python? - python

I have the following data in a pandas dataframe. I want to group the data based on month, then type.
month hour Type count
0 4 0 Bike 8
1 4 0 Pedelec 16
2 4 1 Bike 9
3 4 1 Pedelec 4
4 4 2 Bike 18
... ... ... ... ...
412 12 21 Pedelec 15
413 12 22 Bike 7
414 12 22 Pedelec 10
415 12 23 Bike 2
416 12 23 Pedelec 15
I want to convert this to a nested json with field names. The code I use to create a dictionary is this:
jsonfile=barchart.groupby(['month','Type'])[['hour','count']].apply(lambda x: x.to_dict('r')).reset_index(name='data').groupby('month')['Type','data'].apply(lambda x: x.set_index('Type')['data'].to_dict()).reset_index(name='data').groupby('month')['data'].apply(list).to_dict()
The output I get is in this format:
[{'month': 4,
'values': [{'Bike': [{'hour': 0, 'count': 8},
{'hour': 1, 'count': 9},
{'hour': 2, 'count': 18},
{'hour': 3, 'count': 2},
{'hour': 4, 'count': 2},
...
{'hour': 23, 'count': 14}],
'Pedelec': [{'hour': 0, 'count': 16},
{'hour': 1, 'count': 4},
{'hour': 2, 'count': 12},
...
{'hour': 23, 'count': 27}]}]},
Expected output:
[{'month': 4,
'values': [{'Type': 'Bike': [{'hour': 0, 'count': 8},
{'hour': 1, 'count': 9},

I used the following to create my deired format
jsonfile=barchart.groupby(['month','Type'])[['hour','count']].apply(lambda x: x.to_dict('r')).reset_index(name='data').groupby('month')['Type','data'].apply(lambda x: x.set_index('Type')['data'].to_dict()).reset_index(name='data').groupby('month')['data'].apply(list).to_dict()
json_arr=[]
for month,values in jsonfile.items():
arr=[]
for value in values:
for types, val in value.items():
arr.append({"type": types, "values": val})
json_arr.append({"month": month, "values": arr} )

Related

Pandas: how to reorganize data in dictionaries

I have a dataframe object structured flat like this (actually I have more than 6 variables)
VAR1 VAR2 VAR3 VAR4 VAR5 VAR6
0 1 2 3 4 5 6
1 2 4 6 8 10 12
2 3 6 9 12 15 18
3 4 8 12 16 20 24
4 5 10 15 20 25 30
5 6 12 18 24 30 36
However, in order to get compatibility with other applications I would like to have a structure like this
NEW1 NEW2 NEW3
0 {"id":{"VAR1": 1}} {"AA":{"VAR2": 2, "VAR3": 3}, "CC":{"BB":{"VAR4": 4, "VAR5": 5}}} {"TS": 6}
1 {"id":{"VAR1": 2}} {"AA":{"VAR2": 4, "VAR3": 6}, "CC":{"BB":{"VAR4": 8, "VAR5":10}}} {"TS":12}
2 {"id":{"VAR1": 3}} {"AA":{"VAR2": 6, "VAR3": 9}, "CC":{"BB":{"VAR4":12, "VAR5":15}}} {"TS":18}
3 {"id":{"VAR1": 4}} {"AA":{"VAR2": 8, "VAR3":12}, "CC":{"BB":{"VAR4":16, "VAR5":20}}} {"TS":24}
4 {"id":{"VAR1": 5}} {"AA":{"VAR2":10, "VAR3":15}, "CC":{"BB":{"VAR4":20, "VAR5":25}}} {"TS":30}
5 {"id":{"VAR1": 6}} {"AA":{"VAR2":12, "VAR3":18}, "CC":{"BB":{"VAR4":24, "VAR5":30}}} {"TS":36}
Is there any easy way to achieve this result?
I have tried to use df.to_dict("index"), but it groups all the variables together while I need to split the dict into "subdictionaries" and associate them to these variables "AAA", "BBB", "id", "TS"
thank you for the tips and suggestions
Create dictionaries defining the relationships of variables to other labels
replace = {'VAR6': 'TS'}
graph = {
'VAR1': 'id', 'VAR2': 'AA', 'VAR3': 'AA',
'VAR4': 'BB', 'VAR5': 'BB',
'BB': 'CC',
'id': 'NEW1', 'AA': 'NEW2', 'CC': 'NEW2', 'TS': 'NEW3'}
def traverse(k, d):
path = []
while k in d:
k = d[k]
path.append(k)
return path
dat = {}
for i, rec in zip(df.index, df.to_dict('records')):
for k, v in rec.items():
k = replace.get(k, k)
cur = dat
path = traverse(k, thing)
cur = dat.setdefault(path.pop(), {}).setdefault(i, {})
while path:
cur = cur.setdefault(path.pop(), {})
cur[k] = v
pd.DataFrame(dat)
NEW1 NEW2 NEW3
0 {'id': {'VAR1': 1}} {'AA': {'VAR2': 2, 'VAR3': 3}, 'CC': {'BB': {'VAR4': 4, 'VAR5': 5}}} {'TS': 6}
1 {'id': {'VAR1': 2}} {'AA': {'VAR2': 4, 'VAR3': 6}, 'CC': {'BB': {'VAR4': 8, 'VAR5': 10}}} {'TS': 12}
2 {'id': {'VAR1': 3}} {'AA': {'VAR2': 6, 'VAR3': 9}, 'CC': {'BB': {'VAR4': 12, 'VAR5': 15}}} {'TS': 18}
3 {'id': {'VAR1': 4}} {'AA': {'VAR2': 8, 'VAR3': 12}, 'CC': {'BB': {'VAR4': 16, 'VAR5': 20}}} {'TS': 24}
4 {'id': {'VAR1': 5}} {'AA': {'VAR2': 10, 'VAR3': 15}, 'CC': {'BB': {'VAR4': 20, 'VAR5': 25}}} {'TS': 30}
5 {'id': {'VAR1': 6}} {'AA': {'VAR2': 12, 'VAR3': 18}, 'CC': {'BB': {'VAR4': 24, 'VAR5': 30}}} {'TS': 36}

Sum a Pandas DataFrame column under the ranges of another DataFrame

I have two DataFrames DF1 and DF2, and I want to aggregate the values of one column in DF1 under the date ranges of a column in DF2. Here is my reproducible example:
DF1 ranges from 6/14/2013 to 7/13/2013, and is sorted descending in time. Its columns to be aggregated are a and b. Notice, there can be multiple records for the same date.
list1 = [{'a': 5, 'date': '7/13/2013', 'b': 13},
{'a': 4, 'date': '7/12/2013', 'b': 14},
{'a': 7, 'date': '7/12/2013', 'b': 12},
{'a': 2, 'date': '7/10/2013', 'b': 18},
{'a': 9, 'date': '7/7/2013', 'b': 17},
{'a': 6, 'date': '7/5/2013', 'b': 20},
{'a': 8, 'date': '6/30/2013', 'b': 12},
{'a': 5, 'date': '6/29/2013', 'b': 13},
{'a': 3, 'date': '6/25/2013', 'b': 13},
{'a': 4, 'date': '6/23/2013', 'b': 10},
{'a': 1, 'date': '6/22/2013', 'b': 16},
{'a': 6, 'date': '6/20/2013', 'b': 19},
{'a': 7, 'date': '6/18/2013', 'b': 12},
{'a': 9, 'date': '6/16/2013', 'b': 15}]
DF1 = pd.DataFrame(list1)
DF2 contains the weekly date separators, for which the DF1 columns a and b should be aggregated.
list2 = [{'datesep': '6/22/2013', 'c': 32},
{'datesep': '6/29/2013', 'c': 23},
{'datesep': '7/6/2013', 'c': 44},
{'datesep': '7/13/2013', 'c': 18},
{'datesep': '7/20/2013', 'c': 51}]
DF2 = pd.DataFrame(list2)
What I want to do is keep DF1.c as is, and aggregate DF1.a and DF1.b so that the values get summed at the DF2.datesep separator just above their DF1.date. That is, the values of DF1.a and DF1.b from 6/16/2013 to 6/22/2013 (both inclusive) should be aggregated at the closest next date separator, which is DF2.datesep=6/22/2013 row. 7/7/2013 to 7/13/2013 (both inclusive) should be aggregated at the closest next date separator, which is DF2.datesep=7/13/2013 row etc. Therefore the result should look like (column orders don't matter):
c date a_sum b_sum
0 32 6/22/2013 23 62
1 23 6/29/2013 12 36
2 44 7/6/2013 14 32
3 18 7/13/2013 27 74
4 51 7/20/2013 - -
I did this with a loop on list1 and list2, but is there a Pandas/Numpy solution that utilizes DF1 and DF2? Thank you!
First you need to convert the date strings to actual date. Then you can use a lambda to calculate a_sum and b_sum for each row. Finally combine the sum df to DF2:
DF1.date = pd.to_datetime(DF1.date)
DF2['end'] = pd.to_datetime(DF2.datesep)
DF2['start'] = DF2.end.shift(1).fillna(pd.to_datetime('1970-01-01'))
sums = DF2.apply(lambda x: DF1.loc[DF1.date.gt(x.start) & DF1.date.le(x.end)][['a','b']].sum(), axis=1)
sums.columns=['a_sum','b_sum']
pd.concat([DF2[['c','datesep']],sums],1)
c datesep a_sum b_sum
0 32 6/22/2013 23 62
1 23 6/29/2013 12 36
2 44 7/6/2013 14 32
3 18 7/13/2013 27 74
4 51 7/20/2013 0 0

Converting to Pandas MultiIndex

I have a dataframe of the form:
SpeciesName 0
0 A [[Year: 1, Quantity: 2],[Year: 3, Quantity: 4...]]
1 B [[Year: 1, Quantity: 7],[Year: 2, Quantity: 15...]]
2 C [[Year: 2, Quantity: 9],[Year: 4, Quantity: 13...]]
I'm attempting to try and create a MultiIndex that uses the SpeciesName and the year as the index:
SpeciesName Year
A 1 Data
2 Data
B 1 Data
2 Data
I have not been able to get pandas.MultiIndex(..) to work and my attempts at iterating through the dataset and manually creating a new object have not been very fruitful. Any insights would be greatly appreciated!
I'm going to assume your data is list of dictionaries... because if I don't, what you've written makes no sense unless they are strings and I don't want to parse strings
df = pd.DataFrame([
['A', [dict(Year=1, Quantity=2), dict(Year=3, Quantity=4)]],
['B', [dict(Year=1, Quantity=7), dict(Year=2, Quantity=15)]],
['C', [dict(Year=2, Quantity=9), dict(Year=4, Quantity=13)]]
], columns=['SpeciesName', 0])
df
SpeciesName 0
0 A [{'Year': 1, 'Quantity': 2}, {'Year': 3, 'Quantity': 4}]
1 B [{'Year': 1, 'Quantity': 7}, {'Year': 2, 'Quantity': 15}]
2 C [{'Year': 2, 'Quantity': 9}, {'Year': 4, 'Quantity': 13}]
Then the solution is obvious
pd.DataFrame.from_records(
*zip(*(
[d, s]
for s, l in zip(
df['SpeciesName'], df[0].values.tolist())
for d in l
))
).set_index('Year', append=True)
Quantity
Year
A 1 2
3 4
B 1 7
2 15
C 2 9
4 13

Convert dataframe to dictionary in Python

I have a csv file that I converted into dataframe using Pandas. Here's the dataframe:
Customer ProductID Count
John 1 50
John 2 45
Mary 1 75
Mary 2 10
Mary 5 15
I need an output in the form of a dictionary that looks like this:
{ProductID:1, Count:{John:50, Mary:75}},
{ProductID:2, Count:{John:45, Mary:10}},
{ProductID:5, Count:{John:0, Mary:15}}
I read the following answers:
python pandas dataframe to dictionary
and
Convert dataframe to dictionary
This is the code that I'm having:
df = pd.read_csv('customer.csv')
dict1 = df.set_index('Customer').T.to_dict('dict')
dict2 = df.to_dict(orient='records')
and this is my current output:
dict1 = {'John': {'Count': 45, 'ProductID': 2}, 'Mary': {'Count': 15, 'ProductID': 5}}
dict2 = [{'Count': 50, 'Customer': 'John', 'ProductID': 1},
{'Count': 45, 'Customer': 'John', 'ProductID': 2},
{'Count': 75, 'Customer': 'Mary', 'ProductID': 1},
{'Count': 10, 'Customer': 'Mary', 'ProductID': 2},
{'Count': 15, 'Customer': 'Mary', 'ProductID': 5}]
IIUC you can use:
d = df.groupby('ProductID').apply(lambda x: dict(zip(x.Customer, x.Count)))
.reset_index(name='Count')
.to_dict(orient='records')
print (d)
[{'ProductID': 1, 'Count': {'John': 50, 'Mary': 75}},
{'ProductID': 2, 'Count': {'John': 45, 'Mary': 10}},
{'ProductID': 5, 'Count': {'Mary': 15}}]

Reorganizing the data in a dataframe

I have data in the following format:
data =
[
{'data1': [{'sub_data1': 0}, {'sub_data2': 4}, {'sub_data3': 1}, {'sub_data4': -5}]},
{'data2': [{'sub_data1': 1}, {'sub_data2': 1}, {'sub_data3': 1}, {'sub_data4': 12}]},
{'data3': [{'sub_data1': 3}, {'sub_data2': 0}, {'sub_data3': 1}, {'sub_data4': 7}]},
]
How should I reorganize it so that when save it to hdf by
a = pd.DataFrame(data, columns=map(lambda x: x.name, ['data1', 'data2', 'data3']))
a.to_hdf('my_data.hdf')
I get a dataframe in the following format:
data1 data2 data3
_________________________________________
sub_data1 0 1 1
sub_data2 4 1 0
sub_data3 1 1 1
sub_data4 -5 12 7
update1: after following advice given me below and saving it an hdf file and reading it, I got this which is not what I want:
data1 data2 data3
0 {u'sub_data1': 22} {u'sub_data1': 33} {u'sub_data1': 44}
1 {u'sub_data2': 0} {u'sub_data2': 11} {u'sub_data2': 44}
2 {u'sub_data3': 12} {u'sub_data3': 16} {u'sub_data3': 19}
3 {u'sub_data4': 0} {u'sub_data4': 0} {u'sub_data4': 0}
Well if you convert your data into dictionary of dictionaries, you can then just create DataFrame very easily:
In [25]: data2 = {k: {m: n for i in v for m, n in i.iteritems()} for x in data for k, v in x.iteritems()}
In [26]: data2
Out[26]:
{'data1': {'sub_data1': 0, 'sub_data2': 4, 'sub_data3': 1, 'sub_data4': -5},
'data2': {'sub_data1': 1, 'sub_data2': 1, 'sub_data3': 1, 'sub_data4': 12},
'data3': {'sub_data1': 3, 'sub_data2': 0, 'sub_data3': 1, 'sub_data4': 7}}
In [27]: pd.DataFrame(data2)
Out[27]:
data1 data2 data3
sub_data1 0 1 3
sub_data2 4 1 0
sub_data3 1 1 1
sub_data4 -5 12 7

Categories