I have data in the following format:
data =
[
{'data1': [{'sub_data1': 0}, {'sub_data2': 4}, {'sub_data3': 1}, {'sub_data4': -5}]},
{'data2': [{'sub_data1': 1}, {'sub_data2': 1}, {'sub_data3': 1}, {'sub_data4': 12}]},
{'data3': [{'sub_data1': 3}, {'sub_data2': 0}, {'sub_data3': 1}, {'sub_data4': 7}]},
]
How should I reorganize it so that when save it to hdf by
a = pd.DataFrame(data, columns=map(lambda x: x.name, ['data1', 'data2', 'data3']))
a.to_hdf('my_data.hdf')
I get a dataframe in the following format:
data1 data2 data3
_________________________________________
sub_data1 0 1 1
sub_data2 4 1 0
sub_data3 1 1 1
sub_data4 -5 12 7
update1: after following advice given me below and saving it an hdf file and reading it, I got this which is not what I want:
data1 data2 data3
0 {u'sub_data1': 22} {u'sub_data1': 33} {u'sub_data1': 44}
1 {u'sub_data2': 0} {u'sub_data2': 11} {u'sub_data2': 44}
2 {u'sub_data3': 12} {u'sub_data3': 16} {u'sub_data3': 19}
3 {u'sub_data4': 0} {u'sub_data4': 0} {u'sub_data4': 0}
Well if you convert your data into dictionary of dictionaries, you can then just create DataFrame very easily:
In [25]: data2 = {k: {m: n for i in v for m, n in i.iteritems()} for x in data for k, v in x.iteritems()}
In [26]: data2
Out[26]:
{'data1': {'sub_data1': 0, 'sub_data2': 4, 'sub_data3': 1, 'sub_data4': -5},
'data2': {'sub_data1': 1, 'sub_data2': 1, 'sub_data3': 1, 'sub_data4': 12},
'data3': {'sub_data1': 3, 'sub_data2': 0, 'sub_data3': 1, 'sub_data4': 7}}
In [27]: pd.DataFrame(data2)
Out[27]:
data1 data2 data3
sub_data1 0 1 3
sub_data2 4 1 0
sub_data3 1 1 1
sub_data4 -5 12 7
Related
How to transform a list of dictionary into a table.
Here is the table:
[{'wow': 1,
'item': 1,
'money': 1},
{'best': 1,
'sock': 1,
'saved': 1,
'found': 1},
{'cry': 1,
'shock': 1,
'sound': 1}]
Desired ouput:
words
n
wow
1
item
1
...
...
I have tried
pd.DataFrame(x , columns=['Words', 'n'])
However, the output that I receive is just an index with empty columns.
Any help?
You can use pandas melt
x = [{'wow': 1,
'item': 1,
'money': 1},
{'best': 1,
'sock': 1,
'saved': 1,
'found': 1},
{'cry': 1,
'shock': 1,
'sound': 1}]
df = pd.DataFrame(x)
df = df.melt().dropna().reset_index(drop=True)
df.columns = ['words', 'n']
Output:
You can use pd.columns
An example which you can refer to:
df = pd.DataFrame(someInput)
headers = ["Words", "n"]
df.columns = headers
Chaining items of these dictionaries to build may meet your requirements:
>>> x = [{'wow': 1,
... 'item': 1,
... 'money': 1},
... {'best': 1,
... 'sock': 1,
... 'saved': 1,
... 'found': 1},
... {'cry': 1,
... 'shock': 1,
... 'sound': 1}]
>>> from itertools import chain
>>> pd.DataFrame(chain.from_iterable(d.items() for d in x), columns=['words', 'n'])
words n
0 wow 1
1 item 1
2 money 1
3 best 1
4 sock 1
5 saved 1
6 found 1
7 cry 1
8 shock 1
9 sound 1
I hope this code help
tbl = [{'wow': 1, 'item': 1, 'money': 1}, {'best': 1, 'sock': 1, 'saved': 1, 'found': 1}, {'cry': 1, 'shock': 1, 'sound': 1}]
dicts = {}
for d in tbl:
dicts.update(d)
df = pd.DataFrame.from_dict(dict(word=list(dicts.keys()), n=list(dicts.values())))
I have the following data in a pandas dataframe. I want to group the data based on month, then type.
month hour Type count
0 4 0 Bike 8
1 4 0 Pedelec 16
2 4 1 Bike 9
3 4 1 Pedelec 4
4 4 2 Bike 18
... ... ... ... ...
412 12 21 Pedelec 15
413 12 22 Bike 7
414 12 22 Pedelec 10
415 12 23 Bike 2
416 12 23 Pedelec 15
I want to convert this to a nested json with field names. The code I use to create a dictionary is this:
jsonfile=barchart.groupby(['month','Type'])[['hour','count']].apply(lambda x: x.to_dict('r')).reset_index(name='data').groupby('month')['Type','data'].apply(lambda x: x.set_index('Type')['data'].to_dict()).reset_index(name='data').groupby('month')['data'].apply(list).to_dict()
The output I get is in this format:
[{'month': 4,
'values': [{'Bike': [{'hour': 0, 'count': 8},
{'hour': 1, 'count': 9},
{'hour': 2, 'count': 18},
{'hour': 3, 'count': 2},
{'hour': 4, 'count': 2},
...
{'hour': 23, 'count': 14}],
'Pedelec': [{'hour': 0, 'count': 16},
{'hour': 1, 'count': 4},
{'hour': 2, 'count': 12},
...
{'hour': 23, 'count': 27}]}]},
Expected output:
[{'month': 4,
'values': [{'Type': 'Bike': [{'hour': 0, 'count': 8},
{'hour': 1, 'count': 9},
I used the following to create my deired format
jsonfile=barchart.groupby(['month','Type'])[['hour','count']].apply(lambda x: x.to_dict('r')).reset_index(name='data').groupby('month')['Type','data'].apply(lambda x: x.set_index('Type')['data'].to_dict()).reset_index(name='data').groupby('month')['data'].apply(list).to_dict()
json_arr=[]
for month,values in jsonfile.items():
arr=[]
for value in values:
for types, val in value.items():
arr.append({"type": types, "values": val})
json_arr.append({"month": month, "values": arr} )
I am doing an exercise in which the current requirement is to "Find the top 10 major project themes (using column 'mjtheme_namecode')".
My first thought was to do group_by, then count and sort the groups.
However, the values in this column are lists of dicts, e.g.
[{'code': '1', 'name': 'Economic management'},
{'code': '6', 'name': 'Social protection and risk management'}]
and I can't (apparently) group these, at least not with group_by. I get an error.
TypeError: unhashable type: 'list'
Is there a trick? I'm guessing something along the lines of this question.
(I can group by another column that has string values and matches 1:1 with this column, but the exercise is specific.)
df.head()
There are two steps to solve your problem:
Using pandas==0.25
Flatten the list of dict
Transform dict in columns:
Step 1
df = df.explode('mjtheme_namecode')
Step 2
df = df.join(pd.DataFrame(df['mjtheme_namecode'].values.tolist())
Added: if the dict has multiple hierarchies, you can try using json_normalize:
from pandas.io.json import json_normalize
df = df.join(json_normalize(df['mjtheme_namecode'].values.tolist())
The only issue here is pd.explode will duplicate all other columns (in case that is an issue).
Using sample data:
x = [
[1,2,[{'a':1, 'b':3},{'a':2, 'b':4}]],
[1,3,[{'a':5, 'b':6},{'a':7, 'b':8}]]
]
df = pd.DataFrame(x, columns=['col1','col2','col3'])
Out[1]:
col1 col2 col3
0 1 2 [{'a': 1, 'b': 3}, {'a': 2, 'b': 4}]
1 1 3 [{'a': 5, 'b': 6}, {'a': 7, 'b': 8}]
## Step 1
df.explode('col3')
Out[2]:
col1 col2 col3
0 1 2 {'a': 1, 'b': 3}
0 1 2 {'a': 2, 'b': 4}
1 1 3 {'a': 5, 'b': 6}
1 1 3 {'a': 7, 'b': 8}
## Step 2
df = df.join(pd.DataFrame(df['col3'].values.tolist()))
Out[3]:
col1 col2 col3 a b
0 1 2 {'a': 1, 'b': 3} 1 3
0 1 2 {'a': 2, 'b': 4} 1 3
1 1 3 {'a': 5, 'b': 6} 2 4
1 1 3 {'a': 7, 'b': 8} 2 4
## Now you can group with the new variables
I have a dataframe of the form:
SpeciesName 0
0 A [[Year: 1, Quantity: 2],[Year: 3, Quantity: 4...]]
1 B [[Year: 1, Quantity: 7],[Year: 2, Quantity: 15...]]
2 C [[Year: 2, Quantity: 9],[Year: 4, Quantity: 13...]]
I'm attempting to try and create a MultiIndex that uses the SpeciesName and the year as the index:
SpeciesName Year
A 1 Data
2 Data
B 1 Data
2 Data
I have not been able to get pandas.MultiIndex(..) to work and my attempts at iterating through the dataset and manually creating a new object have not been very fruitful. Any insights would be greatly appreciated!
I'm going to assume your data is list of dictionaries... because if I don't, what you've written makes no sense unless they are strings and I don't want to parse strings
df = pd.DataFrame([
['A', [dict(Year=1, Quantity=2), dict(Year=3, Quantity=4)]],
['B', [dict(Year=1, Quantity=7), dict(Year=2, Quantity=15)]],
['C', [dict(Year=2, Quantity=9), dict(Year=4, Quantity=13)]]
], columns=['SpeciesName', 0])
df
SpeciesName 0
0 A [{'Year': 1, 'Quantity': 2}, {'Year': 3, 'Quantity': 4}]
1 B [{'Year': 1, 'Quantity': 7}, {'Year': 2, 'Quantity': 15}]
2 C [{'Year': 2, 'Quantity': 9}, {'Year': 4, 'Quantity': 13}]
Then the solution is obvious
pd.DataFrame.from_records(
*zip(*(
[d, s]
for s, l in zip(
df['SpeciesName'], df[0].values.tolist())
for d in l
))
).set_index('Year', append=True)
Quantity
Year
A 1 2
3 4
B 1 7
2 15
C 2 9
4 13
I have a dataframe containing id and list of dicts:
df = pd.DataFrame({
'list_of_dicts': [[{'a': 1, 'b': 2}, {'a': 11, 'b': 22}],
[{'a': 3, 'b': 4}, {'a': 33, 'b': 44}]],
'id': [100, 200]
})
and I want to normalize it like this:
id a b
0 100 1 2
0 100 3 4
1 200 11 22
1 200 33 44
This gets most of the way:
pd.concat([
pd.DataFrame.from_dict(item)
for item in df.list_of_dicts
])
but is missing the id column.
I'm most interested in readability.
How about something like this:
d = {
'list_of_dicts': [[{'a': 1, 'b': 2}, {'a': 11, 'b': 22}],
[{'a': 3, 'b': 4}, {'a': 33, 'b': 44}]],
'id': [100, 200]
}
df = pd.DataFrame([pd.Series(x) for ld in d['list_of_dicts'] for x in ld])
id = [[x]*len(l) for l,x in zip(d['list_of_dicts'],d['id'])]
df['id'] = pd.Series([x for l in id for x in l])
EDIT - Here's a simpler version
t = [[('id', i)]+list(l.items()) for i in d['id'] for ll in d['list_of_dicts'] for l in ll]
df = pd.DataFrame([dict(x) for x in t])
And, if you really want the id column first, you can change dict to OrderedDict from the collections module.
This is what I call an incomprehension
pd.DataFrame(
*list(map(list, zip(
*[(d, i) for i, l in zip(df.id, df.list_of_dicts) for d in l]
)))
).rename_axis('id').reset_index()
id a b
0 100 1 2
1 100 11 22
2 200 3 4
3 200 33 44