I have a data frame which looks like following
Date Top
A B
2018-09-30 1.2 2.3
2018-10-01 1.5 1.7
2018-10-02 2.3 2.8
2018-10-03 7.7 7.5
2018-10-04 1.1 0.9
2018-10-05 2.1 6.5
So I have multi-index in the columns, only two columns 'Date' and 'Top' and then 'Top' has two level 1 columns 'A' and 'B'.
I am trying to convert them into python dictionary.
when I am using
df_dict = df.to_dict(orient = 'index')
I get an output
{0: {('Top', 'A'): 1.2, ('Top', 'B'): 2.3, ('date', ''): '2018-09-30'},
1: {('Top', 'A'): 1.5, ('Top', 'B'): 1.7, ('date', ''): '2018-10-01'},
2: {('Top', 'A'): 2.3, ('Top', 'B'): 2.8, ('date', ''): '2018-10-02'},
3: {('Top', 'A'): 7.7, ('Top', 'B'): 7.5, ('date', ''): '2018-10-03'},
4: {('Top', 'A'): 1.1, ('Top', 'B'): 0.9, ('date', ''): '2018-10-04'},
5: {('Top', 'A'): 2.1, ('Top', 'B'): 6.5, ('date', ''): '2018-10-05'}}
Now I can access df_dict with following script which give me an output of 1.2
df_dict[1]['Top']['Top','A']
But I am looking for output with this script
df_dict[1]['Top']
Output: A:1.2, B:2.3
since 'Top' is not a key inside the first [1] key-value pair. So that I can access all 'Top' easily for a date.
Thanks for all the help
You can use dict comprehension with filtering by first level Top:
df_dict = df.to_dict(orient = 'index')
out = {k2: v for (k1, k2), v in df_dict[0].items() if k1 == 'Top'}
print (out)
{'A': 1.2, 'B': 2.3}
Simplier is use pandas for select by index value and first level of MultiIndex and then create dict:
print (df.loc[0, 'Top'])
A 1.2
B 2.3
Name: 0, dtype: object
out = df.loc[0, 'Top'].to_dict()
print (out)
{'A': 1.2, 'B': 2.3}
EDIT:
print (df)
A B
2018-09-30 1.2 2.3
2018-10-01 1.5 1.7
2018-10-02 2.3 2.8
2018-10-03 7.7 7.5
2018-10-04 1.1 0.9
2018-10-05 2.1 6.5
df.index.name = 'date'
df = df.reset_index()
#set MultiIndex for each columns for avoid empty strings keys
df.columns = [['d','Top', 'Top'], df.columns]
#for each first level of MultiIndex create dictionary
#also add new level to outer level of dict
out = {x:df[x].to_dict(orient = 'index') for x in df.columns.levels[0]}
print (out)
{'Top': {0: {'A': 1.2, 'B': 2.3}, 1: {'A': 1.5, 'B': 1.7}, 2: {'A': 2.3, 'B': 2.8},
3: {'A': 7.7, 'B': 7.5}, 4: {'A': 1.1, 'B': 0.9}, 5: {'A': 2.1, 'B': 6.5}},
'd': {0: {'date': '2018-09-30'}, 1: {'date': '2018-10-01'},
2: {'date': '2018-10-02'}, 3: {'date': '2018-10-03'},
4: {'date': '2018-10-04'}, 5: {'date': '2018-10-05'}}}
print (out['Top'][0])
{'A': 1.2, 'B': 2.3}
Related
I have this dataframe:
id value
0 10.2
1 5.7
2 7.4
With id being the index. I want to have such output:
{'0': 10.2, '1': 5.7, '2': 7.4}
How to do this in python?
Use to_dict on the column:
>>> df['value'].to_dict()
{0: 10.2, 1: 5.7, 2: 7.4}
If you need the keys as strings:
>>> df.set_index(df.index.astype(str))['value'].to_dict()
{'0': 10.2, '1': 5.7, '2': 7.4}
I have checked the advicse here: Nested dictionary to multiindex dataframe where dictionary keys are column labels
However, I couldn't get it to work in my problem.
I would like to change a dictionary into multiindexed dataframe, where 'a','b','c' are names of multiindexes, their values 12,0.8,1.8,bla1,bla2,bla3,bla4 are multiindexes and values from lists are assign to the multiindexes as in the picture of table below.
My dictionary:
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
The result DataFrame I would like to get:
The code which don't make multiindexes and set every values under each other in next row:
df_a = pd.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
df_b = pd.DataFrame(df_a[0].values.tolist(), index=df_a.index)
Use ast.literal_eval to convert each string into a dictionary and build the index from there:
import pandas as pd
from ast import literal_eval
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
keys, data = zip(*dictionary.items())
index = pd.MultiIndex.from_frame(pd.DataFrame([literal_eval(i) for i in keys]))
res = pd.DataFrame(data=list(data), index=index)
print(res)
Output
0 1 2
a b c
12.0 0.8 bla1 200 0.0 0.0
bla2 37 44.0 0.6
1.8 bla3 100 2.0 1.0
bla4 400 3.0 1.0
This question is similar to this one, but I want to take it a step further. Is it possible to extend the solution to work with more levels? Multilevel dataframes' .to_dict() method has some promising options, but most of them will return entries that are indexed by tuples (i.e. (A, 0, 0): 274.0) rather than nesting them in dictionaries.
For an example of what I'm looking to accomplish, consider this multiindex dataframe:
data = {0: {
('A', 0, 0): 274.0,
('A', 0, 1): 19.0,
('A', 1, 0): 67.0,
('A', 1, 1): 12.0,
('B', 0, 0): 83.0,
('B', 0, 1): 45.0
},
1: {
('A', 0, 0): 254.0,
('A', 0, 1): 11.0,
('A', 1, 0): 58.0,
('A', 1, 1): 11.0,
('B', 0, 0): 76.0,
('B', 0, 1): 56.0
}
}
df = pd.DataFrame(data).T
df.index = ['entry1', 'entry2']
df
# output:
A B
0 1 0
0 1 0 1 0 1
entry1 274.0 19.0 67.0 12.0 83.0 45.0
entry2 254.0 11.0 58.0 11.0 76.0 56.0
You can imagine that we have many records here, not just two, and that the index names could be longer strings. How could you turn this into nested dictionaries (or directly to JSON) that look like this:
[
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
'B': {0: {0: 83.0, 1: 45.0}}},
'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
'B': {0: {0: 76.0, 1: 56.0}}}}
]
I'm thinking some amount of recursion could potentially be helpful, maybe something like this, but have so far been unsuccessful.
So, you really need to do 2 things here:
df.to_dict()
Convert this to nested dictionary.
df.to_dict(orient='index') gives you a dictionary with the index as keys; it looks like this:
>>> df.to_dict(orient='index')
{'entry1': {('A', 0, 0): 274.0,
('A', 0, 1): 19.0,
('A', 1, 0): 67.0,
('A', 1, 1): 12.0,
('B', 0, 0): 83.0,
('B', 0, 1): 45.0},
'entry2': {('A', 0, 0): 254.0,
('A', 0, 1): 11.0,
('A', 1, 0): 58.0,
('A', 1, 1): 11.0,
('B', 0, 0): 76.0,
('B', 0, 1): 56.0}}
Now you need to nest this. Here's a trick from Martijn Pieters to do that:
def nest(d: dict) -> dict:
result = {}
for key, value in d.items():
target = result
for k in key[:-1]: # traverse all keys but the last
target = target.setdefault(k, {})
target[key[-1]] = value
return result
Putting this all together:
def df_to_nested_dict(df: pd.DataFrame) -> dict:
d = df.to_dict(orient='index')
return {k: nest(v) for k, v in d.items()}
Output:
>>> df_to_nested_dict(df)
{'entry1': {'A': {0: {0: 274.0, 1: 19.0}, 1: {0: 67.0, 1: 12.0}},
'B': {0: {0: 83.0, 1: 45.0}}},
'entry2': {'A': {0: {0: 254.0, 1: 11.0}, 1: {0: 58.0, 1: 11.0}},
'B': {0: {0: 76.0, 1: 56.0}}}}
I took the idea from the previous answer and slightly modified it.
1) Took the function nested_dict from stackoverflow, to create the dictionary
from collections import defaultdict
def nested_dict(n, type):
if n == 1:
return defaultdict(type)
else:
return defaultdict(lambda: nested_dict(n-1, type))
2 Wrote the following function:
def df_to_nested_dict(self, df, type):
# Get the number of levels
temp = df.index.names
lvl = len(temp)
# Create the target dictionary
new_nested_dict=nested_dict(lvl, type)
# Convert the dataframe to a dictionary
temp_dict = df.to_dict(orient='index')
for x, y in temp_dict.items():
dict_keys = ''
# Process the individual items from the key
for item in x:
dkey = '[%d]' % item
dict_keys = dict_keys + dkey
# Create a string and execute it
dict_update = 'new_nested_dict%s = y' % dict_keys
exec(dict_update)
return new_nested_dict
It is the same idea but it is done slightly different
I have a data frame with three columns, I would like to create a dictionary after applying groupby function on first and second column.I can do this by for loops, but is there any pandas way of doing it?
DataFrame:
Col X Col Y Sum
A a 3
A b 2
A c 1
B p 5
B q 6
B r 7
After grouping by on Col X and Col Y : df.groupby(['Col X','Col Y']).sum()
Sum
Col X Col Y
A a 3
b 2
c 1
B p 5
q 6
r 7
Dictionary I want to create
{A:{'a':3,'b':2,'c':1}, B:{'p':5,'q':6,'r':7}}
Use a dictionary comprehension while iterating via a groupby object
{name: dict(zip(g['Col Y'], g['Sum'])) for name, g in df.groupby('Col X')}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
If you insisted on using to_dict somewhere, you could do something like this:
s = df.set_index(['Col X', 'Col Y']).Sum
{k: s.xs(k).to_dict() for k in s.index.levels[0]}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
Keep in mind, that the to_dict method is just using some comprehension under the hood. If you have a special use case that requires something more than what the orient options provide for... there is no shame in constructing your own comprehension.
You can iterate over the MultiIndex series:
>>> s = df.set_index(['ColX', 'ColY'])['Sum']
>>> {k: v.reset_index(level=0, drop=True).to_dict() for k, v in s.groupby(level=0)}
{'A': {'a': 3, 'b': 2, 'c': 1}, 'B': {'p': 5, 'q': 6, 'r': 7}}
#A to_dict() solution
d = df.groupby(['Col X','Col Y']).sum().reset_index().pivot(columns='Col X',values='Sum').to_dict()
Out[70]:
{'A': {0: 3.0, 1: 2.0, 2: 1.0, 3: nan, 4: nan, 5: nan},
'B': {0: nan, 1: nan, 2: nan, 3: 5.0, 4: 6.0, 5: 7.0}}
#if you need to get rid of the nans:
{k1:{k2:v2 for k2,v2 in v1.items() if pd.notnull(v2)} for k1,v1 in d.items()}
Out[73]: {'A': {0: 3.0, 1: 2.0, 2: 1.0}, 'B': {3: 5.0, 4: 6.0, 5: 7.0}}
I have csv file in the following format
a b c d
1 12.0 3.5 4.3 5.9
2 13.0 5.7 2.8 5.2
3 14.0 6.4 9.7 2.3
4 15.0 6.8 4.7 3.4
I want to export rows into a python array of arrays. Here is the pseudocode:
a = read csv
b[][] = a float 2d array that is 1x4
import rows into b
the output of b should be:
[[12.0,3.5,4.3,5.9],[13.0,5.7,2.8,5.2],[14.0,6.4,9.7,2.3],[15.0,6.8,4.7,3.4]]
how would I do this? Please let me know if you need any other clarification. Thank you.
Problems:
all rows are NOT of same size. some rows have 10 elements and others may have 7 or 8 or 9.
This is what I have:
import csv
def main():
a = range(4)
x = 0
with open('test.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
a[x] = row
x += 1
print a
Output:
[['13,4.2,2.4,5,6.4'], ['14,3.2,3.4,5.6,7.2'], ['15,8.5,3.7,8.5,0.75'], ['16,5.4,8.3,3.5,5.4']]
How do I make the arrays turn from string into floats?
Using module csv.DictReader to skip empty lines and get a list of dictionaries:
In [131]: import csv
...: with open('a.csv') as f:
...: lst=list(csv.DictReader(f))
In [132]: lst
Out[132]:
[{'a': '12.0', 'b': '3.5', 'c': '4.3', 'd': '5.9'},
{'a': '13.0', 'b': '5.7', 'c': '2.8', 'd': '5.2'},
{'a': '14.0', 'b': '6.4', 'c': '9.7', 'd': '2.3'},
{'a': '15.0', 'b': '6.8', 'c': '4.7', 'd': '3.4'}]
In [134]: [{k:float(d[k]) for k in d} for d in lst] #convert values to floats
Out[134]:
[{'a': 12.0, 'b': 3.5, 'c': 4.3, 'd': 5.9},
{'a': 13.0, 'b': 5.7, 'c': 2.8, 'd': 5.2},
{'a': 14.0, 'b': 6.4, 'c': 9.7, 'd': 2.3},
{'a': 15.0, 'b': 6.8, 'c': 4.7, 'd': 3.4}]
EDIT:
to get a list of list:
In [143]: with open('a.csv') as f:
...: cr=csv.reader(f)
...: skip=next(cr) #skip the first row of keys "a,b,c,d"
...: print [map(float, l) for l in cr]
...:
[[12.0, 3.5, 4.3, 5.9], [13.0, 5.7, 2.8, 5.2], [14.0, 6.4, 9.7, 2.3], [15.0, 6.8, 4.7, 3.4]]