How to convert dataframe into dictionary python - python

I have a dataframe that looks like
Total_Time_words Words
0 1.50 your
1 2.15 intention
2 2.75 is
3 3.40 dangerous
4 3.85 for
when I use this code:
new.set_index('Words').T.to_dict('records')
I get this output below:
[{'your': 1.5,
'intention': 2.15,
'is': 2.75,
'dangerous': 3.4,
'for': 3.85,
'my': 4.0,
'world': 4.3}]
But this is my expected output below:
[
{
1.50:"your"
},
{
2.15:"intention"
}
]

You can use list comprehension with zip as below:
new_dict = [{k:v} for k,v in zip(df["Total_Time_words"], df["words"])]
print(new_dict)

Related

How to convert Pandas Dataframe to list of dict for each row

Is there any possible way to convert pandas Dataframe to dict with list for each row?
Open High Low Close
2021-12-15 12:30:00 1.9000 1.91 1.86 1.8850
2021-12-15 13:30:00 1.8881 1.95 1.88 1.9400
2021-12-15 14:30:00 1.9350 1.95 1.86 1.8956
The output I want
{x:2021-12-15 12:30:00, y:\[1.9000,1.91,1.86,1.8850\]}
{x:2021-12-15 13:30:00, y:\[1.8881,1.95,1.88,1.9400\]}
{x:2021-12-15 14:30:00, y:\[1.9350,1.95,1.86,1.8956\]}
You can use:
dictt=list(zip(df.index,df[['Open','High','Low','Close']].values.tolist()))
final =[{'x':i[0], 'y':i[1]} for i in dictt]
or without loop:
df['y']=df[['Open','High','Low','Close']].values.tolist()
final = df.reset_index().rename(columns={'index':'x'})[['x','y']].to_dict('records')
Output:
[
{
"x":"2021-12-15 12:30:00",
"y":[
1.9,
1.91,
1.86,
1.885
]
},
{
"x":"2021-12-15 13:30:00",
"y":[
1.8881,
1.95,
1.88,
1.94
]
},
{
"x":"2021-12-15 14:30:00",
"y":[
1.935,
1.95,
1.86,
1.8956
]
}
]
If you want to convert a dataframe to a list of dict,you simply need to specify orient='index' ... So in your case if:
df = pd.DataFrame({'o':[1,2,3],'l':[4,5,6],'x':[7,8,9]},index=['t1','t2','t3'])
then you can do:
[{'x':k,'y':list(v.values())} for k,v in df.to_dict(orient='index').items()]
or also:
df2 = pd.DataFrame(df.apply(lambda x:list(x[df.columns]), axis=1))
list(df2.reset_index().rename(columns={'index':'x',0:'y'}).to_dict(orient='index').values())
Either results to:
[{'x': 't1', 'y': [1, 4, 7]},
{'x': 't2', 'y': [2, 5, 8]},
{'x': 't3', 'y': [3, 6, 9]}]

dataframe to dict in python

I have this dataframe:
id value
0 10.2
1 5.7
2 7.4
With id being the index. I want to have such output:
{'0': 10.2, '1': 5.7, '2': 7.4}
How to do this in python?
Use to_dict on the column:
>>> df['value'].to_dict()
{0: 10.2, 1: 5.7, 2: 7.4}
If you need the keys as strings:
>>> df.set_index(df.index.astype(str))['value'].to_dict()
{'0': 10.2, '1': 5.7, '2': 7.4}

most pythonic way to transform dataframe into nested custom dict with tuple-style keys out of two columns

I have the following dataframe:
import pandas as pd
df = pd.DataFrame({
"color": ["blue", "blue", "blue", "blue", "green", "green", "green", "green"],
"object": ["hat", "hat", "coat", "coat", "hat", "hat", "coat", "coat"],
"group": [1, 2, 1, 2, 1, 2, 1, 2],
"value": [1.2 , 3.5, 5.4, 7.1, 6.4, 1.8, 3.5, 5.6]
})
that looks like this:
I want to create a nested dict, with the columns "color" and "object" as the first key as a string (e.g. "(blue, hat)" (Note: This is a syntactically incorrect tuple with intention. It should be in string format!), the group as the key in the second level and the value as the key of the second level. I.e. my desired output is:
{
"(blue, hat)": {
1: 1.2,
2: 3.5
},
"(blue, coat)": {
1: 5.4,
2: 7.1
},
"(green, hat)": {
1: 6.4,
2: 1.8
},
"(green, coat)": {
1: 3.5,
2: 5.6
}
}
My approach would be to loop through the unique values of color, object and group, but that seems cumbersome to me. Is there a more pythonic approach?
Use dictionary comprehension with DataFrame.groupby, if need tuples rept like strings use:
d = {str(k): v.set_index('group')['value'].to_dict()
for k, v in df.groupby(['color','object'])}
print (d)
{
"('blue', 'coat')": {
1: 5.4,
2: 7.1
},
"('blue', 'hat')": {
1: 1.2,
2: 3.5
},
"('green', 'coat')": {
1: 3.5,
2: 5.6
},
"('green', 'hat')": {
1: 6.4,
2: 1.8
}
Or if need change format like 'tuple's without '' use f-strings:
d = {f'({k[0]}, {k[1]})': v.set_index('group')['value'].to_dict()
for k, v in df.groupby(['color','object'])}
Alternative with join:
d = {f'({", ".join(k)})': v.set_index('group')['value'].to_dict()
for k, v in df.groupby(['color','object'])}
print (d)
{
'(blue, coat)': {
1: 5.4,
2: 7.1
},
'(blue, hat)': {
1: 1.2,
2: 3.5
},
'(green, coat)': {
1: 3.5,
2: 5.6
},
'(green, hat)': {
1: 6.4,
2: 1.8
}
}

Add nested dictionaries on matching keys

I have a nested dictionary, such as:
{'A1': {'T1': [1, 3.0, 3, 4.0], 'T2': [2, 2.0]}, 'A2': {'T1': [1, 0.0, 3, 5.0], 'T2': [2, 3.0]}}
What I want to do is sum each sub dictionary, to obtain this:
A1 A2 A1 A2
T1+T1 T2+T2 (ignore the first entry of the list)
[3.0, 5.0, 9.0] <<<< output
1 2 3
res 3.0 + 0.0 = 3.0 and 2.0 + 3.0 = 5.0 and 5.0 + 4.0 = 9.0
How can I do this? I've tried a for, but I've created a big mess
One way is to use collections.Counter in a list comprehension, and sum the resulting Counter objects:
from collections import Counter
d = {'A1': {'T1': 3.0, 'T2': 2.0}, 'A2': {'T1': 0.0, 'T2': 3.0}}
l = (Counter(i) for i in d.values())
sum(l, Counter())
# Counter({'T1': 3.0, 'T2': 5.0})
For sum to work here, I've defined an empty Counter() as the start argument, so sum expects other Counter objects.
To get only the values, you can do:
sum(l, Counter()).values()
# dict_values([3.0, 5.0])
you could use a list comprehension with zip:
d = {'A1': {'T1': 3.0, 'T2': 2.0}, 'A2': {'T1': 0.0, 'T2': 3.0}}
[sum(e) for e in zip(*(e.values() for e in d.values()))]
output:
[3.0, 5.0]
this will work if your python version is >= 3.6
also, you can use 2 for loops:
r = {}
for dv in d.values():
for k, v in dv.items():
r.setdefault(k, []).append(v)
result = [sum(v) for v in r.values()]
print(result)
output:
[3.0, 5.0]
after your edit
you could use:
from itertools import zip_longest
sum_t1, sum_t2 = list(list(map(sum, zip(*t))) for t in zip(*[e.values() for e in d.values()]))
[i for t in zip_longest(sum_t1[1:], sum_t2[1:]) for i in t if i is not None]
output:
[3.0, 5.0, 6, 9.0]

Pandas Data frame to desired python dictionary

I have a data frame which looks like following
Date Top
A B
2018-09-30 1.2 2.3
2018-10-01 1.5 1.7
2018-10-02 2.3 2.8
2018-10-03 7.7 7.5
2018-10-04 1.1 0.9
2018-10-05 2.1 6.5
So I have multi-index in the columns, only two columns 'Date' and 'Top' and then 'Top' has two level 1 columns 'A' and 'B'.
I am trying to convert them into python dictionary.
when I am using
df_dict = df.to_dict(orient = 'index')
I get an output
{0: {('Top', 'A'): 1.2, ('Top', 'B'): 2.3, ('date', ''): '2018-09-30'},
1: {('Top', 'A'): 1.5, ('Top', 'B'): 1.7, ('date', ''): '2018-10-01'},
2: {('Top', 'A'): 2.3, ('Top', 'B'): 2.8, ('date', ''): '2018-10-02'},
3: {('Top', 'A'): 7.7, ('Top', 'B'): 7.5, ('date', ''): '2018-10-03'},
4: {('Top', 'A'): 1.1, ('Top', 'B'): 0.9, ('date', ''): '2018-10-04'},
5: {('Top', 'A'): 2.1, ('Top', 'B'): 6.5, ('date', ''): '2018-10-05'}}
Now I can access df_dict with following script which give me an output of 1.2
df_dict[1]['Top']['Top','A']
But I am looking for output with this script
df_dict[1]['Top']
Output: A:1.2, B:2.3
since 'Top' is not a key inside the first [1] key-value pair. So that I can access all 'Top' easily for a date.
Thanks for all the help
You can use dict comprehension with filtering by first level Top:
df_dict = df.to_dict(orient = 'index')
out = {k2: v for (k1, k2), v in df_dict[0].items() if k1 == 'Top'}
print (out)
{'A': 1.2, 'B': 2.3}
Simplier is use pandas for select by index value and first level of MultiIndex and then create dict:
print (df.loc[0, 'Top'])
A 1.2
B 2.3
Name: 0, dtype: object
out = df.loc[0, 'Top'].to_dict()
print (out)
{'A': 1.2, 'B': 2.3}
EDIT:
print (df)
A B
2018-09-30 1.2 2.3
2018-10-01 1.5 1.7
2018-10-02 2.3 2.8
2018-10-03 7.7 7.5
2018-10-04 1.1 0.9
2018-10-05 2.1 6.5
df.index.name = 'date'
df = df.reset_index()
#set MultiIndex for each columns for avoid empty strings keys
df.columns = [['d','Top', 'Top'], df.columns]
#for each first level of MultiIndex create dictionary
#also add new level to outer level of dict
out = {x:df[x].to_dict(orient = 'index') for x in df.columns.levels[0]}
print (out)
{'Top': {0: {'A': 1.2, 'B': 2.3}, 1: {'A': 1.5, 'B': 1.7}, 2: {'A': 2.3, 'B': 2.8},
3: {'A': 7.7, 'B': 7.5}, 4: {'A': 1.1, 'B': 0.9}, 5: {'A': 2.1, 'B': 6.5}},
'd': {0: {'date': '2018-09-30'}, 1: {'date': '2018-10-01'},
2: {'date': '2018-10-02'}, 3: {'date': '2018-10-03'},
4: {'date': '2018-10-04'}, 5: {'date': '2018-10-05'}}}
print (out['Top'][0])
{'A': 1.2, 'B': 2.3}

Categories