dataframe to dict in python - python

I have this dataframe:
id value
0 10.2
1 5.7
2 7.4
With id being the index. I want to have such output:
{'0': 10.2, '1': 5.7, '2': 7.4}
How to do this in python?

Use to_dict on the column:
>>> df['value'].to_dict()
{0: 10.2, 1: 5.7, 2: 7.4}
If you need the keys as strings:
>>> df.set_index(df.index.astype(str))['value'].to_dict()
{'0': 10.2, '1': 5.7, '2': 7.4}

Related

How to make multiindex dataframe from a nested dictionary keys and lists of values?

I have checked the advicse here: Nested dictionary to multiindex dataframe where dictionary keys are column labels
However, I couldn't get it to work in my problem.
I would like to change a dictionary into multiindexed dataframe, where 'a','b','c' are names of multiindexes, their values 12,0.8,1.8,bla1,bla2,bla3,bla4 are multiindexes and values from lists are assign to the multiindexes as in the picture of table below.
My dictionary:
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
The result DataFrame I would like to get:
The code which don't make multiindexes and set every values under each other in next row:
df_a = pd.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
df_b = pd.DataFrame(df_a[0].values.tolist(), index=df_a.index)
Use ast.literal_eval to convert each string into a dictionary and build the index from there:
import pandas as pd
from ast import literal_eval
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
keys, data = zip(*dictionary.items())
index = pd.MultiIndex.from_frame(pd.DataFrame([literal_eval(i) for i in keys]))
res = pd.DataFrame(data=list(data), index=index)
print(res)
Output
0 1 2
a b c
12.0 0.8 bla1 200 0.0 0.0
bla2 37 44.0 0.6
1.8 bla3 100 2.0 1.0
bla4 400 3.0 1.0

How to convert dataframe into dictionary python

I have a dataframe that looks like
Total_Time_words Words
0 1.50 your
1 2.15 intention
2 2.75 is
3 3.40 dangerous
4 3.85 for
when I use this code:
new.set_index('Words').T.to_dict('records')
I get this output below:
[{'your': 1.5,
'intention': 2.15,
'is': 2.75,
'dangerous': 3.4,
'for': 3.85,
'my': 4.0,
'world': 4.3}]
But this is my expected output below:
[
{
1.50:"your"
},
{
2.15:"intention"
}
]
You can use list comprehension with zip as below:
new_dict = [{k:v} for k,v in zip(df["Total_Time_words"], df["words"])]
print(new_dict)

Pandas Data frame to desired python dictionary

I have a data frame which looks like following
Date Top
A B
2018-09-30 1.2 2.3
2018-10-01 1.5 1.7
2018-10-02 2.3 2.8
2018-10-03 7.7 7.5
2018-10-04 1.1 0.9
2018-10-05 2.1 6.5
So I have multi-index in the columns, only two columns 'Date' and 'Top' and then 'Top' has two level 1 columns 'A' and 'B'.
I am trying to convert them into python dictionary.
when I am using
df_dict = df.to_dict(orient = 'index')
I get an output
{0: {('Top', 'A'): 1.2, ('Top', 'B'): 2.3, ('date', ''): '2018-09-30'},
1: {('Top', 'A'): 1.5, ('Top', 'B'): 1.7, ('date', ''): '2018-10-01'},
2: {('Top', 'A'): 2.3, ('Top', 'B'): 2.8, ('date', ''): '2018-10-02'},
3: {('Top', 'A'): 7.7, ('Top', 'B'): 7.5, ('date', ''): '2018-10-03'},
4: {('Top', 'A'): 1.1, ('Top', 'B'): 0.9, ('date', ''): '2018-10-04'},
5: {('Top', 'A'): 2.1, ('Top', 'B'): 6.5, ('date', ''): '2018-10-05'}}
Now I can access df_dict with following script which give me an output of 1.2
df_dict[1]['Top']['Top','A']
But I am looking for output with this script
df_dict[1]['Top']
Output: A:1.2, B:2.3
since 'Top' is not a key inside the first [1] key-value pair. So that I can access all 'Top' easily for a date.
Thanks for all the help
You can use dict comprehension with filtering by first level Top:
df_dict = df.to_dict(orient = 'index')
out = {k2: v for (k1, k2), v in df_dict[0].items() if k1 == 'Top'}
print (out)
{'A': 1.2, 'B': 2.3}
Simplier is use pandas for select by index value and first level of MultiIndex and then create dict:
print (df.loc[0, 'Top'])
A 1.2
B 2.3
Name: 0, dtype: object
out = df.loc[0, 'Top'].to_dict()
print (out)
{'A': 1.2, 'B': 2.3}
EDIT:
print (df)
A B
2018-09-30 1.2 2.3
2018-10-01 1.5 1.7
2018-10-02 2.3 2.8
2018-10-03 7.7 7.5
2018-10-04 1.1 0.9
2018-10-05 2.1 6.5
df.index.name = 'date'
df = df.reset_index()
#set MultiIndex for each columns for avoid empty strings keys
df.columns = [['d','Top', 'Top'], df.columns]
#for each first level of MultiIndex create dictionary
#also add new level to outer level of dict
out = {x:df[x].to_dict(orient = 'index') for x in df.columns.levels[0]}
print (out)
{'Top': {0: {'A': 1.2, 'B': 2.3}, 1: {'A': 1.5, 'B': 1.7}, 2: {'A': 2.3, 'B': 2.8},
3: {'A': 7.7, 'B': 7.5}, 4: {'A': 1.1, 'B': 0.9}, 5: {'A': 2.1, 'B': 6.5}},
'd': {0: {'date': '2018-09-30'}, 1: {'date': '2018-10-01'},
2: {'date': '2018-10-02'}, 3: {'date': '2018-10-03'},
4: {'date': '2018-10-04'}, 5: {'date': '2018-10-05'}}}
print (out['Top'][0])
{'A': 1.2, 'B': 2.3}

Progressive value collection within a group in pandas

I have some data similar to:
#Simulate some data
d = {
"id": [1,1,1,1,1,2,2,2,2],
"action_order": [1,2,3,4,5,1,2,3,4],
"n_actions": [5,5,5,5,5,4,4,4,4],
"seed": ['1','2','3','4','5','10','11','12','13'],
"time_spent": [0.3,0.4,0.5,0.6,0.7,10.1,11.1,12.1,13.1]
}
data = pd.DataFrame(d)
I need a function that for each row will return the values from two columns (seed and time_spent) in that row AND ALL PREVIOUS ROWS within the group as a dictionary. I have attempted to use the apply function as follows but the results are not quite what I need.
data \
.groupby(["profile_id"])[["artist_seed", "tlh"]] \
.apply(lambda x: dict(zip(x["artist_seed"], x["tlh"]))) \
.tolist()
data \
.groupby("profile_id")[["artist_seed", "tlh", "action_order"]] \
.apply(lambda x: dict(zip(list(x["artist_seed"]), list(x["tlh"]))))
The new DataFrame should look like this:
id new_col
0 1 {u'1': 0.3}
1 1 {u'1': 0.3, u'2': 0.4}
2 1 {u'1': 0.3, u'3': 0.5, u'2': 0.4}
...
You can keep a running dict and just return a copy of the most recent version on each apply iteration, per group:
def wrapper(g):
cumdict = {}
return g.apply(update_cumdict, args=(cumdict,), axis=1)
def update_cumdict(row, cd):
cd[row.seed] = row.time_spent
return cd.copy()
data["new_col"] = data.groupby("id").apply(wrapper).reset_index()[0]
data.new_col
0 {'1': 0.3}
1 {'1': 0.3, '2': 0.4}
2 {'1': 0.3, '2': 0.4, '3': 0.5}
3 {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6}
4 {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6, '5': ...
5 {'10': 10.1}
6 {'10': 10.1, '11': 11.1}
7 {'10': 10.1, '11': 11.1, '12': 12.1}
8 {'10': 10.1, '11': 11.1, '12': 12.1, '13': 13.1}
Name: new_col, dtype: object
How about this.
In [15]: data.groupby(['id']).apply(lambda d: pd.Series(np.arange(len(d))).apply(lambda x: d[['seed', 'time_spent']].iloc[:x+1].to_dict()))
Out[15]:
id
1 0 {'seed': {0: '1'}, 'time_spent': {0: 0.3}}
1 {'seed': {0: '1', 1: '2'}, 'time_spent': {0: 0...
2 {'seed': {0: '1', 1: '2', 2: '3'}, 'time_spent...
3 {'seed': {0: '1', 1: '2', 2: '3', 3: '4'}, 'ti...
4 {'seed': {0: '1', 1: '2', 2: '3', 3: '4', 4: '...
2 0 {'seed': {5: '10'}, 'time_spent': {5: 10.1}}
1 {'seed': {5: '10', 6: '11'}, 'time_spent': {5:...
2 {'seed': {5: '10', 6: '11', 7: '12'}, 'time_sp...
3 {'seed': {5: '10', 6: '11', 7: '12', 8: '13'},...
dtype: object
additionally, you can modify the parameter of .to_dict() method to change the output dict style, refer to: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html
or maybe this is what you want:
In [18]: data.groupby(['id']).apply(lambda d: pd.Series(np.arange(len(d))).apply(lambda x: dict(zip(d['seed'].iloc[:x+1], d['time_spent'].iloc[:x+1]))))
Out[18]:
id
1 0 {'1': 0.3}
1 {'1': 0.3, '2': 0.4}
2 {'1': 0.3, '2': 0.4, '3': 0.5}
3 {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6}
4 {'1': 0.3, '2': 0.4, '3': 0.5, '4': 0.6, '5': ...
2 0 {'10': 10.1}
1 {'10': 10.1, '11': 11.1}
2 {'10': 10.1, '11': 11.1, '12': 12.1}
3 {'10': 10.1, '11': 11.1, '12': 12.1, '13': 13.1}
dtype: object

csv row import into python array

I have csv file in the following format
a b c d
1 12.0 3.5 4.3 5.9
2 13.0 5.7 2.8 5.2
3 14.0 6.4 9.7 2.3
4 15.0 6.8 4.7 3.4
I want to export rows into a python array of arrays. Here is the pseudocode:
a = read csv
b[][] = a float 2d array that is 1x4
import rows into b
the output of b should be:
[[12.0,3.5,4.3,5.9],[13.0,5.7,2.8,5.2],[14.0,6.4,9.7,2.3],[15.0,6.8,4.7,3.4]]
how would I do this? Please let me know if you need any other clarification. Thank you.
Problems:
all rows are NOT of same size. some rows have 10 elements and others may have 7 or 8 or 9.
This is what I have:
import csv
def main():
a = range(4)
x = 0
with open('test.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
a[x] = row
x += 1
print a
Output:
[['13,4.2,2.4,5,6.4'], ['14,3.2,3.4,5.6,7.2'], ['15,8.5,3.7,8.5,0.75'], ['16,5.4,8.3,3.5,5.4']]
How do I make the arrays turn from string into floats?
Using module csv.DictReader to skip empty lines and get a list of dictionaries:
In [131]: import csv
...: with open('a.csv') as f:
...: lst=list(csv.DictReader(f))
In [132]: lst
Out[132]:
[{'a': '12.0', 'b': '3.5', 'c': '4.3', 'd': '5.9'},
{'a': '13.0', 'b': '5.7', 'c': '2.8', 'd': '5.2'},
{'a': '14.0', 'b': '6.4', 'c': '9.7', 'd': '2.3'},
{'a': '15.0', 'b': '6.8', 'c': '4.7', 'd': '3.4'}]
In [134]: [{k:float(d[k]) for k in d} for d in lst] #convert values to floats
Out[134]:
[{'a': 12.0, 'b': 3.5, 'c': 4.3, 'd': 5.9},
{'a': 13.0, 'b': 5.7, 'c': 2.8, 'd': 5.2},
{'a': 14.0, 'b': 6.4, 'c': 9.7, 'd': 2.3},
{'a': 15.0, 'b': 6.8, 'c': 4.7, 'd': 3.4}]
EDIT:
to get a list of list:
In [143]: with open('a.csv') as f:
...: cr=csv.reader(f)
...: skip=next(cr) #skip the first row of keys "a,b,c,d"
...: print [map(float, l) for l in cr]
...:
[[12.0, 3.5, 4.3, 5.9], [13.0, 5.7, 2.8, 5.2], [14.0, 6.4, 9.7, 2.3], [15.0, 6.8, 4.7, 3.4]]

Categories