Converting dataframe into sub-list or dictionaries - python

I have the data in tabular format (rows and columns) which I read into a dataframe (Data1) :
Name D Score
0 Angelica D1 3.5
1 Angelica D2 2.0
2 Bill D1 2.0
3 Chan D3 1.0
......
I am able to convert it into a list using:
Data2 = Data1.values.tolist()
and get the below output:
[
['Angelica', 'D1', 3.5], ['Angelica', 'D2', 2.0],
['Bill', 'D1', 2.0], ['Bill', 'D2', 3.5],
['Chan', 'D8', 1.0], ['Chan', 'D3', 3.0], ['Chan', 'D4', 5.0],
['Dan', 'D4', 3.0], ['Dan', 'D5', 4.5], ['Dan', 'D6', 4.0]
]
What I want is, the output to be like this:
{
'Angelica': {'D1': 3.5, 'D2': 2.0} ,
'Bill': {'D1': 2.0, 'D2': 3.5}
'Chan': {'D8': 1.0, 'D3': 3.0, 'D4': 5.0 }
'Dan': {'D4': 3.0, 'D5': 4.5, 'D6': 4.0}
}
How can I achieve this in Python?

You can use a dictionary comprehension after grouping the df by the Name column:
>>> df = pd.DataFrame([{'Name': 'Angela', 'Score': 3.5, 'D': 'D1'}, {'Name': 'Angela', 'Score': 2.0, 'D': 'D2'}, {'Name': 'Bill', 'Score': 2.0, 'D': 'D1'}, {'Name': 'Chan', 'Score': 1.0, 'D': 'D3'}])
>>> df
D Name Score
0 D1 Angela 3.5
1 D2 Angela 2.0
2 D1 Bill 2.0
3 D3 Chan 1.0
>>> data2 = {name: {df.ix[v].D: df.ix[v].Score for v in val} for name, val in df.groupby('Name').groups.items()}
>>> data2
{'Chan': {'D3': 1.0}, 'Angela': {'D1': 3.5, 'D2': 2.0}, 'Bill': {'D1': 2.0}}

You can zip up the values from each group after grouping by Name:
In [4]: l = [
...: ['Angelica', 'D1', 3.5], ['Angelica', 'D2', 2.0],
...: ['Bill', 'D1', 2.0], ['Bill', 'D2', 3.5],
...: ['Chan', 'D8', 1.0], ['Chan', 'D3', 3.0], ['Chan', 'D4', 5.0],
...: ['Dan', 'D4', 3.0], ['Dan', 'D5', 4.5], ['Dan', 'D6', 4.0]
...: ]
...: columns=["Name", "D", "Score"]
...: df = pd.DataFrame(l, columns=columns)
...:
In [5]: {name: dict(zip(v["D"], v["Score"])) for name, v in df.groupby("Name")}
In [6]: data
Out[6]:
{'Angelica': {'D1': 3.5, 'D2': 2.0},
'Bill': {'D1': 2.0, 'D2': 3.5},
'Chan': {'D3': 3.0, 'D4': 5.0, 'D8': 1.0},
'Dan': {'D4': 3.0, 'D5': 4.5, 'D6': 4.0}}

from collections import defaultdict
result = defaultdict(dict)
for item in Data2:
result[item[0]].update(dict([item[1:]]))

Related

Use slider to plot dataframe columns as scatter plots

I'm trying to use a matplotlib slider to iterate through a list of dataframes and plot the columns as scatter plots.
As the slider progresses, I'd like it to plot the first column of the first dataframe as the x-axis, then, keeping the first column as the x-axis, progress through by individually plotting the remaining columns as the y-axes.
After completing the first dataframe, I'd like it to move on to the next one in the list of dataframes and follow the same general operation.
I am having trouble creating my update function to properly reach into the nested dataframes.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider
data1 = {'col1': ['a1', 'b1', 'c1', 'd1', 'e1'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data2 = {'col1': ['a2', 'b2', 'c2', 'd2', 'e2'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data3 = {'col1': ['a3', 'b3', 'c3', 'd3', 'e3'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data4 = {'col1': ['a4', 'b4', 'c4', 'd4', 'e4'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
df4 = pd.DataFrame(data4)
dataframes = [df1, df2, df3, df4]
fig, ax = plt.subplots(figsize=(9, 6))
fig.subplots_adjust(bottom=0.25)
scat = ax.scatter(dataframes[0]['col1'], dataframes[0]['col2'])
ax_pos = fig.add_axes([0.25, 0.1, 0.65, 0.03])
slider = Slider(
ax=ax_pos,
label='Slider',
valmin=1,
valmax=3,
valinit=1,
valstep=1
)
def update(val):
# I have no idea what to do here
columns = df1.columns
columns = columns[1:]
for x in columns:
slider.val = x
col = slider.val
for idx, df in enumerate(dataframes):
ax.scatter(dataframes[idx]['col1'], dataframes[idx][col])
fig.canvas.draw_idle()
slider.on_changed(update)
plt.show()
Running the code as-is generates the following plot, which is something, but the first adjustment of the slider plots everything (and all on the same graph):
I think my issue stems from a very nascent understanding of the relationship between the original scatter function (scat) and its counterpart inside the "update" function.
How can I create an update function that allows me to iterate through a list of nested dataframes?

How to make multiindex dataframe from a nested dictionary keys and lists of values?

I have checked the advicse here: Nested dictionary to multiindex dataframe where dictionary keys are column labels
However, I couldn't get it to work in my problem.
I would like to change a dictionary into multiindexed dataframe, where 'a','b','c' are names of multiindexes, their values 12,0.8,1.8,bla1,bla2,bla3,bla4 are multiindexes and values from lists are assign to the multiindexes as in the picture of table below.
My dictionary:
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
The result DataFrame I would like to get:
The code which don't make multiindexes and set every values under each other in next row:
df_a = pd.DataFrame.from_dict(dictionary, orient="index").stack().to_frame()
df_b = pd.DataFrame(df_a[0].values.tolist(), index=df_a.index)
Use ast.literal_eval to convert each string into a dictionary and build the index from there:
import pandas as pd
from ast import literal_eval
dictionary ={
"{'a': 12.0, 'b': 0.8, 'c': ' bla1'}": [200, 0.0, '0.0'],
"{'a': 12.0, 'b': 0.8, 'c': ' bla2'}": [37, 44, '0.6'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla3'}": [100, 2.0, '1.0'],
"{'a': 12.0, 'b': 1.8, 'c': ' bla4'}": [400, 3.0, '1.0']
}
keys, data = zip(*dictionary.items())
index = pd.MultiIndex.from_frame(pd.DataFrame([literal_eval(i) for i in keys]))
res = pd.DataFrame(data=list(data), index=index)
print(res)
Output
0 1 2
a b c
12.0 0.8 bla1 200 0.0 0.0
bla2 37 44.0 0.6
1.8 bla3 100 2.0 1.0
bla4 400 3.0 1.0

I want to replace dictionary's value

I want to replace dictionary's value.I have a dictionary whose variable's name is dct like
dct={'A': {'a1': [[10.0, 5.0], [7.0, 7.0], [1.0, 5.0], [20.0, 30.0]],
'a2': [[50.0, 50.0], [55.0, 60.0]],
'a3': [[40.0, 100.0], [100.0, 200.0], [100.0, 140.0], [200.0, 190.0]],
'a4': [[50.0, 70.0], [140.0, 130.0], [160.0, 150.0], [200.0, 180.0]],
'a5': [[100.0, 110.0], [180.0, 210.0], [60.0, 50.0], [200.0, 190.0]] }}
If dictionary's child value like [[10.0, 5.0], [7.0, 7.0], [1.0, 5.0], [20.0, 30.0]] or [[50.0, 50.0], [55.0, 60.0]] can be divided 4,I want to replace 5 instead of the child value.If dictionary's child value can be divided 2,I want to replace 4 instead of the child value.
So, I wrote the codes,
for ky, vl in dct.items():
for k,v in vl.items():
if len(v) %4 == 0:
element[ky] = 5
elif len(v) %2 == 0:
element[ky] = 4
else:
continue
print(element)
But print(element) shows {‘A’: {‘a5’: 5}} so it has only last value.I really cannot understand why such a thing happens.How can I fix this?What is wrong in my codes?
Actually your code is incorrect to perform that given task, here's the correct code to solve your query like whatever you wanted to implement.
Check this below code it works fine and replaces child values by 5 when each child value is divisible by 4 and replaces values by 4 when each child value is divisible by 2
dct = {'A': {'a1': [[10.0, 5.0], [7.0, 7.0], [1.0, 5.0], [20.0, 30.0]],
'a2': [[50.0, 50.0], [55.0, 60.0]],
'a3': [[40.0, 100.0], [100.0, 200.0], [100.0, 140.0], [200.0, 190.0]],
'a4': [[50.0, 70.0], [140.0, 130.0], [160.0, 150.0], [200.0, 180.0]],
'a5': [[100.0, 110.0], [180.0, 210.0], [60.0, 50.0], [200.0, 190.0]] }}
print (dct)
for k,v in dct.items():
for ky,vl in v.items():
for each_elem in (range(0,len(vl))):
if vl[each_elem][0] % 4 == 0:
vl[each_elem][0] = 5
else:
if vl[each_elem][0] % 2 == 0:
vl[each_elem][0] = 4
if vl[each_elem][1] % 4 == 0:
vl[each_elem][1] = 5
else:
if vl[each_elem][1] % 2 == 0:
vl[each_elem][1] = 4
print ("\n")
print (dct)
that gives this output below
{'A': {'a1': [[10.0, 5.0], [7.0, 7.0], [1.0, 5.0], [20.0, 30.0]], 'a3': [[40.0, 100.0], [100.0, 200.0], [100.0, 140.0], [200.0, 190.0]], 'a2': [[50.0, 50.0], [55.0, 60.0]], 'a5': [[100.0, 110.0], [180.0, 210.0], [60.0, 50.0], [200.0, 190.0]], 'a4': [[50.0, 70.0], [140.0, 130.0], [160.0, 150.0], [200.0, 180.0]]}}
{'A': {'a1': [[4, 5.0], [7.0, 7.0], [1.0, 5.0], [5, 4]], 'a3': [[5, 5], [5, 5], [5, 5], [5, 4]], 'a2': [[4, 4], [55.0, 5]], 'a5': [[5, 4], [5, 4], [5, 4], [5, 4]], 'a4': [[4, 4], [5, 4], [5, 4], [5, 5]]}}
Hope this answer work great for you. Have a good time ahead :)
The problem is you are inserting main dict key's into new dict , But in origional dict there are two dict , so you have to maintain a sub or nested dict and then at last you can insert that nested dict to main dict:
Try this code :
dct={'A': {'a1': [[10.0, 5.0], [7.0, 7.0], [1.0, 5.0], [20.0, 30.0]], 'a2': [[50.0, 50.0], [55.0, 60.0]], 'a3': [[40.0, 100.0], [100.0, 200.0], [100.0, 140.0], [200.0, 190.0]], 'a4': [[50.0, 70.0], [140.0, 130.0], [160.0, 150.0], [200.0, 180.0]], 'a5': [[100.0, 110.0], [180.0, 210.0], [60.0, 50.0], [200.0, 190.0]] }}
element={}
for ky, vl in dct.items():
sub_dict={}
for k, v in vl.items():
if len(v) % 4 == 0:
sub_dict[k] = 5
elif len(v) % 2 == 0:
sub_dict[k] = 4
else:
continue
element[ky]=sub_dict
print(element)
output:
{'A': {'a1': 5, 'a2': 4, 'a3': 5, 'a5': 5, 'a4': 5}}

Insert list and slice in single dict comprehension

I'm trying to build a dict comprehension that does an insert and takes a slice.
Does anybody know how to do this, or even is this is possible at all?
I'm trying to get the same output in cprd with a dict comprehension, as in newd with a for loop.
Code (Python 3.6.1)
# Initializations
hline = "-"*80
h = ['H1', 'H2', 'H3', 'H4']
d = {'A': [['Y1', 'Y2', 'Y3', 'Y4'], [-3.4, 15.9, 'NA', 6.0], [-3.4, 4.2, -7.4, 6.3], [22.7, 7.4, 2.8, 'NA']], 'B': [['Y1', 'Y2', 'Y3', 'Y4'], [-45.8, -10.7, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']], 'C': [['Y1', 'Y2', 'Y3', 'Y4'], [-10.5, 32.8, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']]}
print(f"h = {h}")
print(f"d = {d}")
print(hline)
# Without dict/list comprehension
newd = {}
for key,value in d.items():
value.insert(1,h)
newd[key] = value[1:]
print(f"newd = {newd}")
print(hline)
# Re-Initializations
d = {'A': [['Y1', 'Y2', 'Y3', 'Y4'], [-3.4, 15.9, 'NA', 6.0], [-3.4, 4.2, -7.4, 6.3], [22.7, 7.4, 2.8, 'NA']], 'B': [['Y1', 'Y2', 'Y3', 'Y4'], [-45.8, -10.7, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']], 'C': [['Y1', 'Y2', 'Y3', 'Y4'], [-10.5, 32.8, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']]}
# Tryout with dict comprehension
cprd = {key:value[1:] for key,value in d.items()}
print(f"cprd = {cprd}")
print(hline)
Output
h = ['H1', 'H2', 'H3', 'H4']
d = {'A': [['Y1', 'Y2', 'Y3', 'Y4'], [-3.4, 15.9, 'NA', 6.0], [-3.4, 4.2, -7.4, 6.3], [22.7, 7.4, 2.8, 'NA']], 'B': [['Y1', 'Y2', 'Y3', 'Y4'], [-45.8, -10.7, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']], 'C': [['Y1', 'Y2', 'Y3', 'Y4'], [-10.5, 32.8, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']]}
--------------------------------------------------------------------------------
newd = {'A': [['H1', 'H2', 'H3', 'H4'], [-3.4, 15.9, 'NA', 6.0], [-3.4, 4.2, -7.4, 6.3], [22.7, 7.4, 2.8, 'NA']], 'B': [['H1', 'H2', 'H3', 'H4'], [-45.8, -10.7, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']], 'C': [['H1', 'H2', 'H3', 'H4'], [-10.5, 32.8, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']]}
--------------------------------------------------------------------------------
cprd = {'A': [[-3.4, 15.9, 'NA', 6.0], [-3.4, 4.2, -7.4, 6.3], [22.7, 7.4, 2.8, 'NA']], 'B': [[-45.8, -10.7, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']], 'C': [[-10.5, 32.8, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']]}
--------------------------------------------------------------------------------
You can use list concatenation to create the desired values:
{key:[h]+value[1:] for key,value in d.items()}
# {'A': [['H1', 'H2', 'H3', 'H4'], [-3.4, 15.9, 'NA', 6.0], [-3.4, 4.2, -7.4, 6.3], [22.7, 7.4, 2.8, 'NA']], 'B': [['H1', 'H2', 'H3', 'H4'], [-45.8, -10.7, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']], 'C': [['H1', 'H2', 'H3', 'H4'], [-10.5, 32.8, 'NA', 'NA'], [5.4, 12.7, 19.2, 20.3], [22.7, 7.4, 2.8, 'NA']]}
Note that:
it returns the exact same data as newd
it does not mutate d
In your example, d was changed after having defined newd. Is it a bug or a feature? :)

csv row import into python array

I have csv file in the following format
a b c d
1 12.0 3.5 4.3 5.9
2 13.0 5.7 2.8 5.2
3 14.0 6.4 9.7 2.3
4 15.0 6.8 4.7 3.4
I want to export rows into a python array of arrays. Here is the pseudocode:
a = read csv
b[][] = a float 2d array that is 1x4
import rows into b
the output of b should be:
[[12.0,3.5,4.3,5.9],[13.0,5.7,2.8,5.2],[14.0,6.4,9.7,2.3],[15.0,6.8,4.7,3.4]]
how would I do this? Please let me know if you need any other clarification. Thank you.
Problems:
all rows are NOT of same size. some rows have 10 elements and others may have 7 or 8 or 9.
This is what I have:
import csv
def main():
a = range(4)
x = 0
with open('test.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
for row in spamreader:
a[x] = row
x += 1
print a
Output:
[['13,4.2,2.4,5,6.4'], ['14,3.2,3.4,5.6,7.2'], ['15,8.5,3.7,8.5,0.75'], ['16,5.4,8.3,3.5,5.4']]
How do I make the arrays turn from string into floats?
Using module csv.DictReader to skip empty lines and get a list of dictionaries:
In [131]: import csv
...: with open('a.csv') as f:
...: lst=list(csv.DictReader(f))
In [132]: lst
Out[132]:
[{'a': '12.0', 'b': '3.5', 'c': '4.3', 'd': '5.9'},
{'a': '13.0', 'b': '5.7', 'c': '2.8', 'd': '5.2'},
{'a': '14.0', 'b': '6.4', 'c': '9.7', 'd': '2.3'},
{'a': '15.0', 'b': '6.8', 'c': '4.7', 'd': '3.4'}]
In [134]: [{k:float(d[k]) for k in d} for d in lst] #convert values to floats
Out[134]:
[{'a': 12.0, 'b': 3.5, 'c': 4.3, 'd': 5.9},
{'a': 13.0, 'b': 5.7, 'c': 2.8, 'd': 5.2},
{'a': 14.0, 'b': 6.4, 'c': 9.7, 'd': 2.3},
{'a': 15.0, 'b': 6.8, 'c': 4.7, 'd': 3.4}]
EDIT:
to get a list of list:
In [143]: with open('a.csv') as f:
...: cr=csv.reader(f)
...: skip=next(cr) #skip the first row of keys "a,b,c,d"
...: print [map(float, l) for l in cr]
...:
[[12.0, 3.5, 4.3, 5.9], [13.0, 5.7, 2.8, 5.2], [14.0, 6.4, 9.7, 2.3], [15.0, 6.8, 4.7, 3.4]]

Categories