Use slider to plot dataframe columns as scatter plots

Use slider to plot dataframe columns as scatter plots - python

I'm trying to use a matplotlib slider to iterate through a list of dataframes and plot the columns as scatter plots.
As the slider progresses, I'd like it to plot the first column of the first dataframe as the x-axis, then, keeping the first column as the x-axis, progress through by individually plotting the remaining columns as the y-axes.
After completing the first dataframe, I'd like it to move on to the next one in the list of dataframes and follow the same general operation.
I am having trouble creating my update function to properly reach into the nested dataframes.
Here is my code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.widgets import Slider
data1 = {'col1': ['a1', 'b1', 'c1', 'd1', 'e1'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data2 = {'col1': ['a2', 'b2', 'c2', 'd2', 'e2'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data3 = {'col1': ['a3', 'b3', 'c3', 'd3', 'e3'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
data4 = {'col1': ['a4', 'b4', 'c4', 'd4', 'e4'],
'col2': [0.0 ,0.5, 1.0, 1.5, 2.0],
'col3': [2.5, 3.0, 3.5, 4.0, 4.5],
'col4': [5.0, 5.5, 6.0, 6.5, 7.0]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df3 = pd.DataFrame(data3)
df4 = pd.DataFrame(data4)
dataframes = [df1, df2, df3, df4]
fig, ax = plt.subplots(figsize=(9, 6))
fig.subplots_adjust(bottom=0.25)
scat = ax.scatter(dataframes[0]['col1'], dataframes[0]['col2'])
ax_pos = fig.add_axes([0.25, 0.1, 0.65, 0.03])
slider = Slider(
ax=ax_pos,
label='Slider',
valmin=1,
valmax=3,
valinit=1,
valstep=1
)
def update(val):
# I have no idea what to do here
columns = df1.columns
columns = columns[1:]
for x in columns:
slider.val = x
col = slider.val
for idx, df in enumerate(dataframes):
ax.scatter(dataframes[idx]['col1'], dataframes[idx][col])
fig.canvas.draw_idle()
slider.on_changed(update)
plt.show()
Running the code as-is generates the following plot, which is something, but the first adjustment of the slider plots everything (and all on the same graph):
I think my issue stems from a very nascent understanding of the relationship between the original scatter function (scat) and its counterpart inside the "update" function.
How can I create an update function that allows me to iterate through a list of nested dataframes?

Related

How to stack only selected columns in pandas barh plot

I am trying to plot a bar chart where I would like to have two bars, one stacked and another one not stacked by the side of the stacked one.
I have the first plot which is a stacked plot:
And another plot, with the same lines and columns:
I want to plot it side by side to the columns of the last plot, and not stack it:
This is a code snippet to replicate my problem:
d = pd.DataFrame({'DC': {'col0': 257334.0,
'col1': 0.0,
'col2': 0.0,
'col3': 186146.0,
'col4': 0.0,
'col5': 366431.0,
'col6': 461.0,
'col7': 0.0,
'col8': 0.0},
'DC - IDC': {'col0': 32665.0,
'col1': 0.0,
'col2': 156598.0,
'col3': 0.0,
'col4': 176170.0,
'col5': 0.0,
'col6': 0.0,
'col7': 0.0,
'col8': 0.0},
'No Address': {'col0': 292442.0,
'col1': 227.0,
'col2': 298513.0,
'col3': 117167.0,
'col4': 249.0,
'col5': 747753.0,
'col6': 271976.0,
'col7': 9640.0,
'col8': 211410.0}})
d[['DC', 'DC - IDC']].plot.barh(stacked=True)
d[['No Address']].plot.barh( stacked=False, color='red')

Use position parameter to draw 2 columns on the same index:
fig, ax = plt.subplots()
d[['DC', 'DC - IDC']].plot.barh(width=0.4, position=0, stacked=True, ax=ax)
d[['No Address']].plot.barh(width=0.4, position=1, stacked=True, ax=ax, color='red')
plt.show()

You can achieve this only by using matplotlib.pyplot library. First, you need to import NumPy and matplotlib libraries.
import matplotlib.pyplot as plt
import numpy as np
Then,
plt.figure(figsize=(15,8))
plt.barh(d.index, d['DC'], 0.4, label='DC', align='edge')
plt.barh(d.index, d['DC - IDC'], 0.4, label='DC - IDC', align='edge')
plt.barh(np.arange(len(d.index))-0.4, d['No Address'], 0.4, color='red', label='No Address', align='edge')
plt.legend();
Here is what I did:
Increase the figure size (optional)
Create a BarContainer for each column
Decrease the width of each bar to 0.4 to make them fit
Align the left edges of the bars with the y positions
Normally all bars now are stacked. To put the red bars to the side you need to subtract each y coordinate by the width of the bars (0.4) np.arange(len(d.index))-0.4
Finally, add a legend
It should look like that:

bar plot with vertical lines for each bar

%matplotlib inline
import matplotlib
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'index' : ['A', 'B', 'C', 'D'], 'first': [1.2, 1.23, 1.32, 1.08], 'second': [2, 2.2, 3, 1.08], 'max': [1.5, 3, 0.9, 'NaN']}).set_index('index')
I want to plot a horizontal bar chart with first and second as bars.
I want to use the max column for displaying a vertical line at the corresponding values if the other columns.
I only managed the bar plot as for now.
Like this:
Any hints on how to achieve this?
thx

I have replaced the NaN with some finite value and then you can use the following code
df = pd.DataFrame({'index' : ['A', 'B', 'C', 'D'], 'first': [1.2, 1.23, 1.32, 1.08],
'second': [2, 2.2, 3, 1.08], 'max': [1.5, 3, 0.9, 2.5]}).set_index('index')
plt.barh(range(4), df['first'], height=-0.25, align='edge')
plt.barh(range(4), df['second'], height=0.25, align='edge', color='red')
plt.yticks(range(4), df.index);
for i, val in enumerate(df['max']):
plt.vlines(val, i-0.25, i+0.25, color='limegreen')

python access second element of list

When I print my list I get something like this
[[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
I want to extract first and second elements from above list into separate lists so that I can ask the plt to plot it for me.
So my results should be
[6.0,6.1,6.2 ... 6.8] and [0.5,1.0,1.5,2.0 , ... .4.5]
I want to know if we have a cleaner solution than to
for sublist in l:
i=0
for item in sublist:
flat_list.append(item)
break #get first element of each

You can try list indexing:
data = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
d1 = [item[0] for item in data]
print d1
d2 = [item[1] for item in data]
print d2
output :
[6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8]
[0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]

zip() will provide the required output.
xy = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
x,y = zip(*xy)
print(x)
print(y)
Output:
(6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8)
(0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5)
zip() aggregates the elements from all the iterable. zip(x,y) would provide the list you currently have. zip() with * can be used to unzip a list.
Also, there is no need to convert the tuples to list since pyplot.plot() takes an array-like parameter.
import matplotlib.pyplot as plt
plt.plot(x,y)
plt.show()

I would recommend using numpy arrays. For example:
import matplotlib.pyplot as plt
import numpy as np
a= np.array([[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]])
plt.plot(a[:,0], a[:,1])
plt.show()
Output:

Here a try with zip, zip() will makes iterator that aggregates elements based on the iterables passed, and returns an iterator of tuples, so map() function is used to make the tuples to list :
l = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
a,b = map(list,zip(*l))
print(a,b)
O/P will be like :
[6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8] [0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]

One-liner using zip built-in and unpacking
>>> original = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
>>> left, right = zip(*original)
>>> left
(6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8)
>>> right
(0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5)
if you are embarassed that results are tuples we can turn them into lists simply using map built-in:
>>> left, right = map(list, zip(*original))
>>> left
[6.0, 6.1, 6.2, 6.3, 6.4, 6.5, 6.6, 6.7, 6.8]
>>> right
[0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.5]

Lots of pure Python approaches here. But given that your goal is to plot the separated values, I think there's a case to be made here for the simplicity of Pandas - just drop the list as-is into a data frame and plot():
import pandas as pd
pd.DataFrame(data).plot(x=0, y=1)

l = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
a,b=list(zip(*l))
print('first elements:',a)
print('second elements:',a)
To plot:
import matplotlib.pyplot as plt
l = [[6.0, 0.5], [6.1, 1.0], [6.2, 1.5], [6.3, 2.0], [6.4, 2.5], [6.5, 3.0], [6.6, 3.5], [6.7, 4.0], [6.8, 4.5]]
a,b=list(zip(*l))
print('first elements:',a)
print('second elements:',a)
plt.plot(a,b)
plt.show()

Converting dataframe into sub-list or dictionaries

I have the data in tabular format (rows and columns) which I read into a dataframe (Data1) :
Name D Score
0 Angelica D1 3.5
1 Angelica D2 2.0
2 Bill D1 2.0
3 Chan D3 1.0
......
I am able to convert it into a list using:
Data2 = Data1.values.tolist()
and get the below output:
[
['Angelica', 'D1', 3.5], ['Angelica', 'D2', 2.0],
['Bill', 'D1', 2.0], ['Bill', 'D2', 3.5],
['Chan', 'D8', 1.0], ['Chan', 'D3', 3.0], ['Chan', 'D4', 5.0],
['Dan', 'D4', 3.0], ['Dan', 'D5', 4.5], ['Dan', 'D6', 4.0]
]
What I want is, the output to be like this:
{
'Angelica': {'D1': 3.5, 'D2': 2.0} ,
'Bill': {'D1': 2.0, 'D2': 3.5}
'Chan': {'D8': 1.0, 'D3': 3.0, 'D4': 5.0 }
'Dan': {'D4': 3.0, 'D5': 4.5, 'D6': 4.0}
}
How can I achieve this in Python?

You can use a dictionary comprehension after grouping the df by the Name column:
>>> df = pd.DataFrame([{'Name': 'Angela', 'Score': 3.5, 'D': 'D1'}, {'Name': 'Angela', 'Score': 2.0, 'D': 'D2'}, {'Name': 'Bill', 'Score': 2.0, 'D': 'D1'}, {'Name': 'Chan', 'Score': 1.0, 'D': 'D3'}])
>>> df
D Name Score
0 D1 Angela 3.5
1 D2 Angela 2.0
2 D1 Bill 2.0
3 D3 Chan 1.0
>>> data2 = {name: {df.ix[v].D: df.ix[v].Score for v in val} for name, val in df.groupby('Name').groups.items()}
>>> data2
{'Chan': {'D3': 1.0}, 'Angela': {'D1': 3.5, 'D2': 2.0}, 'Bill': {'D1': 2.0}}

You can zip up the values from each group after grouping by Name:
In [4]: l = [
...: ['Angelica', 'D1', 3.5], ['Angelica', 'D2', 2.0],
...: ['Bill', 'D1', 2.0], ['Bill', 'D2', 3.5],
...: ['Chan', 'D8', 1.0], ['Chan', 'D3', 3.0], ['Chan', 'D4', 5.0],
...: ['Dan', 'D4', 3.0], ['Dan', 'D5', 4.5], ['Dan', 'D6', 4.0]
...: ]
...: columns=["Name", "D", "Score"]
...: df = pd.DataFrame(l, columns=columns)
...:
In [5]: {name: dict(zip(v["D"], v["Score"])) for name, v in df.groupby("Name")}
In [6]: data
Out[6]:
{'Angelica': {'D1': 3.5, 'D2': 2.0},
'Bill': {'D1': 2.0, 'D2': 3.5},
'Chan': {'D3': 3.0, 'D4': 5.0, 'D8': 1.0},
'Dan': {'D4': 3.0, 'D5': 4.5, 'D6': 4.0}}

from collections import defaultdict
result = defaultdict(dict)
for item in Data2:
result[item[0]].update(dict([item[1:]]))

Append array to beginning of another array

I'm attempting to perform a simple task: append an array to the beginning of another array. Here a MWE of what I mean:
a = ['a','b','c','d','e','f','g','h','i']
b = [6,4,1.,2,8,784.,43,6.,2]
c = [8,4.,32.,6,1,7,2.,9,23]
# Define arrays.
a_arr = np.array(a)
bc_arr = np.array([b, c])
# Append a_arr to beginning of bc_arr
print np.concatenate((a_arr, bc_arr), axis=1)
but I keep getting a ValueError: all the input arrays must have same number of dimensions error.
The arrays a_arr and bc_arr come like that from a different process so I can't manipulate the way they are created (ie: I can't use the a,b,c lists).
How can I generate a new array of a_arr and bc_arr so that it will look like:
array(['a','b','c','d','e','f','g','h','i'], [6,4,1.,2,8,784.,43,6.,2], [8,4.,32.,6,1,7,2.,9,23])

Can you do something like.
In [88]: a = ['a','b','c','d','e','f','g','h','i']
In [89]: b = [6,4,1.,2,8,784.,43,6.,2]
In [90]: c = [8,4.,32.,6,1,7,2.,9,23]
In [91]: joined_arr=np.array([a_arr,b_arr,c_arr],dtype=object)
In [92]: joined_arr
Out[92]:
array([['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i'],
[6.0, 4.0, 1.0, 2.0, 8.0, 784.0, 43.0, 6.0, 2.0],
[8.0, 4.0, 32.0, 6.0, 1.0, 7.0, 2.0, 9.0, 23.0]], dtype=object)

this should work
In [84]: a=np.atleast_2d(a).astype('object')
In [85]: b=np.atleast_2d(b).astype('object')
In [86]: c=np.atleast_2d(c).astype('object')
In [87]: np.vstack((a,b,c))
Out[87]:
array([[a, b, c, d, e, f, g, h, i],
[6.0, 4.0, 1.0, 2.0, 8.0, 784.0, 43.0, 6.0, 2.0],
[8.0, 4.0, 32.0, 6.0, 1.0, 7.0, 2.0, 9.0, 23.0]], dtype=object)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Use slider to plot dataframe columns as scatter plots - python

Related

How to stack only selected columns in pandas barh plot

bar plot with vertical lines for each bar

python access second element of list

Converting dataframe into sub-list or dictionaries

Append array to beginning of another array

Categories

Resources