Multiple graphs instead one using Matplotlib - python

The code below takes a dataframe filters by a string in a column and then plot the values of another column
I plot the values of the using histogram and than worked fine until I added Mean, Median and standard deviation but now I am just getting an empty graph where instead the all of the variables mentioned below should be plotted in one graph together with their labels
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import pyplot as plt
from matplotlib import pyplot as plt
import numpy as np
df = pd.read_csv(r'C:/Users/output.csv', delimiter=";", encoding='unicode_escape')
df['Plot_column'] = df['Plot_column'].str.split(',').str[0]
df['Plot_column'] = df['Plot_column'].astype('int64', copy=False)
X=df[df['goal_colum']=='start running']['Plot_column'].values
dev_x= X
mean_=np.mean(dev_x)
median_=np.median(dev_x)
standard_=np.std(dev_x)
plt.hist(dev_x, bins=5)
plt.plot(mean_, label='Mean')
plt.plot(median_, label='Median')
plt.plot(standard_, label='Std Deviation')
plt.title('Data')

https://matplotlib.org/3.1.1/gallery/statistics/histogram_features.html
There are two major ways to plot in matplotlib, pyplot (the easy way) and ax (the hard way). Ax lets you customize your plot more and you should work to move towards that. Try something like the following
num_bins = 50
fig, ax = plt.subplots()
# the histogram of the data
n, bins, patches = ax.hist(dev_x, num_bins, density=1)
ax.plot(np.mean(dev_x))
ax.plot(np.median(dev_x))
ax.plot(np.std(dev_x))
# Tweak spacing to prevent clipping of ylabel
fig.tight_layout()
plt.show()

Related

Matplotlib: orthographic projection of 3D data (in 2D plot)

I'm trying to plot 3D data in 2D using orthographic projection. Here is partially what I'm looking for:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(figsize=(10,10),facecolor='white')
axs = [fig.add_subplot(223)]
axs.append(fig.add_subplot(224))#,sharey=axs[0]))
axs.append(fig.add_subplot(221))#,sharex=axs[0]))
rng = np.random.default_rng(12345)
values = rng.random((100,3))-.5
values[:,1] = 1.6*values[:,1]
values[:,2] = .5*values[:,2]
for ax,axis in zip(axs,['y','x','z']):
axis1,axis2={'x':(1,2),'y':(0,2),'z':(0,1)}[axis]
ax.add_patch(plt.Circle([0,0], radius=.2, color='pink',zorder=-20))
ax.scatter(values[:,axis1],values[:,axis2])
axs[0].set_xlabel('x')
axs[2].set_ylabel('y')
axs[1].set_xlabel('y')
axs[0].set_ylabel('z')
fig.subplots_adjust(.08,.06,.99,.99,0,0)
plt.show()
There are some issues with this plot and the fixes I tried: I would need 'equal' aspect so that the circles are actually circle. I would also need the circles to be of the same size in each subplot. Finally, I would like the space to be optimized (i.e. with as little white space inside and between the subplots as possible).
I have tried sharing the axis between the subplots, then doing .axis('scaled') or .set_aspect('equal','box',share=True) for each axes, but the axis end up not being properly shared, and the circle in each subplot end up of different sizes. And while it crops the subplots to the data, it leaves a lot of space between the subplots. .axis('equal') or .set_aspect('equal','datalim',share=True) without axis shared leaves white space inside the subplots, and with shared axis, it leaves out some data.
Any way to make it work? And it would be perfect if it can work on matplotlib 3.4.3.
You can use a common xlim, ylim for your subplots and set your equal ratio with ax.set_aspect(aspect='equal', adjustable='datalim'):
See full code below:
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(figsize=(10,10),facecolor='white')
axs = [fig.add_subplot(223)]
axs.append(fig.add_subplot(224))#,sharey=axs[0]))
axs.append(fig.add_subplot(221))#,sharex=axs[0]))
rng = np.random.default_rng(12345)
values = rng.random((100,3))-.5
values[:,1] = 1.6*values[:,1]
values[:,2] = .5*values[:,2]
for ax,axis in zip(axs,['y','x','z']):
axis1,axis2={'x':(1,2),'y':(0,2),'z':(0,1)}[axis]
ax.add_patch(plt.Circle([0,0], radius=.2, color='pink',zorder=-20))
ax.scatter(values[:,axis1],values[:,axis2])
ax.set_xlim([np.amin(values),np.amax(values)])
ax.set_ylim([np.amin(values),np.amax(values)])
ax.set_aspect('equal', adjustable='datalim')
axs[0].set_xlabel('x')
axs[2].set_ylabel('y')
axs[1].set_xlabel('y')
axs[0].set_ylabel('z')
fig.subplots_adjust(.08,.06,.99,.99,0,0)
plt.show()
The output gives:
I made it work using gridspec (I changed scatter for plot to visually make sure no data gets left out). It requires some tweaking of the figsize to really minimize the white space within the axes. Thank you to #jylls for the intermediate solution.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.gridspec import GridSpec
%matplotlib inline
rng = np.random.default_rng(12345)
values = rng.random((100,3))-.5
values[:,1] = 1.6*values[:,1]
values[:,2] = .5*values[:,2]
fig = plt.figure(figsize=(10,8),facecolor='white')
ranges = np.ptp(values,axis=0)
gs = GridSpec(2, 2, None,.08,.06,.99,.99,0,0, width_ratios=[ranges[0], ranges[1]], height_ratios=[ranges[1], ranges[2]])
axs = [fig.add_subplot(gs[2])]
axs.append(fig.add_subplot(gs[3]))#,sharey=axs[0]))
axs.append(fig.add_subplot(gs[0]))#,sharex=axs[0]))
for ax,axis in zip(axs,['y','x','z']):
axis1,axis2={'x':(1,2),'y':(0,2),'z':(0,1)}[axis]
ax.add_patch(plt.Circle([0,0], radius=.2, color='pink',zorder=-20))
ax.plot(values[:,axis1],values[:,axis2])
ax.set_aspect('equal', adjustable='datalim')
axs[0].set_xlabel('x')
axs[2].set_ylabel('y')
axs[1].set_xlabel('y')
axs[0].set_ylabel('z')
plt.show()

How can I rotate axis tickmark labels if I set axis properties before making my plot?

I'm experimenting with seaborn and have a question about specifying axes properties. In my code below, I've taken two approaches to creating a heatmap of a matrix and placing the results on two sets of axes in a figure.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
A=np.random.randn(4,4)
labels=['a','b','c','d']
fig, ax = plt.subplots(2)
sns.heatmap(ax =ax[0], data = A)
ax[0].set_xticks(range(len(labels)))
ax[0].set_xticklabels(labels,fontsize=10,rotation=45)
ax[0].set_yticks(range(len(labels)))
ax[0].set_yticklabels(labels,fontsize=10,rotation=45)
ax[1].set_xticks(range(len(labels)))
ax[1].set_xticklabels(labels,fontsize=10,rotation=45)
ax[1].set_yticks(range(len(labels)))
ax[1].set_yticklabels(labels,fontsize=10,rotation=45)
sns.heatmap(ax =ax[1], data = A,xticklabels=labels, yticklabels=labels)
plt.show()
The resulting figure looks like this:
Normally, I would always take the first approach of creating the heatmap and then specifying axis properties. However, when creating an animation (to be embedded on a tkinter canvas), which is what I'm ultimately interested in doing, I found such an ordering in my update function leads to "flickering" of axis labels. The second approach will eliminate this effect, and it also centers the tickmarks within squares along the axes.
However, the second approach does not rotate the y-axis tickmark labels as desired. Is there a simple fix to this?
I'm not sure this is what you're looking for. It looks like you create your figure after you change the yticklabels. so the figure is overwriting your yticklabels.
Below would fix your issue.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
A=np.random.randn(4,4)
labels=['a','b','c','d']
fig, ax = plt.subplots(2)
sns.heatmap(ax =ax[0], data = A)
ax[0].set_xticks(range(len(labels)))
ax[0].set_xticklabels(labels,fontsize=10,rotation=45)
ax[0].set_yticks(range(len(labels)))
ax[0].set_yticklabels(labels,fontsize=10,rotation=45)
ax[1].set_xticks(range(len(labels)))
ax[1].set_xticklabels(labels,fontsize=10,rotation=45)
ax[1].set_yticks(range(len(labels)))
sns.heatmap(ax =ax[1], data = A,xticklabels=labels, yticklabels=labels)
ax[1].set_yticklabels(labels,fontsize=10,rotation=45)
plt.show()

Change y axis range of a secondary axis in python Matplotlib

I have two plots overlaid on each other generated by the following code:
import matplotlib.pyplot as plt
import pandas as pd
width=.5
t=pd.DataFrame({'bars':[3.4,3.1,5.1,5.1,3.8,4.2,5.2,4.0,3.6],'lines':[2.4,2.2,2.4,2.1,2.0,2.1,1.9,1.8,1.9]})
t['bars'].plot(kind='bar',width=width)
t['lines'].plot(secondary_y=True, color='red')
ax=plt.gca()
plt.xlim([-width,len(t['bars'])-width])
ax.set_xticklabels(('1','2','3','4','5','6','7','8','9'))
plt.show()
I want to be able to scale the range of the second y axis to go from 0.0 to 2.5 (instead of 1.8 to 2.4) in steps of .5. How can I define this without changing the bar chart at all?
Pandas returns the axis on which it plots when you call the plot function. Just save that axis and modify the limits using the object oriented approach.
import matplotlib.pyplot as plt
import pandas as pd
width=.5
t=pd.DataFrame({'bars':[3.4,3.1,5.1,5.1,3.8,4.2,5.2,4.0,3.6],'lines':[2.4,2.2,2.4,2.1,2.0,2.1,1.9,1.8,1.9]})
ax1 = t['bars'].plot(kind='bar',width=width)
ax2 = t['lines'].plot(secondary_y=True, color='red')
ax2.set_ylim(0, 2.5)
ax1.set_xlim([-width,len(t['bars'])-width])
ax1.set_xticklabels(('1','2','3','4','5','6','7','8','9'))
plt.show()

Plot data from two DataFrame with only one colorbar in a scatter plot

I have two DataFrame for two different datasets that contain columns RA,Dec, and Vel. I need to plot them to a same scatter plot and show one colorbar instead of two. There's similar question using pure matplotlib here, but I need to do it using scatter plot function from pandas. Here's my experiment so far:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data1 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-20,10,5)})
data2 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-10,20,5)})
fig, ax = plt.subplots(figsize=(12, 10))
data1.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='^',ax=ax,label='Methanol',vmin=-20, vmax=20)
data2.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='o',ax=ax,label='Water',vmin=-20, vmax=20)
ax.set_xlabel('$\Delta$RA (arcsec.)')
ax.set_ylabel('$\Delta$Dec. (arcsec.)')
ax.set_title('Maser Spot')
ax.invert_xaxis()
ax.legend(loc=2)
Using this code, I managed to plot two DataFrame into one scatter plot. But it shows two colorbars as you can see here:
Test Case.
Any help is appreciated.
You can just add colorbar = False in the first plot.
The final code will be :
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
data1 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-20,10,5)})
data2 = pd.DataFrame({'RA':np.random.randint(-100,100,5),
'Dec':np.random.randint(-100,100,5),'Vel':np.random.randint(-10,20,5)})
fig, ax = plt.subplots(figsize=(12, 10))
data1.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='^',ax=ax,label='Methanol',vmin=-20, vmax=20,
colorbar=False)
data2.plot.scatter(x='RA',y='Dec',c='Vel',cmap='rainbow',
marker='o',ax=ax,label='Water',vmin=-20, vmax=20)
ax.set_xlabel('$\Delta$RA (arcsec.)')
ax.set_ylabel('$\Delta$Dec. (arcsec.)')
ax.set_title('Maser Spot')
ax.invert_xaxis()
ax.legend(loc=2)

Vertical line at the end of a CDF histogram using matplotlib

I'm trying to create a CDF but at the end of the graph, there is a vertical line, shown below:
I've read that his is because matplotlib uses the end of the bins to draw the vertical lines, which makes sense, so I added into my code as:
bins = sorted(X) + [np.inf]
where X is the data set I'm using and set the bin size to this when plotting:
plt.hist(X, bins = bins, cumulative = True, histtype = 'step', color = 'b')
This does remove the line at the end and produce the desired effect, however when I normalise this graph now it produces an error:
ymin = max(ymin*0.9, minimum) if not input_empty else minimum
UnboundLocalError: local variable 'ymin' referenced before assignment
Is there anyway to either normalise the data with
bins = sorted(X) + [np.inf]
in my code or is there another way to remove the line on the graph?
An alternative way to plot a CDF would be as follows (in my example, X is a bunch of samples drawn from the unit normal):
import numpy as np
import matplotlib.pyplot as plt
X = np.random.randn(10000)
n = np.arange(1,len(X)+1) / np.float(len(X))
Xs = np.sort(X)
fig, ax = plt.subplots()
ax.step(Xs,n)
I needed a solution where I would not need to alter the rest of my code (using plt.hist(...) or, with pandas, dataframe.plot.hist(...)) and that I could reuse easily many times in the same jupyter notebook.
I now use this little helper function to do so:
def fix_hist_step_vertical_line_at_end(ax):
axpolygons = [poly for poly in ax.get_children() if isinstance(poly, mpl.patches.Polygon)]
for poly in axpolygons:
poly.set_xy(poly.get_xy()[:-1])
Which can be used like this (without pandas):
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
X = np.sort(np.random.randn(1000))
fig, ax = plt.subplots()
plt.hist(X, bins=100, cumulative=True, density=True, histtype='step')
fix_hist_step_vertical_line_at_end(ax)
Or like this (with pandas):
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(1000))
fig, ax = plt.subplots()
ax = df.plot.hist(ax=ax, bins=100, cumulative=True, density=True, histtype='step', legend=False)
fix_hist_step_vertical_line_at_end(ax)
This works well even if you have multiple cumulative density histograms on the same axes.
Warning: this may not lead to the wanted results if your axes contain other patches falling under the mpl.patches.Polygon category. That was not my case so I prefer using this little helper function in my plots.
Assuming that your intentions are pure aesthetic, add a vertical line, of the same color as your plot background:
ax.axvline(x = value, color = 'white', linewidth = 2)
Where "value" stands for the right extreme of the rightmost bin.

Categories