Stacked histogram with different histtype - python

I'd like to plot different data sets in an stacked histogram, but I want the data on top to have a step type.
I have done this one by splitting the data, first two sets in an stacked histogram and a sum of all the data sets in a different step histogram. Here is the code and plot
mu, sigma = 100, 10
x1 = list(mu + sigma*np.random.uniform(1,100,100))
x2 = list(mu + sigma*np.random.uniform(1,100,100))
x3 = list(mu + sigma*np.random.uniform(1,100,100))
plt.hist([x1, x2], bins=20, stacked=True, histtype='stepfilled', color=['green', 'red'], zorder=2)
plt.hist(x1+x2+x3, bins=20, histtype='step', ec='dodgerblue', ls='--', linewidth=3., zorder=1)
The problem with this example are the borders of the 'step' histogram that are wider than the width of the 'stepfilled' histogram. Any way of fixing this?

For the bars to coincide, two issues need to be solved:
The bin boundaries for the histograms should be exactly equal. They can be calculated dividing the distance from the overall minimum to maximum in N+1 equal parts. Both calls to plt.hist need the same bin boundaries.
The thick edge of the 'step' histogram makes the bars wider. Therefore, the other histogram needs edges of the same width. plt.hist doesn't seem to accept a list of colors for the different parts of the stacked histogram, so a fixed color needs to be set. Optionally, the edge color can be changed afterwards looping through the generated bars.
from matplotlib import pyplot as plt
import numpy as np
mu, sigma = 100, 10
x1 = mu + sigma * np.random.uniform(1, 100, 100)
x2 = mu + sigma * np.random.uniform(1, 100, 100)
x3 = mu + sigma * np.random.uniform(1, 100, 100)
xmin = np.min([x1, x2, x3])
xmax = np.max([x1, x2, x3])
bins = np.linspace(xmin, xmax, 21)
_, _, barlist = plt.hist([x1, x2], bins=bins, stacked=True, histtype='stepfilled',
color=['limegreen', 'crimson'], ec='black', linewidth=3, zorder=2)
plt.hist(np.concatenate([x1, x2, x3]), bins=bins, histtype='step',
ec='dodgerblue', ls='--', linewidth=3, zorder=1)
for bars in barlist:
for bar in bars:
bar.set_edgecolor(bar.get_facecolor())
plt.show()
This is how it would look like with cross-hatching (plt.hist(..., hatch='X')) and black edges:

Related

How to draw the normal distribution of a barplot with log x axis?

I'd like to draw a lognormal distribution of a given bar plot.
Here's the code
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import numpy as np; np.random.seed(1)
import scipy.stats as stats
import math
inter = 33
x = np.logspace(-2, 1, num=3*inter+1)
yaxis = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0.03,0.3,0.75,1.24,1.72,2.2,3.1,3.9,
4.3,4.9,5.3,5.6,5.87,5.96,6.01,5.83,5.42,4.97,4.60,4.15,3.66,3.07,2.58,2.19,1.90,1.54,1.24,1.08,0.85,0.73,
0.84,0.59,0.55,0.53,0.48,0.35,0.29,0.15,0.15,0.14,0.12,0.14,0.15,0.05,0.05,0.05,0.04,0.03,0.03,0.03, 0.02,
0.02,0.03,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0,0]
fig, ax = plt.subplots()
ax.bar(x[:-1], yaxis, width=np.diff(x), align="center", ec='k', color='w')
ax.set_xscale('log')
plt.xlabel('Diameter (mm)', fontsize='12')
plt.ylabel('Percentage of Total Particles (%)', fontsize='12')
plt.ylim(0,8)
plt.xlim(0.01, 10)
fig.set_size_inches(12, 12)
plt.savefig("Test.png", dpi=300, bbox_inches='tight')
Resulting plot:
What I'm trying to do is to draw the Probability Density Function exactly like the one shown in red in the graph below:
An idea is to convert everything to logspace, with u = log10(x). Then draw the density histogram in there. And also calculate a kde in the same space. Everything gets drawn as y versus u. When we have u at a top twin axes, x can stay at the bottom. Both axes get aligned by setting the same xlims, but converted to logspace on the top axis. The top axis can be hidden to get the desired result.
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
inter = 33
u = np.linspace(-2, 1, num=3*inter+1)
x = 10**u
us = np.linspace(u[0], u[-1], 500)
yaxis = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0.03,0.3,0.75,1.24,1.72,2.2,3.1,3.9,
4.3,4.9,5.3,5.6,5.87,5.96,6.01,5.83,5.42,4.97,4.60,4.15,3.66,3.07,2.58,2.19,1.90,1.54,1.24,1.08,0.85,0.73,
0.84,0.59,0.55,0.53,0.48,0.35,0.29,0.15,0.15,0.14,0.12,0.14,0.15,0.05,0.05,0.05,0.04,0.03,0.03,0.03, 0.02,
0.02,0.03,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0,0]
yaxis = np.array(yaxis)
# reconstruct data from the given frequencies
u_data = np.repeat((u[:-1] + u[1:]) / 2, (yaxis * 100).astype(np.int))
kde = stats.gaussian_kde((u[:-1]+u[1:])/2, weights=yaxis, bw_method=0.2)
total_area = (np.diff(u)*yaxis).sum() # total area of all bars; divide by this area to normalize
fig, ax = plt.subplots()
ax2 = ax.twiny()
ax2.bar(u[:-1], yaxis, width=np.diff(u), align="edge", ec='k', color='w', label='frequencies')
ax2.plot(us, total_area*kde(us), color='crimson', label='kde')
ax2.plot(us, total_area * stats.norm.pdf(us, u_data.mean(), u_data.std()), color='dodgerblue', label='lognormal')
ax2.legend()
ax.set_xscale('log')
ax.set_xlabel('Diameter (mm)', fontsize='12')
ax.set_ylabel('Percentage of Total Particles (%)', fontsize='12')
ax.set_ylim(0,8)
xlim = np.array([0.01,10])
ax.set_xlim(xlim)
ax2.set_xlim(np.log10(xlim))
ax2.set_xticks([]) # hide the ticks at the top
plt.tight_layout()
plt.show()
PS: Apparently this also can be achieved directly without explicitly using u (at the cost of being slightly more cryptic):
x = np.logspace(-2, 1, num=3*inter+1)
xs = np.logspace(-2, 1, 500)
total_area = (np.diff(np.log10(x))*yaxis).sum() # total area of all bars; divide by this area to normalize
kde = gaussian_kde((np.log10(x[:-1])+np.log10(x[1:]))/2, weights=yaxis, bw_method=0.2)
ax.bar(x[:-1], yaxis, width=np.diff(x), align="edge", ec='k', color='w')
ax.plot(xs, total_area*kde(np.log10(xs)), color='crimson')
ax.set_xscale('log')
Note that the bandwidth set for gaussian_kde is a somewhat arbitrarily value. Larger values give a more equalized curve, smaller values keep closer to the data. Some experimentation can help.

Python manipulate axis (x and y) update

we measure the radius over an entire device (each degree, 360 points), which is around 148mm. It should be between 146 and 150.
If you plot the data with the corresponding limits, you get this:
CirclPlot
I like to change the axis that between -145 and 145 is small, and between 145- 150 / -145 - -150 is large. So I can see the measured value nice in between the limits.
Is that possible with python?
import matplotlib.pyplot as plt
import matplotlib.scale as mscale
import pandas as pd
#read CSV
EBRData = pd.read_csv('C://Users/vanderey/Documents/MATLAB/EBRTest2.csv', header = 0)
# Define data
Dates = EBRData['Date']
Rx = EBRData['xCoat']
Ry = EBRData['yCoat']
RLSLx = EBRData['xCoat_LSL']
RLSLy = EBRData['yCoat_LSL']
RUSLx = EBRData['xCoat_USL']
RUSLy = EBRData['yCoat_USL']
#Create plot
my_dpi=96
plt.figure(figsize=(480/my_dpi, 480/my_dpi), dpi=my_dpi)
plt.plot(Rx, Ry, color='blue', marker='.', linewidth=1, alpha=0.4)
plt.plot(RLSLx, RLSLy, color='red', marker='.', linewidth=1, alpha=0.4)
plt.plot(RUSLx, RUSLy, color='red', marker='.', linewidth=1, alpha=0.4)
plt.title('EBR')
plt.show()
If radius is what you want to show, I'd also recommend to calculate R from x and y measurements and put that into a plot together with the target limits.
You can do so by calculating the complete polar coordinates from your x/y-values
phi = np.arctan2(df.yCoat, df.xCoat)
R = pd.DataFrame(np.sqrt(df.xCoat.values**2 + df.yCoat.values**2), columns=['R'], index=phi)
If you rather like to plot over the nominal angular values instead of the actual measured angle positions, you could set phi also to e.g.
phi = np.linspace(-np.pi, np.pi, 360, endpoint=False)
However, this can be plotted simply as a normal line plot with two indicated limit lines like
R.plot()
plt.hlines(146, -np.pi, np.pi, 'k')
plt.hlines(150, -np.pi, np.pi, 'k')
or e.g. as a polar plot
f = plt.figure()
ax = f.add_subplot(111, projection='polar')
ax.set_rlim(144, 152)
plt.plot(R, 'b.-')
ax.fill_between(np.linspace(-np.pi, np.pi, 360), 140, 146, color='gray')
ax.fill_between(np.linspace(-np.pi, np.pi, 360), 150, 160, color='gray')
To show samples outside the wanted range, you can simply add e.g.
plt.plot(R[R<146], 'r.')
plt.plot(R[R>150], 'r.')
to immediately see if there's a problem:

Some Data Points not Appearing on PyPlot in Python

I am trying to plot a chart that shows the Observation data points, along with the corresponding prediction.
However, as I am plotting, the red Observation dots are not appearing on my plot; and I am unsure as to why.
They do appear when I run the following in another line:
fig = plt.figure(figsize = (20,6))
plt.plot(testY, 'r.', markersize=10, label=u'Observations')
plt.plot(predictedY, 'b-', label=u'Prediction')
But the code that I am using to plot does not allows them to show up:
def plotGP(testY, predictedY, sigma):
fig = plt.figure(figsize = (20,6))
plt.plot(testY, 'r.', markersize=10, label=u'Observations')
plt.plot(predictedY, 'b-', label=u'Prediction')
x = range(len(testY))
plt.fill(np.concatenate([x, x[::-1]]), np.concatenate([predictedY - 1.9600 * sigma, (predictedY + 1.9600 * sigma)[::-1]]),
alpha=.5, fc='b', ec='None', label='95% confidence interval')
subset = results_dailyData['2010-01':'2010-12']
testY = subset['electricity-kWh']
predictedY = subset['predictedY']
sigma = subset['sigma']
plotGP(testY, predictedY, sigma)
My current plot, where the red Observation points are not appearing.
The plot when I run the plotting code in it's own line. I'd like these dots and the blue line to appear in the plot above:
You may want to consider the following example, where the two cases with and without the fill function from the question are compared.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import pandas as pd
def plotGP(ax, testY, predictedY, sigma, showfill=False):
ax.set_title("Show fill {}".format(showfill))
ax.plot(testY, 'r.', markersize=10, label=u'Observations')
ax.plot(predictedY, 'b-', label=u'Prediction')
x = range(len(testY))
if showfill:
ax.fill(np.concatenate([x, x[::-1]]), np.concatenate([predictedY - 1.9600 * sigma, (predictedY + 1.9600 * sigma)[::-1]]),
alpha=.5, fc='b', ec='None', label='95% confidence interval')
x = np.linspace(-5,-2)
y = np.cumsum(np.random.normal(size=len(x)))
sigma = 2
df = pd.DataFrame({"y" : y}, index=x)
fig, (ax, ax2) =plt.subplots(2,1)
plotGP(ax,df.y, df.y, sigma, False)
plotGP(ax2, df.y, df.y, sigma, True)
plt.show()
As can be seen, the plot curves may sit at completely different positions in the diagram, which would depend on the index of the dataframe.

Plot two histograms on the same graph and have their columns sum to 100

I have two sets of different sizes that I'd like to plot on the same histogram. However, since one set has ~330,000 values and the other has about ~16,000 values, their frequency histograms are hard to compare. I'd like to plot a histogram comparing the two sets such that the y-axis is the % of occurrences in that bin. My code below gets close to this, except that rather than having the individual bin values sum to 1.0, the integral of the histogram sums to 1.0 (this is because of the normed=True parameter).
How can I achieve my goal? I've already tried manually calculating the % frequency and using plt.bar() but rather than overlaying the plots, the plots are compared side by side. I want to keep the effect of having the alpha=0.5
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
if plt.get_fignums():
plt.close('all')
electric = pd.read_csv('electric.tsv', sep='\t')
gas = pd.read_csv('gas.tsv', sep='\t')
electric_df = pd.DataFrame(electric)
gas_df = pd.DataFrame(ngma_nonheat)
electric = electric_df['avg_daily']*30
gas = gas_df['avg_daily']*30
## Create a plot for NGMA gas usage
plt.figure("Usage Comparison")
weights_electric = np.ones_like(electric)/float(len(electric))
weights_gas = np.ones_like(gas)/float(len(gas))
bins=np.linspace(0, 200, num=50)
n, bins, rectangles = plt.hist(electric, bins, alpha=0.5, label='electric usage', normed=True, weights=weights_electric)
plt.hist(gas, bins, alpha=0.5, label='gas usage', normed=True, weights=weights_gas)
plt.legend(loc='upper right')
plt.xlabel('Average 30 day use in therms')
plt.ylabel('% of customers')
plt.title('NGMA Customer Usage Comparison')
plt.show()
It sounds like you don't want the normed/density kwarg in this case. You're already using weights. If you multiply your weights by 100 and leave out the normed=True option, you should get exactly what you had in mind.
For example:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)
x = np.random.normal(5, 2, 10000)
y = np.random.normal(2, 1, 3000000)
xweights = 100 * np.ones_like(x) / x.size
yweights = 100 * np.ones_like(y) / y.size
fig, ax = plt.subplots()
ax.hist(x, weights=xweights, color='lightblue', alpha=0.5)
ax.hist(y, weights=yweights, color='salmon', alpha=0.5)
ax.set(title='Histogram Comparison', ylabel='% of Dataset in Bin')
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()
On the other hand, what you're currently doing (weights and normed) would result in (note the units on the y-axis):
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)
x = np.random.normal(5, 2, 10000)
y = np.random.normal(2, 1, 3000000)
xweights = 100 * np.ones_like(x) / x.size
yweights = 100 * np.ones_like(y) / y.size
fig, ax = plt.subplots()
ax.hist(x, weights=xweights, color='lightblue', alpha=0.5, normed=True)
ax.hist(y, weights=yweights, color='salmon', alpha=0.5, normed=True)
ax.set(title='Histogram Comparison', ylabel='% of Dataset in Bin')
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()

Plotting Curves aligned to Dynamic Time Warping Matrix

I have problems to plot two arrays with the right scaling. I use the dtw package to compare the two arrays, x and y (https://pypi.python.org/pypi/dtw/1.0). The function dtw returns a matrix and a path.
With the following code, I can plot the matrix and the path:
import matplotlib.pyplot as plt
dist, cost, acc, path = dtw(x, y, dist=lambda x, y: norm(x - y, ord=1))
plt.imshow(acc.T, origin='lower', cmap=cm.gray, interpolation='nearest')
plt.colorbar()
plt.plot(path[0], path[1], 'w')
plt.ylim((-0.5, acc.shape[1]-0.5))
plt.xlim((-0.5, acc.shape[0]-0.5))
Resulting figure:
However, I would like to plot the two curves aligned to it, like shown in (http://www.psb.ugent.be/cbd/papers/gentxwarper/DTWalgorithm.htm). One curve is above the matrix, the other one is on the left side, so that you can compare which parts are equal.
Like suggested by kwinkunks (see comment) I used this example as template. Please note that I used "plt.pcolor()" instead of "plt.image()" to plot the matrix. This is my code and the resulting figure:
'''
Plotting
'''
nullfmt = NullFormatter()
# definitions for the axes
left, width = 0.12, 0.60
bottom, height = 0.08, 0.60
bottom_h = 0.16 + width
left_h = left + 0.27
rect_plot = [left_h, bottom, width, height]
rect_x = [left_h, bottom_h, width, 0.2]
rect_y = [left, bottom, 0.2, height]
# start with a rectangular Figure
plt.figure(2, figsize=(8, 8))
axplot = plt.axes(rect_plot)
axx = plt.axes(rect_x)
axy = plt.axes(rect_y)
# Plot the matrix
axplot.pcolor(acc.T,cmap=cm.gray)
axplot.plot(path[0], path[1], 'w')
axplot.set_xlim((0, len(x)))
axplot.set_ylim((0, len(linear)))
axplot.tick_params(axis='both', which='major', labelsize=18)
# Plot time serie horizontal
axx.plot(x,'.', color='k')
axx.tick_params(axis='both', which='major', labelsize=18)
xloc = plt.MaxNLocator(4)
x2Formatter = FormatStrFormatter('%d')
axx.yaxis.set_major_locator(xloc)
axx.yaxis.set_major_formatter(x2Formatter)
# Plot time serie vertical
axy.plot(y,linear,'.',color='k')
axy.invert_xaxis()
yloc = plt.MaxNLocator(4)
xFormatter = FormatStrFormatter('%d')
axy.xaxis.set_major_locator(yloc)
axy.xaxis.set_major_formatter(xFormatter)
axy.tick_params(axis='both', which='major', labelsize=18)
#Limits
axx.set_xlim(axplot.get_xlim())
axy.set_ylim(axplot.get_ylim())
plt.show()

Categories