I am trying to plot a chart that shows the Observation data points, along with the corresponding prediction.
However, as I am plotting, the red Observation dots are not appearing on my plot; and I am unsure as to why.
They do appear when I run the following in another line:
fig = plt.figure(figsize = (20,6))
plt.plot(testY, 'r.', markersize=10, label=u'Observations')
plt.plot(predictedY, 'b-', label=u'Prediction')
But the code that I am using to plot does not allows them to show up:
def plotGP(testY, predictedY, sigma):
fig = plt.figure(figsize = (20,6))
plt.plot(testY, 'r.', markersize=10, label=u'Observations')
plt.plot(predictedY, 'b-', label=u'Prediction')
x = range(len(testY))
plt.fill(np.concatenate([x, x[::-1]]), np.concatenate([predictedY - 1.9600 * sigma, (predictedY + 1.9600 * sigma)[::-1]]),
alpha=.5, fc='b', ec='None', label='95% confidence interval')
subset = results_dailyData['2010-01':'2010-12']
testY = subset['electricity-kWh']
predictedY = subset['predictedY']
sigma = subset['sigma']
plotGP(testY, predictedY, sigma)
My current plot, where the red Observation points are not appearing.
The plot when I run the plotting code in it's own line. I'd like these dots and the blue line to appear in the plot above:
You may want to consider the following example, where the two cases with and without the fill function from the question are compared.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import pandas as pd
def plotGP(ax, testY, predictedY, sigma, showfill=False):
ax.set_title("Show fill {}".format(showfill))
ax.plot(testY, 'r.', markersize=10, label=u'Observations')
ax.plot(predictedY, 'b-', label=u'Prediction')
x = range(len(testY))
if showfill:
ax.fill(np.concatenate([x, x[::-1]]), np.concatenate([predictedY - 1.9600 * sigma, (predictedY + 1.9600 * sigma)[::-1]]),
alpha=.5, fc='b', ec='None', label='95% confidence interval')
x = np.linspace(-5,-2)
y = np.cumsum(np.random.normal(size=len(x)))
sigma = 2
df = pd.DataFrame({"y" : y}, index=x)
fig, (ax, ax2) =plt.subplots(2,1)
plotGP(ax,df.y, df.y, sigma, False)
plotGP(ax2, df.y, df.y, sigma, True)
plt.show()
As can be seen, the plot curves may sit at completely different positions in the diagram, which would depend on the index of the dataframe.
Related
I'd like to draw a lognormal distribution of a given bar plot.
Here's the code
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import numpy as np; np.random.seed(1)
import scipy.stats as stats
import math
inter = 33
x = np.logspace(-2, 1, num=3*inter+1)
yaxis = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0.03,0.3,0.75,1.24,1.72,2.2,3.1,3.9,
4.3,4.9,5.3,5.6,5.87,5.96,6.01,5.83,5.42,4.97,4.60,4.15,3.66,3.07,2.58,2.19,1.90,1.54,1.24,1.08,0.85,0.73,
0.84,0.59,0.55,0.53,0.48,0.35,0.29,0.15,0.15,0.14,0.12,0.14,0.15,0.05,0.05,0.05,0.04,0.03,0.03,0.03, 0.02,
0.02,0.03,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0,0]
fig, ax = plt.subplots()
ax.bar(x[:-1], yaxis, width=np.diff(x), align="center", ec='k', color='w')
ax.set_xscale('log')
plt.xlabel('Diameter (mm)', fontsize='12')
plt.ylabel('Percentage of Total Particles (%)', fontsize='12')
plt.ylim(0,8)
plt.xlim(0.01, 10)
fig.set_size_inches(12, 12)
plt.savefig("Test.png", dpi=300, bbox_inches='tight')
Resulting plot:
What I'm trying to do is to draw the Probability Density Function exactly like the one shown in red in the graph below:
An idea is to convert everything to logspace, with u = log10(x). Then draw the density histogram in there. And also calculate a kde in the same space. Everything gets drawn as y versus u. When we have u at a top twin axes, x can stay at the bottom. Both axes get aligned by setting the same xlims, but converted to logspace on the top axis. The top axis can be hidden to get the desired result.
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
inter = 33
u = np.linspace(-2, 1, num=3*inter+1)
x = 10**u
us = np.linspace(u[0], u[-1], 500)
yaxis = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0.01,0.03,0.3,0.75,1.24,1.72,2.2,3.1,3.9,
4.3,4.9,5.3,5.6,5.87,5.96,6.01,5.83,5.42,4.97,4.60,4.15,3.66,3.07,2.58,2.19,1.90,1.54,1.24,1.08,0.85,0.73,
0.84,0.59,0.55,0.53,0.48,0.35,0.29,0.15,0.15,0.14,0.12,0.14,0.15,0.05,0.05,0.05,0.04,0.03,0.03,0.03, 0.02,
0.02,0.03,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0,0]
yaxis = np.array(yaxis)
# reconstruct data from the given frequencies
u_data = np.repeat((u[:-1] + u[1:]) / 2, (yaxis * 100).astype(np.int))
kde = stats.gaussian_kde((u[:-1]+u[1:])/2, weights=yaxis, bw_method=0.2)
total_area = (np.diff(u)*yaxis).sum() # total area of all bars; divide by this area to normalize
fig, ax = plt.subplots()
ax2 = ax.twiny()
ax2.bar(u[:-1], yaxis, width=np.diff(u), align="edge", ec='k', color='w', label='frequencies')
ax2.plot(us, total_area*kde(us), color='crimson', label='kde')
ax2.plot(us, total_area * stats.norm.pdf(us, u_data.mean(), u_data.std()), color='dodgerblue', label='lognormal')
ax2.legend()
ax.set_xscale('log')
ax.set_xlabel('Diameter (mm)', fontsize='12')
ax.set_ylabel('Percentage of Total Particles (%)', fontsize='12')
ax.set_ylim(0,8)
xlim = np.array([0.01,10])
ax.set_xlim(xlim)
ax2.set_xlim(np.log10(xlim))
ax2.set_xticks([]) # hide the ticks at the top
plt.tight_layout()
plt.show()
PS: Apparently this also can be achieved directly without explicitly using u (at the cost of being slightly more cryptic):
x = np.logspace(-2, 1, num=3*inter+1)
xs = np.logspace(-2, 1, 500)
total_area = (np.diff(np.log10(x))*yaxis).sum() # total area of all bars; divide by this area to normalize
kde = gaussian_kde((np.log10(x[:-1])+np.log10(x[1:]))/2, weights=yaxis, bw_method=0.2)
ax.bar(x[:-1], yaxis, width=np.diff(x), align="edge", ec='k', color='w')
ax.plot(xs, total_area*kde(np.log10(xs)), color='crimson')
ax.set_xscale('log')
Note that the bandwidth set for gaussian_kde is a somewhat arbitrarily value. Larger values give a more equalized curve, smaller values keep closer to the data. Some experimentation can help.
I want to create a plot for two different datasets similar to the one presented in this answer:
In the above image, the author managed to fix the overlapping problem of the error bars by adding some small random scatter in x to the new dataset.
In my problem, I must plot a similar graphic, but having some categorical data in the x axis:
Any ideas on how to slightly move one the error bars of the second dataset using categorical variables at the x axis? I want to avoid the overlapping between the bars for making the visualization easier.
You can translate each errorbar by adding the default data transform to a prior translation in data space. This is possible when knowing that categories are in general one data unit away from each other.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
from matplotlib.transforms import Affine2D
x = list("ABCDEF")
y1, y2 = np.random.randn(2, len(x))
yerr1, yerr2 = np.random.rand(2, len(x))*4+0.3
fig, ax = plt.subplots()
trans1 = Affine2D().translate(-0.1, 0.0) + ax.transData
trans2 = Affine2D().translate(+0.1, 0.0) + ax.transData
er1 = ax.errorbar(x, y1, yerr=yerr1, marker="o", linestyle="none", transform=trans1)
er2 = ax.errorbar(x, y2, yerr=yerr2, marker="o", linestyle="none", transform=trans2)
plt.show()
Alternatively, you could translate the errorbars after applying the data transform and hence move them in units of points.
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
from matplotlib.transforms import ScaledTranslation
x = list("ABCDEF")
y1, y2 = np.random.randn(2, len(x))
yerr1, yerr2 = np.random.rand(2, len(x))*4+0.3
fig, ax = plt.subplots()
trans1 = ax.transData + ScaledTranslation(-5/72, 0, fig.dpi_scale_trans)
trans2 = ax.transData + ScaledTranslation(+5/72, 0, fig.dpi_scale_trans)
er1 = ax.errorbar(x, y1, yerr=yerr1, marker="o", linestyle="none", transform=trans1)
er2 = ax.errorbar(x, y2, yerr=yerr2, marker="o", linestyle="none", transform=trans2)
plt.show()
While results look similar in both cases, they are fundamentally different. You will observe this difference when interactively zooming the axes or changing the figure size.
Consider the following approach to highlight plots - combination of errorbar and fill_between with non-zero transparency:
import random
import matplotlib.pyplot as plt
# create sample data
N = 8
data_1 = {
'x': list(range(N)),
'y': [10. + random.random() for dummy in range(N)],
'yerr': [.25 + random.random() for dummy in range(N)]}
data_2 = {
'x': list(range(N)),
'y': [10.25 + .5 * random.random() for dummy in range(N)],
'yerr': [.5 * random.random() for dummy in range(N)]}
# plot
plt.figure()
# only errorbar
plt.subplot(211)
for data in [data_1, data_2]:
plt.errorbar(**data, fmt='o')
# errorbar + fill_between
plt.subplot(212)
for data in [data_1, data_2]:
plt.errorbar(**data, alpha=.75, fmt=':', capsize=3, capthick=1)
data = {
'x': data['x'],
'y1': [y - e for y, e in zip(data['y'], data['yerr'])],
'y2': [y + e for y, e in zip(data['y'], data['yerr'])]}
plt.fill_between(**data, alpha=.25)
Result:
Threre is example on lib site: https://matplotlib.org/stable/gallery/lines_bars_and_markers/errorbar_subsample.html
enter image description here
You need parameter errorevery=(m, n),
n - how often plot error lines, m - shift with range from 0 to n
I have two dataframes, ground_truth and prediction (Both are pandas series). Finally, I want to plot all prediction points and all ground_truth points as I already did. What I wanna do, is to plot a line between each prediction and ground_truth point. So that the line is a connection between the prediction point x1,y1 and the ground_truth point x2,y2. For a better understanding I attached an image. The black lines (created via paint) is what I want to do.
This is what I already have:
fig, ax = plt.subplots()
ax.plot(pred,'ro', label='Prediction', color = 'g')
ax.plot(GT,'^', label='Ground Truth', color = 'r' )
plt.xlabel('a')
plt.ylabel('b')
plt.title('test')
plt.xticks(np.arange(-1, 100, 5))
plt.style.use('ggplot')
plt.legend()
plt.show()
I guess the easiest and most understandable solution is to plot the respective lines between pred and GT in a loop.
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['legend.numpoints'] = 1
#generate some random data
pred = np.random.rand(10)*70
GT = pred+(np.random.randint(8,40,size= len(pred))*2.*(np.random.randint(2,size=len(pred))-.5 ))
fig, ax = plt.subplots(figsize=(6,4))
# plot a black line between the
# ith prediction and the ith ground truth
for i in range(len(pred)):
ax.plot([i,i],[pred[i], GT[i]], c="k", linewidth=0.5)
ax.plot(pred,'o', label='Prediction', color = 'g')
ax.plot(GT,'^', label='Ground Truth', color = 'r' )
ax.set_xlim((-1,10))
plt.xlabel('a')
plt.ylabel('b')
plt.title('test')
plt.legend()
plt.show()
You can plot each line as a separate plot. You could make a loop and call plot for each line connecting the two points. However you could also give the plot(x, y, ...) two 2d arrays as arguments. Each column in x will correspond to the same column in y and are represented by a line in the plot. So you'll need to generate these two. It could look something like this:
L = len(pred)
t = np.c_[range(L), range(L)].T
ax.plot(t, np.c_[pred, GT].T, '-k')
You can achieve this using matplotlib errorbar (http://matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html), with the idea of drawing error bars around the average of the two lines you are plotting:
Here is a minimal example to show my idea:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# example data
x = np.arange(0.1,10, 0.5)
y1 = pd.Series(np.exp(-x), index = x)
y2 = pd.Series(np.exp(-x)+ np.sin(x), index = x)
avg_line = (y1 + y2)*0.5
err = (avg_line - y1).abs()
fig, ax = plt.subplots(1)
y1.plot(marker = 'o', label='Prediction', color = 'g', linestyle = '', ax = ax)
y2.plot(marker = '^', label='Ground Truth', color = 'r', linestyle = '', ax = ax)
ax.errorbar(x, avg_line.values, yerr=err.values, fmt= 'none', ecolor = 'k', barsabove = False, capthick=0)
plt.style.use('ggplot')
ax.legend()
Hope this solves your problem.
I have two sets of different sizes that I'd like to plot on the same histogram. However, since one set has ~330,000 values and the other has about ~16,000 values, their frequency histograms are hard to compare. I'd like to plot a histogram comparing the two sets such that the y-axis is the % of occurrences in that bin. My code below gets close to this, except that rather than having the individual bin values sum to 1.0, the integral of the histogram sums to 1.0 (this is because of the normed=True parameter).
How can I achieve my goal? I've already tried manually calculating the % frequency and using plt.bar() but rather than overlaying the plots, the plots are compared side by side. I want to keep the effect of having the alpha=0.5
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
if plt.get_fignums():
plt.close('all')
electric = pd.read_csv('electric.tsv', sep='\t')
gas = pd.read_csv('gas.tsv', sep='\t')
electric_df = pd.DataFrame(electric)
gas_df = pd.DataFrame(ngma_nonheat)
electric = electric_df['avg_daily']*30
gas = gas_df['avg_daily']*30
## Create a plot for NGMA gas usage
plt.figure("Usage Comparison")
weights_electric = np.ones_like(electric)/float(len(electric))
weights_gas = np.ones_like(gas)/float(len(gas))
bins=np.linspace(0, 200, num=50)
n, bins, rectangles = plt.hist(electric, bins, alpha=0.5, label='electric usage', normed=True, weights=weights_electric)
plt.hist(gas, bins, alpha=0.5, label='gas usage', normed=True, weights=weights_gas)
plt.legend(loc='upper right')
plt.xlabel('Average 30 day use in therms')
plt.ylabel('% of customers')
plt.title('NGMA Customer Usage Comparison')
plt.show()
It sounds like you don't want the normed/density kwarg in this case. You're already using weights. If you multiply your weights by 100 and leave out the normed=True option, you should get exactly what you had in mind.
For example:
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)
x = np.random.normal(5, 2, 10000)
y = np.random.normal(2, 1, 3000000)
xweights = 100 * np.ones_like(x) / x.size
yweights = 100 * np.ones_like(y) / y.size
fig, ax = plt.subplots()
ax.hist(x, weights=xweights, color='lightblue', alpha=0.5)
ax.hist(y, weights=yweights, color='salmon', alpha=0.5)
ax.set(title='Histogram Comparison', ylabel='% of Dataset in Bin')
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()
On the other hand, what you're currently doing (weights and normed) would result in (note the units on the y-axis):
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(1)
x = np.random.normal(5, 2, 10000)
y = np.random.normal(2, 1, 3000000)
xweights = 100 * np.ones_like(x) / x.size
yweights = 100 * np.ones_like(y) / y.size
fig, ax = plt.subplots()
ax.hist(x, weights=xweights, color='lightblue', alpha=0.5, normed=True)
ax.hist(y, weights=yweights, color='salmon', alpha=0.5, normed=True)
ax.set(title='Histogram Comparison', ylabel='% of Dataset in Bin')
ax.margins(0.05)
ax.set_ylim(bottom=0)
plt.show()
I have the following code:
from mpl_toolkits.axes_grid.axislines import SubplotZero
from matplotlib.transforms import BlendedGenericTransform
import matplotlib.pyplot as plt
import numpy
if 1:
fig = plt.figure(1)
ax = SubplotZero(fig, 111)
fig.add_subplot(ax)
ax.axhline(linewidth=1.7, color="black")
ax.axvline(linewidth=1.7, color="black")
plt.xticks([1])
plt.yticks([])
ax.text(0, 1.05, 'y', transform=BlendedGenericTransform(ax.transData, ax.transAxes), ha='center')
ax.text(1.05, 0, 'x', transform=BlendedGenericTransform(ax.transAxes, ax.transData), va='center')
for direction in ["xzero", "yzero"]:
ax.axis[direction].set_axisline_style("-|>")
ax.axis[direction].set_visible(True)
for direction in ["left", "right", "bottom", "top"]:
ax.axis[direction].set_visible(False)
x = numpy.linspace(-1, 1, 10000)
ax.plot(x, numpy.tan(2*(x - numpy.pi/2)), linewidth=1.2, color="black")
plt.ylim(-5, 5)
plt.savefig('graph.png')
which produces this graph:
As you can see, not only is the tan graph sketched, but a portion of line is added to join the asymptotic regions of the tan graph, where an asymptote would normally be.
Is there some built in way to skip that section? Or will I graph separate disjoint domains of tan that are bounded by asymptotes (if you get what I mean)?
Something you could try: set a finite threshold and modify your function to provide non-finite values after those points. Practical code modification:
yy = numpy.tan(2*(x - numpy.pi/2))
threshold = 10000
yy[yy>threshold] = numpy.inf
yy[yy<-threshold] = numpy.inf
ax.plot(x, yy, linewidth=1.2, color="black")
Results in:
This code creates a figure and one subplot for tangent function. NaN are inserted when cos(x) is tending to 0 (NaN means "Not a Number" and NaNs are not plotted or connected).
matplot-fmt-pi created by k-donn(https://pypi.org/project/matplot-fmt-pi/) used to change the formatter to make x labels and ticks correspond to multiples of π/8 in fractional format.
plot formatting (grid, legend, limits, axis) is performed as commented.
import matplotlib.pyplot as plt
import numpy as np
from matplot_fmt_pi import MultiplePi
fig, ax = plt.subplots() # creates a figure and one subplot
x = np.linspace(-2 * np.pi, 2 * np.pi, 1000)
y = np.tan(x)
y[np.abs(np.cos(x)) <= np.abs(np.sin(x[1]-x[0]))] = np.nan
# This operation inserts a NaN where cos(x) is reaching 0
# NaN means "Not a Number" and NaNs are not plotted or connected
ax.plot(x, y, lw=2, color="blue", label='Tangent')
# Set up grid, legend, and limits
ax.grid(True)
ax.axhline(0, color='black', lw=.75)
ax.axvline(0, color='black', lw=.75)
ax.set_title("Trigonometric Functions")
ax.legend(frameon=False) # remove frame legend frame
# axis formatting
ax.set_xlim(-2 * np.pi, 2 * np.pi)
pi_manager = MultiplePi(8) # number= ticks between 0 - pi
ax.xaxis.set_major_locator(pi_manager.locator())
ax.xaxis.set_major_formatter(pi_manager.formatter())
plt.ylim(top=10) # y axis limit values
plt.ylim(bottom=-10)
y_ticks = np.arange(-10, 10, 1)
plt.yticks(y_ticks)
fig
[![enter image description here][1]][1]plt.show()