I have two dataframes, ground_truth and prediction (Both are pandas series). Finally, I want to plot all prediction points and all ground_truth points as I already did. What I wanna do, is to plot a line between each prediction and ground_truth point. So that the line is a connection between the prediction point x1,y1 and the ground_truth point x2,y2. For a better understanding I attached an image. The black lines (created via paint) is what I want to do.
This is what I already have:
fig, ax = plt.subplots()
ax.plot(pred,'ro', label='Prediction', color = 'g')
ax.plot(GT,'^', label='Ground Truth', color = 'r' )
plt.xlabel('a')
plt.ylabel('b')
plt.title('test')
plt.xticks(np.arange(-1, 100, 5))
plt.style.use('ggplot')
plt.legend()
plt.show()
I guess the easiest and most understandable solution is to plot the respective lines between pred and GT in a loop.
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams['legend.numpoints'] = 1
#generate some random data
pred = np.random.rand(10)*70
GT = pred+(np.random.randint(8,40,size= len(pred))*2.*(np.random.randint(2,size=len(pred))-.5 ))
fig, ax = plt.subplots(figsize=(6,4))
# plot a black line between the
# ith prediction and the ith ground truth
for i in range(len(pred)):
ax.plot([i,i],[pred[i], GT[i]], c="k", linewidth=0.5)
ax.plot(pred,'o', label='Prediction', color = 'g')
ax.plot(GT,'^', label='Ground Truth', color = 'r' )
ax.set_xlim((-1,10))
plt.xlabel('a')
plt.ylabel('b')
plt.title('test')
plt.legend()
plt.show()
You can plot each line as a separate plot. You could make a loop and call plot for each line connecting the two points. However you could also give the plot(x, y, ...) two 2d arrays as arguments. Each column in x will correspond to the same column in y and are represented by a line in the plot. So you'll need to generate these two. It could look something like this:
L = len(pred)
t = np.c_[range(L), range(L)].T
ax.plot(t, np.c_[pred, GT].T, '-k')
You can achieve this using matplotlib errorbar (http://matplotlib.org/1.2.1/examples/pylab_examples/errorbar_demo.html), with the idea of drawing error bars around the average of the two lines you are plotting:
Here is a minimal example to show my idea:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# example data
x = np.arange(0.1,10, 0.5)
y1 = pd.Series(np.exp(-x), index = x)
y2 = pd.Series(np.exp(-x)+ np.sin(x), index = x)
avg_line = (y1 + y2)*0.5
err = (avg_line - y1).abs()
fig, ax = plt.subplots(1)
y1.plot(marker = 'o', label='Prediction', color = 'g', linestyle = '', ax = ax)
y2.plot(marker = '^', label='Ground Truth', color = 'r', linestyle = '', ax = ax)
ax.errorbar(x, avg_line.values, yerr=err.values, fmt= 'none', ecolor = 'k', barsabove = False, capthick=0)
plt.style.use('ggplot')
ax.legend()
Hope this solves your problem.
Related
I have a ecdf plot like this:
penguins = sns.load_dataset("penguins")
fig, ax = plt.subplots(figsize = (10,8))
sns.ecdfplot(data=penguins, x="bill_length_mm", hue="species")
ax.axhline(.25, linestyle = '--', color ='#cfcfcf', lw = 2, alpha = 0.75)
how to find the x values on this intersecting axhline?
You could loop through the generated curves (ax.get_lines()), extract their coordinates and search for the index of the first y-value larger than the desired y-value.
Here is some illustrating code (note that sns.ecdfplot() should get ax as parameter):
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
penguins = sns.load_dataset("penguins")
fig, ax = plt.subplots(figsize=(10, 8))
sns.ecdfplot(data=penguins, x="bill_length_mm", hue="species", ax=ax)
y_special = 0.25
for line in ax.get_lines():
x, y = line.get_data()
ind = np.argwhere(y >= y_special)[0, 0] # first index where y is larger than y_special
# x[ind] is the desired x-value
ax.text(x[ind], y_special, f' {x[ind]:.1f}', ha='left', va='top') # maybe color=line.get_color()
ax.axhline(y_special, linestyle='--', color='#cfcfcf', lw=2, alpha=0.75)
plt.show()
PS: Optionally you could add these x-values to the legend:
for line, legend_text in zip(ax.get_lines(), ax.legend_.get_texts()):
x, y = line.get_data()
ind = np.argwhere(y >= y_special)[0, 0]
legend_text.set_text(f'{x[ind]:5.2f} {legend_text.get_text()}')
This is a case where it's better to use the computational tools that pandas provides instead of trying to back quantitative values out from a visual representation.
If you want the values corresponding to the .25 quantile for each species, you should do:
penguins.groupby("species")["bill_length_mm"].quantile(.25)
which returns
species
Adelie 36.75
Chinstrap 46.35
Gentoo 45.30
Name: bill_length_mm, dtype: float64
I have an issue with customizing the legend of my plot. I did lot's of customizing but couldnt get my head around this one. I want the symbols (not the labels) to be equally spaced in the legend. As you can see in the example, the space between the circles in the legend, gets smaller as the circles get bigger.
any ideas?
Also, how can I also add a color bar (in addition to the size), with smaller circles being light red (for example) and bigger circle being blue (for example)
here is my code so far:
import pandas as pd
import matplotlib.pyplot as plt
from vega_datasets import data as vega_data
gap = pd.read_json(vega_data.gapminder.url)
df = gap.loc[gap['year'] == 2000]
fig, ax = plt.subplots(1, 1,figsize=[14,12])
ax=ax.scatter(df['life_expect'], df['fertility'],
s = df['pop']/100000,alpha=0.7, edgecolor="black",cmap="viridis")
plt.xlabel("X")
plt.ylabel("Y");
kw = dict(prop="sizes", num=6, color="lightgrey", markeredgecolor='black',markeredgewidth=2)
plt.legend(*ax.legend_elements(**kw),bbox_to_anchor=(1, 0),frameon=False,
loc="lower left",markerscale=1,ncol=1,borderpad=2,labelspacing=4,handletextpad=2)
plt.grid()
plt.show()
It's a bit tricky, but you could measure the legend elements and reposition them to have a constant inbetween distance. Due to the pixel positioning, the plot can't be resized afterwards.
I tested the code inside PyCharm with the 'Qt5Agg' backend. And in a Jupyter notebook, both with %matplotlib inline and with %matplotlib notebook. I'm not sure whether it would work well in all environments.
Note that ax.scatter doesn't return an ax (countrary to e.g. sns.scatterplot) but a list of the created scatter dots.
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.transforms import IdentityTransform
from vega_datasets import data as vega_data
gap = pd.read_json(vega_data.gapminder.url)
df = gap.loc[gap['year'] == 2000]
fig, ax = plt.subplots(1, 1, figsize=[14, 12])
fig.subplots_adjust(right=0.8)
scat = ax.scatter(df['life_expect'], df['fertility'],
s=df['pop'] / 100000, alpha=0.7, edgecolor="black", cmap="viridis")
plt.xlabel("X")
plt.ylabel("Y")
x = 1.1
y = 0.1
is_first = True
kw = dict(prop="sizes", num=6, color="lightgrey", markeredgecolor='black', markeredgewidth=2)
handles, labels = scat.legend_elements(**kw)
inverted_transData = ax.transData.inverted()
for handle, label in zip(handles[::-1], labels[::-1]):
plt.setp(handle, clip_on=False)
for _ in range(1 if is_first else 2):
plt.setp(handle, transform=ax.transAxes)
if is_first:
xd, yd = x, y
else:
xd, yd = inverted_transData.transform((x, y))
handle.set_xdata([xd])
handle.set_ydata([yd])
ax.add_artist(handle)
bbox = handle.get_window_extent(fig.canvas.get_renderer())
y += y - bbox.y0 + 15 # 15 pixels inbetween
x = (bbox.x0 + bbox.x1) / 2
if is_first:
xd_text, _ = inverted_transData.transform((bbox.x1+10, y))
ax.text(xd_text, yd, label, transform=ax.transAxes, ha='left', va='center')
y = bbox.y1
is_first = False
plt.show()
I am trying to plot a chart that shows the Observation data points, along with the corresponding prediction.
However, as I am plotting, the red Observation dots are not appearing on my plot; and I am unsure as to why.
They do appear when I run the following in another line:
fig = plt.figure(figsize = (20,6))
plt.plot(testY, 'r.', markersize=10, label=u'Observations')
plt.plot(predictedY, 'b-', label=u'Prediction')
But the code that I am using to plot does not allows them to show up:
def plotGP(testY, predictedY, sigma):
fig = plt.figure(figsize = (20,6))
plt.plot(testY, 'r.', markersize=10, label=u'Observations')
plt.plot(predictedY, 'b-', label=u'Prediction')
x = range(len(testY))
plt.fill(np.concatenate([x, x[::-1]]), np.concatenate([predictedY - 1.9600 * sigma, (predictedY + 1.9600 * sigma)[::-1]]),
alpha=.5, fc='b', ec='None', label='95% confidence interval')
subset = results_dailyData['2010-01':'2010-12']
testY = subset['electricity-kWh']
predictedY = subset['predictedY']
sigma = subset['sigma']
plotGP(testY, predictedY, sigma)
My current plot, where the red Observation points are not appearing.
The plot when I run the plotting code in it's own line. I'd like these dots and the blue line to appear in the plot above:
You may want to consider the following example, where the two cases with and without the fill function from the question are compared.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(0)
import pandas as pd
def plotGP(ax, testY, predictedY, sigma, showfill=False):
ax.set_title("Show fill {}".format(showfill))
ax.plot(testY, 'r.', markersize=10, label=u'Observations')
ax.plot(predictedY, 'b-', label=u'Prediction')
x = range(len(testY))
if showfill:
ax.fill(np.concatenate([x, x[::-1]]), np.concatenate([predictedY - 1.9600 * sigma, (predictedY + 1.9600 * sigma)[::-1]]),
alpha=.5, fc='b', ec='None', label='95% confidence interval')
x = np.linspace(-5,-2)
y = np.cumsum(np.random.normal(size=len(x)))
sigma = 2
df = pd.DataFrame({"y" : y}, index=x)
fig, (ax, ax2) =plt.subplots(2,1)
plotGP(ax,df.y, df.y, sigma, False)
plotGP(ax2, df.y, df.y, sigma, True)
plt.show()
As can be seen, the plot curves may sit at completely different positions in the diagram, which would depend on the index of the dataframe.
I want to specify the frequency of markers that are printed in my scatter plot.
After being unsuccessful with markevery (other stackoverflow question: Problems with using markevery) I followed the suggestion to slice my values using the notation of x[::5] and y[::5] for every 5th value.
However, now I get a different error. That is,
Traceback (most recent call last):
File "C:\Users\mkupfer\NASA_SJSU_UARC_work\Info\CodingExamples\PythonExamples\X-Y-Value_Plot_Z-SimTime_02_noSectors.py", line 26, in <module>
timePlot = ax.scatter(x[::5], y[::5], s=50, c=timeList, marker = marker.next(), edgecolors='none', norm=cNorm, cmap = plt.matplotlib.cm.jet) #cm.Spectral_r
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 5715, in scatter
colors = mcolors.colorConverter.to_rgba_array(c, alpha)
File "C:\Python27\lib\site-packages\matplotlib\colors.py", line 380, in to_rgba_array
raise ValueError("Color array must be two-dimensional")
ValueError: Color array must be two-dimensional
Here is a simplified version of my code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.lines as lines
from matplotlib import cm
import csv
import itertools
import random
#callSignList = [AMF2052,AMF2052,AMF2052,AMF2052,AMF2052]
xList = random.sample(xrange(100), 100)
yList = random.sample(xrange(100), 100)
timeList = random.sample(xrange(100), 100)
#prepare the plot
fig = plt.figure(figsize=(18,13))
ax = fig.add_subplot(111)
cNorm = plt.matplotlib.colors.Normalize(vmin=0, vmax=3600)
marker = itertools.cycle(('o', '^', '+', '8', 's', 'p', 'x'))
x = xList
y = yList
timePlot = ax.scatter(x[::5], y[::5], s=50, c=timeList, marker = marker.next(), edgecolors='none', norm=cNorm, cmap = plt.matplotlib.cm.jet) #cm.Spectral_r
fig.subplots_adjust(top=0.90, bottom=0.15, hspace=0.25,)
# Now adding the colorbar
#fig.colorbar(timePlot, shrink=0.5, aspect=10, orientation="horizontal")
cax = fig.add_axes([0.15, 0.06, 0.7, 0.05])
#The numbers in the square brackets of add_axes refer to [left, bottom, width, height],
#where the coordinates are just fractions that go from 0 to 1 of the plotting area.
#ax.colorbar(timePlot)
cbar = fig.colorbar(timePlot, cax, orientation='horizontal')
cbar.set_label('Relative Simulation Time')
plt.show()
Can someone please give me an idea where I made a mistake?
Any help is appreciated. Thanks.
Your colour list should be the same length as your data. So you need to apply the same slice.
timePlot = ax.scatter(x[::5], y[::5], s=50, c=timeList[::5],
marker = marker.next(), edgecolors='none',
norm=cNorm, cmap = plt.matplotlib.cm.jet)
I want to specify the frequency of markers that are printed in my scatter plot.
After being unsuccessful with markevery (other stackoverflow question: Problems with using markevery) I followed the suggestion to slice my values using the notation of x[::5] and y[::5] for every 5th value.
However, now I get a different error. That is,
Traceback (most recent call last):
File "C:\Users\mkupfer\NASA_SJSU_UARC_work\Info\CodingExamples\PythonExamples\X-Y-Value_Plot_Z-SimTime_02_noSectors.py", line 26, in <module>
timePlot = ax.scatter(x[::5], y[::5], s=50, c=timeList, marker = marker.next(), edgecolors='none', norm=cNorm, cmap = plt.matplotlib.cm.jet) #cm.Spectral_r
File "C:\Python27\lib\site-packages\matplotlib\axes.py", line 5715, in scatter
colors = mcolors.colorConverter.to_rgba_array(c, alpha)
File "C:\Python27\lib\site-packages\matplotlib\colors.py", line 380, in to_rgba_array
raise ValueError("Color array must be two-dimensional")
ValueError: Color array must be two-dimensional
Here is a simplified version of my code:
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors as colors
import matplotlib.lines as lines
from matplotlib import cm
import csv
import itertools
import random
#callSignList = [AMF2052,AMF2052,AMF2052,AMF2052,AMF2052]
xList = random.sample(xrange(100), 100)
yList = random.sample(xrange(100), 100)
timeList = random.sample(xrange(100), 100)
#prepare the plot
fig = plt.figure(figsize=(18,13))
ax = fig.add_subplot(111)
cNorm = plt.matplotlib.colors.Normalize(vmin=0, vmax=3600)
marker = itertools.cycle(('o', '^', '+', '8', 's', 'p', 'x'))
x = xList
y = yList
timePlot = ax.scatter(x[::5], y[::5], s=50, c=timeList, marker = marker.next(), edgecolors='none', norm=cNorm, cmap = plt.matplotlib.cm.jet) #cm.Spectral_r
fig.subplots_adjust(top=0.90, bottom=0.15, hspace=0.25,)
# Now adding the colorbar
#fig.colorbar(timePlot, shrink=0.5, aspect=10, orientation="horizontal")
cax = fig.add_axes([0.15, 0.06, 0.7, 0.05])
#The numbers in the square brackets of add_axes refer to [left, bottom, width, height],
#where the coordinates are just fractions that go from 0 to 1 of the plotting area.
#ax.colorbar(timePlot)
cbar = fig.colorbar(timePlot, cax, orientation='horizontal')
cbar.set_label('Relative Simulation Time')
plt.show()
Can someone please give me an idea where I made a mistake?
Any help is appreciated. Thanks.
Your colour list should be the same length as your data. So you need to apply the same slice.
timePlot = ax.scatter(x[::5], y[::5], s=50, c=timeList[::5],
marker = marker.next(), edgecolors='none',
norm=cNorm, cmap = plt.matplotlib.cm.jet)