Removing Data Below A Line In A Scatterplot (Python) - python

So I had code that graphed a 2dhistogram of my dataset. I plotted it like so:
histogram = plt.hist2d(fehsc, ofesc, bins=nbins, range=[[-1,.5],[0.225,0.4]])
I wanted to only look at data above a certain line though, so I added the following and it worked just fine:
counts = histogram[0]
xpos = histogram[1]
ypos = histogram[2]
image = histogram[3]
newcounts = counts #we're going to iterate over this
for i in range (nbins):
xin = xpos[i]
yin = ypos
yline = m*xin + b
reset = np.where(yin < yline) #anything less than yline we want to be 0
#index = index[0:len(index)-1]
countout = counts[i]
countout[reset] = 0
newcounts[i] = countout
However, I now need to draw a regression line through that cut region. Doing so is not possible (AFAIK) in plt.2dhist, so I'm using plt.scatter. Problem is I don't know how to make that cut anymore - I can't index the scatterplot.
I have this now:
plt.xlim(-1,.5)
plt.ylim(.225, .4)
scatter = plt.scatter(fehsc,ofesc, marker = ".")
and I only want to retain the data above some line:
xarr = np.arange(-1,0.5, 0.015)
yarr = m*xarr + b
plt.plot(xarr, yarr, color='r')
I've tried running the loop with some variations of the variables but I don't actually understand or know how to get it to work.

You could define a mask for your data before you plot and then just plot the data points that actually meet your criteria. Below an example, where all data points above a certain line are plotted in green and all data points below the line are plotted in black.
from matplotlib import pyplot as plt
import numpy as np
#the scatterplot data
xvals = np.random.rand(100)
yvals = np.random.rand(100)
#the line
b = 0.1
m = 1
x = np.linspace(0,1,num=100)
y = m*x+b
mask = yvals > m*xvals+b
plt.scatter(xvals[mask],yvals[mask],color='g')
plt.scatter(xvals[~mask],yvals[~mask],color='k')
plt.plot(x,y,'r')
plt.show()
The result looks like this
Hope this helps.
EDIT:
If you want to create a 2D histogram, where the portion below the line is set to zero, you can do that by first generating the histogram using numpy (as an array) and then setting the values inside that array to zero, if the bins fall below the line. After that, you can plot the matrix using plt.pcolormesh:
from matplotlib import pyplot as plt
import numpy as np
#the scatterplot data
xvals = np.random.rand(1000)
yvals = np.random.rand(1000)
histogram,xbins,ybins = np.histogram2d(xvals,yvals,bins=50)
#computing the bin centers from the bin edges:
xcenters = 0.5*(xbins[:-1]+xbins[1:])
ycenters = 0.5*(ybins[:-1]+ybins[1:])
#the line
b = 0.1
m = 1
x = np.linspace(0,1,num=100)
y = m*x+b
#hiding the part of the histogram below the line
xmesh,ymesh = np.meshgrid(xcenters,ycenters)
mask = m*xmesh+b > ymesh
histogram[mask] = 0
#making the plot
mat = plt.pcolormesh(xcenters,ycenters,histogram)
line = plt.plot(x,y,'r')
plt.xlim([0,1])
plt.ylim([0,1])
plt.show()
The result would be something like this:

Related

How to set the same y-axis on a graph with matplotlib.pyplot

I am trying to plot a set of data points on the same axis showing the minor differences in their data. When I go to plot the lines, the y-axis is stacked with almost repeating data points instead of mixing the data points. I am limited to using numpy, math, and matplotlib.pyplot.
I apologize in advance if there is a better way to upload graphs.
I am able to get one graph with two lines as desired, but it resets the y-axis for each plot (stacks it). I have tried setting the plt.ylim(0,1000), putting both (x,y) couples in the same plot line amongst other things.
My wrong graph
Correct y-axis graph
I used these imports:
import numpy as np
import matplotlib.pyplot as plt
Here is the code I have been using:
getKey = [m for m in data]
#get reversed list for ideal values
ideal2 = (data[getKey[0]])
ideal2r = []
for k in ideal2:
ideal2r.insert(0,k)
#get reversed list for actual values
actual2 = (data[getKey[1]])
actual2r = []
for k in actual2:
actual2r.insert(0,k)
#get reversed list for measured values
measured2 = (data[getKey[2]])
measured2r = []
for k in measured2:
measured2r.insert(0,k)
#plot the first graph comparing ideal to actual values over increments 0-30(incs)
plt.plot(incs,ideal2r,'b', label = 'ideal')
plt.plot(incs,actual2r,'r', label = 'actual')
#plt.axis('equal')
plt.legend()
plt.show()
#plot the second graph comparing actual to measured values over increments 0-30(incs)
plt.plot(incs,actual2r,'b', label = 'actual')
plt.plot(incs,measured2r,'r', label = 'measured')
#plt.axis('equal')
plt.legend()
plt.show()
When I use this method in a different function it prints on the same y-axis for each line (0,1000)
Other function graph
other method:
k = np.linspace(0,9,10)
x = [1000, 750, 563, 422, 316, 237, 178, 133, 100, 75]
z = [927,870,567,271,132,47,92,79,99,123]
# Complete the rest of the function below this line
r = 200
a = 0.75
x_est = [z[0]]
p_est = [1]
g_est = [0]
x_hat = z[0]
p = 1 #cant be 0
for i in range(len(k)-1):
#initialization
x_hat = a*x_hat
p = a*p*a
#predict
g = p/(p+r) # adding noise prediction
x_hat = x_hat + g*(z[i] - x_hat)
p = (1-g)*p
#update
p_est.append(p)
x_est.append(x_hat)
g_est.append(g)
plt.plot(k,x,'b', label = 'true state')
plt.plot(k,z,'r', label = 'observation')
plt.plot(k, x_est, 'g', label = 'estimate')
plt.legend()
plt.show()
The data i have been using is contained here

Making parts of a line graph a different colour depending on their y value in Matplotlib

I'm making a program which takes a random list of data and will plot it.
I want the colour of the graph to change if it goes above a certain value.
https://matplotlib.org/gallery/lines_bars_and_markers/multicolored_line.html
Matplotlib has an entry on doing just this but it seems to require using a function as input for the graph not using lists.
Does anyone know how to either convert this to work for lists or another way of doing so?
Here's my code so far (without my horrific failed attempts to colour code them)
from matplotlib import pyplot as plt
import random
import sys
import numpy as np
#setting the max and min values where I want the colour to change
A_min = 2
B_max = 28
#makes lists for later
A_min_lin = []
B_max_lin = []
#simulating a corruption of the data where it returns all zeros
sim_crpt = random.randint(0,10)
print(sim_crpt)
randomy = []
if sim_crpt == 0:
randomy = []
#making the empty lists for corrupted data
for i in range(0,20):
randomy.append(0)
print(randomy)
else:
#making a random set of values for the y axis
for i in range(0,20):
n = random.randint(0,30)
randomy.append(n)
print(randomy)
#making an x axis for time
time = t = np.arange(0, 20, 1)
#Making a list to plot a straight line showing where the maximum and minimum values
for i in range(0, len(time)):
A_min_lin.append(A_min)
B_max_lin.append(B_max)
#Testing to see if more than 5 y values are zero to return if it's corrupted
tracker = 0
for i in (randomy):
if i == 0:
tracker += 1
if tracker > 5:
sys.exit("Error, no data")
#ploting and showing the different graphs
plt.plot(time,randomy)
plt.plot(time,A_min_lin)
plt.plot(time,B_max_lin)
plt.legend(['Data', 'Minimum for linear', "Maximum for linear"])
plt.show
You can use np.interp to generate the fine-grain data to plot:
# fine grain time
new_time = np.linspace(time.min(), time.max(), 1000)
# interpolate the y values
new_randomy = np.interp(new_time, time, randomy)
# this is copied from the link with few modification
points = np.array([new_time, new_randomy]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
fig, axs = plt.subplots()
norm = plt.Normalize(new_randomy.min(), new_randomy.max())
lc = LineCollection(segments, cmap='viridis', norm=norm)
# Set the values used for colormapping
lc.set_array(new_randomy[1:])
lc.set_linewidth(2)
line = axs.add_collection(lc)
fig.colorbar(line, ax=axs)
# set the limits
axs.set_xlim(new_time.min(), new_time.max())
axs.set_ylim(new_randomy.min(), new_randomy.max())
plt.show()
Output:

How to grid plot 2D categorical data

I hava data that looks like:
Name X Y
A HIGH MID
B LOW LOW
C MID LOW
D HIGH MID
How to plot this data in a 2-D diagram with a 3x3 grid adding a random variation to place each data point including its name with enough spacing between each other.
So it should look somewhat like that:
The following i tried, but i dont know how to plot the values not exactly on the grid, but in between, so they do nbot overlap.
import pandas as pd
import matplotlib.pyplot as plt
### Mock Data ###
data = """A0,LOW,LOW
A,MID,MID
B,LOW,MID
C,MID,HIGH
D,LOW,MID
E,HIGH,HIGH"""
df = pd.DataFrame([x.split(',') for x in data.split('\n')])
df.columns = ['name','X','Y']
### Plotting ###
fig,axs = plt.subplots()
axs.scatter(df.X,df.Y,label=df.name)
axs.set_xlabel('X')
axs.set_ylabel('Y')
for i,p in enumerate(df.name):
axs.annotate(p, (df.X[i],df.Y[i]))
axs.grid()
axs.set_axisbelow(True)
fig.tight_layout()
plt.show()
resulting:
You can control directly the positions and change the labels on the axis. There are a few problems with your drawing because you are not taking into account some issue such as "what label will you have if you have more than one point at the same location?".
In any case here is a possible solution:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
### Mock Data ###
data = """A0,LOW,LOW
A,MID,MID
B,LOW,MID
C,MID,HIGH
D,LOW,MID
E,HIGH,HIGH"""
df = pd.DataFrame([x.split(',') for x in data.split('\n')])
df.columns = ['name','X','Y']
pos = [0, 1, 2]
lbls = ["LOW", "MID", "HIGH"]
trans = {lbls[i]:pos[i] for i in range(len(pos))}
mat = np.zeros((3, 3), dtype="U10") # This is limited to 10 characters
xxs = []
yys = []
offset = 0.05
for i in range(df.shape[0]):
xc, yc = trans[df.X[i]], trans[df.Y[i]]
if mat[xc, yc]=="":
mat[xc, yc] = df.name[i]
else:
mat[xc, yc] = mat[xc, yc] + ";" + df.name[i]
xxs.append(xc)
yys.append(yc)
fig,axs = plt.subplots()
axs.scatter(xxs, yys)
for i in range(df.shape[0]):
name = mat[xxs[i], yys[i]]
axs.text(xxs[i]+offset, yys[i]+offset, name)
axs.set_xticks(pos)
axs.set_xticklabels(lbls)
axs.set_yticks(pos)
axs.set_yticklabels(lbls)
for i in pos:
axs.axhline(pos[i]-0.5, color="black")
axs.axvline(pos[i]-0.5, color="black")
axs.set_xlim(-0.5, 2.5)
axs.set_ylim(-0.5, 2.5)
plt.show()
This result in the following image:

Linear Regression: Extending line past data and adding a legend

I have a code:
import math
import numpy as np
import pylab as plt1
from matplotlib import pyplot as plt
uH2 = 1.90866638
uHe = 3.60187307
eH2 = 213.38
eHe = 31.96
R = float(uH2*eH2)/(uHe*eHe)
C_Values = []
Delta = []
kHeST = []
J_f21 = []
data = np.genfromtxt("Lamda_HeHCL.txt", unpack=True);
J_i1=data[1];
J_f1=data[2];
kHe=data[7]
data = np.genfromtxt("Basecol_Basic_New_1.txt", unpack=True);
J_i2=data[0];
J_f2=data[1];
kH2=data[5]
print kHe
print kH2
kHe = map(float, kHe)
kH2 = map(float, kH2)
kHe = np.array(kHe)
kH2= np.array(kH2)
g = len(kH2)
for n in range(0,g):
if J_f2[n] == 1:
Jf21 = J_f2[n]
J_f21.append(Jf21)
ratio = kHe[n]/kH2[n]
C = (((math.log(float(kH2[n]),10)))-(math.log(float(kHe[n]),10)))/math.log(R,10)
C_Values.append(C)
St = abs(J_f1[n] - J_i1[n])
Delta.append(St)
print C_Values
print Delta
print J_f21
fig, ax = plt.subplots()
ax.scatter(Delta,C_Values)
for i, txt in enumerate(J_f21):
ax.annotate(txt, (Delta[i],C_Values[i]))
plt.plot(np.unique(Delta), np.poly1d(np.polyfit(Delta, C_Values, 1))(np.unique(Delta)))
plt.plot(Delta, C_Values)
fit = np.polyfit(Delta,C_Values,1)
fit_fn = np.poly1d(fit)
# fit_fn is now a function which takes in x and returns an estimate for y
plt.scatter(Delta,C_Values, Delta, fit_fn(Delta))
plt.xlim(0, 12)
plt.ylim(-3, 3)
In this code, I am trying to plot a linear regression that extends past the data and touches the x-axis. I am also trying to add a legend to the plot that shows the slope of the plot. Using the code, I was able to plot this graph.
Here is some trash data I have been using to try and extend the line and add a legend to my code.
x =[5,7,9,15,20]
y =[10,9,8,7,6]
I would also like it to be a scatter except for the linear regression line.
Given that you don't provide the data you're loading from files I was unable to test this, but off the top of my head:
To extend the line past the plot, you could turn this line
plt.plot(np.unique(Delta), np.poly1d(np.polyfit(Delta, C_Values, 1))(np.unique(Delta)))
Into something like
x = np.linspace(0, 12, 50) # both 0 and 12 are from visually inspecting the plot
plt.plot(x, np.poly1d(np.polyfit(Delta, C_Values, 1))(x))
But if you want the line extended to the x-axis,
polynomial = np.polyfit(Delta, C_Values, 1)
x = np.linspace(0, *np.roots(polynomial))
plt.plot(x, np.poly1d(polynomial)(x))
As for the scatter plot thing, it seems to me you could just remove this line:
plt.plot(Delta, C_Values)
Oh right, as for the legend, add a label to the plots you make, like this:
plt.plot(x, np.poly1d(polynomial)(x), label='Linear regression')
and add a call to plt.legend() just before plt.show().

Scipy dendrogram with names

I'm using the example dendrogram from this post in my work but would also like to keep track of which row / column is from which piece of data.
I've edited the code with records of names of the data as names as follows and would like to print out the names at the bottom and to the right of the distance matrix visualization. I've tried adding labels = names in the call to dendrogram but this didn't help.
Does anyone know how to add labels to this?
import scipy
import pylab
import scipy.cluster.hierarchy as sch
# Generate random features and distance matrix.
x = scipy.rand(40)
D = scipy.zeros([40,40])
for i in range(40):
for j in range(40):
D[i,j] = abs(x[i] - x[j])
### new code
names = [ ]
for i in range(40):
names.append( 'str%i'%( i ) )
print names[-1]
### end new code
# Compute and plot first dendrogram.
fig = pylab.figure(figsize=(8,8))
ax1 = fig.add_axes([0.09,0.1,0.2,0.6])
Y = sch.linkage(D, method='centroid')
Z1 = sch.dendrogram(Y, orientation='right')
ax1.set_xticks([])
ax1.set_yticks([])
# Compute and plot second dendrogram.
ax2 = fig.add_axes([0.3,0.71,0.6,0.2])
Y = sch.linkage(D, method='single')
Z2 = sch.dendrogram(Y)
ax2.set_xticks([])
ax2.set_yticks([])
# Plot distance matrix.
axmatrix = fig.add_axes([0.3,0.1,0.6,0.6])
idx1 = Z1['leaves']
idx2 = Z2['leaves']
D = D[idx1,:]
D = D[:,idx2]
im = axmatrix.matshow(D, aspect='auto', origin='lower', cmap=pylab.cm.YlGnBu)
axmatrix.set_xticks([])
axmatrix.set_yticks([])
# Plot colorbar.
#axcolor = fig.add_axes([0.91,0.1,0.02,0.6])
#pylab.colorbar(im, cax=axcolor)
fig.show()
fig.savefig('dendrogram.png')
The python package heatmapcluster (available on PyPI) that I wrote accepts (in fact, requires) labels.
Here's a simplified version of your script using heatmapcluster:
import numpy as np
import matplotlib.pyplot as plt
from heatmapcluster import heatmapcluster
# Generate random features and distance matrix.
x = np.random.rand(40)
D = np.abs(np.subtract.outer(x, x))
names = ['str%i' % i for i in range(len(x))]
h = heatmapcluster(D, names, names,
num_row_clusters=3, num_col_clusters=3,
label_fontsize=8,
xlabel_rotation=-75,
cmap=plt.cm.coolwarm,
show_colorbar=True,
top_dendrogram=True)
plt.show()
And here is the plot it generates:
(Note that, for a symmetric array like D, there is really no point in clustering both axes. By symmetry, they will generate the same dendrogram.)

Categories