I would like to generate random points on an x,y scatter plot that are either above or below a given line. For example, if the line is y=x I would like to generate a list of points in the top left of the plot (above the line) and a list of points in the bottom right of the plot (below the line). Here's is an example where the points are above or below y=5:
import random
import matplotlib.pyplot as plt
num_points = 10
x1 = [random.randrange(start=1, stop=9) for i in range(num_points)]
x2 = [random.randrange(start=1, stop=9) for i in range(num_points)]
y1 = [random.randrange(start=1, stop=5) for i in range(num_points)]
y2 = [random.randrange(start=6, stop=9) for i in range(num_points)]
plt.scatter(x1, y1, c='blue')
plt.scatter(x2, y2, c='red')
plt.show()
However, I generated the x and y points independently, which limits me to equations where y = c (where c is a constant). How can I expand this to any y=mx+b?
You can change the stop and start limits for y1 and y2 to be the line you want. You will need to decide where the plane ends (set lower and upper).
Note this only works for integers. You can use truncated multivariate distributions if you want something more sophisticated.
m, b = 1, 0
lower, upper = -25, 25
x1 = [random.randrange(start=1, stop=9) for i in range(num_points)]
x2 = [random.randrange(start=1, stop=9) for i in range(num_points)]
y1 = [random.randrange(start=lower, stop=m*x+b) for x in x1]
y2 = [random.randrange(start=m*x+b, stop=upper) for x in x2]
plt.plot(np.arange(10), m*np.arange(10)+b)
plt.scatter(x1, y1, c='blue')
plt.scatter(x2, y2, c='red')
You may as well have my answer too.
This way puts Gaussian noise above the line, and below. I have deliberately set the mean of the noise to 20 so that it would stand out from the line, which is y = 10*x + 5. You would probably make the mean zero.
>>> import random
>>> def y(x, m, b):
... return m*x + b
...
>>> import numpy as np
>>> X = np.linspace(0, 10, 100)
>>> y_above = [y(x, 10, 5) + abs(random.gauss(20,5)) for x in X]
>>> y_below = [y(x, 10, 5) - abs(random.gauss(20,5)) for x in X]
>>> import matplotlib.pyplot as plt
>>> plt.scatter(X, y_below, c='g')
>>> plt.scatter(X, y_above, c='r')
>>> plt.show()
Here's the plot.
There are many approaches possible, but if your only requirement is that they are above and below the y = mx + b line, then you can simply plug the random x values into the equation and then add or subtract a random y value.
import random
import matplotlib.pyplot as plt
slope = 1
intercept = 0
def ymxb(slope, intercept, x):
return slope * x + intercept
num_points = 10
x1 = [random.randrange(start=1, stop=9) for i in range(num_points)]
x2 = [random.randrange(start=1, stop=9) for i in range(num_points)]
y1 = [ymxb(slope, intercept, x) - random.randrange(start=1, stop=9) for x in x1]
y2 = [ymxb(slope, intercept, x) + random.randrange(start=1, stop=9) for x in x2]
plt.scatter(x1, y1, c='blue')
plt.scatter(x2, y2, c='red')
plt.show()
That looks like this:
Side of (x, y) is defined by the sign of y - mx - b. You can read it here, for example.
import random
import matplotlib.pyplot as plt
num_points = 50
x = [random.randrange(start=1, stop=9) for i in range(num_points)]
y = [random.randrange(start=1, stop=9) for i in range(num_points)]
m = 5
b = -3
colors = ['blue' if y[i] - m * x[i] - b > 0 else 'red' for i in range(num_points) ]
plt.plot([0, 10], [b, 10 * m + b], c='green')
plt.xlim((0, 10))
plt.ylim((0, 10))
plt.scatter(x, y, c=colors)
plt.show()
Related
I would like to generate a random float point above and below a line created by numpy arrays.
For example I have these line equations:
x_values = np.linspace(-1, 1, 100)
y1 = 2 * x_values -5
y2= -3 * x_values +2
plt.plot(x_values,y1, '-k')
plt.plot(x_values,y2, '-g')
I have tried this method from Generate random points above and below a line in Python and it works if np.arrange is used like so:
lower, upper = -25, 25
num_points = 1
x1 = [random.randrange(start=1, stop=9) for i in range(num_points)]
x2 = [random.randrange(start=1, stop=9) for i in range(num_points)]
y1 = [random.randrange(start=lower, stop=(2 * x -5) )for x in x1]
y2 = [random.randrange(start=(2 * x -5), stop=upper) for x in x2]
plt.plot(np.arange(10), 2 * np.arange(10) -5)
plt.scatter(x1, y1, c='blue')
plt.scatter(x2, y2, c='red')
However, I wanted to find a way to generate a random point if np.linspace(-1, 1, 100) was used to create the line graph. The difference is involving/allowing float coordinates to be picked to. But unsure how.
Any ideas will be appreciated.
Here is an approach, using functions for the y-values. Random x positions are chosen uniformly over the x-range. For each random x, a value is randomly chosen between its y-ranges.
import numpy as np
import matplotlib.pyplot as plt
x_values = np.linspace(-1, 1, 100)
f1 = lambda x: 2 * x - 5
f2 = lambda x: -3 * x + 2
y1 = f1(x_values)
y2 = f2(x_values)
plt.plot(x_values, y1, '-k')
plt.plot(x_values, y2, '-g')
plt.fill_between (x_values, y1, y2, color='gold', alpha=0.2)
num_points = 20
xs = np.random.uniform(x_values[0], x_values[-1], num_points)
ys = np.random.uniform(f1(xs), f2(xs))
plt.scatter(xs, ys, color='crimson')
plt.show()
PS: Note that the simplicity of the approach chooses x uniform over its length. If you need an even distribution over the area of the trapezium, you need the x less probable at the right, and more at the left. You can visualize this with many more points and using transparency. With the simplistic approach, the right will look denser than the left.
The following code first generates x,y points in a parallelogram, and remaps the points on the wrong side back to its mirror position. The code looks like:
import numpy as np
import matplotlib.pyplot as plt
x0, x1 = -1, 1
x_values = np.linspace(x0, x1, 100)
f1 = lambda x: 2 * x - 5
f2 = lambda x: -3 * x + 2
y1 = f1(x_values)
y2 = f2(x_values)
plt.plot(x_values, y1, '-k')
plt.plot(x_values, y2, '-g')
plt.fill_between(x_values, y1, y2, color='gold', alpha=0.2)
num_points = 100_000
h0 = f2(x0) - f1(x0)
h1 = f2(x1) - f1(x1)
xs1 = np.random.uniform(x0, x1, num_points)
ys1 = np.random.uniform(0, h0 + h1, num_points) + f1(xs1)
xs = np.where(ys1 <= f2(xs1), xs1, x0 + x1 - xs1)
ys = np.where(ys1 <= f2(xs1), ys1, f1(xs) + h0 + h1 + f1(xs1) - ys1)
plt.scatter(xs, ys, color='crimson', alpha=0.2, ec='none', s=1)
plt.show()
Plot comparing the two approaches:
First of all, if you have 2 intersecting lines, there will most likely be a triangle in which you can pick random points. This is dangerously close to Bertrand's paradox, so make sure that your RNG suits its purpose.
If you don't really care about how "skewed" your randomness is, try this:
import numpy as np
left, right = -1, 1
# x_values = np.linspace(left, right, 100)
k1, k2 = 2, -3
b1, b2 = -5, 2
y1 = lambda x: k1*x + b1
y2 = lambda x: k2*x + b2
# If you need a point above the 1st equation, but below the second one.
# Check the limits where you can pick the points under this condition.
nosol = False
if k1==k2:
if b1>=b2:
inters = -100
nosol = True
else:
rand_x = np.random.uniform(left,right)
rand_y = np.random.uniform(y1(rand_x),y2(rand_x))
print(f'Random point is ({round(rand_x,2)}, {round(rand_y,2)})')
else:
inters = (b2-b1)/(k1-k2)
if inters<=left:
if k1>=k2:
nosol=True
elif inters>=right:
if k1<=k2:
nosol=True
if nosol:
print('No solution')
else:
if k1>k2:
right = inters
else:
left = inters
# Pick random X between "left" and "right"
# Pick whatever distribution you like or need
rand_x = np.random.uniform(left,right)
rand_y = np.random.uniform(y1(rand_x),y2(rand_x))
print(f'Random point is ({round(rand_x,2)}, {round(rand_y,2)})')
If your random X needs to belong to a specific number sequence, use some other np.random function: randint, choice...
I have a function that is split in 3. (0 < x < L1) (L1 < x < a) (a < x L2).
I need to add a notation on the plot for the max value no matter where x is on the (0 < x < L2).
I have:
c1 = np.arange(0,L1+0.1, 0.1)
c2 = np.arange(L1,a+0.1, 0.1)
c3 = np.arange(a,L+0.1, 0.1)
y1 = -q*c1
y2 = -q*c2 + RAV
y3 = -q*c3 + RAV - P
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True)
ax1.fill_between(c1, y1)
ax1.fill_between(c2, y2)
ax1.fill_between(c3, y3)
ax1.set_title('S curve')
Mmax1=np.max(y1)
Mmax2=np.max(y2)
Mmax3=np.max(y3)
Mmax= round(max(Mmax1,Mmax2,Mmax3), 2)
Now I want to take find the x coordinate of the y value Mmax, but I don't know how to use something like x[np.argmax(Mmax)] where x = a.any(c1, c2, c3).
I need the x coordinate so that I can plot it in, where the value occurs
ax2.annotate(text2,
xy=(max_x, Mmax), xycoords='data',
xytext=(0, 30), textcoords='offset points',
arrowprops=dict(arrowstyle="->"))
How can I fix it? Thank you!
Here I would illustrate what may be useful for you,As I don't have you complete code to create your desire function I have defined 3 simple function with following description:
by out_arr=np.maximum.reduce([y1,y2,y3]) we would have maximum in ever x and by result = np.where(out_arr == np.amax(out_arr)) we find out which index have maximum value. then maximum pint would be point=[Max_X,out_arr[Max_X]]
import matplotlib.pyplot as plt
import numpy as np
x= np.arange(0., 6., 1)
y1=x
y2=x**2
y3=x**3
out_arr=np.maximum.reduce([y1,y2,y3])
result = np.where(out_arr == np.amax(out_arr))
Max_X=result[0]
print(Max_X)
point=[Max_X,out_arr[Max_X]]
plt.plot(Max_X,out_arr[Max_X],'ro')
# red dashes, blue squares and green triangles
plt.plot(x, y1, 'r--', x, y2, 'bs', x, y3, 'g^')
plt.show()
The goal is to fill the space between two arrays y1 and y2, similar to matplotlib's fill_between. But I don't want to fill the space with a polygon (for example with hatch='|'), but rather I want to draw the vertical lines only between the data points of the two arrays.
import matplotlib.pyplot as plt
import numpy as np
n = 10
y1 = np.random.random(n)
y2 = np.random.random(n) + 1
x1 = np.arange(n)
ax.fill_between(x1, y1, y2, facecolor='w', hatch='|')
Using a LineCollection could be handy if there are lots of lines in the game. Similar to the other answer, but less expensive:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
def draw_lines_between(*, x1=None, x2=None, y1, y2, ax=None, **kwargs):
ax = ax or plt.gca()
x1 = x1 if x1 is not None else np.arange(len(y1))
x2 = x2 if x2 is not None else x1
cl = LineCollection(np.stack((np.c_[x1, x2], np.c_[y1, y2]), axis=2), **kwargs)
ax.add_collection(cl)
return cl
n = 10
y1 = np.random.random(n)
y2 = np.random.random(n) + 1
x = np.arange(n)
color_list = [str(x) for x in np.round(np.linspace(0., 0.8, n), 2)]
fig, ax = plt.subplots()
ax.plot(x, y1, 'r')
ax.plot(x, y2, 'b')
draw_lines_between(ax=ax, x1=x, y1=y1, y2=y2, colors=color_list)
plt.show()
I wrote a little function which takes two arrays y1, y2 (x1, x2 are optional)
and connects their data points vertically.
def draw_lines_between(*, ax, x1=None, x2=None, y1, y2, color_list=None, **kwargs):
assert len(y1) == len(y2)
assert isinstance(color_list, list)
n = len(y1)
if x1 is None:
x1 = np.arange(n)
if x2 is None:
x2 = x1
if color_list is None:
color_list = [None for i in range(n)]
elif len(color_list) < n:
color_list = [color_list] * n
h = np.zeros(n, dtype=object)
for i in range(n):
h[i] = ax.plot((x1[i], x2[i]), (y1[i], y2[i]), color=color_list[i], **kwargs)[0]
return h
import matplotlib.pyplot as plt
import numpy as np
n = 10
y1 = np.random.random(n)
y2 = np.random.random(n) + 1
x1 = np.arange(n)
color_list = [str(x) for x in np.round(np.linspace(0., 0.8, n), 2)]
fig, ax = plt.subplots()
ax.plot(x1, y1, 'r')
ax.plot(x1, y2, 'b')
draw_lines_between(ax=ax, x1=x1, y1=y1, y2=y2, color_list=color_list)
I want to do something like this:
I have the points but don't know how to plot the curves instead of straight lines.
Thank you.
For people interested in this question, I followed Matthew's suggestion and came up with this implementation:
def hanging_line(point1, point2):
import numpy as np
a = (point2[1] - point1[1])/(np.cosh(point2[0]) - np.cosh(point1[0]))
b = point1[1] - a*np.cosh(point1[0])
x = np.linspace(point1[0], point2[0], 100)
y = a*np.cosh(x) + b
return (x,y)
Here is what the result looks like:
import matplotlib.pyplot as plt
point1 = [0,1]
point2 = [1,2]
x,y = hanging_line(point1, point2)
plt.plot(point1[0], point1[1], 'o')
plt.plot(point2[0], point2[1], 'o')
plt.plot(x,y)
You are going to need some expression for the curve you want to plot, then you can make the curve out of many line segments.
Here's a parabola:
x = np.linspace(-1, 1, 100)
y = x*x
plt.plot(x, y)
Here's a sin curve:
x = np.linspace(-2*np.pi, 2*np.pi, 100)
y = np.sin(x)
plt.plot(x, y)
Each of these looks smooth, but is actually made up of many small line segments.
To get a collection of curves like you showed, you are going to need some expression for a curve you want to plot in terms of its two endpoints. The ones in your picture look like catenarys which are (approximately) the shape a hanging chain assumes under the force of gravity:
x = np.linspace(-2*np.pi, 2*np.pi, 100)
y = 2*np.cosh(x/2)
plt.plot(x, y)
You will have to find a way of parameterizing this curve in terms of its two endpoints, which will require you substituting your values of y and x into:
y = a*cosh(x/a) + b
and solving the resulting pair of equations for a and b.
There is a cool (at least for me) way to draw curve lines between two points, using Bezier curves. Just with some simple code you can create lists with dots connecting points and chart them with matplotlib, for example:
def recta(x1, y1, x2, y2):
a = (y1 - y2) / (x1 - x2)
b = y1 - a * x1
return (a, b)
def curva_b(xa, ya, xb, yb, xc, yc):
(x1, y1, x2, y2) = (xa, ya, xb, yb)
(a1, b1) = recta(xa, ya, xb, yb)
(a2, b2) = recta(xb, yb, xc, yc)
puntos = []
for i in range(0, 1000):
if x1 == x2:
continue
else:
(a, b) = recta(x1, y1, x2, y2)
x = i*(x2 - x1)/1000 + x1
y = a*x + b
puntos.append((x,y))
x1 += (xb - xa)/1000
y1 = a1*x1 + b1
x2 += (xc - xb)/1000
y2 = a2*x2 + b2
return puntos
Then, just run the function for some starting, mid and ending points, and use matplotlib:
lista1 = curva_b(1, 2, 2, 1, 3, 2.5)
lista2 = curva_b(1, 2, 2.5, 1.5, 3, 2.5)
lista3 = curva_b(1, 2, 2.5, 2, 3, 2.5)
lista4 = curva_b(1, 2, 1.5, 3, 3, 2.5)
fig, ax = plt.subplots()
ax.scatter(*zip(*lista1), s=1, c='b')
ax.scatter(*zip(*lista2), s=1, c='r')
ax.scatter(*zip(*lista3), s=1, c='g')
ax.scatter(*zip(*lista4), s=1, c='k')
This should be the results:
several Bezier quadratic curves
By extending the code a little more, you can get forms like this:
Bezier quartic curve
Below is my code for scatter plotting the data in my text file. The file I am opening contains two columns. The left column is x coordinates and the right column is y coordinates. the code creates a scatter plot of x vs. y. I need a code to overplot a line of best fit to the data in the scatter plot, and none of the built in pylab function have worked for me.
from matplotlib import *
from pylab import *
with open('file.txt') as f:
data = [line.split() for line in f.readlines()]
out = [(float(x), float(y)) for x, y in data]
for i in out:
scatter(i[0],i[1])
xlabel('X')
ylabel('Y')
title('My Title')
show()
A one-line version of this excellent answer to plot the line of best fit is:
plt.plot(np.unique(x), np.poly1d(np.polyfit(x, y, 1))(np.unique(x)))
Using np.unique(x) instead of x handles the case where x isn't sorted or has duplicate values.
Assuming line of best fit for a set of points is:
y = a + b * x
where:
b = ( sum(xi * yi) - n * xbar * ybar ) / sum((xi - xbar)^2)
a = ybar - b * xbar
Code and plot
# sample points
X = [0, 5, 10, 15, 20]
Y = [0, 7, 10, 13, 20]
# solve for a and b
def best_fit(X, Y):
xbar = sum(X)/len(X)
ybar = sum(Y)/len(Y)
n = len(X) # or len(Y)
numer = sum([xi*yi for xi,yi in zip(X, Y)]) - n * xbar * ybar
denum = sum([xi**2 for xi in X]) - n * xbar**2
b = numer / denum
a = ybar - b * xbar
print('best fit line:\ny = {:.2f} + {:.2f}x'.format(a, b))
return a, b
# solution
a, b = best_fit(X, Y)
#best fit line:
#y = 0.80 + 0.92x
# plot points and fit line
import matplotlib.pyplot as plt
plt.scatter(X, Y)
yfit = [a + b * xi for xi in X]
plt.plot(X, yfit)
UPDATE:
notebook version
You can use numpy's polyfit. I use the following (you can safely remove the bit about coefficient of determination and error bounds, I just think it looks nice):
#!/usr/bin/python3
import numpy as np
import matplotlib.pyplot as plt
import csv
with open("example.csv", "r") as f:
data = [row for row in csv.reader(f)]
xd = [float(row[0]) for row in data]
yd = [float(row[1]) for row in data]
# sort the data
reorder = sorted(range(len(xd)), key = lambda ii: xd[ii])
xd = [xd[ii] for ii in reorder]
yd = [yd[ii] for ii in reorder]
# make the scatter plot
plt.scatter(xd, yd, s=30, alpha=0.15, marker='o')
# determine best fit line
par = np.polyfit(xd, yd, 1, full=True)
slope=par[0][0]
intercept=par[0][1]
xl = [min(xd), max(xd)]
yl = [slope*xx + intercept for xx in xl]
# coefficient of determination, plot text
variance = np.var(yd)
residuals = np.var([(slope*xx + intercept - yy) for xx,yy in zip(xd,yd)])
Rsqr = np.round(1-residuals/variance, decimals=2)
plt.text(.9*max(xd)+.1*min(xd),.9*max(yd)+.1*min(yd),'$R^2 = %0.2f$'% Rsqr, fontsize=30)
plt.xlabel("X Description")
plt.ylabel("Y Description")
# error bounds
yerr = [abs(slope*xx + intercept - yy) for xx,yy in zip(xd,yd)]
par = np.polyfit(xd, yerr, 2, full=True)
yerrUpper = [(xx*slope+intercept)+(par[0][0]*xx**2 + par[0][1]*xx + par[0][2]) for xx,yy in zip(xd,yd)]
yerrLower = [(xx*slope+intercept)-(par[0][0]*xx**2 + par[0][1]*xx + par[0][2]) for xx,yy in zip(xd,yd)]
plt.plot(xl, yl, '-r')
plt.plot(xd, yerrLower, '--r')
plt.plot(xd, yerrUpper, '--r')
plt.show()
Have implemented #Micah 's solution to generate a trendline with a few changes and thought I'd share:
Coded as a function
Option for a polynomial trendline (input order=2)
Function can also just return the coefficient of determination (R^2, input Rval=True)
More Numpy array optimisations
Code:
def trendline(xd, yd, order=1, c='r', alpha=1, Rval=False):
"""Make a line of best fit"""
#Calculate trendline
coeffs = np.polyfit(xd, yd, order)
intercept = coeffs[-1]
slope = coeffs[-2]
power = coeffs[0] if order == 2 else 0
minxd = np.min(xd)
maxxd = np.max(xd)
xl = np.array([minxd, maxxd])
yl = power * xl ** 2 + slope * xl + intercept
#Plot trendline
plt.plot(xl, yl, c, alpha=alpha)
#Calculate R Squared
p = np.poly1d(coeffs)
ybar = np.sum(yd) / len(yd)
ssreg = np.sum((p(xd) - ybar) ** 2)
sstot = np.sum((yd - ybar) ** 2)
Rsqr = ssreg / sstot
if not Rval:
#Plot R^2 value
plt.text(0.8 * maxxd + 0.2 * minxd, 0.8 * np.max(yd) + 0.2 * np.min(yd),
'$R^2 = %0.2f$' % Rsqr)
else:
#Return the R^2 value:
return Rsqr
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
X, Y = x.reshape(-1,1), y.reshape(-1,1)
plt.plot( X, LinearRegression().fit(X, Y).predict(X) )
Numpy 1.4 introduced new API. You can use this one-liner, where n determines how smooth you want the line to be and a is the degree of the polynomial.
plt.plot(*np.polynomial.Polynomial.fit(x, y, a).linspace(n), 'r-')