Difference between uniformly distributed variables, Python - python

I'm very new to Python, and I was trying to use this problem as a learning exercise, but I can't get anywhere with it.
What I want to do is to show that for two random variables that come uniformly distributed within a 200ns window, the probability of them arriving within 7ns of each other is ~5%:
X, Y ~ U[0, 200]
Z = X - Y
P(|Z| < 7) = ?
I wanted to know the most analytical way of doing this, because I thought Python might have some useful libraries to help, and because if I wanted to do a stochastic simulation I would do it in C++ ROOT which would take me far less time!
The way that I've done it is below, but it's different from what I've calculated analytically. Can anyone suggest a better/more accurate way of solving the same problem?
Thanks a lot!
from scipy.stats import uniform, expon
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots(1, 1)
a, b = 0, 200
size = 1000000
# Genrating uniform distribution
uniform_distribution = uniform(loc=a, scale=b)
x = uniform_distribution.rvs(size=size)
y = uniform_distribution.rvs(size=size)
z=x-y
ax.hist(z)
zsmall=[z for i in z if abs(i)<7]
n=len(zsmall)
print("probability = ",n/size)

Edit: added some code to improve the figure.
Your code is fine, and the results do agree with the analytically derived value. To see this more readily, I have modified your code slightly, scaling the domains of X and Y down to [0, 1] and computing P(|Z| < 7/200), so that this is still equivalent to your original question.
from scipy.stats import uniform
import matplotlib.pyplot as plt
a, b = 0, 1
size = 1000000
# generate uniformly distributed x and y
uniform_distribution = uniform(loc=a, scale=b)
x = uniform_distribution.rvs(size=size)
y = uniform_distribution.rvs(size=size)
z = x - y
# set up figure
fig, ax = plt.subplots(figsize = [16, 8])
ax.set_aspect('equal')
ax.set_xlim([-1, 1])
ax.set_ylim([0, 1])
ax.set_xticks([-1, 0, 1])
ax.set_xticklabels([-1, 0, 1], size=20)
ax.set_yticks([0, 1])
ax.set_yticklabels([0, 1], size=20)
# plot histogram with y-axis scaled to show density,
# increased bin number for better resolution
ax.hist(z, density=True, bins=200, alpha=0.5)
# plot lines around the area we want to estimate
plt.axvline(-7/200, color='black', linestyle='--')
ax.annotate('x = -7/200', xy=(-7/200, 0.4), xytext=(-0.05, 0.4), fontsize=16, ha='right')
plt.axvline( 7/200, color='black', linestyle='--')
ax.annotate('x = 7/200', xy=(7/200, 0.2), xytext=(0.05, 0.2), fontsize=16)
# plot theoretical probability density function
ax.plot([-1, 0], [0, 1], color='gray', linestyle=':')
ax.plot([ 0, 1], [1, 0], color='gray', linestyle=':')
zsmall = [1 for i in z if abs(i) < 7/200]
n = len(zsmall)
print("probability =", n/size)
probability = 0.06857
As you can see, this approaches the theoretically expected triangular distribution (gray dotted lines) pretty closely already. For comparison, we can calculate the theoretical probability, which is the area between the dashed lines and below the dotted lines. We can compute this as the area of the whole rectangle between the dashed lines minus the area of the square consisting of the two small triangles above the dotted lines:
2*(7/200) - (7/200)**2
= 0.068775
So the theoretical value does agree with your simulation result.

Related

Creating an incremental smooth spline curve instead of non-uniform increments in python

I am trying to smoothen my data using spline - which is basically cumulative percentile on the y-axis and a reference point they refer to on the x-axis. I get most part of it correct, however, the challenge I am facing is my y axis is increasing in a non linear way - as seen int he spline plot below- y-axis value keep increasing and decreasing, instead of just increasing.
I still want a smooth curve but want y-axis to increase with the x-axis - i.e. each subsequent y-axis point should be equal or a slight increment in value from the previous value, as opposed to increasing and the decreasing later.
Reproducible code:
import pandas as pd
import numpy as np
from scipy.interpolate import make_interp_spline
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
percentile_block_df = pd.DataFrame({
'x' : [0.5,100.5,200.5,400.5,800.5,900.5,1000.5],
'percentile' : [0.0001,0.01,0.065,0.85,0.99,0.9973,0.9999]
})
figure(figsize=(8, 6), dpi=80)
y = percentile_block_df.percentile
x = percentile_block_df.x
X_Y_Spline = make_interp_spline(x, y)
# Returns evenly spaced numbers
# over a specified interval.
X_ = np.linspace(x.min(), x.max(), 1000)
Y_ = X_Y_Spline(X_)
figure(figsize=(18, 6), dpi=80)
plt.subplot(1, 2, 1) # row 1, col 2 index 1
plt.plot(x, y,"ro")
plt.plot(x, y)
plt.title("Original")
plt.xlabel('X')
plt.ylabel('Percentile ')
plt.subplot(1, 2, 2) # index 2
plt.plot(x, y,"ro")
plt.plot(X_, Y_,"green")
plt.title("Spline Plot")
plt.xlabel('X')
plt.ylabel('Percentile ')
plt.show()
What you are looking for is "monotonicity preserving interpolation". A quick search shows that scipy.interpolate.PchipInterpolator does just that. Here is the result for your example when simply plugging in from scipy.interpolate import PchipInterpolator instead of from scipy.interpolate import make_interp_spline.
Whether or not that's appropriate depends of course on your specific requirements for the interpolation. I encourage you to research the other options which are out there.
Similar question:
Fully monotone interpolation in python
Code that eventually worked for me:
This link, explains the need for Monotone cubic interpolation
#this code allows "smoothening" of the data
B_spline_coeff1 = PchipInterpolator(x1, np.log(y1))
X1_Final = np.linspace(x.min(), x.max(), 1000)
Y1_Final = np.exp(B_spline_coeff1(X1_Final))
#plot subplots
figure(figsize=(18, 6), dpi=80)
plt.subplot(1, 2, 1) # row 1, col 2 index 1
plt.plot(x, y,"ro")
plt.plot(x, y)
plt.title("Original")
plt.xlabel('X')
plt.ylabel('Percentile ')
plt.subplot(1, 2, 2) # index 2
plt.plot(x, y,"ro")
plt.plot(X1_Final, Y1_Final,"green")
plt.title("Spline Plot")
plt.xlabel('X')
plt.ylabel('Percentile ')
plt.show()

coloring lines generated from numerical points

I calculate the eigenvalues of large matrices depending on a parameter and would like to plot the eigenvalues in different colors. So I do not have functions where I can conveniently plot different functions in different colors, but instead I just have a set of points which just get connected as interpolation. My problem is that the lines should be intersecting, but that cannot be achieved with this numerical approach.
Maybe it is best explained with a small example.
import numpy as np
import numpy.linalg
import matplotlib.pyplot as plt
def mat(x):
#return np.array([[np.sin(x), 0], [0, -np.sin(x)]])
return np.array([[np.sin(x), 0, 0], [0, -np.sin(x), 0], [0, 0, np.sin(10*x)+x]])
fig=plt.figure()
fig.suptitle('wrong colors')
ax=fig.add_subplot(111)
# x = np.linspace(-1,1,100) # no, not that easy, the intersection points are difficult to find
x = np.sort(np.random.uniform(low=-1, high=1, size=1000))
#evs = np.zeros((2, len(x)))
evs = np.zeros((3, len(x)))
for i in range(len(x)):
evs[:, i] = np.linalg.eigvalsh(mat(x[i]))
print(np.shape(evs))
ax.plot(x, evs[0,:], color='C0')
ax.plot(x, evs[1,:], color='C1')
ax.plot(x, evs[2,:], color='C2')
# just reference plot, this is how it should look like
fig2 = plt.figure()
fig2.suptitle('correct colors')
ax2 = fig2.add_subplot(111)
ax2.plot(x, np.sin(x), color='C0')
ax2.plot(x, -np.sin(x), color='C1')
ax2.plot(x, np.sin(10*x)+x, color='C2')
plt.show()
So what I get is this:
What I would like to have is this:
One difficulty is that the intersection point is difficult to calculate and usually not included. That's ok, I don't need the point, as the graphics is purely informative. But the colors should be shown correctly. Any suggestions how I could achieve something like this easily?
To give you an idea of where this is to be used, have a look at the following picture.
Here, the straight lines in the middle should have a different color than the curved ones.
Besides the matrix being a lot more complex, the image is created in the same way as above.
EDIT: My example was not good and clear, I have come up with one which is closer to my real problem. The matrix is numeric and I cannot diagonalize it analytically, i.e. I cannot know whether it is sin, cos or maybe some mean np.sin(2*x+0.2)+np.cos(x)**2.
Here you go:
Just concatenate the first part of one signal with the last part of the other
import numpy as np
import numpy.linalg
import matplotlib.pyplot as plt
def mat(x):
return np.array([[np.sin(x), 0], [0, -np.sin(x)]])
fig=plt.figure()
fig.suptitle('wrong colors')
ax=fig.add_subplot(111)
x = np.linspace(-1,1,100)
evs = np.zeros((2, len(x)))
for i in range(len(x)):
evs[:, i] = np.linalg.eigvalsh(mat(x[i]))
print(np.shape(evs))
ax.plot(x, np.concatenate((evs[0,:int(len(x)//2)],evs[1,int(len(x)//2):])), color='C0')
ax.plot(x, np.concatenate((evs[1,:int(len(x)//2)],evs[0,int(len(x)//2):])), color='C1')
plt.show()

Matplotlib contour plot with conditions on the curve

Suppose I want to plot the contour of z=0 where z=(19*y^3-6*x*y-1). I can use the following code:
x = np.linspace(-2,2,1000)
y = np.linspace(-1,1,1000)
X,Y = np.meshgrid(x,y)
z = (19)*Y**3-6*X*Y-1
plt.figure()
plt.contour(z,0)
plt.show()
Now I want to display the portion of the curve where 3*19*y^2-6*x<0 in solid line, and the portion where 3*19*y^2-6*x>0 in dashed line. Basically I'm doing some sort of stability analysis of a system, and I want to show different regions of the z=0 curve differently depending on whether dz/dy is positive or negative.
What I can think of is to locate the coordinates of the two parts myself, and to use scatter plot to show the two parts of the curve using different colour (or line style). I also know how to do this easily in Mathematica. I just wonder whether there is a more elegant solution in matplotlib to do this job.
Maybe the following approach is interesting, although not perfect?
A variable z2 is created with for the 3*19*y^2-6*x>0 condition. z2 is erased everywhere except where it is close to z. Then it is colored with a red-blue colormap, red for the positive part, blue for the negative, and white around 0.
The background is set to black and the contour color to white in order to have enough contrast.
Note that both the contour plot and the imshow need the extent parameter to be set in order to get informative axes.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(-2, 2, 1000)
y = np.linspace(-1, 1, 1000)
X, Y = np.meshgrid(x, y)
z = (19) * Y ** 3 - 6 * X * Y - 1
z2 = 3 * 19 * Y ** 2 - 6 * X
z2 = np.where(np.abs(z) < 0.2, z2, np.NaN)
plt.gca().set_facecolor('black')
plt.imshow(z2, cmap='coolwarm', vmin=-1, vmax=+1, alpha=1, extent=[-2, 2, -1, 1], origin='lower')
plt.contour(z, 0, extent=[-2, 2, -1, 1], zorder=3, colors='white')
plt.show()

Is there anything in matplotlib that behaves like alpha but reversed?

A good way to show the concentration of the data points in a plot is using a scatter plot with non-unit transparency. As a result, the areas with more concentration would appear darker.
# this is synthetic example
N = 10000 # a very very large number
x = np.random.normal(0, 1, N)
y = np.random.normal(0, 1, N)
plt.scatter(x, y, marker='.', alpha=0.1) # an area full of dots, darker wherever the number of dots is more
which gives something like this:
Imagine the case we want to emphasize on the outliers. So the situation is almost reversed: A plot in which the less-concentrated areas are bolder. (There might be a trick to apply for my simple example, but imagine a general case where a distribution of points are not known prior, or it's difficult to define a rule for transparency/weight on color.)
I was thinking if there's anything handy same as alpha that is designed for this job specifically. Although other ideas for emphasizing on outliers are also welcomed.
UPDATE: This is what happens when more then one data point is scattered on the same area:
I'm looking for something like the picture below, the more data point, the less transparent the marker.
To answer the question: You can calculate the density of points, normalize it and encode it in the alpha channel of a colormap.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from matplotlib.colors import LinearSegmentedColormap
# this is synthetic example
N = 10000 # a very very large number
x = np.random.normal(0, 1, N)
y = np.random.normal(0, 1, N)
fig, (ax,ax2) = plt.subplots(ncols=2, figsize=(8,5))
ax.scatter(x, y, marker='.', alpha=0.1)
values = np.vstack([x,y])
kernel = stats.gaussian_kde(values)
weights = kernel(values)
weights = weights/weights.max()
cols = plt.cm.Blues([0.8, 0.5])
cols[:,3] = [1., 0.005]
cmap = LinearSegmentedColormap.from_list("", cols)
ax2.scatter(x, y, c=weights, s = 1, marker='.', cmap=cmap)
plt.show()
Left is the original image, right is the image where higher density points have a lower alpha.
Note, however, that this is undesireable, because high density transparent points are undistinguishable from low density. I.e. in the right image it really looks as though you have a hole in the middle of your distribution.
Clearly, a solution with a colormap which does not contain the color of the background is a lot less confusing to the reader.
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# this is synthetic example
N = 10000 # a very very large number
x = np.random.normal(0, 1, N)
y = np.random.normal(0, 1, N)
fig, ax = plt.subplots(figsize=(5,5))
values = np.vstack([x,y])
kernel = stats.gaussian_kde(values)
weights = kernel(values)
weights = weights/weights.max()
ax.scatter(x, y, c = weights, s=9, edgecolor="none", marker='.', cmap="magma")
plt.show()
Here, low density points are still emphazised by darker color, but at the same time it's clear to the viewer that the highest density lies in the middle.
As far as I know, there is no "direct" solution to this quite interesting problem. As a workaround, I propose this solution:
N = 10000 # a very very large number
x = np.random.normal(0, 1, N)
y = np.random.normal(0, 1, N)
fig = plt.figure() # create figure directly to be able to extract the bg color
ax = fig.gca()
ax.scatter(x, y, marker='.') # plot all markers without alpha
bgcolor = ax.get_facecolor() # extract current background color
# plot with alpha, "overwriting" dense points
ax.scatter(x, y, marker='.', color=bgcolor, alpha=0.2)
This will plot all points without transparency and then plot all points again with some transparency, "overwriting" those points with the highest density the most. Setting the alpha value to other higher values will put more emphasis to outliers and vice versa.
Of course the color of the second scatter plot needs to be adjusted to your background color. In my example this is done by extracting the background color and setting it as the new scatter plot's color.
This solution is independent of the kind of distribution. It only depends on the density of the points. However it produces twice the amount of points, thus may take slightly longer to render.
Reproducing the edit in the question, my solution is showing exactly the desired behavior. The leftmost point is a single point and is the darkest, the rightmost is consisting of three points and is the lightest color.
x = [0, 1, 1, 2, 2, 2]
y = [0, 0, 0, 0, 0, 0]
fig = plt.figure() # create figure directly to be able to extract the bg color
ax = fig.gca()
ax.scatter(x, y, marker='.', s=10000) # plot all markers without alpha
bgcolor = ax.get_facecolor() # extract current background color
# plot with alpha, "overwriting" dense points
ax.scatter(x, y, marker='.', color=bgcolor, alpha=0.2, s=10000)
Assuming that the distributions are centered around a specific point (e.g. (0,0) in this case), I would use this:
import numpy as np
import matplotlib.pyplot as plt
N = 500
# 0 mean, 0.2 std
x = np.random.normal(0,0.2,N)
y = np.random.normal(0,0.2,N)
# calculate the distance to (0, 0).
color = np.sqrt((x-0)**2 + (y-0)**2)
plt.scatter(x , y, c=color, cmap='plasma', alpha=0.7)
plt.show()
Results:
I don't know if it helps you, because it's not exactly you asked for, but you can simply color points, which values are bigger than some threshold. For example:
import matplotlib.pyplot as plt
num = 100
threshold = 80
x = np.linspace(0, 100, num=num)
y = np.random.normal(size=num)*45
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.scatter(x[np.abs(y) < threshold], y[np.abs(y) < threshold], color="#00FFAA")
ax.scatter(x[np.abs(y) >= threshold], y[np.abs(y) >= threshold], color="#AA00FF")
plt.show()

How to draw a line with matplotlib?

I cannot find a way to draw an arbitrary line with matplotlib Python library. It allows to draw horizontal and vertical lines (with matplotlib.pyplot.axhline and matplotlib.pyplot.axvline, for example), but i do not see how to draw a line through two given points (x1, y1) and (x2, y2). Is there a way? Is there a simple way?
This will draw a line that passes through the points (-1, 1) and (12, 4), and another one that passes through the points (1, 3) et (10, 2)
x1 are the x coordinates of the points for the first line, y1 are the y coordinates for the same -- the elements in x1 and y1 must be in sequence.
x2 and y2 are the same for the other line.
import matplotlib.pyplot as plt
x1, y1 = [-1, 12], [1, 4]
x2, y2 = [1, 10], [3, 2]
plt.plot(x1, y1, x2, y2, marker = 'o')
plt.show()
I suggest you spend some time reading / studying the basic tutorials found on the very rich matplotlib website to familiarize yourself with the library.
What if I don't want line segments?
[edit]:
As shown by #thomaskeefe, starting with matplotlib 3.3, this is now builtin as a convenience: plt.axline((x1, y1), (x2, y2)), rendering the following obsolete.
There are no direct ways to have lines extend to infinity... matplotlib will either resize/rescale the plot so that the furthest point will be on the boundary and the other inside, drawing line segments in effect; or you must choose points outside of the boundary of the surface you want to set visible, and set limits for the x and y axis.
As follows:
import matplotlib.pyplot as plt
x1, y1 = [-1, 12], [1, 10]
x2, y2 = [-1, 10], [3, -1]
plt.xlim(0, 8), plt.ylim(-2, 8)
plt.plot(x1, y1, x2, y2, marker = 'o')
plt.show()
As of matplotlib 3.3, you can do this with plt.axline((x1, y1), (x2, y2)).
I was checking how ax.axvline does work, and I've written a small function that resembles part of its idea:
import matplotlib.pyplot as plt
import matplotlib.lines as mlines
def newline(p1, p2):
ax = plt.gca()
xmin, xmax = ax.get_xbound()
if(p2[0] == p1[0]):
xmin = xmax = p1[0]
ymin, ymax = ax.get_ybound()
else:
ymax = p1[1]+(p2[1]-p1[1])/(p2[0]-p1[0])*(xmax-p1[0])
ymin = p1[1]+(p2[1]-p1[1])/(p2[0]-p1[0])*(xmin-p1[0])
l = mlines.Line2D([xmin,xmax], [ymin,ymax])
ax.add_line(l)
return l
So, if you run the following code you will realize how does it work. The line will span the full range of your plot (independently on how big it is), and the creation of the line doesn't rely on any data point within the axis, but only in two fixed points that you need to specify.
import numpy as np
x = np.linspace(0,10)
y = x**2
p1 = [1,20]
p2 = [6,70]
plt.plot(x, y)
newline(p1,p2)
plt.show()
Just want to mention another option here.
You can compute the coefficients using numpy.polyfit(), and feed the coefficients to numpy.poly1d(). This function can construct polynomials using the coefficients, you can find more examples here
https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.poly1d.html
Let's say, given two data points (-0.3, -0.5) and (0.8, 0.8)
import numpy as np
import matplotlib.pyplot as plt
# compute coefficients
coefficients = np.polyfit([-0.3, 0.8], [-0.5, 0.8], 1)
# create a polynomial object with the coefficients
polynomial = np.poly1d(coefficients)
# for the line to extend beyond the two points,
# create the linespace using the min and max of the x_lim
# I'm using -1 and 1 here
x_axis = np.linspace(-1, 1)
# compute the y for each x using the polynomial
y_axis = polynomial(x_axis)
fig = plt.figure()
axes = fig.add_axes([0.1, 0.1, 1, 1])
axes.set_xlim(-1, 1)
axes.set_ylim(-1, 1)
axes.plot(x_axis, y_axis)
axes.plot(-0.3, -0.5, 0.8, 0.8, marker='o', color='red')
Hope it helps.
In case somebody lands here trying to plot many segments in one go, here is a way. Say the segments are defined by two 2-d arrays of same length, e.g. a and b. We want to plot segments between each a[i] and b[i]. In that case:
Solution 1
ab_pairs = np.c_[a, b]
plt_args = ab_pairs.reshape(-1, 2, 2).swapaxes(1, 2).reshape(-1, 2)
ax.plot(*plt_args, ...)
Example:
np.random.seed(0)
n = 32
a = np.random.uniform(0, 1, (n, 2))
b = np.random.uniform(0, 1, (n, 2))
fig, ax = plt.subplots(figsize=(3, 3))
ab_pairs = np.c_[a, b]
ab_args = ab_pairs.reshape(-1, 2, 2).swapaxes(1, 2).reshape(-1, 2)
# segments
ax.plot(*ab_args, c='k')
# identify points: a in blue, b in red
ax.plot(*a.T, 'bo')
ax.plot(*b.T, 'ro')
plt.show()
Solution 2
The above creates many matplotlib.lines.Line2D. If you'd like a single line, we can do it by interleaving NaN between pairs:
ax.plot(*np.c_[a, b, a*np.nan].reshape(-1, 2).T, ...)
Example:
# same init as example above, then
fig, ax = plt.subplots(figsize=(3, 3))
# segments (all at once)
ax.plot(*np.c_[a, b, a*np.nan].reshape(-1, 2).T, 'k')
# identify points: a in blue, b in red
ax.plot(*a.T, 'bo')
ax.plot(*b.T, 'ro')
plt.show()
(Same figure as above).
Based on #Alejandro's answer:
if you want to add a line to an existing Axes (e.g. a scatter plot), and
all you know is the slope and intercept of the desired line (e.g. a regression line), and
you want it to cover the entire visible X range (already computed), and
you want to use the object-oriented interface (not pyplot).
Then you can do this (existing Axes in ax):
# e.g. slope, intercept, r_value, p_value, std_err = scipy.stats.linregress(xs, ys)
xmin, xmax = ax.get_xbound()
ymin = (xmin * slope) + intercept
ymax = (xmax * slope) + intercept
l = matplotlib.lines.Line2D([xmin, xmax], [ymin, ymax])
ax.add_line(l)

Categories