I have two gaussian plots:
x = np.linspace(-5,9,10000)
plot1=plt.plot(x,mlab.normpdf(x,2.5,1))
plot2=plt.plot(x,mlab.normpdf(x,5,1))
and I want to find the point at where the two curves intersect. Is there a way of doing this? In particular I want to find the value of the x-coordinate where they meet.
You want to find the x's such that both gaussian functions have the same height.(i.e intersect)
You can do so by equating two gaussian functions and solve for x. In the end you will get a quadratic equation with coefficients relating to the gaussian means and variances. Here is the final result:
import numpy as np
def solve(m1,m2,std1,std2):
a = 1/(2*std1**2) - 1/(2*std2**2)
b = m2/(std2**2) - m1/(std1**2)
c = m1**2 /(2*std1**2) - m2**2 / (2*std2**2) - np.log(std2/std1)
return np.roots([a,b,c])
m1 = 2.5
std1 = 1.0
m2 = 5.0
std2 = 1.0
result = solve(m1,m2,std1,std2)
The output is :
array([ 3.75])
You can plot the found intersections:
x = np.linspace(-5,9,10000)
plot1=plt.plot(x,mlab.normpdf(x,m1,std1))
plot2=plt.plot(x,mlab.normpdf(x,m2,std2))
plot3=plt.plot(result,mlab.normpdf(result,m1,std1),'o')
The plot will be:
If your gaussians have multiple intersections, the code will also find all of them(say m1=2.5, std1=3.0, m2=5.0, std2=1.0):
Here's a solution based on purely numpy that is also applicable to curves other than Gaussian.
def get_intersection_locations(y1,y2,test=False,x=None):
"""
return indices of the intersection point/s.
"""
idxs=np.argwhere(np.diff(np.sign(y1 - y2))).flatten()
if test:
x=range(len(y1)) if x is None else x
plt.figure(figsize=[2.5,2.5])
ax=plt.subplot()
ax.plot(x,y1,color='r',label='line1',alpha=0.5)
ax.plot(x,y2,color='b',label='line2',alpha=0.5)
_=[ax.axvline(x[i],color='k') for i in idxs]
_=[ax.text(x[i],ax.get_ylim()[1],f"{x[i]:1.1f}",ha='center',va='bottom') for i in idxs]
ax.legend(bbox_to_anchor=[1,1])
ax.set(xlabel='x',ylabel='density')
return idxs
# single intersection
x = np.arange(-10, 10, 0.001)
y1=sc.stats.norm.pdf(x,-2,2)
y2=sc.stats.norm.pdf(x,2,3)
get_intersection_locations(y1=y1,y2=y2,x=x,test=True) # returns indice/s array([10173])
# double intersection
x = np.arange(-10, 10, 0.001)
y1=sc.stats.norm.pdf(x,-2,1)
y2=sc.stats.norm.pdf(x,2,3)
get_intersection_locations(y1=y1,y2=y2,x=x,test=True)
Based on an answer to a similar question.
Related
From https://stackoverflow.com/a/30460089/2202107, we can generate CDF of a normal distribution:
import numpy as np
import matplotlib.pyplot as plt
N = 100
Z = np.random.normal(size = N)
# method 1
H,X1 = np.histogram( Z, bins = 10, normed = True )
dx = X1[1] - X1[0]
F1 = np.cumsum(H)*dx
#method 2
X2 = np.sort(Z)
F2 = np.array(range(N))/float(N)
# plt.plot(X1[1:], F1)
plt.plot(X2, F2)
plt.show()
Question: How do we generate the "original" normal distribution, given only x (eg X2) and y (eg F2) coordinates?
My first thought was plt.plot(x,np.gradient(y)), but gradient of y was all zero (data points are evenly spaced in y, but not in x) These kind of data is often met in percentile calculations. The key is to get the data evenly space in x and not in y, using interpolation:
x=X2
y=F2
num_points=10
xinterp = np.linspace(-2,2,num_points)
yinterp = np.interp(xinterp, x, y)
# for normalizing that sum of all bars equals to 1.0
tot_val=1.0
normalization_factor = tot_val/np.trapz(np.ones(len(xinterp)),yinterp)
plt.bar(xinterp, normalization_factor * np.gradient(yinterp), width=0.2)
plt.show()
output looks good to me:
I put my approach here for examination. Let me know if my logic is flawed.
One issue is: when num_points is large, the plot looks bad, but it's a issue in discretization, not sure how to avoid it.
Related posts:
I failed to understand why the answer was so complicated in https://stats.stackexchange.com/a/6065/131632
I also didn't understand why my approach was different than Generate distribution given percentile ranks
I'm looking for a way to analyze two cubic splines and find the point where they come the closest to each other. I've seen a lot of solutions and posts but I've been unable to implement the methods suggested. I know that the closest point will be one of the end-points of the two curves or a point where the first derivative of both curves is equal. Checking the end points is easy. Finding the points where the first derivatives match is hard.
Given:
Curve 0 is B(t) (red)
Curve 1 is C(s) (blue)
A candidate for closest point is where:
B'(t) = C'(s)
The first derivative of each curve takes the following form:
Where the a, b, c coefficients are formed from the control points of the curves:
a=P1-P0
b=P2-P1
c=P3-P2
Taking the 4 control points for each cubic spline I can get each curve's parametric sections into a matrix form that can be expressed with Numpy with the following Python code:
def test_closest_points():
# Control Points for the two qubic splines.
spline_0 = [(1,28), (58,93), (113,95), (239,32)]
spline_1 = [(58, 241), (26,76), (225,83), (211,205)]
first_derivative_matrix = np.array([[3, -6, 3], [-6, 6, 0], [3, 0, 0]])
spline_0_x_A = spline_0[1][0] - spline_0[0][0]
spline_0_x_B = spline_0[2][0] - spline_0[1][0]
spline_0_x_C = spline_0[3][0] - spline_0[2][0]
spline_0_y_A = spline_0[1][1] - spline_0[0][1]
spline_0_y_B = spline_0[2][1] - spline_0[1][1]
spline_0_y_C = spline_0[3][1] - spline_0[2][1]
spline_1_x_A = spline_1[1][0] - spline_1[0][0]
spline_1_x_B = spline_1[2][0] - spline_1[1][0]
spline_1_x_C = spline_1[3][0] - spline_1[2][0]
spline_1_y_A = spline_1[1][1] - spline_1[0][1]
spline_1_y_B = spline_1[2][1] - spline_1[1][1]
spline_1_y_C = spline_1[3][1] - spline_1[2][1]
spline_0_first_derivative_x_coefficients = np.array([[spline_0_x_A], [spline_0_x_B], [spline_0_x_C]])
spline_0_first_derivative_y_coefficients = np.array([[spline_0_y_A], [spline_0_y_B], [spline_0_y_C]])
spline_1_first_derivative_x_coefficients = np.array([[spline_1_x_A], [spline_1_x_B], [spline_1_x_C]])
spline_1_first_derivative_y_coefficients = np.array([[spline_1_y_A], [spline_1_y_B], [spline_1_y_C]])
# Show All te matrix values
print 'first_derivative_matrix:'
print first_derivative_matrix
print
print 'spline_0_first_derivative_x_coefficients:'
print spline_0_first_derivative_x_coefficients
print
print 'spline_0_first_derivative_y_coefficients:'
print spline_0_first_derivative_y_coefficients
print
print 'spline_1_first_derivative_x_coefficients:'
print spline_1_first_derivative_x_coefficients
print
print 'spline_1_first_derivative_y_coefficients:'
print spline_1_first_derivative_y_coefficients
print
# Now taking B(t) as spline_0 and C(s) as spline_1, I need to find the values of t and s where B'(t) = C'(s)
This post has some good high-level advice but I'm unsure how to implement a solution in python that can find the correct values for t and s that have matching first derivatives (slopes). The B'(t) - C'(s) = 0 problem seems like a matter of finding roots. Any advice on how to do it with python and Numpy would be greatly appreciated.
Using Numpy assumes that the problem should be solved numerically. Without loss of generality we can treat that 0<s<=1 and 0<t<=1. You can use SciPy package to solve the problem numerically, e.g.
from scipy.optimize import minimize
import numpy as np
def B(t):
"""Assumed for simplicity: 0 < t <= 1
"""
return np.sin(6.28 * t), np.cos(6.28 * t)
def C(s):
"""0 < s <= 1
"""
return 10 + np.sin(3.14 * s), 10 + np.cos(3.14 * s)
def Q(x):
"""Distance function to be minimized
"""
b = B(x[0])
c = C(x[1])
return (b[0] - c[0]) ** 2 + (b[1] - c[1]) ** 2
res = minimize(Q, (0.5, 0.5))
print("B-Point: ", B(res.x[0]))
print("C-Point: ", C(res.x[1]))
B-Point: (0.7071067518175205, 0.7071068105555733)
C-Point: (9.292893243165555, 9.29289319446135)
This is example for two circles (one circle and arc). This will likely work with splines.
Your assumption of B'(t) = C'(s) is too strong.
Derivatives have direction and magnitude. Directions must coincide in the candidate points, but magnitudes might differ.
To find points with the same derivative slopes and the closest distance you can solve equation system (of course, high power :( )
yb'(t) * xc'(u) - yc'(t) * xb'(u) = 0 //vector product of (anti)collinear vectors is zero
((xb(t) - xc(u))^2 + (xb(t) - xc(u))^2)' = 0 //distance derivative
You can use the function fmin also:
import numpy as np
import matplotlib.pylab as plt
from scipy.optimize import fmin
def BCubic(t, P0, P1, P2, P3):
a=P1-P0
b=P2-P1
c=P3-P2
return a*3*(1-t)**2 + b*6*(1-t)*t + c*3*t**2
def B(t):
return BCubic(t,4,2,3,1)
def C(t):
return BCubic(t,1,4,3,4)
def f(t):
# L1 or manhattan distance
return abs(B(t) - C(t))
init = 0 # 2
tmin = fmin(f,np.array([init]))
#Optimization terminated successfully.
#Current function value: 2.750000
# Iterations: 23
# Function evaluations: 46
print(tmin)
# [0.5833125]
tmin = tmin[0]
t = np.linspace(0, 2, 100)
plt.plot(t, B(t), label='B')
plt.plot(t, C(t), label='C')
plt.plot(t, abs(B(t)-C(t)), label='|B-C|')
plt.plot(tmin, B(tmin), 'r.', markersize=12, label='min')
plt.axvline(x=tmin, linestyle='--', color='k')
plt.legend()
plt.show()
I am looking for a simple approach to finding the interpolated intersection between two Numpy arrays. I know that this is simply achieved if we have two function handles, rather than two arrays, as shown in this link using Scipy or using Sympy. I want to do the same, but given two arrays, specifically between the linear spline that results from connecting array entries by lines.
For example, suppose we have two arrays, y_1 and y_2, both thyought of as being evaluated at xSupport.
import numpy as np
xSupport = np.array([0,1])
y_1 = np.array([0,2])
y_2 = np.array([1,0])
I am looking for the function that returns 1/3, which is the x value at the intersections between these two lines. In my application the support is larger than two, so I am looking for an approach that is independent of the arrays' length.
Along the same lines as ser's answer:
import numpy as np
x = np.array([0,1])
y1 = np.array([0,2])
y2 = np.array([1,0])
def solve(f,x):
s = np.sign(f)
z = np.where(s == 0)[0]
if z:
return z
else:
s = s[0:-1] + s[1:]
z = np.where(s == 0)[0]
return z
def interp(f,x,z):
m = (f[z+1] - f[z]) / (x[z+1] - x[z])
return x[z] - f[z]/m
f = y1-y2
z = solve(f,x)
ans = interp(f,x,z)
print(ans)
The problem can be simplified by assuming that you're finding a zero, and then performing the function on the difference of the two series. First, 'solve' finds where a sign transition occurs (implying a zero occurs somewhere in between) and then 'interp' performs a linear interpolation to find the solution.
Over in Digitizing an analog signal, I created a function called find_transition_times . You could use that function by passing y_1 - y_2 for y and 0 for threshold:
In [5]: xSupport = np.array([0,1])
...: y_1 = np.array([0,2])
...: y_2 = np.array([1,0])
...:
In [6]: find_transition_times(xSupport, y_1 - y_2, 0)
Out[6]: array([ 0.33333333])
I have used the finite element method to approximate the laplace equation and thus have turned it into a matrix system AU = F where A is the stiffness vector and solved for U (not massively important for my question).
I have now got my approximation U, which when i find AU i should get the vector F (or at least similar) where F is:
AU gives the following plot for x = 0 to x = 1 (say, for 20 nodes):
I then need to interpolate U to a longer vector and find AU (for a bigger A too, but not interpolating that). I interpolate U by the following:
U_inter = interp1d(x,U)
U_rich = U_inter(longer_x)
which seems to work okay until i multiply it with the longer A matrix:
It seems each spike is at a node of x (i.e. the nodes of the original U). Does anybody know what could be causing this? The following is my code to find A, U and F.
import numpy as np
import math
import scipy
from scipy.sparse import diags
import scipy.sparse.linalg
from scipy.interpolate import interp1d
import matplotlib
import matplotlib.pyplot as plt
def Poisson_Stiffness(x0):
"""Finds the Poisson equation stiffness matrix with any non uniform mesh x0"""
x0 = np.array(x0)
N = len(x0) - 1 # The amount of elements; x0, x1, ..., xN
h = x0[1:] - x0[:-1]
a = np.zeros(N+1)
a[0] = 1 #BOUNDARY CONDITIONS
a[1:-1] = 1/h[1:] + 1/h[:-1]
a[-1] = 1/h[-1]
a[N] = 1 #BOUNDARY CONDITIONS
b = -1/h
b[0] = 0 #BOUNDARY CONDITIONS
c = -1/h
c[N-1] = 0 #BOUNDARY CONDITIONS: DIRICHLET
data = [a.tolist(), b.tolist(), c.tolist()]
Positions = [0, 1, -1]
Stiffness_Matrix = diags(data, Positions, (N+1,N+1))
return Stiffness_Matrix
def NodalQuadrature(x0):
"""Finds the Nodal Quadrature Approximation of sin(pi x)"""
x0 = np.array(x0)
h = x0[1:] - x0[:-1]
N = len(x0) - 1
approx = np.zeros(len(x0))
approx[0] = 0 #BOUNDARY CONDITIONS
for i in range(1,N):
approx[i] = math.sin(math.pi*x0[i])
approx[i] = (approx[i]*h[i-1] + approx[i]*h[i])/2
approx[N] = 0 #BOUNDARY CONDITIONS
return approx
def Solver(x0):
Stiff_Matrix = Poisson_Stiffness(x0)
NodalApproximation = NodalQuadrature(x0)
NodalApproximation[0] = 0
U = scipy.sparse.linalg.spsolve(Stiff_Matrix, NodalApproximation)
return U
x = np.linspace(0,1,10)
rich_x = np.linspace(0,1,50)
U = Solver(x)
A_rich = Poisson_Stiffness(rich_x)
U_inter = interp1d(x,U)
U_rich = U_inter(rich_x)
AUrich = A_rich.dot(U_rich)
plt.plot(rich_x,AUrich)
plt.show()
comment 1:
I added a Stiffness_Matrix = Stiffness_Matrix.tocsr() statement to avoid an efficiency warning. FE calculations are complex enough that I'll have to print out some intermediate values before I can identify what is going on.
comment 2:
plt.plot(rich_x,A_rich.dot(Solver(rich_x))) plots nice. The noise you get is the result of the difference between the inperpolated U_rich and the true solution: U_rich-Solver(rich_x).
comment 3:
I don't think there's a problem with your code. The problem is with idea that you can test an interpolation this way. I'm rusty on FE theory, but I think you need to use the shape functions to interpolate, not a simple linear one.
comment 4:
Intuitively, with A_rich.dot(U_rich) you are asking, what kind of forcing F would produce U_rich. Compared to Solver(rich_x), U_rich has flat spots, regions where it's value is less than the true solution. What F would produce that? One that is spiky, with NodalQuadrature(x) at the x points, but near zero values in between. That's what your plot is showing.
A higher order interpolation will eliminate the flat spots, and produce a smoother back calculated F. But you really need to revisit the FE theory.
You might find it instructive to look at
plt.plot(x,NodalQuadrature(x))
plt.plot(rich_x, NodalQuadrature(rich_x))
The second plot is much smoother, but only about 1/5 as high.
Better yet look at:
plt.plot(rich_x,AUrich,'-*') # the spikes
plt.plot(x,NodalQuadrature(x),'o') # original forcing
plt.plot(rich_x, NodalQuadrature(rich_x),'+') # new forcing
In the model the forcing isn't continuous, it is a value at each node. With more nodes (rich_x) the magnitude at each node is less.
For the given data, I want to set the outlier values (defined by 95% confidense level or 95% quantile function or anything that is required) as nan values. Following is the my data and code that I am using right now. I would be glad if someone could explain me further.
import numpy as np, matplotlib.pyplot as plt
data = np.random.rand(1000)+5.0
plt.plot(data)
plt.xlabel('observation number')
plt.ylabel('recorded value')
plt.show()
The problem with using percentile is that the points identified as outliers is a function of your sample size.
There are a huge number of ways to test for outliers, and you should give some thought to how you classify them. Ideally, you should use a-priori information (e.g. "anything above/below this value is unrealistic because...")
However, a common, not-too-unreasonable outlier test is to remove points based on their "median absolute deviation".
Here's an implementation for the N-dimensional case (from some code for a paper here: https://github.com/joferkington/oost_paper_code/blob/master/utilities.py):
def is_outlier(points, thresh=3.5):
"""
Returns a boolean array with True if points are outliers and False
otherwise.
Parameters:
-----------
points : An numobservations by numdimensions array of observations
thresh : The modified z-score to use as a threshold. Observations with
a modified z-score (based on the median absolute deviation) greater
than this value will be classified as outliers.
Returns:
--------
mask : A numobservations-length boolean array.
References:
----------
Boris Iglewicz and David Hoaglin (1993), "Volume 16: How to Detect and
Handle Outliers", The ASQC Basic References in Quality Control:
Statistical Techniques, Edward F. Mykytka, Ph.D., Editor.
"""
if len(points.shape) == 1:
points = points[:,None]
median = np.median(points, axis=0)
diff = np.sum((points - median)**2, axis=-1)
diff = np.sqrt(diff)
med_abs_deviation = np.median(diff)
modified_z_score = 0.6745 * diff / med_abs_deviation
return modified_z_score > thresh
This is very similar to one of my previous answers, but I wanted to illustrate the sample size effect in detail.
Let's compare a percentile-based outlier test (similar to #CTZhu's answer) with a median-absolute-deviation (MAD) test for a variety of different sample sizes:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
def main():
for num in [10, 50, 100, 1000]:
# Generate some data
x = np.random.normal(0, 0.5, num-3)
# Add three outliers...
x = np.r_[x, -3, -10, 12]
plot(x)
plt.show()
def mad_based_outlier(points, thresh=3.5):
if len(points.shape) == 1:
points = points[:,None]
median = np.median(points, axis=0)
diff = np.sum((points - median)**2, axis=-1)
diff = np.sqrt(diff)
med_abs_deviation = np.median(diff)
modified_z_score = 0.6745 * diff / med_abs_deviation
return modified_z_score > thresh
def percentile_based_outlier(data, threshold=95):
diff = (100 - threshold) / 2.0
minval, maxval = np.percentile(data, [diff, 100 - diff])
return (data < minval) | (data > maxval)
def plot(x):
fig, axes = plt.subplots(nrows=2)
for ax, func in zip(axes, [percentile_based_outlier, mad_based_outlier]):
sns.distplot(x, ax=ax, rug=True, hist=False)
outliers = x[func(x)]
ax.plot(outliers, np.zeros_like(outliers), 'ro', clip_on=False)
kwargs = dict(y=0.95, x=0.05, ha='left', va='top')
axes[0].set_title('Percentile-based Outliers', **kwargs)
axes[1].set_title('MAD-based Outliers', **kwargs)
fig.suptitle('Comparing Outlier Tests with n={}'.format(len(x)), size=14)
main()
Notice that the MAD-based classifier works correctly regardless of sample-size, while the percentile based classifier classifies more points the larger the sample size is, regardless of whether or not they are actually outliers.
Detection of outliers in one dimensional data depends on its distribution
1- Normal Distribution :
Data values are almost equally distributed over the expected range :
In this case you easily use all the methods that include mean ,like the confidence interval of 3 or 2 standard deviations(95% or 99.7%) accordingly for a normally distributed data (central limit theorem and sampling distribution of sample mean).I is a highly effective method.
Explained in Khan Academy statistics and Probability - sampling distribution library.
One other way is prediction interval if you want confidence interval of data points rather than mean.
Data values are are randomly distributed over a range:
mean may not be a fair representation of the data, because the average is easily influenced by outliers (very small or large values in the data set that are not typical)
The median is another way to measure the center of a numerical data set.
Median Absolute deviation - a method which measures the distance of all points from the median in terms of median distance
http://www.itl.nist.gov/div898/handbook/eda/section3/eda35h.htm - has a good explanation as explained in Joe Kington's answer above
2 - Symmetric Distribution : Again Median Absolute Deviation is a good method if the z-score calculation and threshold is changed accordingly
Explanation :
http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers/
3 - Asymmetric Distribution : Double MAD - Double Median Absolute Deviation
Explanation in the above attached link
Attaching my python code for reference :
def is_outlier_doubleMAD(self,points):
"""
FOR ASSYMMETRIC DISTRIBUTION
Returns : filtered array excluding the outliers
Parameters : the actual data Points array
Calculates median to divide data into 2 halves.(skew conditions handled)
Then those two halves are treated as separate data with calculation same as for symmetric distribution.(first answer)
Only difference being , the thresholds are now the median distance of the right and left median with the actual data median
"""
if len(points.shape) == 1:
points = points[:,None]
median = np.median(points, axis=0)
medianIndex = (points.size/2)
leftData = np.copy(points[0:medianIndex])
rightData = np.copy(points[medianIndex:points.size])
median1 = np.median(leftData, axis=0)
diff1 = np.sum((leftData - median1)**2, axis=-1)
diff1 = np.sqrt(diff1)
median2 = np.median(rightData, axis=0)
diff2 = np.sum((rightData - median2)**2, axis=-1)
diff2 = np.sqrt(diff2)
med_abs_deviation1 = max(np.median(diff1),0.000001)
med_abs_deviation2 = max(np.median(diff2),0.000001)
threshold1 = ((median-median1)/med_abs_deviation1)*3
threshold2 = ((median2-median)/med_abs_deviation2)*3
#if any threshold is 0 -> no outliers
if threshold1==0:
threshold1 = sys.maxint
if threshold2==0:
threshold2 = sys.maxint
#multiplied by a factor so that only the outermost points are removed
modified_z_score1 = 0.6745 * diff1 / med_abs_deviation1
modified_z_score2 = 0.6745 * diff2 / med_abs_deviation2
filtered1 = []
i = 0
for data in modified_z_score1:
if data < threshold1:
filtered1.append(leftData[i])
i += 1
i = 0
filtered2 = []
for data in modified_z_score2:
if data < threshold2:
filtered2.append(rightData[i])
i += 1
filtered = filtered1 + filtered2
return filtered
I've adapted the code from http://eurekastatistics.com/using-the-median-absolute-deviation-to-find-outliers and it gives the same results as Joe Kington's, but uses L1 distance instead of L2 distance, and has support for asymmetric distributions. The original R code did not have Joe's 0.6745 multiplier, so I also added that in for consistency within this thread. Not 100% sure if it's necessary, but makes the comparison apples-to-apples.
def doubleMADsfromMedian(y,thresh=3.5):
# warning: this function does not check for NAs
# nor does it address issues when
# more than 50% of your data have identical values
m = np.median(y)
abs_dev = np.abs(y - m)
left_mad = np.median(abs_dev[y <= m])
right_mad = np.median(abs_dev[y >= m])
y_mad = left_mad * np.ones(len(y))
y_mad[y > m] = right_mad
modified_z_score = 0.6745 * abs_dev / y_mad
modified_z_score[y == m] = 0
return modified_z_score > thresh
Well a simple solution can also be, removing something which outside 2 standard deviations(or 1.96):
import random
def outliers(tmp):
"""tmp is a list of numbers"""
outs = []
mean = sum(tmp)/(1.0*len(tmp))
var = sum((tmp[i] - mean)**2 for i in range(0, len(tmp)))/(1.0*len(tmp))
std = var**0.5
outs = [tmp[i] for i in range(0, len(tmp)) if abs(tmp[i]-mean) > 1.96*std]
return outs
lst = [random.randrange(-10, 55) for _ in range(40)]
print lst
print outliers(lst)
Use np.percentile as #Martin suggested:
percentiles = np.percentile(data, [2.5, 97.5])
# or =>, <= for within 95%
data[(percentiles[0]<data) & (percentiles[1]>data)]
# set the outliners to np.nan
data[(percentiles[0]>data) | (percentiles[1]<data)] = np.nan