Get value at certain confidence percentile - python

I'm trying to obtain from a list of sorted α-values (Ex: 0.01, 0.2, 0.5, 1.1, 1.5, 2.4, 3.1, 4.0, 5.7, 6.3) with a confidence level set at 0.8. Where I want to use the value at this location, after traversing 80% of my array. I want to get alpha score to make prediction intervals
alpha_scores = array([0.01, 0.2, 0.5, 1.1, 1.5, 2.4, 3.1, 4.0, 5.7, 6.3])
confidence_level = 0.80
confidence_percentile = int(np.floor(confidence_level * (alpha_scores.size + 1))) - 1 #Calculate the confidence percentile
alpha_index = min(max(confidence_level , 0), alpha_scores.size - 1)
err_dist = alpha_scores[alpha_index]
Would this be the correct way to obtain this? I get a score but this does not always meet that same value.

Related

Modifying array elements based on an absolute difference value

I have two arrays of the same length as shown below.
import numpy as np
y1 = [12.1, 6.2, 1.4, 0.8, 5.6, 6.8, 8.5]
y2 = [8.2, 5.6, 2.8, 1.4, 2.5, 4.2, 6.4]
y1_a = np.array(y1)
y2_a = np.array(y2)
print(y1_a)
print(y2_a)
for i in range(len(y2_a)):
y3_a[i] = abs(y2_a[i] - y2_a[i])
I am computing the absolute difference at each index/location between the two arrays. I have to replace 'y1_a' with 'y2_a' whenever the absolute difference exceeds 2.0 at a given index/location and write it to a new array variable 'y3_a'. The starter code is added.
First of all, let numpy do the lifting for you. You can calculate your absolute differences without a manual for loop:
abs_diff = np.abs(y2_a - y1_a) # I assume your original code has a typo
Now you can get all the values where the absolute difference is more than 2.0:
y3_a = y1_a
y3_a[abs_diff > 2.0] = y2_a[abs_diff > 2.0]

KeyError making pandas dataframe

I am trying to make find the equation of a function using pandas dataframe. This has worked in the past on other projects, however, now nothing seems to work.
I am aware that there might be easier ways to solve this, but i need this to work somehow.
additional_cols = ['xVerdier','fDer']
fdata = pd.DataFrame({"idx":findex,"x":xVerdier[:-1],"y":fDer})
print(fdata)
fdata = fdata.reindex(fdata.columns.tolist() + additional_cols, axis = 1)
fdata=fdata [[xVerdier[:-1],fDer]]
fdata = mpd.DataFrame(fdata)
train=fdata[:(int((len(fdata))))]
test=fdata[(int((len(fdata)))):]
regr=linear_model.LinearRegression()
train_x=np.array(train[[xVerdier]])
train_y=np.array(train[[fDer]])
regr.fit(train_x,train_y)
xVerdier is a list of x-values of a graph
[0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999, 1.0999999999999999, 1.2, 1.3, 1.4000000000000001, 1.5000000000000002, 1.6000000000000003, 1.7000000000000004, 1.8000000000000005, 1.9000000000000006, 2.0000000000000004, 2.1000000000000005, 2.2000000000000006, 2.3000000000000007, 2.400000000000001, 2.500000000000001, 2.600000000000001, 2.700000000000001, 2.800000000000001, 2.9000000000000012, 3.0000000000000013, 3.1000000000000014, 3.2000000000000015, 3.3000000000000016, 3.4000000000000017, 3.5000000000000018, 3.600000000000002, 3.700000000000002, 3.800000000000002, 3.900000000000002, 4.000000000000002, 4.100000000000001, 4.200000000000001, 4.300000000000001, 4.4, 4.5, 4.6, 4.699999999999999, 4.799999999999999, 4.899999999999999, 4.999999999999998]
fDer is a list of y-values of said graph
[1.2, 1.6000000000000003, 2.0000000000000004, 2.4, 2.799999999999999, 3.1999999999999984, 3.5999999999999988, 3.999999999999999, 4.3999999999999995, 4.8, 5.2, 5.600000000000005, 6.000000000000005, 6.400000000000006, 6.800000000000006, 7.200000000000006, 7.600000000000007, 7.999999999999998, 8.400000000000016, 8.79999999999999, 9.200000000000017, 9.600000000000009, 10.000000000000018, 10.40000000000001, 10.800000000000018, 11.20000000000001, 11.600000000000001, 12.000000000000028, 12.40000000000002, 12.799999999999976, 13.200000000000038, 13.60000000000003, 14.000000000000021, 14.400000000000013, 14.80000000000004, 15.199999999999996, 15.600000000000023, 16.00000000000005, 16.400000000000006, 16.799999999999926, 17.19999999999999, 17.59999999999991, 17.99999999999997, 18.399999999999892, 18.799999999999955, 19.199999999999946, 19.599999999999866, 19.99999999999993, 20.39999999999999, 20.799999999999912]
This is the error message
KeyError: "None of [Index([(0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999, 1.0999999999999999, 1.2, 1.3, 1.4000000000000001, 1.5000000000000002, 1.6000000000000003, 1.7000000000000004, 1.8000000000000005, 1.9000000000000006, 2.0000000000000004, 2.1000000000000005, 2.2000000000000006, 2.3000000000000007, 2.400000000000001, 2.500000000000001, 2.600000000000001, 2.700000000000001, 2.800000000000001, 2.9000000000000012, 3.0000000000000013, 3.1000000000000014, 3.2000000000000015, 3.3000000000000016, 3.4000000000000017, 3.5000000000000018, 3.600000000000002, 3.700000000000002, 3.800000000000002, 3.900000000000002, 4.000000000000002, 4.100000000000001, 4.200000000000001, 4.300000000000001, 4.4, 4.5, 4.6, 4.699999999999999, 4.799999999999999, 4.899999999999999), (1.2, 1.6000000000000003, 2.0000000000000004, 2.4, 2.799999999999999, 3.1999999999999984, 3.5999999999999988, 3.999999999999999, 4.3999999999999995, 4.8, 5.2, 5.600000000000005, 6.000000000000005, 6.400000000000006, 6.800000000000006, 7.200000000000006, 7.600000000000007, 7.999999999999998, 8.400000000000016, 8.79999999999999, 9.200000000000017, 9.600000000000009, 10.000000000000018, 10.40000000000001, 10.800000000000018, 11.20000000000001, 11.600000000000001, 12.000000000000028, 12.40000000000002, 12.799999999999976, 13.200000000000038, 13.60000000000003, 14.000000000000021, 14.400000000000013, 14.80000000000004, 15.199999999999996, 15.600000000000023, 16.00000000000005, 16.400000000000006, 16.799999999999926, 17.19999999999999, 17.59999999999991, 17.99999999999997, 18.399999999999892, 18.799999999999955, 19.199999999999946, 19.599999999999866, 19.99999999999993, 20.39999999999999, 20.799999999999912)], dtype='object')] are in the [columns]"

python matplotlib: How can I add a point mark to curve knowing only the x value?

For example, in matplotlib, I plot a simple curve based on few points:
from matplotlib import pyplot as plt
import numpy as np
x=[0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9]
y=[0.0, 0.19, 0.36, 0.51, 0.64, 0.75, 0.8400000000000001, 0.91, 0.96, 0.99, 1.0,
0.99, 0.96, 0.9099999999999999, 0.8399999999999999, 0.75, 0.6399999999999997,
0.5099999999999998, 0.3599999999999999, 0.18999999999999995, 0.0,
-0.20999999999999996, -0.4400000000000004, -0.6900000000000004,
-0.9600000000000009, -1.25, -1.5600000000000005, -1.8900000000000006,
-2.240000000000001, -2.610000000000001]
plt.plot(x,y)
plt.show()
Hypothetically, say I want to highlight the point on the curve where the x value is 0.25, but I don't know the y value for this point. What should I do?
The easiest solution is to perform a linear interpolation between neighboring points for the provided x value. Here is a sample code to show the general principle:
X=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9]
Y=[0.0, 0.19, 0.36, 0.51, 0.64, 0.75, 0.8400000000000001, 0.91, 0.96,
0.99, 1.0, 0.99, 0.96, 0.9099999999999999, 0.8399999999999999, 0.75,
0.6399999999999997, 0.5099999999999998, 0.3599999999999999,
0.18999999999999995, 0.0, -0.20999999999999996, -0.4400000000000004,
-0.6900000000000004, -0.9600000000000009, -1.25, -1.5600000000000005,
-1.8900000000000006, -2.240000000000001, -2.610000000000001]
def interpolate(X, Y, xval):
for n, x in enumerate(X):
if x > xval: break
else: return None # xval > last x value
if n == 0: return None # xval < first x value
xa, xb = X[n-1], X[n] # get surrounding x values
ya, yb = Y[n-1], Y[n] # get surrounding y values
if xb == xa: return ya #
return ya + (xval - xa) * (yb - ya) / (xb - xa) # compute yval by interpolation
print(interpolate(X, Y, 0.25)) # --> 0.435
print(interpolate(X, Y, 0.85)) # --> 0.975
print(interpolate(X, Y, 2.15)) # --> -0.3259999999999997
print(interpolate(X, Y, -1.0)) # --> None (out of bounds)
print(interpolate(X, Y, 3.33)) # --> None (out of bounds)
Note: When the provided xval is not within the range of x values, the function returns None
You could manually do linearly interpolation like this:
def get_y_val(p):
lower_i = max(i for (i, v) in enumerate(x) if v<= p)
upper_i = min(i for (i, v) in enumerate(x) if v>= p)
d = x[upper_i] - x[lower_i]
if d == 0:
return y[lower_i]
y_pt = y[lower_i] * (x[upper_i] - p) / d+ y[upper_i] * (p -
x[lower_i]) / d
return y_pt

SciPy warning message: "Ill-conditioned matrix detected"

I am running some code that I originally developed with SciPy 0.18. Now using SciPy 0.19 I often get warning messages like this:
/usr/lib/python3/dist-packages/scipy/linalg/basic.py:223:
RuntimeWarning: scipy.linalg.solve Ill-conditioned matrix detected.
Result is not guaranteed to be accurate. Reciprocal condition number:
1.8700410190617105e-17 ' condition number: {}'.format(rcond), RuntimeWarning)
Here is a small snippet that generates the message above:
from scipy import interpolate
xx = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
yy = [2.5, 1.5, 0.5, 2.5, 1.5, 0.5, 2.5, 1.5, 0.5]
vals = [30.0, 20.0, 10.0, 31.0, 21.0, 11.0, 32.0, 22.0, 12.0]
f = interpolate.Rbf(xx, yy, vals, epsilon=100)
In spite of the warning the results are correct. What is causing this warning? Can it be suppressed somehow?
When inspecting the matrix with
numpy.linalg.cond(f.A)
6.213533820748747e+16
you'll find that its condition number is in the range of machine precision, meaning that your solution contains no significant digits.
Try, e.g.,
b = numpy.random.rand(f.A.shape[0])
x = numpy.linalg.solve(f.A, b)
print(numpy.dot(f.A, x) - b)
[-0.22342786 -0.06718507 -0.13027724 -0.09972579 -0.16589076 -0.06328093
0.05480577 -0.12606864 0.02067541]
If x was indeed a solution, all those numbers would be close to 0. Take it easy on the epsilon to get something meaningful.

Can I use np.arange with lists as my inputs?

The relevant excerpt of my code is as follows:
import numpy as np
def create_function(duration, start, stop):
rates = np.linspace(start, stop, duration*1000)
return rates
def generate_spikes(duration, start, stop):
rates = [create_function(duration, start, stop)]
array = [np.arange(0, (duration*1000), 1)]
start_value = [np.repeat(start, duration*1000)]
double_array = [np.add(array,array)]
times = np.arange(np.add(start_value,array), np.add(start_value,double_array), rates)
return times/1000.
I know this is really inefficient coding (especially the start_value and double_array stuff), but it's all a product of trying to somehow use arange with lists as my inputs.
I keep getting this error:
Type Error: int() argument must be a string, a bytes-like element, or a number, not 'list'
Essentially, an example of what I'm trying to do is this:
I had two arrays a = [1, 2, 3, 4] and b = [0.1, 0.2, 0.3, 0.4], I'd want to use np.arange to generate [1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8, 5.2]? (I'd be using a different step size for every element in the array.)
Is this even possible? And if so, would I have to flatten my list?
You can use broadcasting there for efficiency purposes -
(a + (b[:,None] * a)).ravel('F')
Sample run -
In [52]: a
Out[52]: array([1, 2, 3, 4])
In [53]: b
Out[53]: array([ 0.1, 0.2, 0.3, 0.4])
In [54]: (a + (b[:,None] * a)).ravel('F')
Out[54]:
array([ 1.1, 1.2, 1.3, 1.4, 2.2, 2.4, 2.6, 2.8, 3.3, 3.6, 3.9,
4.2, 4.4, 4.8, 5.2, 5.6])
Looking at the expected output, it seems you are using just the first three elements off b for the computation. So, to achieve that target, we just slice the first three elements and do that computation, like so -
In [55]: (a + (b[:3,None] * a)).ravel('F')
Out[55]:
array([ 1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8,
5.2])

Categories