Get value at certain confidence percentile - python

I'm trying to obtain from a list of sorted α-values (Ex: 0.01, 0.2, 0.5, 1.1, 1.5, 2.4, 3.1, 4.0, 5.7, 6.3) with a confidence level set at 0.8. Where I want to use the value at this location, after traversing 80% of my array. I want to get alpha score to make prediction intervals
alpha_scores = array([0.01, 0.2, 0.5, 1.1, 1.5, 2.4, 3.1, 4.0, 5.7, 6.3])
confidence_level = 0.80
confidence_percentile = int(np.floor(confidence_level * (alpha_scores.size + 1))) - 1 #Calculate the confidence percentile
alpha_index = min(max(confidence_level , 0), alpha_scores.size - 1)
err_dist = alpha_scores[alpha_index]
Would this be the correct way to obtain this? I get a score but this does not always meet that same value.


Modifying array elements based on an absolute difference value

I have two arrays of the same length as shown below.
import numpy as np
y1 = [12.1, 6.2, 1.4, 0.8, 5.6, 6.8, 8.5]
y2 = [8.2, 5.6, 2.8, 1.4, 2.5, 4.2, 6.4]
y1_a = np.array(y1)
y2_a = np.array(y2)
for i in range(len(y2_a)):
y3_a[i] = abs(y2_a[i] - y2_a[i])
I am computing the absolute difference at each index/location between the two arrays. I have to replace 'y1_a' with 'y2_a' whenever the absolute difference exceeds 2.0 at a given index/location and write it to a new array variable 'y3_a'. The starter code is added.
First of all, let numpy do the lifting for you. You can calculate your absolute differences without a manual for loop:
abs_diff = np.abs(y2_a - y1_a) # I assume your original code has a typo
Now you can get all the values where the absolute difference is more than 2.0:
y3_a = y1_a
y3_a[abs_diff > 2.0] = y2_a[abs_diff > 2.0]

KeyError making pandas dataframe

I am trying to make find the equation of a function using pandas dataframe. This has worked in the past on other projects, however, now nothing seems to work.
I am aware that there might be easier ways to solve this, but i need this to work somehow.
additional_cols = ['xVerdier','fDer']
fdata = pd.DataFrame({"idx":findex,"x":xVerdier[:-1],"y":fDer})
fdata = fdata.reindex(fdata.columns.tolist() + additional_cols, axis = 1)
fdata=fdata [[xVerdier[:-1],fDer]]
fdata = mpd.DataFrame(fdata)
xVerdier is a list of x-values of a graph
[0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999, 1.0999999999999999, 1.2, 1.3, 1.4000000000000001, 1.5000000000000002, 1.6000000000000003, 1.7000000000000004, 1.8000000000000005, 1.9000000000000006, 2.0000000000000004, 2.1000000000000005, 2.2000000000000006, 2.3000000000000007, 2.400000000000001, 2.500000000000001, 2.600000000000001, 2.700000000000001, 2.800000000000001, 2.9000000000000012, 3.0000000000000013, 3.1000000000000014, 3.2000000000000015, 3.3000000000000016, 3.4000000000000017, 3.5000000000000018, 3.600000000000002, 3.700000000000002, 3.800000000000002, 3.900000000000002, 4.000000000000002, 4.100000000000001, 4.200000000000001, 4.300000000000001, 4.4, 4.5, 4.6, 4.699999999999999, 4.799999999999999, 4.899999999999999, 4.999999999999998]
fDer is a list of y-values of said graph
[1.2, 1.6000000000000003, 2.0000000000000004, 2.4, 2.799999999999999, 3.1999999999999984, 3.5999999999999988, 3.999999999999999, 4.3999999999999995, 4.8, 5.2, 5.600000000000005, 6.000000000000005, 6.400000000000006, 6.800000000000006, 7.200000000000006, 7.600000000000007, 7.999999999999998, 8.400000000000016, 8.79999999999999, 9.200000000000017, 9.600000000000009, 10.000000000000018, 10.40000000000001, 10.800000000000018, 11.20000000000001, 11.600000000000001, 12.000000000000028, 12.40000000000002, 12.799999999999976, 13.200000000000038, 13.60000000000003, 14.000000000000021, 14.400000000000013, 14.80000000000004, 15.199999999999996, 15.600000000000023, 16.00000000000005, 16.400000000000006, 16.799999999999926, 17.19999999999999, 17.59999999999991, 17.99999999999997, 18.399999999999892, 18.799999999999955, 19.199999999999946, 19.599999999999866, 19.99999999999993, 20.39999999999999, 20.799999999999912]
This is the error message
KeyError: "None of [Index([(0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999, 1.0999999999999999, 1.2, 1.3, 1.4000000000000001, 1.5000000000000002, 1.6000000000000003, 1.7000000000000004, 1.8000000000000005, 1.9000000000000006, 2.0000000000000004, 2.1000000000000005, 2.2000000000000006, 2.3000000000000007, 2.400000000000001, 2.500000000000001, 2.600000000000001, 2.700000000000001, 2.800000000000001, 2.9000000000000012, 3.0000000000000013, 3.1000000000000014, 3.2000000000000015, 3.3000000000000016, 3.4000000000000017, 3.5000000000000018, 3.600000000000002, 3.700000000000002, 3.800000000000002, 3.900000000000002, 4.000000000000002, 4.100000000000001, 4.200000000000001, 4.300000000000001, 4.4, 4.5, 4.6, 4.699999999999999, 4.799999999999999, 4.899999999999999), (1.2, 1.6000000000000003, 2.0000000000000004, 2.4, 2.799999999999999, 3.1999999999999984, 3.5999999999999988, 3.999999999999999, 4.3999999999999995, 4.8, 5.2, 5.600000000000005, 6.000000000000005, 6.400000000000006, 6.800000000000006, 7.200000000000006, 7.600000000000007, 7.999999999999998, 8.400000000000016, 8.79999999999999, 9.200000000000017, 9.600000000000009, 10.000000000000018, 10.40000000000001, 10.800000000000018, 11.20000000000001, 11.600000000000001, 12.000000000000028, 12.40000000000002, 12.799999999999976, 13.200000000000038, 13.60000000000003, 14.000000000000021, 14.400000000000013, 14.80000000000004, 15.199999999999996, 15.600000000000023, 16.00000000000005, 16.400000000000006, 16.799999999999926, 17.19999999999999, 17.59999999999991, 17.99999999999997, 18.399999999999892, 18.799999999999955, 19.199999999999946, 19.599999999999866, 19.99999999999993, 20.39999999999999, 20.799999999999912)], dtype='object')] are in the [columns]"

python matplotlib: How can I add a point mark to curve knowing only the x value?

For example, in matplotlib, I plot a simple curve based on few points:
from matplotlib import pyplot as plt
import numpy as np
x=[0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9]
y=[0.0, 0.19, 0.36, 0.51, 0.64, 0.75, 0.8400000000000001, 0.91, 0.96, 0.99, 1.0,
0.99, 0.96, 0.9099999999999999, 0.8399999999999999, 0.75, 0.6399999999999997,
0.5099999999999998, 0.3599999999999999, 0.18999999999999995, 0.0,
-0.20999999999999996, -0.4400000000000004, -0.6900000000000004,
-0.9600000000000009, -1.25, -1.5600000000000005, -1.8900000000000006,
-2.240000000000001, -2.610000000000001]
Hypothetically, say I want to highlight the point on the curve where the x value is 0.25, but I don't know the y value for this point. What should I do?
The easiest solution is to perform a linear interpolation between neighboring points for the provided x value. Here is a sample code to show the general principle:
X=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9]
Y=[0.0, 0.19, 0.36, 0.51, 0.64, 0.75, 0.8400000000000001, 0.91, 0.96,
0.99, 1.0, 0.99, 0.96, 0.9099999999999999, 0.8399999999999999, 0.75,
0.6399999999999997, 0.5099999999999998, 0.3599999999999999,
0.18999999999999995, 0.0, -0.20999999999999996, -0.4400000000000004,
-0.6900000000000004, -0.9600000000000009, -1.25, -1.5600000000000005,
-1.8900000000000006, -2.240000000000001, -2.610000000000001]
def interpolate(X, Y, xval):
for n, x in enumerate(X):
if x > xval: break
else: return None # xval > last x value
if n == 0: return None # xval < first x value
xa, xb = X[n-1], X[n] # get surrounding x values
ya, yb = Y[n-1], Y[n] # get surrounding y values
if xb == xa: return ya #
return ya + (xval - xa) * (yb - ya) / (xb - xa) # compute yval by interpolation
print(interpolate(X, Y, 0.25)) # --> 0.435
print(interpolate(X, Y, 0.85)) # --> 0.975
print(interpolate(X, Y, 2.15)) # --> -0.3259999999999997
print(interpolate(X, Y, -1.0)) # --> None (out of bounds)
print(interpolate(X, Y, 3.33)) # --> None (out of bounds)
Note: When the provided xval is not within the range of x values, the function returns None
You could manually do linearly interpolation like this:
def get_y_val(p):
lower_i = max(i for (i, v) in enumerate(x) if v<= p)
upper_i = min(i for (i, v) in enumerate(x) if v>= p)
d = x[upper_i] - x[lower_i]
if d == 0:
return y[lower_i]
y_pt = y[lower_i] * (x[upper_i] - p) / d+ y[upper_i] * (p -
x[lower_i]) / d
return y_pt

SciPy warning message: "Ill-conditioned matrix detected"

I am running some code that I originally developed with SciPy 0.18. Now using SciPy 0.19 I often get warning messages like this:
RuntimeWarning: scipy.linalg.solve Ill-conditioned matrix detected.
Result is not guaranteed to be accurate. Reciprocal condition number:
1.8700410190617105e-17 ' condition number: {}'.format(rcond), RuntimeWarning)
Here is a small snippet that generates the message above:
from scipy import interpolate
xx = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
yy = [2.5, 1.5, 0.5, 2.5, 1.5, 0.5, 2.5, 1.5, 0.5]
vals = [30.0, 20.0, 10.0, 31.0, 21.0, 11.0, 32.0, 22.0, 12.0]
f = interpolate.Rbf(xx, yy, vals, epsilon=100)
In spite of the warning the results are correct. What is causing this warning? Can it be suppressed somehow?
When inspecting the matrix with
you'll find that its condition number is in the range of machine precision, meaning that your solution contains no significant digits.
Try, e.g.,
b = numpy.random.rand(f.A.shape[0])
x = numpy.linalg.solve(f.A, b)
print(, x) - b)
[-0.22342786 -0.06718507 -0.13027724 -0.09972579 -0.16589076 -0.06328093
0.05480577 -0.12606864 0.02067541]
If x was indeed a solution, all those numbers would be close to 0. Take it easy on the epsilon to get something meaningful.

Can I use np.arange with lists as my inputs?

The relevant excerpt of my code is as follows:
import numpy as np
def create_function(duration, start, stop):
rates = np.linspace(start, stop, duration*1000)
return rates
def generate_spikes(duration, start, stop):
rates = [create_function(duration, start, stop)]
array = [np.arange(0, (duration*1000), 1)]
start_value = [np.repeat(start, duration*1000)]
double_array = [np.add(array,array)]
times = np.arange(np.add(start_value,array), np.add(start_value,double_array), rates)
return times/1000.
I know this is really inefficient coding (especially the start_value and double_array stuff), but it's all a product of trying to somehow use arange with lists as my inputs.
I keep getting this error:
Type Error: int() argument must be a string, a bytes-like element, or a number, not 'list'
Essentially, an example of what I'm trying to do is this:
I had two arrays a = [1, 2, 3, 4] and b = [0.1, 0.2, 0.3, 0.4], I'd want to use np.arange to generate [1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8, 5.2]? (I'd be using a different step size for every element in the array.)
Is this even possible? And if so, would I have to flatten my list?
You can use broadcasting there for efficiency purposes -
(a + (b[:,None] * a)).ravel('F')
Sample run -
In [52]: a
Out[52]: array([1, 2, 3, 4])
In [53]: b
Out[53]: array([ 0.1, 0.2, 0.3, 0.4])
In [54]: (a + (b[:,None] * a)).ravel('F')
array([ 1.1, 1.2, 1.3, 1.4, 2.2, 2.4, 2.6, 2.8, 3.3, 3.6, 3.9,
4.2, 4.4, 4.8, 5.2, 5.6])
Looking at the expected output, it seems you are using just the first three elements off b for the computation. So, to achieve that target, we just slice the first three elements and do that computation, like so -
In [55]: (a + (b[:3,None] * a)).ravel('F')
array([ 1.1, 1.2, 1.3, 2.2, 2.4, 2.6, 3.3, 3.6, 3.9, 4.4, 4.8,
