KeyError making pandas dataframe - python

I am trying to make find the equation of a function using pandas dataframe. This has worked in the past on other projects, however, now nothing seems to work.
I am aware that there might be easier ways to solve this, but i need this to work somehow.
additional_cols = ['xVerdier','fDer']
fdata = pd.DataFrame({"idx":findex,"x":xVerdier[:-1],"y":fDer})
print(fdata)
fdata = fdata.reindex(fdata.columns.tolist() + additional_cols, axis = 1)
fdata=fdata [[xVerdier[:-1],fDer]]
fdata = mpd.DataFrame(fdata)
train=fdata[:(int((len(fdata))))]
test=fdata[(int((len(fdata)))):]
regr=linear_model.LinearRegression()
train_x=np.array(train[[xVerdier]])
train_y=np.array(train[[fDer]])
regr.fit(train_x,train_y)
xVerdier is a list of x-values of a graph
[0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999, 1.0999999999999999, 1.2, 1.3, 1.4000000000000001, 1.5000000000000002, 1.6000000000000003, 1.7000000000000004, 1.8000000000000005, 1.9000000000000006, 2.0000000000000004, 2.1000000000000005, 2.2000000000000006, 2.3000000000000007, 2.400000000000001, 2.500000000000001, 2.600000000000001, 2.700000000000001, 2.800000000000001, 2.9000000000000012, 3.0000000000000013, 3.1000000000000014, 3.2000000000000015, 3.3000000000000016, 3.4000000000000017, 3.5000000000000018, 3.600000000000002, 3.700000000000002, 3.800000000000002, 3.900000000000002, 4.000000000000002, 4.100000000000001, 4.200000000000001, 4.300000000000001, 4.4, 4.5, 4.6, 4.699999999999999, 4.799999999999999, 4.899999999999999, 4.999999999999998]
fDer is a list of y-values of said graph
[1.2, 1.6000000000000003, 2.0000000000000004, 2.4, 2.799999999999999, 3.1999999999999984, 3.5999999999999988, 3.999999999999999, 4.3999999999999995, 4.8, 5.2, 5.600000000000005, 6.000000000000005, 6.400000000000006, 6.800000000000006, 7.200000000000006, 7.600000000000007, 7.999999999999998, 8.400000000000016, 8.79999999999999, 9.200000000000017, 9.600000000000009, 10.000000000000018, 10.40000000000001, 10.800000000000018, 11.20000000000001, 11.600000000000001, 12.000000000000028, 12.40000000000002, 12.799999999999976, 13.200000000000038, 13.60000000000003, 14.000000000000021, 14.400000000000013, 14.80000000000004, 15.199999999999996, 15.600000000000023, 16.00000000000005, 16.400000000000006, 16.799999999999926, 17.19999999999999, 17.59999999999991, 17.99999999999997, 18.399999999999892, 18.799999999999955, 19.199999999999946, 19.599999999999866, 19.99999999999993, 20.39999999999999, 20.799999999999912]
This is the error message
KeyError: "None of [Index([(0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999, 1.0999999999999999, 1.2, 1.3, 1.4000000000000001, 1.5000000000000002, 1.6000000000000003, 1.7000000000000004, 1.8000000000000005, 1.9000000000000006, 2.0000000000000004, 2.1000000000000005, 2.2000000000000006, 2.3000000000000007, 2.400000000000001, 2.500000000000001, 2.600000000000001, 2.700000000000001, 2.800000000000001, 2.9000000000000012, 3.0000000000000013, 3.1000000000000014, 3.2000000000000015, 3.3000000000000016, 3.4000000000000017, 3.5000000000000018, 3.600000000000002, 3.700000000000002, 3.800000000000002, 3.900000000000002, 4.000000000000002, 4.100000000000001, 4.200000000000001, 4.300000000000001, 4.4, 4.5, 4.6, 4.699999999999999, 4.799999999999999, 4.899999999999999), (1.2, 1.6000000000000003, 2.0000000000000004, 2.4, 2.799999999999999, 3.1999999999999984, 3.5999999999999988, 3.999999999999999, 4.3999999999999995, 4.8, 5.2, 5.600000000000005, 6.000000000000005, 6.400000000000006, 6.800000000000006, 7.200000000000006, 7.600000000000007, 7.999999999999998, 8.400000000000016, 8.79999999999999, 9.200000000000017, 9.600000000000009, 10.000000000000018, 10.40000000000001, 10.800000000000018, 11.20000000000001, 11.600000000000001, 12.000000000000028, 12.40000000000002, 12.799999999999976, 13.200000000000038, 13.60000000000003, 14.000000000000021, 14.400000000000013, 14.80000000000004, 15.199999999999996, 15.600000000000023, 16.00000000000005, 16.400000000000006, 16.799999999999926, 17.19999999999999, 17.59999999999991, 17.99999999999997, 18.399999999999892, 18.799999999999955, 19.199999999999946, 19.599999999999866, 19.99999999999993, 20.39999999999999, 20.799999999999912)], dtype='object')] are in the [columns]"

Related

Get value at certain confidence percentile

I'm trying to obtain from a list of sorted α-values (Ex: 0.01, 0.2, 0.5, 1.1, 1.5, 2.4, 3.1, 4.0, 5.7, 6.3) with a confidence level set at 0.8. Where I want to use the value at this location, after traversing 80% of my array. I want to get alpha score to make prediction intervals
alpha_scores = array([0.01, 0.2, 0.5, 1.1, 1.5, 2.4, 3.1, 4.0, 5.7, 6.3])
confidence_level = 0.80
confidence_percentile = int(np.floor(confidence_level * (alpha_scores.size + 1))) - 1 #Calculate the confidence percentile
alpha_index = min(max(confidence_level , 0), alpha_scores.size - 1)
err_dist = alpha_scores[alpha_index]
Would this be the correct way to obtain this? I get a score but this does not always meet that same value.

How to L2 Normalize a list of lists in Python using Sklearn

s2 = [[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]]
from sklearn.preprocessing import normalize
X = normalize(s2)
this is throwing error:
ValueError: setting an array element with a sequence.
How to L2 Normalize a list of lists in Python using Sklearn.
Since I don't have enough reputation to comment; hence posting it as an answer.
Let's quickly look at your datapoint.
I have converted the given datapoint into NumPy array. Since it doesn't have the same length, so it will look like.
>>> n2 = np.array([[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]])
>>> n2
array([list([0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]),
list([0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831]),
list([0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925]),
list([0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194])],
dtype=object)
And you can see here that converted values are not in Sequence of Values and to achieve this you need to keep the same length for the internal list ( looks like 0.16666666666666666 is copied multiple time in your array; if not then fix the length), it will look like
>>> n3 = np.array([[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.16666666666666666, 0.319381788645692], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]])
>>> n3
array([[0.2 , 0.2 , 0.2 , 0.30216512, 0.24462871],
[0.2 , 0.48925742, 0.2 , 0.2 , 0.38325815],
[0.31938179, 0.16666667, 0.16666667, 0.16666667, 0.31938179],
[0.2 , 0.2 , 0.2 , 0.30216512, 0.24462871]])
As you can see now n3 has become a sequence of values.
and if you use normalize function, it simply works
>>> X = normalize(n3)
>>> X
array([[0.38408524, 0.38408524, 0.38408524, 0.58028582, 0.46979139],
[0.28108867, 0.6876236 , 0.28108867, 0.28108867, 0.53864762],
[0.59581303, 0.31091996, 0.31091996, 0.31091996, 0.59581303],
[0.38408524, 0.38408524, 0.38408524, 0.58028582, 0.46979139]])
How to use NumPy array to avoid this issue, please have a look at this SO link ValueError: setting an array element with a sequence
Important: I removed one element from the 3rd list in order for all lists to have the same length.
I did that cause I really believe that it's a copy-paste error. If not, comment below and I will modify my answer.
import numpy as np
s2 = [[0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194], [0.2, 0.4892574205256839, 0.2, 0.2, 0.383258146374831], [0.3193817886456925, 0.16666666666666666, 0.16666666666666666, 0.3193817886456925, 0.3193817886456925], [0.2, 0.2, 0.2, 0.3021651247531982, 0.24462871026284194]]
X = normalize(np.array(s2))

python matplotlib: How can I add a point mark to curve knowing only the x value?

For example, in matplotlib, I plot a simple curve based on few points:
from matplotlib import pyplot as plt
import numpy as np
x=[0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9]
y=[0.0, 0.19, 0.36, 0.51, 0.64, 0.75, 0.8400000000000001, 0.91, 0.96, 0.99, 1.0,
0.99, 0.96, 0.9099999999999999, 0.8399999999999999, 0.75, 0.6399999999999997,
0.5099999999999998, 0.3599999999999999, 0.18999999999999995, 0.0,
-0.20999999999999996, -0.4400000000000004, -0.6900000000000004,
-0.9600000000000009, -1.25, -1.5600000000000005, -1.8900000000000006,
-2.240000000000001, -2.610000000000001]
plt.plot(x,y)
plt.show()
Hypothetically, say I want to highlight the point on the curve where the x value is 0.25, but I don't know the y value for this point. What should I do?
The easiest solution is to perform a linear interpolation between neighboring points for the provided x value. Here is a sample code to show the general principle:
X=[0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1, 1.2,
1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5,
2.6, 2.7, 2.8, 2.9]
Y=[0.0, 0.19, 0.36, 0.51, 0.64, 0.75, 0.8400000000000001, 0.91, 0.96,
0.99, 1.0, 0.99, 0.96, 0.9099999999999999, 0.8399999999999999, 0.75,
0.6399999999999997, 0.5099999999999998, 0.3599999999999999,
0.18999999999999995, 0.0, -0.20999999999999996, -0.4400000000000004,
-0.6900000000000004, -0.9600000000000009, -1.25, -1.5600000000000005,
-1.8900000000000006, -2.240000000000001, -2.610000000000001]
def interpolate(X, Y, xval):
for n, x in enumerate(X):
if x > xval: break
else: return None # xval > last x value
if n == 0: return None # xval < first x value
xa, xb = X[n-1], X[n] # get surrounding x values
ya, yb = Y[n-1], Y[n] # get surrounding y values
if xb == xa: return ya #
return ya + (xval - xa) * (yb - ya) / (xb - xa) # compute yval by interpolation
print(interpolate(X, Y, 0.25)) # --> 0.435
print(interpolate(X, Y, 0.85)) # --> 0.975
print(interpolate(X, Y, 2.15)) # --> -0.3259999999999997
print(interpolate(X, Y, -1.0)) # --> None (out of bounds)
print(interpolate(X, Y, 3.33)) # --> None (out of bounds)
Note: When the provided xval is not within the range of x values, the function returns None
You could manually do linearly interpolation like this:
def get_y_val(p):
lower_i = max(i for (i, v) in enumerate(x) if v<= p)
upper_i = min(i for (i, v) in enumerate(x) if v>= p)
d = x[upper_i] - x[lower_i]
if d == 0:
return y[lower_i]
y_pt = y[lower_i] * (x[upper_i] - p) / d+ y[upper_i] * (p -
x[lower_i]) / d
return y_pt

Regex python: match multi-line float values between brackets

Match grouped multi-line float values between brackets
In the below example data, I want to extract all the float values between
brackets belonging only to "group1" using regex, but not the values
from other groups ("group2", "group3" etc.). A requirement is that it is done via regex in python. Is this
possible with regex at all?
Regex patterns attempts:
I tried the following patterns, but they capture either everything or nothing:
Matches every float value in all groups: ([+-]*\d+\.\d+),
Matches no value in any groups: group1 = \[ ([+-]*\d+\.\d+), \]
What should I do to make this work? Any suggestions would be very welcome!
Example data:
group1 = [
1.0,
-2.0,
3.5,
-0.3,
1.7,
4.2,
]
group2 = [
2.0,
1.5,
1.8,
-1.8,
0.7,
-0.3,
]
group1 = [
0.0,
-0.5,
1.3,
0.8,
-0.4,
0.1,
]
Here's a regex I created r'group1 = \[\n([ *-?\d\.\d,\n]+)\]':
import re
s = '''group1 = [
1.0,
-2.0,
3.5,
-0.3,
1.7,
4.2,
]
group2 = [
2.0,
1.5,
1.8,
-1.8,
0.7,
-0.3,
]
group1 = [
0.0,
-0.5,
1.3,
0.8,
-0.4,
0.1,
]'''
groups = re.findall(r'group1 = \[\n([ *-?\d\.\d,\n]+)\]', s)
groups = [float(f) for l in map(lambda p: p.split(','), groups) for f in l if f.strip()]
print(groups)
Output:
[1.0, -2.0, 3.5, -0.3, 1.7, 4.2, 0.0, -0.5, 1.3, 0.8, -0.4, 0.1]
Try this:
\bgroup2 = \[([\s+\d+.\d+[,-\]]+)
This probably isn't the most optimized solution but I made it in just a few minutes using this website. http://www.regexr.com/
This is by far the best resource I have found yet for creating regular expressions. It has great examples, reference and a cheat sheet. Paste your example text and you can tweak the regex and see it update in real time. Hover over the expression and it will give you details on each part.

How can I get the product of all elements in a one dimensional numpy array

I have a one dimensional NumPy array:
a = numpy.array([2,3,3])
I would like to have the product of all elements, 18 in this case.
The only way I could find to do this would be:
b = reduce(lambda x,y: x*y, a)
Which looks pretty, but is not very fast (I need to do this a lot).
Is there a numpy method that does this? If not, what is the most efficient way of doing this? My real world arrays have 39 float elements.
In NumPy you can try:
numpy.prod(a)
For a larger array numpy.arange(1,40) / 10.:
array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1,
1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3,
3.4, 3.5, 3.6, 3.7, 3.8, 3.9])
your reduce(lambda x,y: x*y, a) needs 24.2µs,
numpy.prod(a) needs 3.9µs.
EDIT: a.prod() needs 2.67µs. Thanks to J.F. Sebastian!
Or if the loss of numerical accuracy is not a problem, we can do
>>> numpy.exp(numpy.sum(numpy.log(a)))
17.999999999999996
>>> numpy.prod(a)
18

Categories