Regex python: match multi-line float values between brackets - python

Match grouped multi-line float values between brackets
In the below example data, I want to extract all the float values between
brackets belonging only to "group1" using regex, but not the values
from other groups ("group2", "group3" etc.). A requirement is that it is done via regex in python. Is this
possible with regex at all?
Regex patterns attempts:
I tried the following patterns, but they capture either everything or nothing:
Matches every float value in all groups: ([+-]*\d+\.\d+),
Matches no value in any groups: group1 = \[ ([+-]*\d+\.\d+), \]
What should I do to make this work? Any suggestions would be very welcome!
Example data:
group1 = [
1.0,
-2.0,
3.5,
-0.3,
1.7,
4.2,
]
group2 = [
2.0,
1.5,
1.8,
-1.8,
0.7,
-0.3,
]
group1 = [
0.0,
-0.5,
1.3,
0.8,
-0.4,
0.1,
]

Here's a regex I created r'group1 = \[\n([ *-?\d\.\d,\n]+)\]':
import re
s = '''group1 = [
1.0,
-2.0,
3.5,
-0.3,
1.7,
4.2,
]
group2 = [
2.0,
1.5,
1.8,
-1.8,
0.7,
-0.3,
]
group1 = [
0.0,
-0.5,
1.3,
0.8,
-0.4,
0.1,
]'''
groups = re.findall(r'group1 = \[\n([ *-?\d\.\d,\n]+)\]', s)
groups = [float(f) for l in map(lambda p: p.split(','), groups) for f in l if f.strip()]
print(groups)
Output:
[1.0, -2.0, 3.5, -0.3, 1.7, 4.2, 0.0, -0.5, 1.3, 0.8, -0.4, 0.1]

Try this:
\bgroup2 = \[([\s+\d+.\d+[,-\]]+)
This probably isn't the most optimized solution but I made it in just a few minutes using this website. http://www.regexr.com/
This is by far the best resource I have found yet for creating regular expressions. It has great examples, reference and a cheat sheet. Paste your example text and you can tweak the regex and see it update in real time. Hover over the expression and it will give you details on each part.

Related

KeyError making pandas dataframe

I am trying to make find the equation of a function using pandas dataframe. This has worked in the past on other projects, however, now nothing seems to work.
I am aware that there might be easier ways to solve this, but i need this to work somehow.
additional_cols = ['xVerdier','fDer']
fdata = pd.DataFrame({"idx":findex,"x":xVerdier[:-1],"y":fDer})
print(fdata)
fdata = fdata.reindex(fdata.columns.tolist() + additional_cols, axis = 1)
fdata=fdata [[xVerdier[:-1],fDer]]
fdata = mpd.DataFrame(fdata)
train=fdata[:(int((len(fdata))))]
test=fdata[(int((len(fdata)))):]
regr=linear_model.LinearRegression()
train_x=np.array(train[[xVerdier]])
train_y=np.array(train[[fDer]])
regr.fit(train_x,train_y)
xVerdier is a list of x-values of a graph
[0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999, 1.0999999999999999, 1.2, 1.3, 1.4000000000000001, 1.5000000000000002, 1.6000000000000003, 1.7000000000000004, 1.8000000000000005, 1.9000000000000006, 2.0000000000000004, 2.1000000000000005, 2.2000000000000006, 2.3000000000000007, 2.400000000000001, 2.500000000000001, 2.600000000000001, 2.700000000000001, 2.800000000000001, 2.9000000000000012, 3.0000000000000013, 3.1000000000000014, 3.2000000000000015, 3.3000000000000016, 3.4000000000000017, 3.5000000000000018, 3.600000000000002, 3.700000000000002, 3.800000000000002, 3.900000000000002, 4.000000000000002, 4.100000000000001, 4.200000000000001, 4.300000000000001, 4.4, 4.5, 4.6, 4.699999999999999, 4.799999999999999, 4.899999999999999, 4.999999999999998]
fDer is a list of y-values of said graph
[1.2, 1.6000000000000003, 2.0000000000000004, 2.4, 2.799999999999999, 3.1999999999999984, 3.5999999999999988, 3.999999999999999, 4.3999999999999995, 4.8, 5.2, 5.600000000000005, 6.000000000000005, 6.400000000000006, 6.800000000000006, 7.200000000000006, 7.600000000000007, 7.999999999999998, 8.400000000000016, 8.79999999999999, 9.200000000000017, 9.600000000000009, 10.000000000000018, 10.40000000000001, 10.800000000000018, 11.20000000000001, 11.600000000000001, 12.000000000000028, 12.40000000000002, 12.799999999999976, 13.200000000000038, 13.60000000000003, 14.000000000000021, 14.400000000000013, 14.80000000000004, 15.199999999999996, 15.600000000000023, 16.00000000000005, 16.400000000000006, 16.799999999999926, 17.19999999999999, 17.59999999999991, 17.99999999999997, 18.399999999999892, 18.799999999999955, 19.199999999999946, 19.599999999999866, 19.99999999999993, 20.39999999999999, 20.799999999999912]
This is the error message
KeyError: "None of [Index([(0, 0.1, 0.2, 0.30000000000000004, 0.4, 0.5, 0.6, 0.7, 0.7999999999999999, 0.8999999999999999, 0.9999999999999999, 1.0999999999999999, 1.2, 1.3, 1.4000000000000001, 1.5000000000000002, 1.6000000000000003, 1.7000000000000004, 1.8000000000000005, 1.9000000000000006, 2.0000000000000004, 2.1000000000000005, 2.2000000000000006, 2.3000000000000007, 2.400000000000001, 2.500000000000001, 2.600000000000001, 2.700000000000001, 2.800000000000001, 2.9000000000000012, 3.0000000000000013, 3.1000000000000014, 3.2000000000000015, 3.3000000000000016, 3.4000000000000017, 3.5000000000000018, 3.600000000000002, 3.700000000000002, 3.800000000000002, 3.900000000000002, 4.000000000000002, 4.100000000000001, 4.200000000000001, 4.300000000000001, 4.4, 4.5, 4.6, 4.699999999999999, 4.799999999999999, 4.899999999999999), (1.2, 1.6000000000000003, 2.0000000000000004, 2.4, 2.799999999999999, 3.1999999999999984, 3.5999999999999988, 3.999999999999999, 4.3999999999999995, 4.8, 5.2, 5.600000000000005, 6.000000000000005, 6.400000000000006, 6.800000000000006, 7.200000000000006, 7.600000000000007, 7.999999999999998, 8.400000000000016, 8.79999999999999, 9.200000000000017, 9.600000000000009, 10.000000000000018, 10.40000000000001, 10.800000000000018, 11.20000000000001, 11.600000000000001, 12.000000000000028, 12.40000000000002, 12.799999999999976, 13.200000000000038, 13.60000000000003, 14.000000000000021, 14.400000000000013, 14.80000000000004, 15.199999999999996, 15.600000000000023, 16.00000000000005, 16.400000000000006, 16.799999999999926, 17.19999999999999, 17.59999999999991, 17.99999999999997, 18.399999999999892, 18.799999999999955, 19.199999999999946, 19.599999999999866, 19.99999999999993, 20.39999999999999, 20.799999999999912)], dtype='object')] are in the [columns]"

Get value at certain confidence percentile

I'm trying to obtain from a list of sorted α-values (Ex: 0.01, 0.2, 0.5, 1.1, 1.5, 2.4, 3.1, 4.0, 5.7, 6.3) with a confidence level set at 0.8. Where I want to use the value at this location, after traversing 80% of my array. I want to get alpha score to make prediction intervals
alpha_scores = array([0.01, 0.2, 0.5, 1.1, 1.5, 2.4, 3.1, 4.0, 5.7, 6.3])
confidence_level = 0.80
confidence_percentile = int(np.floor(confidence_level * (alpha_scores.size + 1))) - 1 #Calculate the confidence percentile
alpha_index = min(max(confidence_level , 0), alpha_scores.size - 1)
err_dist = alpha_scores[alpha_index]
Would this be the correct way to obtain this? I get a score but this does not always meet that same value.

Python for loop sum function trouble

Not sure what is wrong with this function but would appriciate any help I could get on it. New to python and a bit confused.
def summer(tables):
"""
MODIFIES the table to add a column summing the previous elements in the row.
Example: Suppose that a is
[['First', 'Second', 'Third'], [0.1, 0.3, 0.5], [0.6, 0.2, 0.7], [0.5, 1.1, 0.1]]
then place_sums(a) modifies the table a so that it is now
[['First', 'Second', 'Third', 'Sum'],
[0.1, 0.3, 0.5, 0.8], [0.6, 0.2, 0.7, 1.5], [0.5, 1.1, 0.1, 1.7]]
Parameter table: the nested list to process
"""
numrows = len(tables)
sums = []
for n in range(numrows):
sums = [sum(item) for item in tables]
return sums
This is what you are looking for. You don't need to create a new list. You just need to update your variable tables. Also putting a return statement inside your loop just make it run one iteration. You should look at how for loop work and what the return statement actually does.
def summer(tables):
"""
MODIFIES the table to add a column summing the previous elements in the row.
Example: Suppose that a is
[['First', 'Second', 'Third'], [0.1, 0.3, 0.5], [0.6, 0.2, 0.7], [0.5, 1.1, 0.1]]
then place_sums(a) modifies the table a so that it is now
[['First', 'Second', 'Third', 'Sum'],
[0.1, 0.3, 0.5, 0.8], [0.6, 0.2, 0.7, 1.5], [0.5, 1.1, 0.1, 1.7]]
Parameter table: the nested list to process
"""
tables[0].append('Sum')
for i in range(1, len(tables)):
tables[i].append(sum(tables[i]))

SciPy warning message: "Ill-conditioned matrix detected"

I am running some code that I originally developed with SciPy 0.18. Now using SciPy 0.19 I often get warning messages like this:
/usr/lib/python3/dist-packages/scipy/linalg/basic.py:223:
RuntimeWarning: scipy.linalg.solve Ill-conditioned matrix detected.
Result is not guaranteed to be accurate. Reciprocal condition number:
1.8700410190617105e-17 ' condition number: {}'.format(rcond), RuntimeWarning)
Here is a small snippet that generates the message above:
from scipy import interpolate
xx = [0.5, 0.5, 0.5, 1.5, 1.5, 1.5, 2.5, 2.5, 2.5]
yy = [2.5, 1.5, 0.5, 2.5, 1.5, 0.5, 2.5, 1.5, 0.5]
vals = [30.0, 20.0, 10.0, 31.0, 21.0, 11.0, 32.0, 22.0, 12.0]
f = interpolate.Rbf(xx, yy, vals, epsilon=100)
In spite of the warning the results are correct. What is causing this warning? Can it be suppressed somehow?
When inspecting the matrix with
numpy.linalg.cond(f.A)
6.213533820748747e+16
you'll find that its condition number is in the range of machine precision, meaning that your solution contains no significant digits.
Try, e.g.,
b = numpy.random.rand(f.A.shape[0])
x = numpy.linalg.solve(f.A, b)
print(numpy.dot(f.A, x) - b)
[-0.22342786 -0.06718507 -0.13027724 -0.09972579 -0.16589076 -0.06328093
0.05480577 -0.12606864 0.02067541]
If x was indeed a solution, all those numbers would be close to 0. Take it easy on the epsilon to get something meaningful.

How can I get the product of all elements in a one dimensional numpy array

I have a one dimensional NumPy array:
a = numpy.array([2,3,3])
I would like to have the product of all elements, 18 in this case.
The only way I could find to do this would be:
b = reduce(lambda x,y: x*y, a)
Which looks pretty, but is not very fast (I need to do this a lot).
Is there a numpy method that does this? If not, what is the most efficient way of doing this? My real world arrays have 39 float elements.
In NumPy you can try:
numpy.prod(a)
For a larger array numpy.arange(1,40) / 10.:
array([ 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. , 1.1,
1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2,
2.3, 2.4, 2.5, 2.6, 2.7, 2.8, 2.9, 3. , 3.1, 3.2, 3.3,
3.4, 3.5, 3.6, 3.7, 3.8, 3.9])
your reduce(lambda x,y: x*y, a) needs 24.2µs,
numpy.prod(a) needs 3.9µs.
EDIT: a.prod() needs 2.67µs. Thanks to J.F. Sebastian!
Or if the loss of numerical accuracy is not a problem, we can do
>>> numpy.exp(numpy.sum(numpy.log(a)))
17.999999999999996
>>> numpy.prod(a)
18

Categories