Efficient Numpy Array Multiplication and Reshaping

Efficient Numpy Array Multiplication and Reshaping - python

Is there a simpler way to get this done?
Consider that I have an array of data points of length m - for instance, the amount of rain that accumulated at a weather station over the course of a single day for m many days. Now, we want to add n many small, semi-random perturbations to each day's data to create m * n many perturbed observations. Furthermore, we can divide the day into q many periods, and we have an estimate of the proportion of any day's rain that accumulates in each of those periods; we are assuming that the proportion of rain that accumulates during any period is not dependent on the day.
So I have an array of the daily observations of length m, (EDIT: length n was fixed to length m) which when perturbed becomes an array of shape [m,n], and an an array of period proportions of length q. What I want now is an array of shape [n,m*q] with one row for each perturbation, where each row is the concatenation of the "period-expanded" perturbed estimates of the daily rainfall observations.
As an example, we can define a toy set of data:
import numpy as np
m = 4
n = 3
q = 5
X = np.arange(m*n).reshape(m,n)
E = (np.arange(1,m*n + 1) /10).reshape(m,n)
X = X - E
Y = np.arange(1,q)
np.random.shuffle(Y)
Y = Y / np.sum(np.arange(1,q))
print(f'X : \n{X}')
print(f'Y : \n{Y}')
which gives us
X :
[[-0.1 0.8 1.7]
[ 2.6 3.5 4.4]
[ 5.3 6.2 7.1]
[ 8. 8.9 9.8]]
Y :
[0.2 0.3 0.1 0.4]
My solution is :
res = (X[:,:,np.newaxis] * Y[np.newaxis,np.newaxis,:]).transpose(1,0,2).reshape(X.shape[1],-1)
print(f'res : \n{res}')
which gives us the appropriate answer :
res :
[[-0.02 -0.03 -0.01 -0.04 0.52 0.78 0.26 1.04 1.06 1.59 0.53 2.12
1.6 2.4 0.8 3.2 ]
[ 0.16 0.24 0.08 0.32 0.7 1.05 0.35 1.4 1.24 1.86 0.62 2.48
1.78 2.67 0.89 3.56]
[ 0.34 0.51 0.17 0.68 0.88 1.32 0.44 1.76 1.42 2.13 0.71 2.84
1.96 2.94 0.98 3.92]]
Admittedly it would be easier to expand the observatons first and then randomly perturb, but the order of operations is a hard requirement on this particular problem: the observations must first be perturbed then the perturbed observations must be expanded and concatenated.

Related

How to solve NaN values error using Lmfit with Python

I'm trying to fit a set of data taken by an external simulation, and stored in a vector, with the Lmfit library.
Below there's my code:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from lmfit import Parameters
def DGauss3Par(x,I1,sigma1,sigma2):
I2 = 2.63 - I1
return (I1/np.sqrt(2*np.pi*sigma1))*np.exp(-(x*x)/(2*sigma1*sigma1)) + (I2/np.sqrt(2*np.pi*sigma2))*np.exp(-(x*x)/(2*sigma2*sigma2))
#TAKE DATA
xFull = []
yFull = []
fileTypex = np.dtype([('xFull', np.float)])
fileTypey = np.dtype([('yFull', np.float)])
fDatax = "xValue.dat"
fDatay = "yValue.dat"
xFull = np.loadtxt(fDatax, dtype=fileTypex)
yFull = np.loadtxt(fDatay, dtype=fileTypey)
xGauss = xFull[:]["xFull"]
yGauss = yFull[:]["yFull"]
#MODEL'S DEFINITION
gmodel = Model(DGauss3Par)
params = Parameters()
params.add('I1', value=1.66)
params.add('sigma1', value=1.04)
params.add('sigma2', value=1.2)
result3 = gmodel.fit(yGauss, x=xGauss, params=params)
#PLOTS
plt.plot(xGauss, result3.best_fit, 'y-')
plt.show()
When I run it, I get this error:
File "Overlap.py", line 133, in <module>
result3 = gmodel.fit(yGauss, x=xGauss, params=params)
ValueError: The input contains nan values
These are the values of the data contained in the vector xGauss (related to the x axis):
[-3.88 -3.28 -3.13 -3.08 -3.03 -2.98 -2.93 -2.88 -2.83 -2.78 -2.73 -2.68
-2.63 -2.58 -2.53 -2.48 -2.43 -2.38 -2.33 -2.28 -2.23 -2.18 -2.13 -2.08
-2.03 -1.98 -1.93 -1.88 -1.83 -1.78 -1.73 -1.68 -1.63 -1.58 -1.53 -1.48
-1.43 -1.38 -1.33 -1.28 -1.23 -1.18 -1.13 -1.08 -1.03 -0.98 -0.93 -0.88
-0.83 -0.78 -0.73 -0.68 -0.63 -0.58 -0.53 -0.48 -0.43 -0.38 -0.33 -0.28
-0.23 -0.18 -0.13 -0.08 -0.03 0.03 0.08 0.13 0.18 0.23 0.28 0.33
0.38 0.43 0.48 0.53 0.58 0.63 0.68 0.73 0.78 0.83 0.88 0.93
0.98 1.03 1.08 1.13 1.18 1.23 1.28 1.33 1.38 1.43 1.48 1.53
1.58 1.63 1.68 1.73 1.78 1.83 1.88 1.93 1.98 2.03 2.08 2.13
2.18 2.23 2.28 2.33 2.38 2.43 2.48 2.53 2.58 2.63 2.68 2.73
2.78 2.83 2.88 2.93 2.98 3.03 3.08 3.13 3.28 3.88]
And these ones the ones in the vector yGauss (related to y axis):
[0.00173977 0.00986279 0.01529543 0.0242624 0.0287456 0.03238484
0.03285927 0.03945234 0.04615091 0.05701618 0.0637672 0.07194268
0.07763934 0.08565687 0.09615262 0.1043281 0.11350606 0.1199406
0.1260062 0.14093328 0.15079665 0.16651464 0.18065023 0.1938894
0.2047541 0.21794024 0.22806706 0.23793043 0.25164404 0.2635118
0.28075974 0.29568682 0.30871501 0.3311846 0.34648062 0.36984661
0.38540666 0.40618835 0.4283945 0.45002014 0.48303911 0.50746062
0.53167057 0.5548792 0.57835128 0.60256181 0.62566436 0.65704847
0.68289386 0.71332794 0.73258027 0.769608 0.78769989 0.81407275
0.83358852 0.85210239 0.87109068 0.89456217 0.91618782 0.93760247
0.95680234 0.96919757 0.9783219 0.98486193 0.9931429 0.9931429
0.98486193 0.9783219 0.96919757 0.95680234 0.93760247 0.91618782
0.89456217 0.87109068 0.85210239 0.83358852 0.81407275 0.78769989
0.769608 0.73258027 0.71332794 0.68289386 0.65704847 0.62566436
0.60256181 0.57835128 0.5548792 0.53167057 0.50746062 0.48303911
0.45002014 0.4283945 0.40618835 0.38540666 0.36984661 0.34648062
0.3311846 0.30871501 0.29568682 0.28075974 0.2635118 0.25164404
0.23793043 0.22806706 0.21794024 0.2047541 0.1938894 0.18065023
0.16651464 0.15079665 0.14093328 0.1260062 0.1199406 0.11350606
0.1043281 0.09615262 0.08565687 0.07763934 0.07194268 0.0637672
0.05701618 0.04615091 0.03945234 0.03285927 0.03238484 0.0287456
0.0242624 0.01529543 0.00986279 0.00173977]
I've also tried to print the values returned by my function, to see if there really were some NaN values:
params = Parameters()
params.add('I1', value=1.66)
params.add('sigma1', value=1.04)
params.add('sigma2', value=1.2)
func = DGauss3Par(xGauss,I1,sigma1,sigma2)
print func
but what I obtained is:
[0.04835225 0.06938855 0.07735839 0.08040181 0.08366964 0.08718237
0.09096169 0.09503048 0.0994128 0.10413374 0.10921938 0.11469669
0.12059333 0.12693754 0.13375795 0.14108333 0.14894236 0.15736337
0.16637406 0.17600115 0.18627003 0.19720444 0.20882607 0.22115413
0.23420498 0.24799173 0.26252377 0.27780639 0.29384037 0.3106216
0.32814069 0.34638266 0.3653266 0.38494543 0.40520569 0.42606735
0.44748374 0.46940149 0.49176057 0.51449442 0.5375301 0.56078857
0.58418507 0.60762948 0.63102687 0.65427809 0.6772804 0.69992818
0.72211377 0.74372824 0.76466232 0.78480729 0.80405595 0.82230355
0.83944875 0.85539458 0.87004937 0.88332762 0.89515085 0.90544838
0.91415806 0.92122688 0.92661155 0.93027889 0.93220625 0.93220625
0.93027889 0.92661155 0.92122688 0.91415806 0.90544838 0.89515085
0.88332762 0.87004937 0.85539458 0.83944875 0.82230355 0.80405595
0.78480729 0.76466232 0.74372824 0.72211377 0.69992818 0.6772804
0.65427809 0.63102687 0.60762948 0.58418507 0.56078857 0.5375301
0.51449442 0.49176057 0.46940149 0.44748374 0.42606735 0.40520569
0.38494543 0.3653266 0.34638266 0.32814069 0.3106216 0.29384037
0.27780639 0.26252377 0.24799173 0.23420498 0.22115413 0.20882607
0.19720444 0.18627003 0.17600115 0.16637406 0.15736337 0.14894236
0.14108333 0.13375795 0.12693754 0.12059333 0.11469669 0.10921938
0.10413374 0.0994128 0.09503048 0.09096169 0.08718237 0.08366964
0.08040181 0.07735839 0.06938855 0.04835225]
So it doesn't seems that there are NaN values, I'm not understanding for which reason it returns me that error.
Could anyone help me, please? Thanks!

If you add a print function to your fit function, printing out sigma1 and sigma2, you'll find that
DGauss3Par is evaluated already a few times before the error occurs.
Both sigma variables have a negative value at the time the error occurs.
Taking the square root of a negative value causes, of course, a NaN.
You should add a min bound or similar to your sigma1 and sigma2 parameters to prevent this. Using min=0.0 as an additional argument to params.add(...) will result in a good fit.
Be aware that for some analyses, setting explicit bounds to your fitting parameters may make these analyses invalid. For most cases, you'll be fine, but for some cases, you'll need to check whether the fitting parameters should be allowed to vary from negative infinity to positive infinity, or are allowed to be bounded.

Convert elements in masked astropy Table to np.nan

Consider the simple process of reading a data file with some non-valid entries. This is my test.dat file:
16 1035.22 1041.09 24.54 0.30 1.39 0.30 1.80 0.30 2.26 0.30 1.14 0.30 0.28 0.30 0.2884
127 824.57 1105.52 25.02 0.29 0.87 0.29 1.30 0.29 2.12 0.29 0.66 0.29 0.10 0.29 0.2986
182 1015.83 904.93 INDEF 0.28 1.80 0.28 1.64 0.28 2.38 0.28 1.04 0.28 0.06 0.28 0.3271
185 1019.15 1155.09 24.31 0.28 1.40 0.28 1.78 0.28 2.10 0.28 0.87 0.28 0.35 0.28 0.3290
192 1024.80 1045.57 24.27 0.27 1.24 0.27 2.01 0.27 2.40 0.27 0.90 0.27 0.09 0.27 0.3328
197 1035.99 876.04 24.10 0.27 1.23 0.27 1.52 0.27 2.59 0.27 0.45 0.27 0.25 0.27 0.3357
198 1110.80 1087.97 24.53 0.27 1.49 0.27 1.71 0.27 2.33 0.27 0.22 0.27 0.00 0.27 0.3362
1103 1168.39 1065.97 24.35 0.27 1.28 0.27 1.29 0.27 2.68 0.27 0.43 0.27 0.26 0.27 0.3388
And this is the code to read it, and replace the "bad" values (INDEF) with a float (99.999)
import numpy as np
from astropy.io import ascii
data = ascii.read("test.dat", fill_values=[('INDEF', '0')])
data = data.filled(99.999)
This works just fine, but if I instead try to replace the bad values with a np.nan (i.e., I use the line data = data.filled(np.nan)) I get:
ValueError: cannot convert float NaN to integer
why is this and how can I get around it?

As noted the issue is that the numpy MaskedArray.filled() method seems to try converting the fill value to the appropriate type before checking if there is actually anything to fill. Since the table in the example has an int column, this fails within numpy (and astropy.Table is just calling the filled() method on each column).
This should work:
In [44]: def fill_cols(tbl, fill=np.nan, kind='f'):
...: """
...: In-place fill of ``tbl`` columns which have dtype ``kind``
...: with ``fill`` value.
...: """
...: for col in tbl.itercols():
...: if col.dtype.kind == kind:
...: col[...] = col.filled(fill)
...:
In [45]: t = simple_table(masked=True)
In [46]: t
Out[46]:
<Table masked=True length=3>
a b c
int64 float64 str1
----- ------- ----
-- 1.0 c
2 2.0 --
3 -- e
In [47]: fill_cols(t)
In [48]: t
Out[48]:
<Table masked=True length=3>
a b c
int64 float64 str1
----- ------- ----
-- 1.0 c
2 2.0 --
3 nan e

I don't think it's primarily a numpy problem, as it works with individual columns:
>>> data['col4'].filled(np.nan)
<Column name='col4' dtype='float64' length=8>
24.54
25.02
nan
24.31
24.27
24.1
24.53
24.35
but you still can't construct a Table from this -
Table([data[n].filled(np.nan) for n in data.colnames])
raises the same error in np.ma.core.
You can explicitly set
data['col4'] = data['col4'].filled(np.nan)
but this apparently lets the table lose its .filled() method...
I am not that familiar with masked arrays and tables, but as you've already filed a related issue on Github, you might want to add this problem.

This is happening fairly deep in numpy, in numpy.ma.filled. fill values have to be scalars, basically.
A messy solution that fills with nan's and still returns a table could look like:
import numpy as np
from astropy.io import ascii
from astropy.table import Table
def fill_with_nan(t):
arr = t.as_array()
arr_list = arr.tolist()
arr = np.array(arr_list)
arr[np.equal(arr, None)] = np.nan
arr = np.array(arr.tolist())
return Table(arr)
data = ascii.read("test.dat", fill_values=[('INDEF', '0')])
data = fill_with_nan(data)

Cut out the middleman? fill_values=[('INDEF', np.nan)]) seems to work.

Scipy curve_fit bounds and conditions

I am trying to use curve_fit to fit some data. it is working great, I would just like to improve the fit with additional parameters to match assumptions (such as mechanical efficiency cannot be greater than 100% etc)
y_data = [0.90 0.90 0.90 0.90 0.90 0.90 0.90 1.30 1.30 1.30 1.30 1.20 1.65 1.65 1.65 1.65 1.65 1.65 1.80 1.80 1.80 1.80 1.80 1.80 1.80 1.80 1.80 3.50 6.60 6.60 6.70 6.70 6.70 6.70 6.70 8.50 12.70] # I am aware this does not have commas
x_data = [0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.38 0.46 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 0.53 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02 1.02] # ditto
def poly2(x, a, b, c): return a*x**2+ b*x+c
def poly3(x,a,b,c,d): return a*x**3+b*x**2+c*b*x+d
pars = fit(poly2, x_data, y_data, bounds=bounds)
But I would like to additionally specify bounds to relations between parameters eg.
B**2 -4*a*c > 0 #for poly2
b**2-3*a*c=0 #for poly3
To ensure that the fit has horizontal inflection.
Is there a way to achieve this?
Edit: I found this, it may help once I investigate:How do I put a constraint on SciPy curve fit?
How would this be done using lmfit as suggested?

So I believe I have solved this, based on #9dogs comment using lmfit.
relevant documentation here:
https://lmfit.github.io/lmfit-py/constraints.html
and a helpful tutorial here:
http://blog.danallan.com/projects/2013/model/
For my function poly3 this seams to work to enforce horizontal or positive inflection.
from lmfit import Parameters, Model
def poly3(x,a,b,c,d): return a*x**3+b*x**2+c*b*x+d
model = Model(poly3, independent_vars=['x'], )
params = Parameters()
apologies for teh terrible maths: the cubic dicriminant is given here as https://brilliant.org/wiki/cubic-discriminant/ b**2*c**2-4*a*c**3-4*b**3*d-27*a**2*d**2+18*a*b*c*d
params = Parameters()
params..add('a', value=1, min=0, vary=True)
params.add('b', value=1, vary=True)
params.add('c', value=1, vary=True)
params.add('d', value=1, vary=True)
params.add('discr', value = 0, vary= False, expr='(b**2*c**2-4*a*c**3-4*b**3*d-27*a**2*d**2+18*a*b*c*d)')
result = model.fit(y_data, x=x_data, params=params) # do the work
pars = [] # list that will contain the optimized parameters for analysis
# create a parameters list for use in the rest of code, this is a stopgap until I refactor the rest of my code
pars.append(result.values['a'])
pars.append(result.values['b'])
pars.append(result.values['c'])
pars.append(result.values['d'])
## rest of code such as plotting
If there are questions I will expand the example further.

Pairwise Elements Using Python - Calculating Average of individual elements of array

So I have a query; I am accessing an API that gives the following response:
[["22014",201939,"0021401229","APR 15 2015",Team1 vs. Team2","W",
19,4,10,0.4,2,4,0.5,0,0,0,2,2,4,7,5,0,2,1,10,14,1],["22014",201939,"0021401","APR
13 2015",Team1 vs. Team3","W",
15,4,13,0.4,2,8,0.5,0,0,0,2,2,4,7,5,0,8,1,12,14,1],["22014",201939,"0021401192","APR
11 2015",Team1 vs. Team4","W",
22,5,10,0.4,2,6,0.5,0,0,0,2,2,4,7,5,0,2,1,8,14,1]]
I could just as easily have 16 different variables that I assign zero to, then print them out like the following example:
sum_pts = 0
for n in range(0,len(shot_data)): #range of games; these lengths vary per player
sum_pts= sum_pts+float(json.dumps(shots_array[n][24]))
print sum_pts/float(len(shots_array))
Output:
>>>
23.75
But I'd rather not create 16 different variables that calculate the average of the individual elements in this list. I'm looking for an easier way that I could get the average of Team1
I would like it the output to eventually be, so that I can apply this to infinite number of players or individual stats:
Team1 AVGPTS AVGAST AVGSTL AVGREB...
23.75 5.3 2.1 3.2
Or it could be:
Player1 AVGPTS AVGAST AVGSTL AVGREB ...
23.75 5.3 2.1 3.2 ...

To get the averages of the last 16 entries of each entry, you could use the following approach, this avoids the need to define multiple variables for each column:
data = [
["22014",201939,"0021401229","APR 15 2015", "Team1 vs. Team2","W", 19,4,10,0.4,2,4,0.5,0,0,0,2,2,4,7,5,0,2,1,10,14,1],
["22014",201939,"0021401","APR 13 2015","Team1 vs. Team3","W", 15,4,13,0.4,2,8,0.5,0,0,0,2,2,4,7,5,0,8,1,12,14,1],
["22014",201939,"0021401192","APR 11 2015","Team1 vs. Team4","W", 22,5,10,0.4,2,6,0.5,0,0,0,2,2,4,7,5,0,2,1,8,14,1]]
length = float(len(data))
values = []
for entry in data:
values.append(entry[6:])
values = zip(*values)
averages = [sum(v) / length for v in values]
for col in averages:
print "{:.2f} ".format(col),
This would display:
18.67 4.33 11.00 0.40 2.00 6.00 0.50 0.00 0.00 0.00 2.00 2.00 4.00 7.00 5.00 0.00 4.00 1.00 10.00 14.00 1.00
Note, your data is missing an opening quote before each Team1 vs Team2.

NumPy array acts differently based on origin (np.max() and np.argmax())

I have a function which creates a NumPy array from a data file. I want to then get the maximum value in the array and the index of that value:
import numpy as np
def dostuff():
# open .txt file into lists
# copy lists into numpy array
# nested for loops and values copied into numpy array called a
print a
print np.max(a)
print np.argmax(a)
dostuff()
Running this gives:
[[ 0.64 0.47 0.22 0.1 0.05 0.02]
[ 2.19 9.13 10.68 6.44 3.36 1.77]
[ 1.84 8.81 12.6 8.31 4.45 2.35]]
2.35
0
Clearly something has gone wrong with the np.max() and np.argmax(). This can be shown with the following code
def test():
a = np.array([[0.64, 0.47, 0.22, 0.1, 0.05, 0.02],
[2.19, 9.13, 10.68, 6.44, 3.36, 1.77],
[1.84, 8.81, 12.6, 8.31, 4.45, 2.35]])
print a
print np.max(a)
print np.argmax(a)
test()
This gives:
[[ 0.64 0.47 0.22 0.1 0.05 0.02]
[ 2.19 9.13 10.68 6.44 3.36 1.77]
[ 1.84 8.81 12.6 8.31 4.45 2.35]]
12.6
14
...which is what I would have expected. I have no idea why these two (apparently) identical arrays give different results here. Does anyone know what I may have done?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Efficient Numpy Array Multiplication and Reshaping - python

Related

How to solve NaN values error using Lmfit with Python

Convert elements in masked astropy Table to np.nan

Scipy curve_fit bounds and conditions

Pairwise Elements Using Python - Calculating Average of individual elements of array

NumPy array acts differently based on origin (np.max() and np.argmax())

Categories

Resources