Using two np.linspace, how to fill 2D array with complex values? - python

I'm trying to fill a 2D array with complex(x,y), where x and y are from two two arrays:
xstep = np.linspace(xmin, xmax, Nx)
ystep = np.linspace(ymin, ymax, Ny)
However I can't figure out how to "spread" these values out on a 2D array.
So far my attempts are not really working out. I was hoping for something along the lines of:
result = np.array(xstep + (1j * ystep))
Maybe something from fromfunction, meshgrid or full, but I can't quite make it work.
As an example, say I do this:
xstep = np.linspace(0, 1, 2) # array([0., 1.])
ystep = np.linspace(0, 1, 3) # array([0. , 0.5, 1. ])
I'm trying to construct an answer:
array([
[0+0j, 0+0.5j, 0+1j],
[1+0j, 1+0.5j, 1+1j]
])
Note that I am not married to the linspace, so any quicker method would also do, it is just my natural starting point for creating this array, being new to Numpy.

In [4]: xstep = np.linspace(0, 1, 2)
In [5]: ystep = np.linspace(0, 1, 3)
In [6]: xstep[:, None] + 1j*ystep
Out[6]:
array([[0.+0.j , 0.+0.5j, 0.+1.j ],
[1.+0.j , 1.+0.5j, 1.+1.j ]])
xstep[:, None] is equivalent to xstep[:, np.newaxis] and its purpose is to add a new axis to xstep on the right. Thus, xstep[:, None] is a 2D array of shape (2, 1).
In [19]: xstep[:, None].shape
Out[19]: (2, 1)
xstep[:, None] + 1j*ystep is thus the sum of a 2D array of shape (2, 1) and a 1D array of shape (3,).
NumPy broadcasting resolves this apparent shape conflict by automatically adding new axes (of length 1) on the left. So, by NumPy broadcasting rules, 1j*ystep is promoted to an array of shape (1, 3).
(Notice that xstep[:, None] is required to explicitly add new axes on the right, but broadcasting will automatically add axes on the left. This is why 1j*ystep[None, :] was unnecessary though valid.)
Broadcasting further promotes both arrays to the common shape (2, 3) (but in a memory-efficient way, without copying the data). The values along the axes of length 1 are broadcasted repeatedly:
In [15]: X, Y = np.broadcast_arrays(xstep[:, None], 1j*ystep)
In [16]: X
Out[16]:
array([[0., 0., 0.],
[1., 1., 1.]])
In [17]: Y
Out[17]:
array([[0.+0.j , 0.+0.5j, 0.+1.j ],
[0.+0.j , 0.+0.5j, 0.+1.j ]])

You can use np.ogrid with imaginary "step" to obtain linspace semantics:
y, x = np.ogrid[0:1:2j, 0:1:3j]
y + 1j*x
# array([[0.+0.j , 0.+0.5j, 0.+1.j ],
# [1.+0.j , 1.+0.5j, 1.+1.j ]])
Here the ogrid line means make an open 2D grid. axis 0: 0 to 1, 2 steps, axis 1: 0 to 1, 3 steps. The type of the slice "step" acts as a switch, if it is imaginary (in fact anything of complex type) its absolute value is taken and the expression is treated like a linspace. Otherwise range semantics apply.
The return values
y, x
# (array([[0.],
# [1.]]), array([[0. , 0.5, 1. ]]))
are "broadcast ready", so in the example we can simply add them and obtain a full 2D grid.
If we allow ourselves an imaginary "stop" parameter in the second slice (which only works with linspace semantics, so depending on your style you may prefer to avoid it) this can be condensed to one line:
sum(np.ogrid[0:1:2j, 0:1j:3j])
# array([[0.+0.j , 0.+0.5j, 0.+1.j ],
# [1.+0.j , 1.+0.5j, 1.+1.j ]])
A similar but potentially more performant method would be preallocation and then broadcasting:
out = np.empty((y.size, x.size), complex)
out.real[...], out.imag[...] = y, x
out
# array([[0.+0.j , 0.+0.5j, 0.+1.j ],
# [1.+0.j , 1.+0.5j, 1.+1.j ]])
And another one using outer sum:
np.add.outer(np.linspace(0,1,2), np.linspace(0,1j,3))
# array([[0.+0.j , 0.+0.5j, 0.+1.j ],
# [1.+0.j , 1.+0.5j, 1.+1.j ]])

Use reshape(-1,1) for xstep as:
xstep = np.linspace(0, 1, 2) # array([0., 1.])
ystep = np.linspace(0, 1, 3) # array([0. , 0.5, 1. ])
result = np.array(xstep.reshape(-1,1) + (1j * ystep))
result
array([[0.+0.j , 0.+0.5j, 0.+1.j ],
[1.+0.j , 1.+0.5j, 1.+1.j ]])

Related

scipy.stats.norm for array of values with different accuracy in different method

Generate two arrays:
np.random.seed(1)
x = np.random.rand(30, 2)
np.random.seed(2)
x_test = np.random.rand(5,2)
Caluclate scipy.stats.norm axis by axis:
gx0 = scipy.stats.norm(np.mean(x[:,0]), np.std(x[:,0])).pdf(x_test[:,0])
gx1 = scipy.stats.norm(np.mean(x[:,1]), np.std(x[:,1])).pdf(x_test[:,1])
and get:
gx0 = array([1.29928091, 1.1344507 , 1.30920536, 1.10709298, 1.26903949])
gx1 = array([0.29941644, 1.36808598, 1.13817727, 1.34149231, 0.95054596])
Calculate using NumPy broadcasting
gx = scipy.stats.norm(np.mean(x, axis = 0), np.std(x, axis = 0)).pdf(x_test)
and get:
gx = array([[1.29928091, 0.29941644],
[1.1344507 , 1.36808598],
[1.30920536, 1.13817727],
[1.10709298, 1.34149231],
[1.26903949, 0.95054596]])
gx[:,0] and gx0 look like the same, but subtracting one from another gx[:,0] - gx0 will get:
array([-4.44089210e-16, -2.22044605e-16, -4.44089210e-16, 0.00000000e+00,
0.00000000e+00])
Why is that?
Not sure why they calculate the answer to different precisions, but converting the input arrays to 128 bit floats solves the problem:
np.random.seed(1)
x = np.random.rand(30, 2).astype(np.float128)
np.random.seed(2)
x_test = np.random.rand(5,2).astype(np.float128)
...
print(gx[:,0] - gx0)
results in:
[0. 0. 0. 0. 0.]

How to make a ufunc output a matrix given two array_like operands (instead of trying to broadcast them)?

I would like to get a matrix of values given two ndarray's from a ufunc, for example:
degs = numpy.array(range(5))
pnts = numpy.array([0.0, 0.1, 0.2])
values = scipy.special.eval_chebyt(degs, pnts)
The above code doesn't work (it gives a ValueError because it tries to broadcast two arrays and fails since they have different shapes: (5,) and (3,)); I would like to get a matrix of values with rows corresponding to degrees and columns to points at which polynomials are evaluated (or vice versa, it doesn't matter).
Currently my workaround is simply to use for-loop:
values = numpy.zeros((5,3))
for j in range(5):
values[j] = scipy.special.eval_chebyt(j, pnts)
Is there a way to do that? In general, how would you let a ufunc know you want an n-dimensional array if you have n array_like arguments?
I know about numpy.vectorize, but that seems neither faster nor more elegant than just a simple for-loop (and I'm not even sure you can apply it to an existent ufunc).
UPDATE What about ufunc's that receive 3 or more parameters? trying outer method gives a ValueError: outer product only supported for binary functions. For example, scipy.special.eval_jacobi.
What you need is exactly the outer method of ufuncs:
ufunc.outer(A, B, **kwargs)
Apply the ufunc op to all pairs (a, b) with a in A and b in B.
values = scipy.special.eval_chebyt.outer(degs, pnts)
#array([[ 1. , 1. , 1. ],
# [ 0. , 0.1 , 0.2 ],
# [-1. , -0.98 , -0.92 ],
# [-0. , -0.296 , -0.568 ],
# [ 1. , 0.9208, 0.6928]])
UPDATE
For more parameters, you must broadcast by hand. meshgrid often help for that,spanning each parameter in a dimension. For exemple :
n=3
alpha = numpy.array(range(5))
beta = numpy.array(range(3))
x = numpy.array(range(2))
data = numpy.meshgrid(n,alpha,beta,x)
values = scipy.special.eval_jacobi(*data)
Reshape the input arguments for broadcasting. In this case, change the shape of degs to be (5, 1) instead of just (5,). The shape (5, 1) broadcast with the shape (3,) results in the shape (5, 3):
In [185]: import numpy as np
In [186]: import scipy.special
In [187]: degs = np.arange(5).reshape(-1, 1) # degs has shape (5, 1)
In [188]: pnts = np.array([0.0, 0.1, 0.2])
In [189]: values = scipy.special.eval_chebyt(degs, pnts)
In [190]: values
Out[190]:
array([[ 1. , 1. , 1. ],
[ 0. , 0.1 , 0.2 ],
[-1. , -0.98 , -0.92 ],
[-0. , -0.296 , -0.568 ],
[ 1. , 0.9208, 0.6928]])

Vectorizing A Function With Array Parameter

I'm currently trying to apply Chi-Squared analysis to some data.
I want to plot a colourmap of varying values depending on the two coefficients of a model
def f(x, coeff):
return coeff[0] + numpy.exp(coeff[1] * x)
def chi_squared(coeff, x, y, y_err):
return numpy.sum(((y - f(x, coeff) / y_err)**2)
us = numpy.linspace(u0, u1, n)
vs = numpy.linspace(v0, v1, n)
rs = numpy.meshgrid(us, vs)
chi = numpy.vectorize(chi_squared)
chi(rs, x, y, y_error)
I tried vectorizing the function to be able to pass a meshgrid of the varying coefficents to produce the colormap.
The values of x, y, y_err are all 1D arrays of length n.
And u, v are the various changing coefficients.
However this doesn't work, resulting in
IndexError: invalid index to scalar variable.
This is because coeff is passed as a scalar rather than a vector, however I don't know how to correct this.
Update
My aim is to take an array of coordinates
rs = [[[u0, v0], [u1, v0],..,[un, v0]],...,[[u0, vm],..,[un,vm]]
Where each coordinate is the coefficient parameters to be passed to the chi-squared method.
This should return a 2D array populated with Chi-Squared values for the appropriate coordinate
chi = [[c00, c10, ..., cn0], ..., [c0m, c1m, ..., cnm]]
I can then use this data to plot a colormap using imshow
Here's my first attempt to run your code:
In [44]: def f(x, coeff):
...: return coeff[0] + numpy.exp(coeff[1] * x)
...:
...: def chi_squared(coeff, x, y, y_err):
...: return numpy.sum((y - f(x, coeff) / y_err)**2)
(I had to remove the ( in that last line)
First guess at possible array values:
In [45]: x = np.arange(3)
In [46]: y = x
In [47]: y_err = x
In [48]: us = np.linspace(0,1,3)
In [49]: rs = np.meshgrid(us,us)
In [50]: rs
Out[50]:
[array([[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ],
[ 0. , 0.5, 1. ]]),
array([[ 0. , 0. , 0. ],
[ 0.5, 0.5, 0.5],
[ 1. , 1. , 1. ]])]
In [51]: chi_squared(rs, x, y, y_err)
/usr/local/bin/ipython3:5: RuntimeWarning: divide by zero encountered in true_divide
import sys
Out[51]: inf
oops, y_err shouldn't have a 0. Try again:
In [52]: y_err = np.array([1,1,1])
In [53]: chi_squared(rs, x, y, y_err)
Out[53]: 53.262865105526018
It also works if I turn the rs list into an array:
In [55]: np.array(rs).shape
Out[55]: (2, 3, 3)
In [56]: chi_squared(np.array(rs), x, y, y_err)
Out[56]: 53.262865105526018
Now, what was the purpose of vectorize?
The f function returns a (n,n) array:
In [57]: f(x, rs)
Out[57]:
array([[ 1. , 1.5 , 2. ],
[ 1. , 2.14872127, 3.71828183],
[ 1. , 3.21828183, 8.3890561 ]])
Lets modify the chi_squared to give sum an axis
In [61]: def chi_squared(coeff, x, y, y_err, axis=None):
...: return numpy.sum((y - f(x, coeff) / y_err)**2, axis=axis)
In [62]: chi_squared(np.array(rs), x, y, y_err)
Out[62]: 53.262865105526018
In [63]: chi_squared(np.array(rs), x, y, y_err, axis=0)
Out[63]: array([ 3. , 6.49033483, 43.77253028])
In [64]: chi_squared(np.array(rs), x, y, y_err, axis=1)
Out[64]: array([ 1.25 , 5.272053 , 46.74081211])
I'm tempted to change the coeff to coeff0, coeff1, to give more control from the start on how this parameter is passed, but it probably doesn't make a difference.
update
Now that you've been more specific about how the coeff values relate to x, y etc, I see that this can be solved with simple broadcasting. No need to use np.vectorize.
First, define a grid that has a different size; that way we, and the code, won't think that each dimension of the coeff grid has anything to do with the x,y values.
In [134]: rs = np.meshgrid(np.linspace(0,1,4), np.linspace(0,1,5), indexing='ij')
In [135]: coeff=np.array(rs)
In [136]: coeff.shape
Out[136]: (2, 4, 5)
Now look at what f looks like when given this coeff and x.
In [137]: f(x, coeff[...,None]).shape
Out[137]: (4, 5, 3)
coeff is effectively (4,5,1), while x is (1,1,3), resulting in a (4,5,3) (by broadcasting rules)
The same thing happens inside chi_squared, with the final step of sum on the last axis (size 3):
In [138]: chi_squared(coeff[...,None], x, y, y_err, axis=-1)
Out[138]:
array([[ 2. , 1.20406718, 1.93676807, 8.40646968,
32.99441808],
[ 2.33333333, 2.15923164, 3.84810347, 11.80559574,
38.73264336],
[ 3.33333333, 3.78106277, 6.42610554, 15.87138846,
45.13753532],
[ 5. , 6.06956056, 9.67077427, 20.60384785,
52.20909393]])
In [139]: _.shape
Out[139]: (4, 5)
One value for each coeff pair of values, the (4,5) grid.

efficient numpy array creation

Given x, I want to produce x, log(x) as a numpy array whereby x has shape s, the result has shape (*s, 2). What's the neatest way to do this? x may just be a float, in which case I want a result with shape (2,).
An ugly way to do this is:
import numpy as np
x = np.asarray(x)
result = np.empty((*x.shape, 2))
result[..., 0] = x
result[..., 1] = np.log(x)
It's important to separate aesthetics from performance. Sometimes ugly code is
fast. In fact, that's the case here. Although creating an empty array and then
assigning values to slices may not look beautiful, it is fast.
import numpy as np
import timeit
import itertools as IT
import pandas as pd
def using_empty(x):
x = np.asarray(x)
result = np.empty(x.shape + (2,))
result[..., 0] = x
result[..., 1] = np.log(x)
return result
def using_concat(x):
x = np.asarray(x)
return np.concatenate([x, np.log(x)], axis=-1).reshape(x.shape+(2,), order='F')
def using_stack(x):
x = np.asarray(x)
return np.stack([x, np.log(x)], axis=x.ndim)
def using_ufunc(x):
return np.array([x, np.log(x)])
using_ufunc = np.vectorize(using_ufunc, otypes=[np.ndarray])
tests = [np.arange(600),
np.arange(600).reshape(20,30),
np.arange(960).reshape(8,15,8)]
# check that all implementations return the same result
for x in tests:
assert np.allclose(using_empty(x), using_concat(x))
assert np.allclose(using_empty(x), using_stack(x))
timing = []
funcs = ['using_empty', 'using_concat', 'using_stack', 'using_ufunc']
for test, func in IT.product(tests, funcs):
timing.append(timeit.timeit(
'{}(test)'.format(func),
setup='from __main__ import test, {}'.format(func), number=1000))
timing = pd.DataFrame(np.array(timing).reshape(-1, len(funcs)), columns=funcs)
print(timing)
yields, the following timeit results on my machine:
using_empty using_concat using_stack using_ufunc
0 0.024754 0.025182 0.030244 2.414580
1 0.025766 0.027692 0.031970 2.408344
2 0.037502 0.039644 0.044032 3.907487
So using_empty is the fastest (of the options tested applied to tests).
Note that np.stack does exactly what you want, so
np.stack([x, np.log(x)], axis=x.ndim)
looks reasonably pretty, but it is also the slowest of the three options tested.
Note that along with being much slower, using_ufunc returns an array of object dtype:
In [236]: x = np.arange(6)
In [237]: using_ufunc(x)
Out[237]:
array([array([ 0., -inf]), array([ 1., 0.]),
array([ 2. , 0.69314718]),
array([ 3. , 1.09861229]),
array([ 4. , 1.38629436]), array([ 5. , 1.60943791])], dtype=object)
which is not the same as the desired result:
In [240]: using_empty(x)
Out[240]:
array([[ 0. , -inf],
[ 1. , 0. ],
[ 2. , 0.69314718],
[ 3. , 1.09861229],
[ 4. , 1.38629436],
[ 5. , 1.60943791]])
In [238]: using_ufunc(x).shape
Out[238]: (6,)
In [239]: using_empty(x).shape
Out[239]: (6, 2)

calculation of residuals with numpy lstsq

I have x,y data:
import numpy as np
x = np.array([ 2.5, 1.25, 0.625, 0.3125, 0.15625, 0.078125])
y = np.array([ 2448636.,1232116.,617889.,310678.,154454.,78338.])
X = np.vstack((x, np.zeros(len(x))))
popt,res,rank,val = np.linalg.lstsq(X.T,y)
popt,res,rank,val
Gives me:
(array([ 981270.29919414, 0. ]),
array([], dtype=float64),
1,
array([ 2.88639894, 0. ]))
Why are the residuals zero ? If I add ones instead of zero the residuals are calculated:
X = np.vstack((x, np.ones(len(x)))) # added ones instead of zeros
popt,res,rank,val = np.linalg.lstsq(X.T,y)
popt,res,rank,val
(array([ 978897.28500355, 4016.82089552]),
array([ 42727293.12864216]),
2,
array([ 3.49623683, 1.45176681]))
Additionally, If I calculate the sum of squared residuals in excel i get 9261214 if the intercept is set zero and 5478137 if ones are added to x.
lstsq is going to have a tough time fitting to that column of zeros: any value of the corresponding parameter (presumably intercept) will do.
To fix the intercept to 0, if that's what you need to do, just send the x array, but make sure that it's the right shape for lstsq:
In [214]: popt,res,rank,val = np.linalg.lstsq(np.atleast_2d(x).T,y)
In [215]: popt
Out[215]: array([ 981270.29919414])
In [216]: res
Out[216]: array([ 92621214.2278382])

Categories