I'm having an issue where I'm adding two 4x1 arrays and the result is a 4x4 array where the first column is repeated 4 times. The result I need is a 4x1 array.
I've initialized an array as such (m = 4): z = np.zeros((m, len(t))
Later in my code I pass this array into a function as z[:,k+1] so the dimensionality becomes a 4x1 array. (Note that when I print this array to my terminal is shows up as a row vector and not a column vector: [0. 0. 0. 0.], I'm not sure why this is either). The array that I'm trying to add to z has the following structure when printed to my terminal:
[[#]
[#]
[#]
[#]]
Clearly the addition is pulling the above array into each element of z instead of adding their respective components together, but I'm not sure why as they should both be column vectors. I'd appreciate any help with this.
EDIT: I have a lot of code so I've included a condensed version that hopefully gets the idea accross.
n = 4 # Defines number of states
m = 4 # Defines number of measurements
x = np.zeros((n, len(t)), dtype=np.float64) # Initializes states
z = np.zeros((m, len(t)), dtype=np.float64) # Initializes measurements
u = np.zeros((1, len(t)), dtype=np.float64) # Initializes input
...
C = np.eye(m) # Defines measurement matrix
...
for k in range(len(t)-1):
...
x_ukf[:,k+1], P_ukf[k+1,:,:] = function_call(x_ukf[:,k], z[:,k+1], u[:,k], P_ukf[k,:,:], C, Q, R, T) # Calls UKF function
This then leads to the function where the following occurrs (note that measurement_matrix = C (4x4 matrix), X is a 4x9 matrix, and W a 1x9 row vector):
Z = measurement_matrix # X # Calculates measurements based on sigma points
zhat = Z # W.T
...
state_vec = state_vec + K # (measurement_vec - zhat) # Updates state estimates
The issue I'm having is with the expression (measurement_vec - zhat). This is where the result should be a 4x1 vector but I'm getting a 4x4 matric.
This is sometimes called broadcasting:
a, b = np.arange(4), np.arange(8,12)
c = a + b[:,None]
Output:
array([[ 8, 9, 10, 11],
[ 9, 10, 11, 12],
[10, 11, 12, 13],
[11, 12, 13, 14]])
Related
I have an n row, m column numpy array, and would like to create a new k x m array by selecting k random elements from each column of the array. I wrote the following python function to do this, but would like to implement something more efficient and faster:
def sample_array_cols(MyMatrix, nelements):
vmat = []
TempMat = MyMatrix.T
for v in TempMat:
v = np.ndarray.tolist(v)
subv = random.sample(v, nelements)
vmat = vmat + [subv]
return(np.array(vmat).T)
One question is whether there's a way to loop over each column without transposing the array (and then transposing back). More importantly, is there some way to map the random sample onto each column that would be faster than having a for loop over all columns? I don't have that much experience with numpy objects, but I would guess that there should be something analogous to apply/mapply in R that would work?
One alternative is to randomly generate the indices first, and then use take_along_axis to map them to the original array:
arr = np.random.randn(1000, 5000) # arbitrary
k = 10 # arbitrary
n, m = arr.shape
idx = np.random.randint(0, n, (k, m))
new = np.take_along_axis(arr, idx, axis=0)
Output (shape):
in [215]: new.shape
out[215]: (10, 500) # (k x m)
To sample each column without replacement just like your original solution
import numpy as np
matrix = np.arange(4*3).reshape(4,3)
matrix
Output
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
k = 2
np.take_along_axis(matrix, np.random.rand(*matrix.shape).argsort(axis=0)[:k], axis=0)
Output
array([[ 9, 1, 2],
[ 3, 4, 11]])
I would
Pre-allocate the result array, and fill in columns, and
Use numpy index based indexing
def sample_array_cols(matrix, n_result):
(n,m) = matrix.shape
vmat = numpy.array([n_result, m], dtype= matrix.dtype)
for c in range(m):
random_indices = numpy.random.randint(0, n, n_result)
vmat[:,c] = matrix[random_indices, c]
return vmat
Not quite fully vectorized, but better than building up a list, and the code scans just like your description.
I'm working on some code for dehazing images, based on this paper, and I started with an abandoned Py2.7 implementation. Since then, particularly with Numba, I've made some real performance improvements (important since I'll have to run this on 8K images).
I'm pretty convinced my last significant performance bottleneck is in performing the box filter step (I've already shaved off almost a minute per image, but this last slow step is ~30s/image), and I'm close to getting it to run as nopython in Numba:
#njit # Row dependencies means can't be parallel
def yCumSum(a):
"""
Numba based computation of y-direction
cumulative sum. Can't be parallel!
"""
out = np.empty_like(a)
out[0, :] = a[0, :]
for i in prange(1, a.shape[0]):
out[i, :] = a[i, :] + out[i - 1, :]
return out
#njit(parallel= True)
def xCumSum(a):
"""
Numba-based parallel computation
of X-direction cumulative sum
"""
out = np.empty_like(a)
for i in prange(a.shape[0]):
out[i, :] = np.cumsum(a[i, :])
return out
#jit
def _boxFilter(m, r, gpu= hasGPU):
if gpu:
m = cp.asnumpy(m)
out = __boxfilter__(m, r)
if gpu:
return cp.asarray(out)
return out
#jit(fastmath= True)
def __boxfilter__(m, r):
"""
Fast box filtering implementation, O(1) time.
Parameters
----------
m: a 2-D matrix data normalized to [0.0, 1.0]
r: radius of the window considered
Return
-----------
The filtered matrix m'.
"""
#H: height, W: width
H, W = m.shape
#the output matrix m'
mp = np.empty(m.shape)
#cumulative sum over y axis
ySum = yCumSum(m) #np.cumsum(m, axis=0)
#copy the accumulated values of the windows in y
mp[0:r+1,: ] = ySum[r:(2*r)+1,: ]
#differences in y axis
mp[r+1:H-r,: ] = ySum[(2*r)+1:,: ] - ySum[ :H-(2*r)-1,: ]
mp[(-r):,: ] = np.tile(ySum[-1,: ], (r, 1)) - ySum[H-(2*r)-1:H-r-1,: ]
#cumulative sum over x axis
xSum = xCumSum(mp) #np.cumsum(mp, axis=1)
#copy the accumulated values of the windows in x
mp[:, 0:r+1] = xSum[:, r:(2*r)+1]
#difference over x axis
mp[:, r+1:W-r] = xSum[:, (2*r)+1: ] - xSum[:, :W-(2*r)-1]
mp[:, -r: ] = np.tile(xSum[:, -1][:, None], (1, r)) - xSum[:, W-(2*r)-1:W-r-1]
return mp
There's plenty to do around the edges, but if I can get the tile operation as a nopython call, I can nopython the whole boxfilter step and get a big performance boost. I'm not super inclined to do something really really specific as I'd love to reuse this code elsewhere, but I wouldn't particularly object to it being limited to a 2D scope. For whatever reason I'm just staring at this and not really sure where to start.
np.tile is a bit too complicated to reimplement in full, but unless I'm misreading it looks like you only need to take a vector and then repeat it along a different axis r times.
A Numba-compatible way to do this is to write
y = x.repeat(r).reshape((-1, r))
Then x will be repeated r times along the second dimension, so that y[i, j] == x[i].
Example:
In [2]: x = np.arange(5)
In [3]: x.repeat(3).reshape((-1, 3))
Out[3]:
array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2],
[3, 3, 3],
[4, 4, 4]])
If you want x to be repeated along the first dimension instead, just take the transpose y.T.
I'll try to explain my issue here without going into too much detail on the actual application so that we can stay grounded in the code. Basically, I need to do operations to a vector field. My first step is to generate the field as
x,y,z = np.meshgrid(np.linspace(-5,5,10),np.linspace(-5,5,10),np.linspace(-5,5,10))
Keep in mind that this is a generalized case, in the program, the bounds of the vector field are not all the same. In the general run of things, I would expect to say something along the lines of
u,v,w = f(x,y,z).
Unfortunately, this case requires so more difficult operations. I need to use a formula similar to
where the vector r is defined in the program as np.array([xgrid-x,ygrid-y,zgrid-z]) divided by its own norm. Basically, this is a vector pointing from every point in space to the position (x,y,z)
Now Numpy has implemented a cross product function using np.cross(), but I can't seem to create a "meshgrid of vectors" like I need.
I have a lambda function that is essentially
xgrid,ygrid,zgrid=np.meshgrid(np.linspace(-5,5,10),np.linspace(-5,5,10),np.linspace(-5,5,10))
B(x,y,z) = lambda x,y,z: np.cross(v,np.array([xgrid-x,ygrid-y,zgrid-z]))
Now the array v is imported from another class and seems to work just fine, but the second array, np.array([xgrid-x,ygrid-y,zgrid-z]) is not a proper shape because it is a "vector of meshgrids" instead of a "meshgrid of vectors". My big issue is that I cannot seem to find a method by which to format the meshgrid in such a way that the np.cross() function can use the position vector. Is there a way to do this?
Originally I thought that I could do something along the lines of:
x,y,z = np.meshgrid(np.linspace(-2,2,5),np.linspace(-2,2,5),np.linspace(-2,2,5))
A = np.array([x,y,z])
cross_result = np.cross(np.array(v),A)
This, however, returns the following error, which I cannot seem to circumvent:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 1682, in cross
raise ValueError(msg)
ValueError: incompatible dimensions for cross product
(dimension must be 2 or 3)
There's a work around with reshape and broadcasting:
A = np.array([x_grid, y_grid, z_grid])
# A.shape == (3,5,5,5)
def B(v, p):
'''
v.shape = (3,)
p.shape = (3,)
'''
shape = A.shape
Ap = A.reshape(3,-1) - p[:,None]
return np.cross(v[None,:], Ap.reshape(3,-1).T).reshape(shape)
print(B(v,p).shape)
# (3, 5, 5, 5)
I think your original attempt only lacks the specification of the axis along which the cross product should be executed.
x, y, z = np.meshgrid(np.linspace(-2, 2, 5),np.linspace(-2, 2, 5), np.linspace(-2, 2, 5))
A = np.array([x, y, z])
cross_result = np.cross(np.array(v), A, axis=0)
I tested this with the code below. As an alternative to np.array([x, y, z]), you can also use np.stack(x, y, z, axis=0), which clearly shows along which axis the meshgrids are stacked to form a meshgrid of vectors, the vectors being aligned with axis 0. I also printed the shape each time and used random input for testing. In the test, the output of the formula is compared at a random index to the cross product of the input-vector at the same index with vector v.
import numpy as np
x, y, z = np.meshgrid(np.linspace(-5, 5, 10), np.linspace(-5, 5, 10), np.linspace(-5, 5, 10))
p = np.random.rand(3) # random reference point
A = np.array([x-p[0], y-p[1], z-p[2]]) # vectors from positions to reference
A_bis = np.stack((x-p[0], y-p[1], z-p[2]), axis=0)
print(f"A equals A_bis? {np.allclose(A, A_bis)}") # the two methods of stacking yield the same
v = -1 + 2*np.random.rand(3) # random vector v
B = np.cross(v, A, axis=0) # cross-product for all points along correct axis
print(f"Shape of v: {v.shape}")
print(f"Shape of A: {A.shape}")
print(f"Shape of B: {B.shape}")
print("\nComparison for random locations: ")
point = np.random.randint(0, 9, 3) # generate random multi-index
a = A[:, point[0], point[1], point[2]] # look up input-vector corresponding to index
b = B[:, point[0], point[1], point[2]] # look up output-vector corresponding to index
print(f"A[:, {point[0]}, {point[1]}, {point[2]}] = {a}")
print(f"v = {v}")
print(f"Cross-product as v x a: {np.cross(v, a)}")
print(f"Cross-product from B (= v x A): {b}")
The resulting output looks like:
A equals A_bis? True
Shape of v: (3,)
Shape of A: (3, 10, 10, 10)
Shape of B: (3, 10, 10, 10)
Comparison for random locations:
A[:, 8, 1, 1] = [-4.03607312 3.72661831 -4.87453077]
v = [-0.90817859 0.10110274 -0.17848181]
Cross-product as v x a: [ 0.17230515 -3.70657882 -2.97637688]
Cross-product from B (= v x A): [ 0.17230515 -3.70657882 -2.97637688]
I have a set of data in python likes:
x y angle
If I want to calculate the distance between two points with all possible value and plot the distances with the difference between two angles.
x, y, a = np.loadtxt('w51e2-pa-2pk.log', unpack=True)
n = 0
f=(((x[n])-x[n+1:])**2+((y[n])-y[n+1:])**2)**0.5
d = a[n]-a[n+1:]
plt.scatter(f,d)
There are 255 points in my data.
f is the distance and d is the difference between two angles.
My question is can I set n = [1,2,3,.....255] and do the calculation again to get the f and d of all possible pairs?
You can obtain the pairwise distances through broadcasting by considering it as an outer operation on the array of 2-dimensional vectors as follows:
vecs = np.stack((x, y)).T
np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
For example,
In [1]: import numpy as np
...: x = np.array([1, 2, 3])
...: y = np.array([3, 4, 6])
...: vecs = np.stack((x, y)).T
...: np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
...:
Out[1]:
array([[ 0. , 1.41421356, 3.60555128],
[ 1.41421356, 0. , 2.23606798],
[ 3.60555128, 2.23606798, 0. ]])
Here, the (i, j)'th entry is the distance between the i'th and j'th vectors.
The case of the pairwise differences between angles is similar, but simpler, as you only have one dimension to deal with:
In [2]: a = np.array([10, 12, 15])
...: a[np.newaxis, :] - a[: , np.newaxis]
...:
Out[2]:
array([[ 0, 2, 5],
[-2, 0, 3],
[-5, -3, 0]])
Moreover, plt.scatter does not care that the results are given as matrices, and putting everything together using the notation of the question, you can obtain the plot of angles by distances by doing something like
vecs = np.stack((x, y)).T
f = np.linalg.norm(vecs[np.newaxis, :] - vecs[:, np.newaxis], axis=2)
d = angle[np.newaxis, :] - angle[: , np.newaxis]
plt.scatter(f, d)
You have to use a for loop and range() to iterate over n, e.g. like like this:
n = len(x)
for i in range(n):
# do something with the current index
# e.g. print the points
print x[i]
print y[i]
But note that if you use i+1 inside the last iteration, this will already be outside of your list.
Also in your calculation there are errors. (x[n])-x[n+1:] does not work because x[n] is a single value in your list while x[n+1:] is a list starting from n+1'th element. You can not subtract a list from an int or whatever it is.
Maybe you will have to even use two nested loops to do what you want. I guess that you want to calculate the distance between each point so a two dimensional array may be the data structure you want.
If you are interested in all combinations of the points in x and y I suggest to use itertools, which will give you all possible combinations. Then you can do it like follows:
import itertools
f = [((x[i]-x[j])**2 + (y[i]-y[j])**2)**0.5 for i,j in itertools.product(255,255) if i!=j]
# and similar for the angles
But maybe there is even an easier way...
I'm very new to Python and currently trying to replicate plots etc that I previously used GrADs for. I want to calculate the divergence at each grid box using u and v wind fields (which are just scaled by specific humidity, q), from a netCDF climate model file.
From endless searching I know I need to use some combination of np.gradient and np.sum, but can't find the right combination. I just know that to do it 'by hand', the calculation would be
divg = dqu/dx + dqv/dy
I know the below is wrong, but it's the best I've got so far...
nc = Dataset(ifile)
q = np.array(nc.variables['hus'][0,:,:])
u = np.array(nc.variables['ua'][0,:,:])
v = np.array(nc.variables['va'][0,:,:])
lon=nc.variables['lon'][:]
lat=nc.variables['lat'][:]
qu = q*u
qv = q*v
dqu/dx, dqu/dy = np.gradient(qu, [dx, dy])
dqv/dx, dqv/dy = np.gradient(qv, [dx, dy])
divg = np.sum(dqu/dx, dqv/dy)
This gives the error 'SyntaxError: can't assign to operator'.
Any help would be much appreciated.
try something like:
dqu_dx, dqu_dy = np.gradient(qu, [dx, dy])
dqv_dx, dqv_dy = np.gradient(qv, [dx, dy])
you can not assign to any operation in python; any of those are syntax errors:
a + b = 3
a * b = 7
# or, in your case:
a / b = 9
UPDATE
following Pinetwig's comment: a/b is not a valid identifier name; it is (the return value of) an operator.
Try removing the [dx, dy].
[dqu_dx, dqu_dy] = np.gradient(qu)
[dqv_dx, dqv_dy] = np.gradient(qv)
Also to point out if you are recreating plots. Gradient changed in numpy between 1.82 and 1.9. This had an effect for recreating matlab plots in python as 1.82 was the matlab method. I am not sure how this relates to GrADs. Here is the wording for both.
1.82
"The gradient is computed using central differences in the interior
and first differences at the boundaries. The returned gradient hence has
the same shape as the input array."
1.9
"The gradient is computed using second order accurate central differences in the interior and either first differences or second order accurate one-sides (forward or backwards) differences at the boundaries. The returned gradient hence has the same shape as the input array."
The gradient function for 1.82 is here.
def gradient(f, *varargs):
"""
Return the gradient of an N-dimensional array.
The gradient is computed using central differences in the interior
and first differences at the boundaries. The returned gradient hence has
the same shape as the input array.
Parameters
----------
f : array_like
An N-dimensional array containing samples of a scalar function.
`*varargs` : scalars
0, 1, or N scalars specifying the sample distances in each direction,
that is: `dx`, `dy`, `dz`, ... The default distance is 1.
Returns
-------
gradient : ndarray
N arrays of the same shape as `f` giving the derivative of `f` with
respect to each dimension.
Examples
--------
>>> x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float)
>>> np.gradient(x)
array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ])
>>> np.gradient(x, 2)
array([ 0.5 , 0.75, 1.25, 1.75, 2.25, 2.5 ])
>>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float))
[array([[ 2., 2., -1.],
[ 2., 2., -1.]]),
array([[ 1. , 2.5, 4. ],
[ 1. , 1. , 1. ]])]
"""
f = np.asanyarray(f)
N = len(f.shape) # number of dimensions
n = len(varargs)
if n == 0:
dx = [1.0]*N
elif n == 1:
dx = [varargs[0]]*N
elif n == N:
dx = list(varargs)
else:
raise SyntaxError(
"invalid number of arguments")
# use central differences on interior and first differences on endpoints
outvals = []
# create slice objects --- initially all are [:, :, ..., :]
slice1 = [slice(None)]*N
slice2 = [slice(None)]*N
slice3 = [slice(None)]*N
otype = f.dtype.char
if otype not in ['f', 'd', 'F', 'D', 'm', 'M']:
otype = 'd'
# Difference of datetime64 elements results in timedelta64
if otype == 'M' :
# Need to use the full dtype name because it contains unit information
otype = f.dtype.name.replace('datetime', 'timedelta')
elif otype == 'm' :
# Needs to keep the specific units, can't be a general unit
otype = f.dtype
for axis in range(N):
# select out appropriate parts for this dimension
out = np.empty_like(f, dtype=otype)
slice1[axis] = slice(1, -1)
slice2[axis] = slice(2, None)
slice3[axis] = slice(None, -2)
# 1D equivalent -- out[1:-1] = (f[2:] - f[:-2])/2.0
out[slice1] = (f[slice2] - f[slice3])/2.0
slice1[axis] = 0
slice2[axis] = 1
slice3[axis] = 0
# 1D equivalent -- out[0] = (f[1] - f[0])
out[slice1] = (f[slice2] - f[slice3])
slice1[axis] = -1
slice2[axis] = -1
slice3[axis] = -2
# 1D equivalent -- out[-1] = (f[-1] - f[-2])
out[slice1] = (f[slice2] - f[slice3])
# divide by step size
outvals.append(out / dx[axis])
# reset the slice object in this dimension to ":"
slice1[axis] = slice(None)
slice2[axis] = slice(None)
slice3[axis] = slice(None)
if N == 1:
return outvals[0]
else:
return outvals
If your grid is Gaussian and the wind names in the file are "u" and "v" you can also calculate divergence directly using cdo:
cdo uv2dv in.nc out.nc
See https://code.mpimet.mpg.de/projects/cdo/embedded/index.html#x1-6850002.13.2 for more details.