How to prevent python from reordering dimensions - python

It's hard to articulate the question I have exactly. I want to subscript all values in the array, but I want the order to be different. The code is below:
# import numpy
import numpy as np
# variables
y = np.zeros([30,120,1440])
composite_events = np.zeros([30,120,1440])
# longitude array
lon = np.linspace(-179.25,179.25,1440)
# index for the starting point
center_lon_index = np.int(500)
# index for shifting the longitude array
lonindex = (np.arange(start=0,stop=np.size(lon),step=1) + center_lon_index) % np.size(lon)
# set the composite events to be the same as the data, but centering a particular point
composite_events[0,:,:] = y[0,:,lonindex]
The code returns the following error.
ValueError: could not broadcast input array from shape (1440,120) into shape (120,1440)
I understand the error, but as far as I can tell, the shape of y should be the same as the shape of composite_events. This type of code works in other languages I've used. What is python doing here? Thanks!

Related

Is it possible to use an array as a list of indices of a matrix to define a new matrix WITHOUT for loops?

I'm have a 3D problem where to final output is an array in the xy plane. I have an array in the x-z plane (dimensions (xsiz, zsiz)) and an array in the y-plane (dimension ysiz) as below:
xz = np.zeros((xsiz, zsiz))
y = (np.arange(ysiz)*(zsiz/ysiz)).astype(int)
xz can be thought of as an array of (zsiz) column vectors of size (xsiz) and labelled by z in range (0, zsiz-1). These are not conveniently accessible given the current setup - I've been retrieving them by np.transpose(xz)[z]. I would like the y array to act like a list of z values and take the column vectors labelled by these z values and combine them in a matrix with final dimension (xsiz, ysiz). (It seems likely to me that it will be easier to work with the transpose of xz so the row vectors can be retrieved as above and combined giving a (ysiz, xsiz) matrix which can then be transposed but I may be wrong.)
This would be a simple using for loops and I've given an example of a such a loop that does what I want below in case my explanation isn't clear. However, the final intention is for this code to be parallelized using CuPy so ideally I would like the entire process to be carried out by matrix manipulation. It seems like it should be possible like this but I can't think how!
Any help greatly appreciated.
import numpy as np
xsiz = 5 #sizes given random values for example
ysiz = 6
zsiz = 4
xz = np.arange(xsiz*zsiz).reshape(xsiz, zsiz)
y = (np.arange(ysiz)*(zsiz/ysiz)).astype(int)
xzT = np.transpose(xz)
final_xyT = np.zeros((0, xsiz))
for i in range(ysiz):
index = y[i]
xvec = xzT[index]
final_xyT = np.vstack((final_xyT, xvec))
#indexing could go wrong here if y contained large numbers
#CuPy's indexing wraps around so hopefully this shouldn't be too big an issue
final_xy = np.transpose(final_xyT)
print(xz)
print(final_xy)
If I correctly get your problem you need this:
xz[:,y]

Xarray add (as in sum) two rows along same dimension but at different coordinate value

I have used xarray to create two different DataArrays with the same dimensions and coordinates. However I want to add two different coordinates in one of these dimensions. I'm trying to add coordinate 'a' to coordinate 'b' in dimension 'x'. There is an easy workaround if these are the only dimensions of my matrix but more complicated if I have more dimensions and I want to keep the normal xarray behaviour for the other dimensions. Please see the example below that fails on the last line. I know how to manually fix this in numpy but the beauty of xarray is that I shouldn't have to.
Does xarray allow an easy solution for this kind of operation?
import xarray as xr
import numpy as np
# create simple DataArray M and N to show what I would like to do
M = xr.DataArray([1, 2], dims="x",coords={'x':['a','b']})
N = xr.DataArray([3, 4], dims="x",coords={'x':['a','b']})
print(M.sel(x='a')+N.sel(x='b')) # this will NOT give me the value
print(M.sel(x='a').values+N.sel(x='b').values) # this will give me the value
# create a more complex DataArray M and N to show what the challenge
m = np.arange(3*2*4)
m = m.reshape(3,2,4)
n = np.arange(4*2*3)
n = n.reshape(4,2,3)
M = xr.DataArray(m, dims=['z1',"x","z2"],coords={'x':['a','b']})
N = xr.DataArray(n, dims=["z2",'x','z1'],coords={'x':['a','b']})
print(M.sel(x='a')+N.sel(x='b')) # this will NOT give me the value
print(M.sel(x='a').values+N.sel(x='b').values) # this will result in an error

scipy ND Interpolating over NaNs

I have been trouble working out how to use the scipy.interpolate functions (either LinearNDInterpolator, griddata or Preferably NearestNDInterpolator)
There are some tutorials online but i am confused what form my data needs to be in.
The online documentation for nearestND is terrible.
The function asks for:
x : (Npoints, Ndims) ndarray of floats
Data point coordinates.
y : (Npoints,) ndarray of float or complex
Data point values.
I have data in the form: lat,long,data,time held within an xarray dataset. There are some gaps in the data I would like to fill in.
I don't understand how to tell the function my x points.
i have tried (lat,long) as a tuple and np.meshgrid(lat,long) but can't seem to get it going.
Any help on how i can pass my lat,long coordinates into the function? Bonus points for time coordinates as well to make the estimates more robust through the third dimension.
Thanks!
i have tried (lat,long) as a tuple
If lat and long are 1D arrays or lists, try this:
points = np.array((lat, long)).T # make a 2D array of shape Npoints x 2
nd = NearestNDInterpolator(points, data)
The you can compute interpolated values as nd(lat1, long1), etc.
Scipy provides multivariate interpolation methods for both unstructured data and data point regularly placed on a grid. Unstructured data means the data could be provided as a list of non-ordered points. It seems that your data is structured: it is an array of size (480, 2040). However, the NearestNDInterpolator works on unstructured data. The flatten method can be used to transform the array to a list (1d) of value (of length 480*2040). The same have to be done for the coordinates. meshgrid is used to have the coordinates for every points of the grid, and again flatten is used to obtain a "list" of 2d coordinates (an array of shape 480*2040 x 2).
Here is an example which go from structured data to unstructured:
import numpy as np
lat = np.linspace(2, 6, 10)
lon = np.linspace(5, 9, 14)
latM, lonM = np.meshgrid(lat, lon) # M is for Matrix
dataM = np.sin(latM)*np.cos(lonM) # example of data, Matrix form
from scipy.interpolate import NearestNDInterpolator
points = np.array((latM.flatten(), lonM.flatten())).T
print( points.shape )
# >>> (140, 2)
f_nearest = NearestNDInterpolator(points, dataM.flatten())
f_nearest(5, 5)
Working with NaNs should not be a big problem in this case, because it is just a missing point in the list, except that the coordinates of the missing points have to be removed from the list too.

Getting Around "ValueError: operands could not be broadcast together"

The code below yields the following value error.
ValueError: operands could not be broadcast together with shapes (8,8) (64,)
It first arose when I expanded the "training" data set from 10 images to 100. The interpreter seems to be telling me that I can't perform any coordinate-wise operations on these data points because one of the coordinate pairs is missing a value. I can't argue with that. Unfortunately, my work arounds haven't exactly worked out. I attempted to insert an if condition followed by a continue statement (i.e., if this specific coordinate comes up, it should continue from the top of the loop). The interpreter didn't like this idea and muttered something about the truth of that statement not being as cut and dry as I thought. It suggested I try a.any() or a.all(). I checked out examples of both, and tried placing the problematic coordinate pair in the parenthesis and in place of the "a." Both approaches got me nowhere. I'm unaware of any Python functions similar to the functions I would use in C to exclude inputs that don't meet specific criteria. Other answers pertaining to similar problems recommend changing the math one uses, but I was told that this is how I am to proceed, so I'm looking at it as an error handling problem.
Does anyone have any insight concerning how one might handle this issue? Any thoughts would be greatly appreciated!
Here's the code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
digits = datasets.load_digits()
#print the 0th image in the image database as an integer matrix
print(digits.images[0])
#plot the 0th image in the database assigning each pixel an intensity of black
plt.figure()
plt.imshow(digits.images[0], cmap = plt.cm.gray_r, interpolation = 'nearest')
plt.show()
#create training subsets of images and targets(labels)
X_train = digits.images[0:1000]
Y_train = digits.target[0:1000]
#pick a test point from images (345)
X_test = digits.images[345]
#view test data point
plt.figure()
plt.imshow(digits.images[345], cmap = plt.cm.gray_r, interpolation = 'nearest')
plt.show()
#distance
def dist(x, y):
return np.sqrt(np.sum((x - y)**2))
#expand set of test data
num = len(X_train)
no_errors = 0
distance = np.zeros(num)
for j in range(1697, 1797):
X_test = digits.data[j]
for i in range(num):
distance[i] = dist(X_train[i], X_test)
min_index = np.argmin(distance)
if Y_train[min_index] != digits.target[j]:
no_errors += 1
print(no_errors)
You need to show us where the error occurs, and some of the error stack.
Then you need to identify which arrays are causing the problem, and examine their shape. Actually the error tells us that. One operand is a 8x8 2d array. The other has the same number of elements but with a 1d shape. You may have to trace some variables back to your own code.
Just to illustrate the problem:
In [381]: x = np.ones((8,8),int)
In [384]: y = np.arange(64)
In [385]: x*y
...
ValueError: operands could not be broadcast together with shapes (8,8) (64,)
In [386]: x[:] = y
...
ValueError: could not broadcast input array from shape (64) into shape (8,8)
Since the 2 arrays have the same number of elements, a fix likely involves reshaping one or the other:
In [387]: x.ravel() + y
Out[387]:
array([ 1, 2, 3, 4, 5, ... 64])
or x-y.reshape(8,8).
My basic point is, you need to understand what array shapes mean, and how arrays of different shape can be used together. You don't 'get around' the error, you fix the inputs so they are 'broadcasting' compatible.
I don't think problem is with the value of a specific element.
The truth value error occurs when you try to test an array in a if context. if expects a simple True or False, not an array of True/False values.
In [389]: if x>0:print('yes')
....
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

NumPy: Pick 2D indices of minimum values over 4D array

I have a function f(x,y,v,w) that I've evaluated over a range of values in (x,y,v,w) and stored in a 4D NumPy array, let's call it A.
I want a way to find two 2D arrays, V_best and W_best that hold the values of v,w that minimize f(x,y,v,w) over x,y. I've approached this by attempting to retrieve the indices of the values of (v,w) that give the minimum values of A over (x,y).
I've tried to use argmin for this, but I can't wrap my head around what the 3D arrays I get in return are, or how to use them in this context. As with many things I'm sure there's an obvious way to do this.
What I have is,
x = np.linspace(0,1,N1)
y = np.linspace(0,1,N2)
v = np.linspace(-5,5,N3)
w = np.linspace(-5,5,N4)
V,W,X,Y = np.meshgrid(v,w,x,y)
VALUEGRID = myfunc(V,W,X,Y)
V_besti = np.argmin(VALUEGRID,axis=0)
W_besti = np.argmin(VALUEGRID,axis=1)
Ideally, V_best and W_best will be of shape (N1,N2), corresponding to the dimensions of the range of x,y. I hope this is sufficiently clear.
Thank you in advance.

Categories