Hi I'm reading two rasters A & B as arrays.
What I'm looking for is to make an opperation to certain cells within two 2D arrays (two rasters). I need to subtract -3.0 to the cells in one array (A) that are greater than the other cells within the 2D array (B).
All the other cells don't need to change, so my answer will be the 2D array (B) with some changed cells that fit that condition and the other 2D array (A) untouched.
I tried this but doesn't seem to work (also takes TOO long):
A = Raster_A.GetRasterBand(1).ReadAsArray()
B = Raster_B.GetRasterBand(1).ReadAsArray()
A = array([ 917.985028, 916.284480, 918.525323, 920.709505,
921.835315, 922.328555, 920.283029, 922.229594,
922.928670, 925.315534, 922.280360, 922.715303,
925.933969, 925.897328, 923.880606, 923.864701])
B = array([ 913.75785758, 914.45941854, 915.17586919, 915.90724705,
916.6534542 , 917.4143068 , 918.18957846, 918.97902532,
919.78239295, 920.59941086, 921.42978108, 922.27316565,
923.12917544, 923.99736194, 924.87721232, 925.76814782])
for i in np.nditer(A, op_flags=['readwrite']):
for j in np.nditer(B, op_flags=['readwrite']):
if j[...] > i[...]:
B = j[...]-3.0
So the answer, the array B should be something like:
B = array([ 913.75785758, 914.45941854, 915.17586919, 915.90724705,
916.6534542 , 917.4143068 , 918.18957846, 918.97902532,
919.78239295, 920.59941086, 921.42978108, 922.27316565,
923.12917544, 923.99736194, 921.87721232, 922.76814782])
Please notice the two bottom right values :)
I'm a bit dizzy already from trying and doing other stuff at the same time so I apologize if I did any stupidity right there, any suggestion is greatly appreciated. Thanks!
Based on your example, I conclude that you want to subtract values from the array B. This can be done via
B[A<B] -= 3
The "mask" A<B is a boolean array that is true at all the values that you want to change. Now, B[A<B] returns a view to exactly these values. Finally, B[A<B] -= 3 changes all these values in place.
It is crucial that you use the inplace operator -=, because otherwise a new array will be created that contain only the values where A<B. Thereby, the array is flattened, i.e. looses its shape, and you do not want that.
Regarding speed, avoid for loops as much as you can when working with numpy. Fancy indexing and slicing offers you very neat (and super fast) options to work with your data. Maybe have a look here and here.
Related
first time asking a question and I'm also really bad at coding. I have a 2D array shape=(100,4000) and want to manipulate all values in each row that are greater than the corresponding value in an 1D array shape=(100,).
I want to do this in the most efficient way as it will soon be an array with shape=(1000,8000) potentially. Not sure how to broadcast in a boolean mask or if I even can? Below is an example of my code that is obviously not working. I could do what I want by using a for loop or by duplicating the LX array to be the shape=(100,4000) but I want it to be faster than these options.
dX = np.random.rand(100,4000)
LX = np.random.rand(100)
LX2 = LX/2
dX[dX>LX2] = dX[dX>LX2]-LX[dX>LX2]
I've been trying to do the following as a batch operation in numpy or torch (no looping). Is this possible?
Suppose I have:
indices: [[3],[2]] (2x1)
output: [[0,0,0,0,1], [0,0,0,1,1]] (2xfixed_num) where fixed_num is 5 here
Essentially, I want to make indices up to that index value 0 and the rest 1 for each element.
Ok, so I actually assume this is some sort of HW assignment - but maybe it's not, either way it was fun to do, here's a solution for your specific example, maybe you can generalize it to any shape array:
def fill_ones(arr, idxs):
x = np.where(np.arange(arr.shape[1]) <= idxs[0], 0, 1) # This is the important logic.
y = np.where(np.arange(arr.shape[1]) <= idxs[1], 0, 1)
return np.array([x, y])
So where the comment is located - we use a condition to assign 0 to all indices before some index value, and 1 after such value. This actually creates a new array as opposed to a mask that we can use to the original array - so maybe it's "dirtier".
Also, I suspect it's possible to generalize to arrays more than 2 dimensions, but the solution i'm imagining now uses a for-loop. Hope this helps!
Note: arr is just a numpy array of whatever shape you want the output to be and idxs is a tuple of what indices past you want to the array elements to turn into 1's - hope that is clear
For a 3D array like this:
import numpy as np
m = np.random.rand(5,4,3)
What's an efficient way to remove all the elements meeting such conditions ?
m[:,:,0] > 0.5 & m[:,:,1] > 0.5 & m[:,:,2] < 0.5
Your question is still undefined but I'll answer to what I think you meant to ask. The problem with your question is that if we remove some of the elements you won't get a proper tensor (multidimensional np array) since it will have 'holes' in it. So instead of removing I'll write a way to set those values to np.nan (you can set them to whatever you'll see fit, such as -1 or None, etc...). To make it more clear, Any element from m cannot meet those three conditions at once, since they're each corresponding to different elements. Answering to your question directly will just give you the same array.
Also, it is worth mentioning that although efficiency will not cut-edge in this case, as you're going to check a condition for every value anyway, but I'll write a common numpy'ish way of doing so:
m[np.where(m[:,:,:2] > 0.5)] = np.nan
m[np.where(m[:,:,2] < 0.5)] = np.nan
What we did here is setting all values that met with a part of your condition to np.nan. This, is by creating a boolean np.array of elements that meet with a condition (the m[:,:,:2] > 0.5 part) and then with np.where check what coordination are the values which set to true. Then, with slicing to those coordination only from m, we gave them a new value with broadcasting.
I wanna print the index of the row containing the minimum element of the matrix
my matrix is matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
and the code
matrix = [[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]]
a = np.array(matrix)
buff_min = matrix.argmin(axis = 0)
print(buff_min) #index of the row containing the minimum element
min = np.array(matrix[buff_min])
print(str(min.min(axis=0))) #print the minium of that row
print(min.argmin(axis = 0)) #index of the minimum
print(matrix[buff_min]) # print all row containing the minimum
after running, my result is
1
3
1
[22, 3, 4, 12]
the first number should be 2, because the minimum is 2 in the third list ([34,6,4,5,8,2]), but it returns 1. It returns 3 as minimum of the matrix.
What's the error?
I am not sure which version of Python you are using, i tested it for Python 2.7 and 3.2 as mentioned your syntax for argmin is not correct, its should be in the format
import numpy as np
np.argmin(array_name,axis)
Next, Numpy knows about arrays of arbitrary objects, it's optimized for homogeneous arrays of numbers with fixed dimensions. If you really need arrays of arrays, better use a nested list. But depending on the intended use of your data, different data structures might be even better, e.g. a masked array if you have some invalid data points.
If you really want flexible Numpy arrays, use something like this:
np.array([[22,33,44,55],[22,3,4,12],[34,6,4,5,8,2]], dtype=object)
However this will create a one-dimensional array that stores references to lists, which means that you will lose most of the benefits of Numpy (vector processing, locality, slicing, etc.).
Also, to mention if you can resize your numpy array thing might work, i haven't tested it, but by the concept that should be an easy solution. But i will prefer use a nested list in this case of input matrix
Does this work?
np.where(a == a.min())[0][0]
Note that all rows of the matrix need to contain the same number of elements.
Suppose I have an N*M*X-dimensional array "data", where N and M are fixed, but X is variable for each entry data[n][m].
(Edit: To clarify, I just used np.array() on the 3D python list which I used for reading in the data, so the numpy array is of dimensions N*M and its entries are variable-length lists)
I'd now like to compute the average over the X-dimension, so that I'm left with an N*M-dimensional array. Using np.average/mean with the axis-argument doesn't work, so the way I'm doing it right now is just iterating over N and M and appending the manually computed average to a new list, but that just doesn't feel very "python":
avgData=[]
for n in data:
temp=[]
for m in n:
temp.append(np.average(m))
avgData.append(temp)
Am I missing something obvious here? I'm trying to freshen up my python skills while I'm at it, so interesting/varied responses are more than welcome! :)
Thanks!
What about using np.vectorize:
do_avg = np.vectorize(np.average)
data_2d = do_avg(data)
data = np.array([[1,2,3],[0,3,2,4],[0,2],[1]]).reshape(2,2)
avg=np.zeros(data.shape)
avg.flat=[np.average(x) for x in data.flat]
print avg
#array([[ 2. , 2.25],
# [ 1. , 1. ]])
This still iterates over the elements of data (nothing un-Pythonic about that). But since there's nothing special about the shape or axes of data, I'm just using data.flat. While appending to Python list, with numpy it is better to assign values to the elements of an existing array.
There are fast numeric methods to work with numpy arrays, but most (if not all) work with simple numeric dtypes. Here the array elements are object (either list or array), numpy has to resort to the usual Python iteration and list operations.
For this small example, this solution is a bit faster than Zwicker's vectorize. For larger data the two solutions take about the same time.