For a 3D array like this:
import numpy as np
m = np.random.rand(5,4,3)
What's an efficient way to remove all the elements meeting such conditions ?
m[:,:,0] > 0.5 & m[:,:,1] > 0.5 & m[:,:,2] < 0.5
Your question is still undefined but I'll answer to what I think you meant to ask. The problem with your question is that if we remove some of the elements you won't get a proper tensor (multidimensional np array) since it will have 'holes' in it. So instead of removing I'll write a way to set those values to np.nan (you can set them to whatever you'll see fit, such as -1 or None, etc...). To make it more clear, Any element from m cannot meet those three conditions at once, since they're each corresponding to different elements. Answering to your question directly will just give you the same array.
Also, it is worth mentioning that although efficiency will not cut-edge in this case, as you're going to check a condition for every value anyway, but I'll write a common numpy'ish way of doing so:
m[np.where(m[:,:,:2] > 0.5)] = np.nan
m[np.where(m[:,:,2] < 0.5)] = np.nan
What we did here is setting all values that met with a part of your condition to np.nan. This, is by creating a boolean np.array of elements that meet with a condition (the m[:,:,:2] > 0.5 part) and then with np.where check what coordination are the values which set to true. Then, with slicing to those coordination only from m, we gave them a new value with broadcasting.
Related
Hi I'm reading two rasters A & B as arrays.
What I'm looking for is to make an opperation to certain cells within two 2D arrays (two rasters). I need to subtract -3.0 to the cells in one array (A) that are greater than the other cells within the 2D array (B).
All the other cells don't need to change, so my answer will be the 2D array (B) with some changed cells that fit that condition and the other 2D array (A) untouched.
I tried this but doesn't seem to work (also takes TOO long):
A = Raster_A.GetRasterBand(1).ReadAsArray()
B = Raster_B.GetRasterBand(1).ReadAsArray()
A = array([ 917.985028, 916.284480, 918.525323, 920.709505,
921.835315, 922.328555, 920.283029, 922.229594,
922.928670, 925.315534, 922.280360, 922.715303,
925.933969, 925.897328, 923.880606, 923.864701])
B = array([ 913.75785758, 914.45941854, 915.17586919, 915.90724705,
916.6534542 , 917.4143068 , 918.18957846, 918.97902532,
919.78239295, 920.59941086, 921.42978108, 922.27316565,
923.12917544, 923.99736194, 924.87721232, 925.76814782])
for i in np.nditer(A, op_flags=['readwrite']):
for j in np.nditer(B, op_flags=['readwrite']):
if j[...] > i[...]:
B = j[...]-3.0
So the answer, the array B should be something like:
B = array([ 913.75785758, 914.45941854, 915.17586919, 915.90724705,
916.6534542 , 917.4143068 , 918.18957846, 918.97902532,
919.78239295, 920.59941086, 921.42978108, 922.27316565,
923.12917544, 923.99736194, 921.87721232, 922.76814782])
Please notice the two bottom right values :)
I'm a bit dizzy already from trying and doing other stuff at the same time so I apologize if I did any stupidity right there, any suggestion is greatly appreciated. Thanks!
Based on your example, I conclude that you want to subtract values from the array B. This can be done via
B[A<B] -= 3
The "mask" A<B is a boolean array that is true at all the values that you want to change. Now, B[A<B] returns a view to exactly these values. Finally, B[A<B] -= 3 changes all these values in place.
It is crucial that you use the inplace operator -=, because otherwise a new array will be created that contain only the values where A<B. Thereby, the array is flattened, i.e. looses its shape, and you do not want that.
Regarding speed, avoid for loops as much as you can when working with numpy. Fancy indexing and slicing offers you very neat (and super fast) options to work with your data. Maybe have a look here and here.
I have the next variables which are List, floats and a numpy array.
dt=list(range(1,12))
c=18
limit=2.75
Energy=np.zeros(len(dt))
I want to assign the value c=18 in the Numpy array Energy. However, there is a condition. The value in the Energy vector can not be greater than limit=2.75, so as c=18 is greater than limit=2.75, it should be cut to 2.5 and assigned in the actual index position of the loop and in the next index positions of the vector Energy until the value 18 is reached. I made this code but it does not really work efficiently.
for i in range(0,1):
if c>limit:
tmp2=c-(limit)
if tmp2>(limit):
tmp3=tmp2-(limit)
if tmp3>limit:
tmp4=tmp3-(limit)
if tmp4>(limit):
tmp5=tmp4-(limit)
if tmp5>(limit):
tmp6=tmp5-(limit)
if tmp6>limit:
tmp7=tmp6-(limit)
if tmp7>(limit):
tmp8=tmp7-(limit)
else:
Energy[i]=limit
Energy[i+1]=limit
Energy[i+2]=limit
Energy[i+3]=limit
Energy[i+4]=limit
Energy[i+5]=limit
Energy[i+6]=tmp7
Do you have an idea of how to make it better? Thank you!
Welcome to stackoverflow!
Your code presently uses a loop where it doesn't need one and doesn't use a loop where it could be used.
Stepping into your code:
for i in range(0,1):
If we change this to:
for i in range(0,1):
print (i)
We will get the result 0 - it only runs once so there is no need to loop it - i isn't referred to in your code so there is no need to loop through it.
You could use a loop to allocate your c to an array but it isn't needed and I'll leave that as an exercise for yourself.
It can be approached in a different, more efficient way.
First of all when you're assigning variables try and make them descriptive and readable - you'll spend a lot more time coming back to code than you do reading it.
I don't know what system you're describing so I've just given generic names:
import numpy as np
length_arrary=12
limit=2.75
value_to_be_assigned=18
energy_result=np.zeros(length_arrary)
Now what we are really asking is two things, how many times does value_to_be_assigned divide into the limit (an integer) and what is the remainder.
Python has two operations for this floor division (//) and modulus which give:
11//5 = 2.0
1%5 = 1.0
So we know the first (value_to_be_assigned//limit elements - 1) of the array need to be equal to the limit and the final element needs to be equal to value_to_be_assigned % limit
Finally Python has an easy way to access elements of a list - we can set the first x elements to be equal to a value with:
array[:x]=value
x just needs to be an integer.
Putting it together we get:
filled_values=int(value_to_be_assigned//limit)
energy_result[:filled_values]=limit
energy_result[filled_values] = value_to_be_assigned % limit
and we can check with
energy_result.sum() # gives us 18
I have a very large 400x300x60x27 array (lets call it 'A'). I took the maximum values which is now a 400x300x60 array called 'B'. Basically I need to find the index in 'A' of each value in 'B'. I have converted them both to lists and set up a for loop to find the indices, but it takes an absurdly long time to get through it because there are over 7 million values. This is what I have:
B=np.zeros((400,300,60))
C=np.zeros((400*300*60))
B=np.amax(A,axis=3)
A=np.ravel(A)
A=A.tolist()
B=np.ravel(B)
B=B.tolist()
for i in range(0,400*300*60):
C[i]=A.index(B[i])
Is there a more efficient way to do this? Its taking hours and hours and the program is still stuck on the last line.
You don't need amax, you need argmax. In case of argmax, the array will only contain the indices rather than values, the computational efficiency of finding the values using indices are much better than vice versa.
So, I would recommend you to store only the indices. Before flattening the array.
instead of np.amax, run A.argmax, this will contain the indices.
But before you're flattening it to 1D, you will need to use a mapping function that causes the indices to 1D as well. This is probably a trivial problem, as you'd need to just use some basic operations to achieve this. But that would also consume some time as it needs to be executed quite some times. But it won't be a searching probem and would save you quite some time.
You are getting those argmax indices and because of the flattening, you are basically converting to linear index equivalents of those.
Thus, a solution would be to add in the proper offsets into the argmax indices in steps leveraging broadcasting at each one of them, like so -
m,n,r,s = A.shape
idx = A.argmax(axis=3)
idx += s*np.arange(r)
idx += r*s*np.arange(n)[:,None]
idx += n*r*s*np.arange(m)[:,None,None] # idx is your C output
Alternatively, a compact way to put it would be like so -
m,n,r,s = A.shape
I,J,K = np.ogrid[:m,:n,:r]
idx = n*r*s*I + r*s*J + s*K + A.argmax(axis=3)
I'm just starting out with Python, all my previous experience being C++ type languages.
In the attempt to learn "good" Python I've been trying to convert this C-like function into Python.
var MMI(var *Data,int Length)
{
var m = Median(Data,Length);
int i, nh=0, nl=0;
for(i=1; i<Length; i++) {
if(Data[i] > m && Data[i] > Data[i-1])
nl++;
else if(Data[i] < m && Data[i] < Data[i-1])
nh++;
}
return 100.*(nl+nh)/(Length-1);
}
I'm pretty sure I can do it easily with a for loop, but I've been trying to do it using a series of array operations, rather than an explicit loop. I came up with:
import numpy as np
import pandas as pd
from pandas import Series
def MMI( buffer, mmi_length ):
window = Series( buffer[:mmi_length] )
m = window.median()
nh = np.logical_and( [window > m], [window > window.shift(1)] )
nl = np.logical_and( [window < m], [window < window.shift(1)] )
nl = np.logical_and( [not nh], [nl] )
return 100 * ( nh.sum() + nl.sum() ) / mmi_length
The final np.logical_and( [not nh], [n] ) gives a "truth value ambiguous" error, which I don't understand, but more importantly I'm not sure whether this approach will actually yield a valid result in Python.
Could someone provide a pointer in how I should code this elegantly, or slap me on the head and tell me to just use a loop?
Ian
Python is implicit, unlike C++ where you almost have to declare everything. Python, and numpy/pandas or other modules, have a ton of optimized functionality built-in - in order for you to work without a lot of loops or value by value comparison (what the modules do in the background, often is a for loop though - so don't think that it's necessarily faster, it's often just a pretty cover).
Now, let's look at your code
import numpy as np # no need for pandas here
def MMI( buffer, mmi_length ):
# we will need to define two arrays here,
# shift(n) does not do what you want
window = np.asarray(buffer[1:mmi_length])
window_shifted = np.asarray(buffer[:mmi_length-1])
m = np.median(window)
# instead using all these explicit functions simply do:
nh = (window > m) & (window > window_shifted)
nl = (window < m) & (window < window_shifted)
nl = ~nh & nl # ~ inverts a lot of things,
# one of them are boolean arrays
# this does the right thing
return 100*(nh.sum()+nl.sum())/mmi_length
Now let's explain:
A Series is basically an array, in this context a series seems like an overkill. If you compare such an object with a scalar, you will get an array of booleans, expressing which value met the condition and which didn't (the same goes for comparing two arrays, it will result in boolean array expressing the value by value comparison).
In the first step, you compare an array to a scalar (remember, this will be a boolean array) and another array to a another array (we'll get to the shift part) and then want to logically and combine the result of the comparisons. Good thing is, that you want to combine two boolean arrays, this will work implicitly by the & operation.
The second step is analogous and will work implicitly the same.
In the third step, you want to invert a boolean array and combine it with another boolean array. Inversion is done by the ~ operator and can be used in lot's of other places to (e.g. for inverting subset selections, etc). You cannot use the not operator here, since its purpose is to convert its argument into a truth value (True/False) and return the opposite - but what is the truth value of an array? The logical and combination of all components? It's not defined, therefore you get the ambiguous error.
The sum() of a boolean array is always the count of True values in the array, thus it will yield the right results.
The only problem with your code, is that if you apply shift(1) to this Series, it will prepend a NaN and truncate the last element of the Series, so that you end up with an equal length object. Now your comparisons are not yielding what you want anymore, because anything compared to a numpy.NaN will return False.
In order to overcome that, you can simply define a second array in the beginning (which then makes pandas obsolete), using the same syntax you already used for window before.
PS: a numpy array is not a python list (all of the above are numpy arrays!) A numpy array is a complex object that allows for all these operations, with standard python lists, you have to work your own for loops
I need to do a lot of operations on multidimensional numpy arrays and therefor i am experimenting towards the best approach on this.
So let's say i have an array like this:
A = np.random.uniform(0, 1, size = 100).reshape(20, 5)
My goal is to get the maximum value numpy.amax() of each entry and it's index. So may A[0] be something like this:
A[0] = [ 0.64570441 0.31781716 0.07268926 0.84183753 0.72194227]
I want to get the maximum and the index of that maximum [0.84183753][0, 3]. No specific representation of the results needed, just an example. I even need the horizontal index only.
I tried using numpy's nditer object:
A_it = np.nditer(A, flags=['multi_index'], op_flags=['readwrite'])
while not A_it.finished:
print(np.amax(A_it.value))
print(A_it.multi_index[1])
A_it.iternext()
I can access every element of the array and its index over the iterations that way but i don't seem to be able to bring the numpy.amax() function in each element and the index together syntax wise. Can i even do it using nditerobject?
Also, in Numpy: Beginner nditer i read that using nditer or using iterations in numpy usually means that i am doing something wrong. But i can't find another convenient way to achieve my goal here without any iterations. Obviously i am a total beginner in numpy and python in general, so any keyword to search for or hint is very much appreciated.
A major problem with nditer is that it iterates over each element, not each row. It's best used as a stepping stone toward a Cython or C rewrite of your code.
If you just want the maximum for each row of your array, a simple iteration or list comprehension will do nicely.
for row in A: print(np.amax(row))
or to turn it back into an array:
np.array([np.amax(row) for row in A])
But you can get the same values by giving amax an axis parameter
np.amax(A,axis=1)
np.argmax identifies the location of the maximum.
np.argmax(A,axis=1)
With the argmax values you could then select the max values as well,
ind=np.argmax(A,axis=1)
A[np.arange(A.shape[0]),ind]
(speed's about the same as repeating the np.amax call).