Manipulating an array in python - python

I have an numpy array that is obtained by reading an image.
data=band.ReadAsArray(0,0,rows,cols)
Now the problem is that while using loops to manipulate the data it took around 13 min. how can I reduce this time. is there any other solution.
sample code
for i in range(rows):
for j in range(cols):
if data[i][j]>1 and data[i][j]<30:
data[i][j]=255
elif data[i][j]<1:
data[i][j]=0
else:
data[i][j]=1
it takes too long. any short method

With numpy you can use a mask to select all elements with a certain condition, as shown in the code example below:
import numpy as np
a = np.random.random((5,5))
a[a<0.5] = 0.0
print(a)
# [[ 0. 0.94925686 0.8946333 0.51562938 0.99873065]
# [ 0. 0. 0. 0. 0. ]
# [ 0.86719795 0. 0.8187514 0. 0.72529116]
# [ 0.6036299 0.9463493 0.78283466 0.6516331 0.84991734]
# [ 0.72939806 0.85408697 0. 0.59062025 0.6704499 ]]
If you wished to re-write your code then it could be something like:
data=band.ReadAsArray(0,0,rows,cols)
data[data >= 1 & data<30] = 255
data[data<1] = 0

Instead of looping, you can assign using a boolean array to select the values you're interested in changing. For example, if we have an array
>>> a = np.array([[0.1, 0.5, 1], [10, 20, 30], [40, 50, 60]])
>>> a
array([[ 0.1, 0.5, 1. ],
[ 10. , 20. , 30. ],
[ 40. , 50. , 60. ]])
We can apply your logic with something like
>>> anew = np.empty_like(a)
>>> anew.fill(1)
>>> anew[a < 1] = 0
>>> anew[(a > 1) & (a < 30)] = 255
>>> anew
array([[ 0., 0., 1.],
[ 255., 255., 1.],
[ 1., 1., 1.]])
This works because of how numpy indexing works:
>>> a < 1
array([[ True, True, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> anew[a < 1]
array([ 0., 0.])
Note: we don't really need anew-- you can act on a itself -- but then you have to be careful about the order you apply things in case your conditions and the target values overlap.
Note #2: your conditions mean that if there's an element of the array which is exactly 30, or anything greater, it will become 1, and not 255. That seems a little odd, but it's what your code does, so I reproduced it.

Related

Returning list of arrays from a function having as argument a vector

I have a function such as:
def f(x):
A =np.array([[0, 1],[0, -1/x]]);
return A
If I use an scalar I will obtain:
>>x=1
>>f(x)
array([[ 0., 1.],
[ 0., -1.]])
and if I use an array as an input, I will obtain:
>>x=np.linspace(1,3,3)
>>f(x)
array([[0, 1],
[0, array([-1. , -0.5 , -0.33333333])]], dtype=object)
Actually I would like to obtain a list of array, namely:
A = [A_1,A_2, ..., A_n]
Right now I do not care much about if it is an array of arrays or a list that contain several arrays.
I know I can do that using a for loop in x. But I think there is probably another way to do it, and maybe more efficient.
So the output that I would like would be something like:
>>x=np.linspace(1,3,3)
>>r=f(x)
array([[[0, 1],[0,-1]],
[[0, 1],[0,-0.5]],
[[0, 1],[0,-0.33333]]])
>>r[0]
array([[0, 1],[0,-1]])
or something like
>>x=np.linspace(1,3,3)
>>r=f(x)
[array([[0, 1],[0,-1]]),
array([[0, 1],[0,-0.5]]),
array([[0, 1],[0,-0.33333]])]
>>r[0]
array([[0, 1],[0,-1]])
Thanks
In your function we could check
type of given parameter. Here, if x is type of np.ndarray we are going to create nested list which we desire, otherwise we'll return output as before.
import numpy as np
def f(x):
if isinstance(x, np.ndarray):
v = -1/x
A = np.array([[[0, 1],[0, i]] for i in v])
else:
A = np.array([[0, 1],[0, -1/x]])
return A
x = np.linspace(1,3,3)
print(f(x))
Output:
[[[ 0. 1. ]
[ 0. -1. ]]
[[ 0. 1. ]
[ 0. -0.5 ]]
[[ 0. 1. ]
[ 0. -0.33333333]]]
You can do something like:
import numpy as np
def f(x):
x = np.array([x]) if type(x)==float or type(x)==int else x
A = np.stack([np.array([[0, 1],[0, -1/i]]) for i in x]);
return A
The first line deal with the cases when x is an int or a float, since is not an iterable. Then:
f(1)
array([[[ 0., 1.],
[ 0., -1.]]])
f(np.linspace(1,3,3))
array([[[ 0. , 1. ],
[ 0. , -1. ]],
[[ 0. , 1. ],
[ 0. , -0.5 ]],
[[ 0. , 1. ],
[ 0. , -0.33333333]]])

filter tensorflow array with specific condition over numpy array

I have a tensorflow array names tf-array and a numpy array names np_array. I want to find specific rows in tf_array with regards to np-array.
tf-array = tf.constant(
[[9.968594, 8.655439, 0., 0. ],
[0., 8.3356, 0., 8.8974 ],
[0., 0., 6.103182, 7.330564 ],
[6.609862, 0., 3.0614321, 0. ],
[9.497023, 0., 3.8914037, 0. ],
[0., 8.457685, 8.602337, 0. ],
[0., 0., 5.826657, 8.283971 ]])
I also have an np-array:
np_array = np.matrix(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]]
Now I want to keep the elements in tf-array in which the combination of n (here n is 2) of them (index of them) is in the value of np-array. What does it mean?
For example, in tf-array, in the first column, indexes which has value are: (0,3,4). Is there any row in np-array which contains any combination of these two indexes: (0,3), (0,4) or (3,4). Actually, there is no such row. So all the elements in that column became zero.
Indexes for the second column in tf-array is (0,1) (0,5) (1,5). As you see the record (1,5) is available in the np-array in the first row. Thats why we keep those in the tf-array.
So the final result should be like this:
[[0. 0. 0. 0. ]
[0. 8.3356 0. 8.8974 ]
[0. 0. 6.103182 7.330564 ]
[0. 0. 3.0614321 0. ]
[0. 0. 3.8914037 0. ]
[0. 8.457685 8.602337 0. ]
[0. 0. 5.826657 8.283971 ]]
I am looking for a very efficient approach as I have large number of data.
Update1
I could get this with the below code which is giving True where there is value and the zero mask to false:
[[ True True False False]
[False True False True]
[False False True True]
[ True False True False]
[ True False True False]
[False True True False]
[False False True True]]
with tf.Session() as sess:
where = tf.not_equal(tf-array, 0.0)
print(sess.run(where))
But how can I compare theese matrix with np_array?
Thank you in advance!
Here is the solution from https://stackoverflow.com/a/56510832/7207392 with necessary modifications. For the sake of simplicity I use np.array for all data. I'm no tensortflow expert, so if translating is not entirely straight forward, you'll have to ask somebody else how to do it.
import numpy as np
def f(a1, a2, n):
N,M = a1.shape
a1p = np.concatenate([a1,np.zeros((1,a1.shape[1]),a1.dtype)], axis=0)
a2 = np.sort(a2, axis=1)
a2[:,1:][a2[:,1:]==a2[:,:-1]] = N
y,x = np.where(np.count_nonzero(a1p[a2], axis=1) >= n)
out = np.zeros_like(a1p)
out[a2[y],x[:,None]] = a1p[a2[y],x[:,None]]
return out[:-1]
a1 = np.array(
[[9.968594, 8.655439, 0., 0. ],
[0., 8.3356, 0., 8.8974 ],
[0., 0., 6.103182, 7.330564 ],
[6.609862, 0., 3.0614321, 0. ],
[9.497023, 0., 3.8914037, 0. ],
[0., 8.457685, 8.602337, 0. ],
[0., 0., 5.826657, 8.283971 ]])
a2 = np.array(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]])
print(f(a1,a2,2))
Output:
[[0. 0. 0. 0. ]
[0. 8.3356 0. 8.8974 ]
[0. 0. 6.103182 7.330564 ]
[0. 0. 3.0614321 0. ]
[0. 0. 3.8914037 0. ]
[0. 8.457685 8.602337 0. ]
[0. 0. 5.826657 8.283971 ]]
The one eficient way you can try is to make bit flags for each row what value are there like for (0,3,4) will be 1 <<0 | 1<<3 | 1<<4. You will have array of values with flags.Try if << and | operator work in numpy.
Make the same for another array, i guess tf- arrays are just wrapped numpys.
After having 2 array of flags, make bitwise "and" over those. Where you condition is true for rows, the result will have at least two non zero bits. Also cound of bits can be done also efficient, google for that.
This hovever wont work with float - you ll need convert those to pretty small ints.
import numpy as np
arr_one = np.array(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]])
arr_two = np.array(
[[2, 0, 7],
[1, 3, 4],
[5, 5, 6],
[1, 3, 6],
[4, 2, 4]])
print('1 << arr_one.T[0] ' , 1 << arr_one.T[0] )
arr_one_flags = 1 << arr_one.T[0] | 1 << arr_one.T[1] | 1 << arr_one.T[2]
print('arr_one_flags ', arr_one_flags)
arr_two_flags = 1 << arr_two.T[0] | 1 << arr_two.T[1] | 1 << arr_two.T[2]
arr_and = arr_one_flags & arr_two_flags
print('arr_and ', arr_and)
def get_bit_count(value):
n = 0
while value:
n += 1
value &= value-1
return n
arr_matches = np.array([get_bit_count(x) for x in arr_and])
print('arr_matches ', arr_matches )
arr_two_filtered = arr_two[arr_matches > 1]
print('arr_two_filtered ', arr_two_filtered )

Where clause with numpy with single array and / or empty_like

I am trying to figure out how the np.where clause works. I create a simple df:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0, 10, size=(3, 4)), columns=list('ABCD'))
print(df)
A B C D
0 5 8 9 5
1 0 0 1 7
2 6 9 2 4
Now when I implement:
print(np.where(df.values, 1, np.nan))
I receive:
[[ 1. 1. 1. 1.]
[ nan nan 1. 1.]
[ 1. 1. 1. 1.]]
But when I create an empty_like array from df: and put it into where clause I receive this:
print(np.where(np.empty_like(df.values), 1, np.nan))
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
Really could use help on explaining how where clause works on a single array.
np.empty_like()
Docs:-
numpy.empty_like(prototype, dtype=None, order='K', subok=True)
Return a new array with the same shape and type as a given array.
>>> a = ([1,2,3], [4,5,6]) # a is array-like
>>> np.empty_like(a)
array([[-1073741821, -1073741821, 3], #random
[ 0, 0, -1073741821]])
np.empty_like() creates an array of the same shape and type as the given array but with random numbers. This array now goes into np.where()
numpy.where()
Docs:-
numpy.where(condition[, x, y])
Return elements that are chosen from x or y depending on condition.
Example:-
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a < 5, a, 10*a)
array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])
>>>np.where(a,1,np.nan)
array([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])
In Python any number other than zero is considered to be TRUE whereas zero is considered to FALSE.
When np.where() gets a np.array it checks for the condition, Here the array itself acts as condition i.e, the np.where evaluates to TRUE when the array elements are not zero and FALSE when they are 0. So the "True" elements are replaced by 1 and "False" elements by np.nan.
Reference:-
numpy.where()
numpy.empty_like()

Re-assign values with multiple if statements Numpy

I have a large Numpy ndarray, here is a sample of that:
myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[np.nan,0.2,0.3,4.2,15.1]])
myarray
array([[ 1.01, 9.4 , 0.0 , 6.9 , 5.7 ],
[ 1.9 , 2.6 , nan, 4.7 , -2.45],
[ nan, 0.2 , 0.3 , 4.2 , 15.1 ]])
As you can see, my array contains floats, positive, negative, zeros and NaNs. I would like to re-assign (re-class) the values in the array based on multiple if statements. I've read many answers and docs but all of which I've seen refer to a simple one or two conditions which can be easily be resolved using np.where for example.
I have multiple condition, for the sake of simplicity let's say I have four conditions (the desired solution should be able to handle more conditions). My conditions are:
if x > 6*y:
x=3
elif x < 4*z:
x=2
elif x == np.nan:
x=np.nan # maybe pass is better?
else:
x=0
where x is a value in the array, y and z are variable that will change among arrays. For example, array #1 will have y=5, z=2, array #2 will have y = 0.9, z= 0.5 etc. The condition for np.nan just means that if a value is nan, do not alter it, keep it nan.
Note that this needs to be executed at the same time, because if I use several np.where one after the other, than condition #2 will overwrite condition #1.
I tried to create a function and then apply it on the array but with no success. It seems that in order to apply a function to an array, the function must include only one argument (the array), and if I out to use a function, it should contain 3 arguments: the array, and y and z values.
What would be the most efficient way to achieve my goal?
In [11]: myarray = np.array([[1.01,9.4,0.0,6.9,5.7],[1.9,2.6,np.nan,4.7,-2.45],[
...: np.nan,0.2,0.3,4.2,15.1]])
In [13]: y, z = 0.9, 0.5
If I perform one of your tests on the whole array:
In [14]: mask1 = myarray >6*y
/usr/local/bin/ipython3:1: RuntimeWarning: invalid value encountered in greater
It's the np.nan that cause this warning.
So lets first identify those nan (and replace):
In [25]: mask0 = np.isnan(myarray)
In [26]: mask0
Out[26]:
array([[False, False, False, False, False],
[False, False, True, False, False],
[ True, False, False, False, False]])
In [27]: arr = myarray.copy()
In [28]: arr[mask0] = 0 # temp replace the nan with 0
myarray == np.nan does not work; it produces False everywhere.
arr = np.nan_to_num(myarray) also works, replacing the nan with 0.
Now find the masks for the y and z tests. It doesn't matter how these handle the original nan (now 0). Calculate both masks first to reduce mutual interference.
In [29]: mask1 = arr > 6*y
In [30]: mask2 = arr < 4*z
In [31]: arr[mask1]
Out[31]: array([ 9.4, 6.9, 5.7, 15.1])
In [32]: arr[mask2]
Out[32]: array([ 1.01, 0. , 1.9 , 0. , -2.45, 0. , 0.2 , 0.3 ])
In [33]: arr[mask0]
Out[33]: array([0., 0.])
Since you want everything else to be 0, lets initial an array of zeros:
In [34]: res = np.zeros_like(arr)
now apply the 3 masks:
In [35]: res[mask1] = 3
In [36]: res[mask2] = 2
In [37]: res[mask0] = np.nan
In [38]: res
Out[38]:
array([[ 2., 3., 2., 3., 3.],
[ 2., 0., nan, 0., 2.],
[nan, 2., 2., 0., 3.]])
I could have applied the masks to arr:
In [40]: arr[mask1] = 3 # np.where(mask1, 3, arr) should also work
In [41]: arr[mask2] = 2
In [42]: arr[mask0] = np.nan
In [43]: arr
Out[43]:
array([[2. , 3. , 2. , 3. , 3. ],
[2. , 2.6, nan, 4.7, 2. ],
[nan, 2. , 2. , 4.2, 3. ]])
I still have to use some logic to combine the masks to identify the slots that are supposed to be 0.

Apply logarithm only on positive entries of array

SciPy thoughtfully provides the scipy.log function, which will take an array and then log all elements in that array. Is there a way to log only the positive (i.e. positive non-zero) elements of an array?
What about where()?
import numpy as np
a = np.array([ 1., -1., 0.5, -0.5, 0., 2. ])
la = np.where(a>0, np.log(a), a)
print(la)
# Gives [ 0. -1. -0.69314718 -0.5 0. 0.69314718]
With boolean indexing:
In [695]: a = np.array([ 1. , -1. , 0.5, -0.5, 0. , 2. ])
In [696]: I=a>0
In [697]: a[I]=np.log(a[I])
In [698]: a
Out[698]:
array([ 0. , -1. , -0.69314718, -0.5 , 0. ,
0.69314718])
or if you just want to keep the logged terms
In [707]: np.log(a[I])
Out[707]: array([ 0. , -0.69314718, 0.69314718])
Here's a vectorized solution that keeps the original array and leaves non-positive values unchanged:
In [1]: import numpy as np
In [2]: a = np.array([ 1., -1., 0.5, -0.5, 0., 2. ])
In [3]: loga = np.log(a)
In [4]: loga
Out[4]: array([ 0., nan, -0.69314718, nan, -inf, 0.69314718 ])
In [5]: # Remove nasty nanses and infses
In [6]: loga[np.where(~np.isfinite(loga))] = a[np.where(~np.isfinite(loga))]
In [7]: loga
Out[7]: array([ 0., -1., -0.69314718, -0.5, 0., 0.69314718])
Here, np.where(~np.isfinite(loga)) returns the indexes of non-finite entries in the loga array, and we replace these values with the corresponding originals from a.
Probably not the answer you're looking for but I'll just put this here:
for i in range(0,rows):
for j in range(0,cols):
if array[i,j] > 0:
array[i,j]=log(array[i,j])
You can vectorize a custom function.
import numpy as np
def pos_log(x):
if x > 0:
return np.log(x)
return x
v_pos_log = np.vectorize(pos_log, otypes=[np.float])
result = v_pos_log(np.array([-1, 1]))
#>>> np.array([-1, 0])
But as the documentation for numpy.vectorize says "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop."

Categories