Where clause with numpy with single array and / or empty_like - python

I am trying to figure out how the np.where clause works. I create a simple df:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0, 10, size=(3, 4)), columns=list('ABCD'))
print(df)
A B C D
0 5 8 9 5
1 0 0 1 7
2 6 9 2 4
Now when I implement:
print(np.where(df.values, 1, np.nan))
I receive:
[[ 1. 1. 1. 1.]
[ nan nan 1. 1.]
[ 1. 1. 1. 1.]]
But when I create an empty_like array from df: and put it into where clause I receive this:
print(np.where(np.empty_like(df.values), 1, np.nan))
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
Really could use help on explaining how where clause works on a single array.

np.empty_like()
Docs:-
numpy.empty_like(prototype, dtype=None, order='K', subok=True)
Return a new array with the same shape and type as a given array.
>>> a = ([1,2,3], [4,5,6]) # a is array-like
>>> np.empty_like(a)
array([[-1073741821, -1073741821, 3], #random
[ 0, 0, -1073741821]])
np.empty_like() creates an array of the same shape and type as the given array but with random numbers. This array now goes into np.where()
numpy.where()
Docs:-
numpy.where(condition[, x, y])
Return elements that are chosen from x or y depending on condition.
Example:-
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a < 5, a, 10*a)
array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])
>>>np.where(a,1,np.nan)
array([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])
In Python any number other than zero is considered to be TRUE whereas zero is considered to FALSE.
When np.where() gets a np.array it checks for the condition, Here the array itself acts as condition i.e, the np.where evaluates to TRUE when the array elements are not zero and FALSE when they are 0. So the "True" elements are replaced by 1 and "False" elements by np.nan.
Reference:-
numpy.where()
numpy.empty_like()

Related

How to divide an array by an other array element wise in numpy?

I have two arrays, and I want all the elements of one to be divided by the second. For example,
In [24]: a = np.array([1,2,3])
In [25]: b = np.array([1,2,3])
In [26]: a/b
Out[26]: array([1., 1., 1.])
In [27]: 1/b
Out[27]: array([1. , 0.5 , 0.33333333])
This is not the answer I want, the output I want is like (we can see all of the elements of a are divided by b)
In [28]: c = []
In [29]: for i in a:
...: c.append(i/b)
...:
In [30]: c
Out[30]:
[array([1. , 0.5 , 0.33333333]),
array([2. , 1. , 0.66666667]),
In [34]: np.array(c)
Out[34]:
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
But I don't like for loop, it's too slow for big data, so is there a function that included in numpy package or any good (faster) way to solve this problem?
It is simple to do in pure numpy, you can use broadcasting to calculate the outer product (or any other outer operation) of two vectors:
import numpy as np
a = np.arange(1, 4)
b = np.arange(1, 4)
c = a[:,np.newaxis] / b
# array([[1. , 0.5 , 0.33333333],
# [2. , 1. , 0.66666667],
# [3. , 1.5 , 1. ]])
This works, since a[:,np.newaxis] increases the dimension of the (3,) shaped array a into a (3, 1) shaped array, which can be used for the desired broadcasting operation.
First you need to cast a into a 2D array (same shape as the output), then repeat for the dimension you want to loop over. Then vectorized division will work.
>>> a.reshape(-1,1)
array([[1],
[2],
[3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1) / b
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
# Transpose will let you do it the other way around, but then you just get 1 for everything
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T / b
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
This should do the job:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
print(a.reshape(-1, 1) / b)
Output:
[[ 1. 0.5 0.33333333]
[ 2. 1. 0.66666667]
[ 3. 1.5 1. ]]

How to insert rows multiple times to a numpy array?

I have an array that I need to check if it has missing values in its rows. The second column must follow a sequence and if a missing value is found I need to insert it.
[[123 1 0
123 2 0
123 4 0
123 5 0
123 8 0
123 9 0
...]]
In this example I'd need to insert at row 2 the values [123 3 0] and at row 4 [[123 6 0], [123 7 0]].
I am iterating the array row by row checking if there is a missing row, using numpy.insert to do it, but it returns a copy every time an insert is done, increasing the index at which the rows should be inserted every time this operation is done.
Is this a reasonable way to do it?
Look at this way without using insert:
import numpy as np
x = np.array([[123, 1, 0],
[123, 2, 0],
[123, 4, 0],
[123, 5, 0],
[123, 8, 0],
[123, 9, 0]])
y = np.zeros((x[-1, 1], x.shape[1]))
y[x[:,1] - 1] = x
indexes = np.where((y[:,0] == 0) & (y[:,1] == 0) & (y[:,2] == 0))[0]
y[indexes] = [[123, i + 1, 0] for i in indexes]
So now,
print(y)
[[123., 1., 0.]
[123., 2., 0.]
[123., 3., 0.]
[123., 4., 0.]
[123., 5., 0.]
[123., 6., 0.]
[123., 7., 0.]
[123., 8., 0.]
[123., 9., 0.]]
Hope this can help you :)

Understanding axes in NumPy

I was going through NumPy documentation, and am not able to understand one point. It mentions, for the example below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
How does the first dimension (axis) have a length of 2?
Edit:
The reason for my confusion is the below statement in the documentation.
The coordinates of a point in 3D space [1, 2, 1] is an array of rank
1, because it has one axis. That axis has a length of 3.
In the original 2D ndarray, I assumed that the number of lists identifies the rank/dimension, and I wrongly assumed that the length of each list denotes the length of each dimension (in that order). So, as per my understanding, the first dimension should be having a length of 3, since the length of the first list is 3.
In numpy, axis ordering follows zyx convention, instead of the usual (and maybe more intuitive) xyz.
Visually, it means that for a 2D array where the horizontal axis is x and the vertical axis is y:
x -->
y 0 1 2
| 0 [[1., 0., 0.],
V 1 [0., 1., 2.]]
The shape of this array is (2, 3) because it is ordered (y, x), with the first axis y of length 2.
And verifying this with slicing:
import numpy as np
a = np.array([[1, 0, 0], [0, 1, 2]], dtype=np.float)
>>> a
Out[]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
>>> a[0, :] # Slice index 0 of first axis
Out[]: array([ 1., 0., 0.]) # Get values along second axis `x` of length 3
>>> a[:, 2] # Slice index 2 of second axis
Out[]: array([ 0., 2.]) # Get values along first axis `y` of length 2
You may be confusing the other sentence with the picture example below. Think of it like this: Rank = number of lists in the list(array) and the term length in your question can be thought of length = the number of 'things' in the list(array)
I think they are trying to describe to you the definition of shape which is in this case (2,3)
in that post I think the key sentence is here:
In NumPy dimensions are called axes. The number of axes is rank.
If you print the numpy array
print(np.array([[ 1. 0. 0.],[ 0. 1. 2.]])
You'll get the following output
#col1 col2 col3
[[ 1. 0. 0.] # row 1
[ 0. 1. 2.]] # row 2
Think of it as a 2 by 3 matrix... 2 rows, 3 columns. It is a 2d array because it is a list of lists. ([[ at the start is a hint its 2d)).
The 2d numpy array
np.array([[ 1. 0., 0., 6.],[ 0. 1. 2., 7.],[3.,4.,5,8.]])
would print as
#col1 col2 col3 col4
[[ 1. 0. , 0., 6.] # row 1
[ 0. 1. , 2., 7.] # row 2
[3., 4. , 5., 8.]] # row 3
This is a 3 by 4 2d array (3 rows, 4 columns)
The first dimensions is the length:
In [11]: a = np.array([[ 1., 0., 0.], [ 0., 1., 2.]])
In [12]: a
Out[12]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
In [13]: len(a) # "length of first dimension"
Out[13]: 2
The second is the length of each "row":
In [14]: [len(aa) for aa in a] # 3 is "length of second dimension"
Out[14]: [3, 3]
Many numpy functions take axis as an argument, for example you can sum over an axis:
In [15]: a.sum(axis=0)
Out[15]: array([ 1., 1., 2.])
In [16]: a.sum(axis=1)
Out[16]: array([ 1., 3.])
The thing to note is that you can have higher dimensional arrays:
In [21]: b = np.array([[[1., 0., 0.], [ 0., 1., 2.]]])
In [22]: b
Out[22]:
array([[[ 1., 0., 0.],
[ 0., 1., 2.]]])
In [23]: b.sum(axis=2)
Out[23]: array([[ 1., 3.]])
Keep the following points in mind when considering Numpy axes:
Each sub-level of a list (or array) represents an axis. For example:
import numpy as np
a = np.array([1,2]) # 1 axis
b = np.array([[1,2],[3,4]]) # 2 axes
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) # 3 axes
Axis labels correspond to the level of the sub-list they represent, starting with axis 0 for the outer most list.
To illustrate this, consider the following array of different shape, each with 24 elements:
# 1D Array
a0 = np.array(
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
)
a0.shape # (24,) - here, the length along the 0-axis is 24
# 2D Array
a01 = np.array(
[
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4],
[4.1, 4.2, 4.3, 4.4],
[5.1, 5.2, 5.3, 5.4],
[6.1, 6.2, 6.3, 6.4]
]
)
a01.shape # (6, 4) - now, the length along the 0-axis is 6
# 3D Array
a012 = np.array(
[
[
[1.1.1, 1.1.2],
[1.2.1, 1.2.2],
[1.3.1, 1.3.2]
],
[
[2.1.1, 2.1.2],
[2.2.1, 2.2.2],
[2.3.1, 2.3.2]
],
[
[3.1.1, 3.1.2],
[3.2.1, 3.2.2],
[3.3.1, 3.3.2]
],
[
[4.1.1, 4.1.2],
[4.2.1, 4.2.2],
[4.3.1, 4.3.2]
]
)
a012.shape # (4, 3, 2) - and finally, the length along the 0-axis is 4

Manipulating an array in python

I have an numpy array that is obtained by reading an image.
data=band.ReadAsArray(0,0,rows,cols)
Now the problem is that while using loops to manipulate the data it took around 13 min. how can I reduce this time. is there any other solution.
sample code
for i in range(rows):
for j in range(cols):
if data[i][j]>1 and data[i][j]<30:
data[i][j]=255
elif data[i][j]<1:
data[i][j]=0
else:
data[i][j]=1
it takes too long. any short method
With numpy you can use a mask to select all elements with a certain condition, as shown in the code example below:
import numpy as np
a = np.random.random((5,5))
a[a<0.5] = 0.0
print(a)
# [[ 0. 0.94925686 0.8946333 0.51562938 0.99873065]
# [ 0. 0. 0. 0. 0. ]
# [ 0.86719795 0. 0.8187514 0. 0.72529116]
# [ 0.6036299 0.9463493 0.78283466 0.6516331 0.84991734]
# [ 0.72939806 0.85408697 0. 0.59062025 0.6704499 ]]
If you wished to re-write your code then it could be something like:
data=band.ReadAsArray(0,0,rows,cols)
data[data >= 1 & data<30] = 255
data[data<1] = 0
Instead of looping, you can assign using a boolean array to select the values you're interested in changing. For example, if we have an array
>>> a = np.array([[0.1, 0.5, 1], [10, 20, 30], [40, 50, 60]])
>>> a
array([[ 0.1, 0.5, 1. ],
[ 10. , 20. , 30. ],
[ 40. , 50. , 60. ]])
We can apply your logic with something like
>>> anew = np.empty_like(a)
>>> anew.fill(1)
>>> anew[a < 1] = 0
>>> anew[(a > 1) & (a < 30)] = 255
>>> anew
array([[ 0., 0., 1.],
[ 255., 255., 1.],
[ 1., 1., 1.]])
This works because of how numpy indexing works:
>>> a < 1
array([[ True, True, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> anew[a < 1]
array([ 0., 0.])
Note: we don't really need anew-- you can act on a itself -- but then you have to be careful about the order you apply things in case your conditions and the target values overlap.
Note #2: your conditions mean that if there's an element of the array which is exactly 30, or anything greater, it will become 1, and not 255. That seems a little odd, but it's what your code does, so I reproduced it.

Python: Resize an existing array and fill with zeros

I think that my issue should be really simple, yet I can not find any help
on the Internet whatsoever. I am very new to Python, so it is possible that
I am missing something very obvious.
I have an array, S, like this [x x x] (one-dimensional). I now create a
diagonal matrix, sigma, with np.diag(S) - so far, so good. Now, I want to
resize this new diagonal array so that I can multiply it by another array that
I have.
import numpy as np
...
shape = np.shape((6, 6)) #This will be some pre-determined size
sigma = np.diag(S) #diagonalise the matrix - this works
my_sigma = sigma.resize(shape) #Resize the matrix and fill with zeros - returns "None" - why?
However, when I print the contents of my_sigma, I get "None". Can someone please
point me in the right direction, because I can not imagine that this should be
so complicated.
Thanks in advance for any help!
Casper
Graphical:
I have this:
[x x x]
I want this:
[x 0 0]
[0 x 0]
[0 0 x]
[0 0 0]
[0 0 0]
[0 0 0] - or some similar size, but the diagonal elements are important.
There is a new numpy function in version 1.7.0 numpy.pad that can do this in one-line. Like the other answers, you can construct the diagonal matrix with np.diag before the padding.
The tuple ((0,N),(0,0)) used in this answer indicates the "side" of the matrix which to pad.
import numpy as np
A = np.array([1, 2, 3])
N = A.size
B = np.pad(np.diag(A), ((0,N),(0,0)), mode='constant')
B is now equal to:
[[1 0 0]
[0 2 0]
[0 0 3]
[0 0 0]
[0 0 0]
[0 0 0]]
sigma.resize() returns None because it operates in-place. np.resize(sigma, shape), on the other hand, returns the result but instead of padding with zeros, it pads with repeats of the array.
Also, the shape() function returns the shape of the input. If you just want to predefine a shape, just use a tuple.
import numpy as np
...
shape = (6, 6) #This will be some pre-determined size
sigma = np.diag(S) #diagonalise the matrix - this works
sigma.resize(shape) #Resize the matrix and fill with zeros
However, this will first flatten out your original array, and then reconstruct it into the given shape, destroying the original ordering. If you just want to "pad" with zeros, instead of using resize() you can just directly index into a generated zero-matrix.
# This assumes that you have a 2-dimensional array
zeros = np.zeros(shape, dtype=np.int32)
zeros[:sigma.shape[0], :sigma.shape[1]] = sigma
I see the edit... you do have to create the zeros first and then move some numbers into it. np.diag_indices_from might be useful for you
bigger_sigma = np.zeros(shape, dtype=sigma.dtype)
diag_ij = np.diag_indices_from(sigma)
bigger_sigma[diag_ij] = sigma[diag_ij]
This solution works with resize function
Take a sample array
S= np.ones((3))
print (S)
# [ 1. 1. 1.]
d= np.diag(S)
print(d)
"""
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
"""
This dosent work, it just add a repeating values
np.resize(d,(6,3))
"""
adds a repeating value
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
"""
This does work
d.resize((6,3),refcheck=False)
print(d)
"""
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
"""
Another pure python solution is
a = [1, 2, 3]
b = []
for i in range(6):
b.append((([0] * i) + a[i:i+1] + ([0] * (len(a) - 1 - i)))[:len(a)])
b is now
[[1, 0, 0], [0, 2, 0], [0, 0, 3], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
it's a hideous solution, I'll admit that.
However, it illustrates some functions of the list type that can be used.

Categories