Apply logarithm only on positive entries of array - python

SciPy thoughtfully provides the scipy.log function, which will take an array and then log all elements in that array. Is there a way to log only the positive (i.e. positive non-zero) elements of an array?

What about where()?
import numpy as np
a = np.array([ 1., -1., 0.5, -0.5, 0., 2. ])
la = np.where(a>0, np.log(a), a)
print(la)
# Gives [ 0. -1. -0.69314718 -0.5 0. 0.69314718]

With boolean indexing:
In [695]: a = np.array([ 1. , -1. , 0.5, -0.5, 0. , 2. ])
In [696]: I=a>0
In [697]: a[I]=np.log(a[I])
In [698]: a
Out[698]:
array([ 0. , -1. , -0.69314718, -0.5 , 0. ,
0.69314718])
or if you just want to keep the logged terms
In [707]: np.log(a[I])
Out[707]: array([ 0. , -0.69314718, 0.69314718])

Here's a vectorized solution that keeps the original array and leaves non-positive values unchanged:
In [1]: import numpy as np
In [2]: a = np.array([ 1., -1., 0.5, -0.5, 0., 2. ])
In [3]: loga = np.log(a)
In [4]: loga
Out[4]: array([ 0., nan, -0.69314718, nan, -inf, 0.69314718 ])
In [5]: # Remove nasty nanses and infses
In [6]: loga[np.where(~np.isfinite(loga))] = a[np.where(~np.isfinite(loga))]
In [7]: loga
Out[7]: array([ 0., -1., -0.69314718, -0.5, 0., 0.69314718])
Here, np.where(~np.isfinite(loga)) returns the indexes of non-finite entries in the loga array, and we replace these values with the corresponding originals from a.

Probably not the answer you're looking for but I'll just put this here:
for i in range(0,rows):
for j in range(0,cols):
if array[i,j] > 0:
array[i,j]=log(array[i,j])

You can vectorize a custom function.
import numpy as np
def pos_log(x):
if x > 0:
return np.log(x)
return x
v_pos_log = np.vectorize(pos_log, otypes=[np.float])
result = v_pos_log(np.array([-1, 1]))
#>>> np.array([-1, 0])
But as the documentation for numpy.vectorize says "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop."

Related

How to divide an array by an other array element wise in numpy?

I have two arrays, and I want all the elements of one to be divided by the second. For example,
In [24]: a = np.array([1,2,3])
In [25]: b = np.array([1,2,3])
In [26]: a/b
Out[26]: array([1., 1., 1.])
In [27]: 1/b
Out[27]: array([1. , 0.5 , 0.33333333])
This is not the answer I want, the output I want is like (we can see all of the elements of a are divided by b)
In [28]: c = []
In [29]: for i in a:
...: c.append(i/b)
...:
In [30]: c
Out[30]:
[array([1. , 0.5 , 0.33333333]),
array([2. , 1. , 0.66666667]),
In [34]: np.array(c)
Out[34]:
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
But I don't like for loop, it's too slow for big data, so is there a function that included in numpy package or any good (faster) way to solve this problem?
It is simple to do in pure numpy, you can use broadcasting to calculate the outer product (or any other outer operation) of two vectors:
import numpy as np
a = np.arange(1, 4)
b = np.arange(1, 4)
c = a[:,np.newaxis] / b
# array([[1. , 0.5 , 0.33333333],
# [2. , 1. , 0.66666667],
# [3. , 1.5 , 1. ]])
This works, since a[:,np.newaxis] increases the dimension of the (3,) shaped array a into a (3, 1) shaped array, which can be used for the desired broadcasting operation.
First you need to cast a into a 2D array (same shape as the output), then repeat for the dimension you want to loop over. Then vectorized division will work.
>>> a.reshape(-1,1)
array([[1],
[2],
[3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1) / b
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
# Transpose will let you do it the other way around, but then you just get 1 for everything
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T / b
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
This should do the job:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
print(a.reshape(-1, 1) / b)
Output:
[[ 1. 0.5 0.33333333]
[ 2. 1. 0.66666667]
[ 3. 1.5 1. ]]

Assign 1d numpy ndarray into columns of a 2d array

Assume dst is an ndarray with shape (5, N), and ramp is an ndarray with shape (5,). (In this case, N = 2):
>>> dst = np.zeros((5, 2))
>>> dst
array([[0., 0.],
[0., 0.],
[0., 0.],
[0., 0.],
[0., 0.]])
>>> ramp = np.linspace(1.0, 2.0, 5)
>>> ramp
array([1. , 1.25, 1.5 , 1.75, 2. ])
Now I'd like to copy ramp into the columns of dst, resulting in this:
>>> dst
array([[1., 1.],
[1.25., 1.25.],
[1.5., 1.5.],
[1.75, 1.75],
[2.0, 2.0]])
I didn't expect this to work, and it doesn't:
>>> dst[:] = ramp
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (5) into shape (5,2)
This works, but I'm certain there's a more "numpyesque" way to accomplish this:
>>> dst[:] = ramp.repeat(dst.shape[1]).reshape(dst.shape)
>>> dst
array([[1. , 1. ],
[1.25, 1.25],
[1.5 , 1.5 ],
[1.75, 1.75],
[2. , 2. ]])
Any ideas?
note
Unlike "Cloning" row or column vectors, I want to assign ramp into dst (or even a subset of dst). In addition, the solution given there uses a python array as the source, not an ndarray, and thus requires calls to .transpose, etc.
Method 1: Use broadcasting:
As OP mentioned in the comment. Broadcasting works on assigment too
dst[:] = ramp[:,None]
Method 2: Use column_stack
N = dst.shape[1]
dst[:] = np.column_stack([ramp.tolist()]*N)
Out[479]:
array([[1. , 1. ],
[1.25, 1.25],
[1.5 , 1.5 ],
[1.75, 1.75],
[2. , 2. ]])
Method 3: use np.tile
N = dst.shape[1]
dst[:] = np.tile(ramp[:,None], (1,N))

What does the rcond parameter of numpy.linalg.pinv do?

While looking up how to calculate pseudo-inverses in numpy (1.15.4) I noticed that numpy.linalg.pinv has a parameter rcond for which the description reads:
rcond : (…) array_like of float
Cutoff for small singular values. Singular values smaller (in
modulus) than rcond * largest_singular_value (again, in modulus)
are set to zero. Broadcasts against the stack of matrices
From my understanding if rcond is a scalar float, all entries
in the output of pinv which would have been smaller than rcond should be set to zero instead (which would be really useful) but this is not what happens, e.g.:
>>> A = np.array([[ 0., 0.3, 1., 0.],
[ 0., 0.4, -0.3, 0.],
[ 0., 1., -0.1, 0.]])
>>> np.linalg.pinv(A, rcond=1e-3)
array([[ 8.31963531e-17, -4.52584594e-17, -5.09901252e-17],
[ 1.82668420e-01, 3.39032588e-01, 8.09586439e-01],
[ 8.95805933e-01, -2.97384188e-01, -1.49788105e-01],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])
What does this parameter actually do? And can I only get the behaviour I actually want by iterating over the whole output matrix again?
Under the hood, a pseudoinverse is calculated using a singular value decomposition. An initial matrix A=UDV^T is inverted as A^+=VD^+U^T, where D is a diagonal matrix with positive real values (singular values). rcond is used to zero out small entries in D. For example:
import numpy as np
# Initial matrix
a = np.array([[1, 0],
[0, 0.1]])
# SVD with diagonal entries in D = [1. , 0.1]
print(np.linalg.svd(a))
# (array([[1., 0.],
# [0., 1.]]),
# array([1. , 0.1]),
# array([[1., 0.],
# [0., 1.]]))
# Pseudoinverse
c = np.linalg.pinv(a)
print(c)
# [[ 1. 0.]
# [ 0. 10.]]
# Reconstruction is perfect
print(np.dot(a, np.dot(c, a)))
# [[1. 0. ]
# [0. 0.1]]
# Zero out all entries in D below rcond * largest_singular_value = 0.2 * 1
# Not entries of the initial or inverse matrices!
d = np.linalg.pinv(a, rcond=0.2)
print(d)
# [[1. 0.]
# [0. 0.]]
# Reconstruction is imperfect
print(np.dot(a, np.dot(d, a)))
# [[1. 0.]
# [0. 0.]]
To just zero out small values of a matrix:
a = np.array([[1, 2],
[3, 0.1]])
a[a < 0.5] = 0
print(a)
# [[1. 2.]
# [3. 0.]]

numpy: multiply arbitrary shape array along first axis

I want to multiply an array along it's first axis by some vector.
For instance, if a is 2D, b is 1D, and a.shape[0] == b.shape[0], we can do:
a *= b[:, np.newaxis]
What if a has an arbitrary shape? In numpy, the ellipsis "..." can be interpreted as "fill the remaining indices with ':'". Is there an equivalent for filling the remaining axes with None/np.newaxis?
The code below generates the desired result, but I would prefer a general vectorized way to accomplish this without falling back to a for loop.
from __future__ import print_function
import numpy as np
def foo(a, b):
"""
Multiply a along its first axis by b
"""
if len(a.shape) == 1:
a *= b
elif len(a.shape) == 2:
a *= b[:, np.newaxis]
elif len(a.shape) == 3:
a *= b[:, np.newaxis, np.newaxis]
else:
n = a.shape[0]
for i in range(n):
a[i, ...] *= b[i]
n = 10
b = np.arange(n)
a = np.ones((n, 3))
foo(a, b)
print(a)
a = np.ones((n, 3, 3))
foo(a, b)
print(a)
Just reverse the order of the axes:
transpose = a.T
transpose *= b
a.T is a transposed view of a, where "transposed" means reversing the order of the dimensions for arbitrary-dimensional a. We assign a.T to a separate variable so the *= doesn't try to set the a.T attribute; the results still apply to a, since the transpose is a view.
Demo:
In [55]: a = numpy.ones((2, 2, 3))
In [56]: a
Out[56]:
array([[[1., 1., 1.],
[1., 1., 1.]],
[[1., 1., 1.],
[1., 1., 1.]]])
In [57]: transpose = a.T
In [58]: transpose *= [2, 3]
In [59]: a
Out[59]:
array([[[2., 2., 2.],
[2., 2., 2.]],
[[3., 3., 3.],
[3., 3., 3.]]])
Following the idea of the accepted answer, you could skip the variable assignment to the transpose as follows:
arr = np.tile(np.arange(10, dtype=float), 3).reshape(3, 10)
print(arr)
factors = np.array([0.1, 1, 10])
arr.T[:, :] *= factors
print(arr)
Which would print
[[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]]
[[ 0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9]
[ 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. ]
[ 0. 10. 20. 30. 40. 50. 60. 70. 80. 90. ]]

Manipulating an array in python

I have an numpy array that is obtained by reading an image.
data=band.ReadAsArray(0,0,rows,cols)
Now the problem is that while using loops to manipulate the data it took around 13 min. how can I reduce this time. is there any other solution.
sample code
for i in range(rows):
for j in range(cols):
if data[i][j]>1 and data[i][j]<30:
data[i][j]=255
elif data[i][j]<1:
data[i][j]=0
else:
data[i][j]=1
it takes too long. any short method
With numpy you can use a mask to select all elements with a certain condition, as shown in the code example below:
import numpy as np
a = np.random.random((5,5))
a[a<0.5] = 0.0
print(a)
# [[ 0. 0.94925686 0.8946333 0.51562938 0.99873065]
# [ 0. 0. 0. 0. 0. ]
# [ 0.86719795 0. 0.8187514 0. 0.72529116]
# [ 0.6036299 0.9463493 0.78283466 0.6516331 0.84991734]
# [ 0.72939806 0.85408697 0. 0.59062025 0.6704499 ]]
If you wished to re-write your code then it could be something like:
data=band.ReadAsArray(0,0,rows,cols)
data[data >= 1 & data<30] = 255
data[data<1] = 0
Instead of looping, you can assign using a boolean array to select the values you're interested in changing. For example, if we have an array
>>> a = np.array([[0.1, 0.5, 1], [10, 20, 30], [40, 50, 60]])
>>> a
array([[ 0.1, 0.5, 1. ],
[ 10. , 20. , 30. ],
[ 40. , 50. , 60. ]])
We can apply your logic with something like
>>> anew = np.empty_like(a)
>>> anew.fill(1)
>>> anew[a < 1] = 0
>>> anew[(a > 1) & (a < 30)] = 255
>>> anew
array([[ 0., 0., 1.],
[ 255., 255., 1.],
[ 1., 1., 1.]])
This works because of how numpy indexing works:
>>> a < 1
array([[ True, True, False],
[False, False, False],
[False, False, False]], dtype=bool)
>>> anew[a < 1]
array([ 0., 0.])
Note: we don't really need anew-- you can act on a itself -- but then you have to be careful about the order you apply things in case your conditions and the target values overlap.
Note #2: your conditions mean that if there's an element of the array which is exactly 30, or anything greater, it will become 1, and not 255. That seems a little odd, but it's what your code does, so I reproduced it.

Categories