Assign 1d numpy ndarray into columns of a 2d array - python

Assume dst is an ndarray with shape (5, N), and ramp is an ndarray with shape (5,). (In this case, N = 2):
>>> dst = np.zeros((5, 2))
>>> dst
array([[0., 0.],
[0., 0.],
[0., 0.],
[0., 0.],
[0., 0.]])
>>> ramp = np.linspace(1.0, 2.0, 5)
>>> ramp
array([1. , 1.25, 1.5 , 1.75, 2. ])
Now I'd like to copy ramp into the columns of dst, resulting in this:
>>> dst
array([[1., 1.],
[1.25., 1.25.],
[1.5., 1.5.],
[1.75, 1.75],
[2.0, 2.0]])
I didn't expect this to work, and it doesn't:
>>> dst[:] = ramp
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (5) into shape (5,2)
This works, but I'm certain there's a more "numpyesque" way to accomplish this:
>>> dst[:] = ramp.repeat(dst.shape[1]).reshape(dst.shape)
>>> dst
array([[1. , 1. ],
[1.25, 1.25],
[1.5 , 1.5 ],
[1.75, 1.75],
[2. , 2. ]])
Any ideas?
note
Unlike "Cloning" row or column vectors, I want to assign ramp into dst (or even a subset of dst). In addition, the solution given there uses a python array as the source, not an ndarray, and thus requires calls to .transpose, etc.

Method 1: Use broadcasting:
As OP mentioned in the comment. Broadcasting works on assigment too
dst[:] = ramp[:,None]
Method 2: Use column_stack
N = dst.shape[1]
dst[:] = np.column_stack([ramp.tolist()]*N)
Out[479]:
array([[1. , 1. ],
[1.25, 1.25],
[1.5 , 1.5 ],
[1.75, 1.75],
[2. , 2. ]])
Method 3: use np.tile
N = dst.shape[1]
dst[:] = np.tile(ramp[:,None], (1,N))

Related

Discretize only a certain arrrays in a tensor with TensorFlow

I have the following array:-
import numpy as np
import tensorflow as tf
input = np.array([[-1.5, 1.0, 3.4, .5], [0.0, 3.0, 1.3, 0.0]])
layer = tf.keras.layers.Discretization(num_bins=2, epsilon=0.01)
layer.adapt(input)
layer(input)
<tf.Tensor: shape=(2, 4), dtype=int64, numpy=
array([[0, 1, 1, 1],
[0, 1, 1, 0]])>
This discretizes the whole tensor. I would like to know if there is a way through which I can just discretize the second array in the tensor.
We can create a mask based on the index of the array that needs to be discretized:
def get_mask(x, array_index):
x = tf.Variable(tf.ones_like(input, dtype=tf.float32))
indices = tf.Variable(array_index, dtype=tf.int32)
updates = tf.Variable(tf.zeros( (indices.shape[0], x.shape[1])), dtype=tf.float32)
return tf.compat.v1.scatter_nd_update(x, indices, updates)
And calling
> mask = get_mask(input, np.array([[1]])) #second array
>
> returns the mask of:
array([[1., 1., 1., 1.],
[0., 0., 0., 0.]])
Then we can apply mask: tf.cast(layer(input), tf.float32) * (1-mask) + input*mask which returns:
array([[-1.5, 1. , 3.4, 0.5],
[ 0. , 1. , 1. , 0. ]]
The above should work for any array and any array index to discretize.

How to divide an array by an other array element wise in numpy?

I have two arrays, and I want all the elements of one to be divided by the second. For example,
In [24]: a = np.array([1,2,3])
In [25]: b = np.array([1,2,3])
In [26]: a/b
Out[26]: array([1., 1., 1.])
In [27]: 1/b
Out[27]: array([1. , 0.5 , 0.33333333])
This is not the answer I want, the output I want is like (we can see all of the elements of a are divided by b)
In [28]: c = []
In [29]: for i in a:
...: c.append(i/b)
...:
In [30]: c
Out[30]:
[array([1. , 0.5 , 0.33333333]),
array([2. , 1. , 0.66666667]),
In [34]: np.array(c)
Out[34]:
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
But I don't like for loop, it's too slow for big data, so is there a function that included in numpy package or any good (faster) way to solve this problem?
It is simple to do in pure numpy, you can use broadcasting to calculate the outer product (or any other outer operation) of two vectors:
import numpy as np
a = np.arange(1, 4)
b = np.arange(1, 4)
c = a[:,np.newaxis] / b
# array([[1. , 0.5 , 0.33333333],
# [2. , 1. , 0.66666667],
# [3. , 1.5 , 1. ]])
This works, since a[:,np.newaxis] increases the dimension of the (3,) shaped array a into a (3, 1) shaped array, which can be used for the desired broadcasting operation.
First you need to cast a into a 2D array (same shape as the output), then repeat for the dimension you want to loop over. Then vectorized division will work.
>>> a.reshape(-1,1)
array([[1],
[2],
[3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1)
array([[1, 1, 1],
[2, 2, 2],
[3, 3, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1) / b
array([[1. , 0.5 , 0.33333333],
[2. , 1. , 0.66666667],
[3. , 1.5 , 1. ]])
# Transpose will let you do it the other way around, but then you just get 1 for everything
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
>>> a.reshape(-1,1).repeat(b.shape[0], axis=1).T / b
array([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
This should do the job:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([1, 2, 3])
print(a.reshape(-1, 1) / b)
Output:
[[ 1. 0.5 0.33333333]
[ 2. 1. 0.66666667]
[ 3. 1.5 1. ]]

What does the rcond parameter of numpy.linalg.pinv do?

While looking up how to calculate pseudo-inverses in numpy (1.15.4) I noticed that numpy.linalg.pinv has a parameter rcond for which the description reads:
rcond : (…) array_like of float
Cutoff for small singular values. Singular values smaller (in
modulus) than rcond * largest_singular_value (again, in modulus)
are set to zero. Broadcasts against the stack of matrices
From my understanding if rcond is a scalar float, all entries
in the output of pinv which would have been smaller than rcond should be set to zero instead (which would be really useful) but this is not what happens, e.g.:
>>> A = np.array([[ 0., 0.3, 1., 0.],
[ 0., 0.4, -0.3, 0.],
[ 0., 1., -0.1, 0.]])
>>> np.linalg.pinv(A, rcond=1e-3)
array([[ 8.31963531e-17, -4.52584594e-17, -5.09901252e-17],
[ 1.82668420e-01, 3.39032588e-01, 8.09586439e-01],
[ 8.95805933e-01, -2.97384188e-01, -1.49788105e-01],
[ 0.00000000e+00, 0.00000000e+00, 0.00000000e+00]])
What does this parameter actually do? And can I only get the behaviour I actually want by iterating over the whole output matrix again?
Under the hood, a pseudoinverse is calculated using a singular value decomposition. An initial matrix A=UDV^T is inverted as A^+=VD^+U^T, where D is a diagonal matrix with positive real values (singular values). rcond is used to zero out small entries in D. For example:
import numpy as np
# Initial matrix
a = np.array([[1, 0],
[0, 0.1]])
# SVD with diagonal entries in D = [1. , 0.1]
print(np.linalg.svd(a))
# (array([[1., 0.],
# [0., 1.]]),
# array([1. , 0.1]),
# array([[1., 0.],
# [0., 1.]]))
# Pseudoinverse
c = np.linalg.pinv(a)
print(c)
# [[ 1. 0.]
# [ 0. 10.]]
# Reconstruction is perfect
print(np.dot(a, np.dot(c, a)))
# [[1. 0. ]
# [0. 0.1]]
# Zero out all entries in D below rcond * largest_singular_value = 0.2 * 1
# Not entries of the initial or inverse matrices!
d = np.linalg.pinv(a, rcond=0.2)
print(d)
# [[1. 0.]
# [0. 0.]]
# Reconstruction is imperfect
print(np.dot(a, np.dot(d, a)))
# [[1. 0.]
# [0. 0.]]
To just zero out small values of a matrix:
a = np.array([[1, 2],
[3, 0.1]])
a[a < 0.5] = 0
print(a)
# [[1. 2.]
# [3. 0.]]

Understanding axes in NumPy

I was going through NumPy documentation, and am not able to understand one point. It mentions, for the example below, the array has rank 2 (it is 2-dimensional). The first dimension (axis) has a length of 2, the second dimension has a length of 3.
[[ 1., 0., 0.],
[ 0., 1., 2.]]
How does the first dimension (axis) have a length of 2?
Edit:
The reason for my confusion is the below statement in the documentation.
The coordinates of a point in 3D space [1, 2, 1] is an array of rank
1, because it has one axis. That axis has a length of 3.
In the original 2D ndarray, I assumed that the number of lists identifies the rank/dimension, and I wrongly assumed that the length of each list denotes the length of each dimension (in that order). So, as per my understanding, the first dimension should be having a length of 3, since the length of the first list is 3.
In numpy, axis ordering follows zyx convention, instead of the usual (and maybe more intuitive) xyz.
Visually, it means that for a 2D array where the horizontal axis is x and the vertical axis is y:
x -->
y 0 1 2
| 0 [[1., 0., 0.],
V 1 [0., 1., 2.]]
The shape of this array is (2, 3) because it is ordered (y, x), with the first axis y of length 2.
And verifying this with slicing:
import numpy as np
a = np.array([[1, 0, 0], [0, 1, 2]], dtype=np.float)
>>> a
Out[]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
>>> a[0, :] # Slice index 0 of first axis
Out[]: array([ 1., 0., 0.]) # Get values along second axis `x` of length 3
>>> a[:, 2] # Slice index 2 of second axis
Out[]: array([ 0., 2.]) # Get values along first axis `y` of length 2
You may be confusing the other sentence with the picture example below. Think of it like this: Rank = number of lists in the list(array) and the term length in your question can be thought of length = the number of 'things' in the list(array)
I think they are trying to describe to you the definition of shape which is in this case (2,3)
in that post I think the key sentence is here:
In NumPy dimensions are called axes. The number of axes is rank.
If you print the numpy array
print(np.array([[ 1. 0. 0.],[ 0. 1. 2.]])
You'll get the following output
#col1 col2 col3
[[ 1. 0. 0.] # row 1
[ 0. 1. 2.]] # row 2
Think of it as a 2 by 3 matrix... 2 rows, 3 columns. It is a 2d array because it is a list of lists. ([[ at the start is a hint its 2d)).
The 2d numpy array
np.array([[ 1. 0., 0., 6.],[ 0. 1. 2., 7.],[3.,4.,5,8.]])
would print as
#col1 col2 col3 col4
[[ 1. 0. , 0., 6.] # row 1
[ 0. 1. , 2., 7.] # row 2
[3., 4. , 5., 8.]] # row 3
This is a 3 by 4 2d array (3 rows, 4 columns)
The first dimensions is the length:
In [11]: a = np.array([[ 1., 0., 0.], [ 0., 1., 2.]])
In [12]: a
Out[12]:
array([[ 1., 0., 0.],
[ 0., 1., 2.]])
In [13]: len(a) # "length of first dimension"
Out[13]: 2
The second is the length of each "row":
In [14]: [len(aa) for aa in a] # 3 is "length of second dimension"
Out[14]: [3, 3]
Many numpy functions take axis as an argument, for example you can sum over an axis:
In [15]: a.sum(axis=0)
Out[15]: array([ 1., 1., 2.])
In [16]: a.sum(axis=1)
Out[16]: array([ 1., 3.])
The thing to note is that you can have higher dimensional arrays:
In [21]: b = np.array([[[1., 0., 0.], [ 0., 1., 2.]]])
In [22]: b
Out[22]:
array([[[ 1., 0., 0.],
[ 0., 1., 2.]]])
In [23]: b.sum(axis=2)
Out[23]: array([[ 1., 3.]])
Keep the following points in mind when considering Numpy axes:
Each sub-level of a list (or array) represents an axis. For example:
import numpy as np
a = np.array([1,2]) # 1 axis
b = np.array([[1,2],[3,4]]) # 2 axes
c = np.array([[[1,2],[3,4]],[[5,6],[7,8]]]) # 3 axes
Axis labels correspond to the level of the sub-list they represent, starting with axis 0 for the outer most list.
To illustrate this, consider the following array of different shape, each with 24 elements:
# 1D Array
a0 = np.array(
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
)
a0.shape # (24,) - here, the length along the 0-axis is 24
# 2D Array
a01 = np.array(
[
[1.1, 1.2, 1.3, 1.4],
[2.1, 2.2, 2.3, 2.4],
[3.1, 3.2, 3.3, 3.4],
[4.1, 4.2, 4.3, 4.4],
[5.1, 5.2, 5.3, 5.4],
[6.1, 6.2, 6.3, 6.4]
]
)
a01.shape # (6, 4) - now, the length along the 0-axis is 6
# 3D Array
a012 = np.array(
[
[
[1.1.1, 1.1.2],
[1.2.1, 1.2.2],
[1.3.1, 1.3.2]
],
[
[2.1.1, 2.1.2],
[2.2.1, 2.2.2],
[2.3.1, 2.3.2]
],
[
[3.1.1, 3.1.2],
[3.2.1, 3.2.2],
[3.3.1, 3.3.2]
],
[
[4.1.1, 4.1.2],
[4.2.1, 4.2.2],
[4.3.1, 4.3.2]
]
)
a012.shape # (4, 3, 2) - and finally, the length along the 0-axis is 4

Apply logarithm only on positive entries of array

SciPy thoughtfully provides the scipy.log function, which will take an array and then log all elements in that array. Is there a way to log only the positive (i.e. positive non-zero) elements of an array?
What about where()?
import numpy as np
a = np.array([ 1., -1., 0.5, -0.5, 0., 2. ])
la = np.where(a>0, np.log(a), a)
print(la)
# Gives [ 0. -1. -0.69314718 -0.5 0. 0.69314718]
With boolean indexing:
In [695]: a = np.array([ 1. , -1. , 0.5, -0.5, 0. , 2. ])
In [696]: I=a>0
In [697]: a[I]=np.log(a[I])
In [698]: a
Out[698]:
array([ 0. , -1. , -0.69314718, -0.5 , 0. ,
0.69314718])
or if you just want to keep the logged terms
In [707]: np.log(a[I])
Out[707]: array([ 0. , -0.69314718, 0.69314718])
Here's a vectorized solution that keeps the original array and leaves non-positive values unchanged:
In [1]: import numpy as np
In [2]: a = np.array([ 1., -1., 0.5, -0.5, 0., 2. ])
In [3]: loga = np.log(a)
In [4]: loga
Out[4]: array([ 0., nan, -0.69314718, nan, -inf, 0.69314718 ])
In [5]: # Remove nasty nanses and infses
In [6]: loga[np.where(~np.isfinite(loga))] = a[np.where(~np.isfinite(loga))]
In [7]: loga
Out[7]: array([ 0., -1., -0.69314718, -0.5, 0., 0.69314718])
Here, np.where(~np.isfinite(loga)) returns the indexes of non-finite entries in the loga array, and we replace these values with the corresponding originals from a.
Probably not the answer you're looking for but I'll just put this here:
for i in range(0,rows):
for j in range(0,cols):
if array[i,j] > 0:
array[i,j]=log(array[i,j])
You can vectorize a custom function.
import numpy as np
def pos_log(x):
if x > 0:
return np.log(x)
return x
v_pos_log = np.vectorize(pos_log, otypes=[np.float])
result = v_pos_log(np.array([-1, 1]))
#>>> np.array([-1, 0])
But as the documentation for numpy.vectorize says "The vectorize function is provided primarily for convenience, not for performance. The implementation is essentially a for loop."

Categories