Numpy array of rank > 1 and one of dimensions == 0 - python

I am implementing a function that reads data from file into a multi-dimensional numpy array. Data is regularly structured in sense of dimension lengths, however, some dimensions may be missing, in which case, I would let the length of that dimension be 0. So I have stumbled upon this behavior:
In [1]: np.random.random((3,3))
Out[1]:
array([[ 0.59756568, 0.47198749, 0.23442854],
[ 0.29374254, 0.58289927, 0.40497268],
[ 0.00481053, 0.63471263, 0.90053086]])
In [2]: np.random.random((0,3,3))
Out[2]: array([], shape=(0, 3, 3), dtype=float64)
OK, so I get an empty array. This makes sense if I look at it as 2nd and 3rd dimensions are subset of the 1st, which is nil, and thus the whole array is nil. However, I would expect np.random.random((3,3,0)) to be equivalent to np.random.random((3,3)). However,
In [3]: np.random.random((3,3,0))
Out[3]: array([], shape=(3, 3, 0), dtype=float64)
An empty array again.
Is this expected behavior? I understand the difference between np.array((3,3)) and np.array((3,3,1)) or np.array((1,3,3)), but I am looking for an explanation why does a dimension of length 0 degenerate the whole array and not only that dimension. Is it just me, or is this one of Python/numpy WTFs?

As I state in a comment, you are getting an empty array because the size of an array is always zero if any of the dimensions are zero. Can I ask what you are trying to do? If you want an empty 3rd dimension you can try something like the following:
>>> x = numpy.random.random((3,3))
>>> y = x[..., numpy.newaxis]
>>> y
array([[[ 0.92418241],
[ 0.76716579],
[ 0.82485034]],
[[ 0.30571695],
[ 0.71012271],
[ 0.54609355]],
[[ 0.98192734],
[ 0.25505518],
[ 0.75473749]]])
>>> y.shape
(3, 3, 1)
>>> x.shape
(3, 3)

Related

python, tensorflow, how to get a tensor shape with half the features

I need the shape of a tensor, except instead of feature_size as the -1 dimension I need feature_size//2
The code I'm currently using is
_, half_output = tf.split(output,2,axis=-1)
half_shape = tf.shape(half_output)
This works but it's incredibly inelegant. I don't need an extra copy of half the tensor, I just need that shape. I've tried to do this other ways but nothing besides this bosh solution has worked yet.
Anyone know a simple way to do this?
A simple way to get the shape with the last value halved:
half_shape = tf.shape(output[..., 1::2])
What it does is simply iterate output in its last dimension with step 2, starting from the second element (index 1).
The ... doesn't touch other dimensions. As a result, you will have output[..., 1::2] with the same dimensions as output, except for the last one, which will be sampled like the following example, resulting in half the original value.
>>> a = np.random.rand(5,5)
>>> a
array([[ 0.21553665, 0.62008421, 0.67069869, 0.74136913, 0.97809012],
[ 0.70765302, 0.14858418, 0.47908281, 0.75706245, 0.70175868],
[ 0.13786186, 0.23760233, 0.31895335, 0.69977537, 0.40196103],
[ 0.7601455 , 0.09566717, 0.02146819, 0.80189659, 0.41992885],
[ 0.88053697, 0.33472285, 0.84303012, 0.10148065, 0.46584882]])
>>> a[..., 1::2]
array([[ 0.62008421, 0.74136913],
[ 0.14858418, 0.75706245],
[ 0.23760233, 0.69977537],
[ 0.09566717, 0.80189659],
[ 0.33472285, 0.10148065]])
This half_shape prints the following Tensor:
Tensor("Shape:0", shape=(3,), dtype=int32)
Alternatively you could get the shape of output and create the shape you want manually:
s = output.get_shape().as_list()
half_shape = tf.TensorShape(s[:-1] + [s[-1] // 2])
This half_shape prints a TensorShape showing the shape halved in the last dimension.

How can I initialize an empty Numpy array with a given number of dimensions?

I basically want to initialize an empty 6-tensor, like this:
a = np.array([[[[[[]]]]]])
Is there a better way than writing the brackets explicitly?
You can use empty or zeros.
For example, to create a new array of 2x3, filled with zeros, use: numpy.zeros(shape=(2,3))
You can do something like np.empty(shape = [1] * (dimensions - 1) + [0]).
Example:
>>> a = np.array([[[[[[]]]]]])
>>> b = np.empty(shape = [1] * 5 + [0])
>>> a.shape == b.shape
True
Iteratively adding rows of that rank-1 using np.concatenate(a,b,axis=0)
Don't. Creating an array iteratively is slow, since it has to create a new array at each step. Plus a and b have to match in all dimensions except the concatenation one.
np.concatenate((np.array([[[]]]),np.array([1,2,3])), axis=0)
will give you dimensions error.
The only thing you can concatenate to such an array is an array with size 0 dimenions
In [348]: np.concatenate((np.array([[]]),np.array([[]])),axis=0)
Out[348]: array([], shape=(2, 0), dtype=float64)
In [349]: np.concatenate((np.array([[]]),np.array([[1,2]])),axis=0)
------
ValueError: all the input array dimensions except for the concatenation axis must match exactly
In [354]: np.array([[]])
Out[354]: array([], shape=(1, 0), dtype=float64)
In [355]: np.concatenate((np.zeros((1,0)),np.zeros((3,0))),axis=0)
Out[355]: array([], shape=(4, 0), dtype=float64)
To work iteratively, start with a empty list, and append to it; then make the array at the end.
a = np.zeros((1,1,1,1,1,0)) could be concatenated on the last axis with another np.ones((1,1,1,1,1,n)) array.
In [363]: np.concatenate((a,np.array([[[[[[1,2,3]]]]]])),axis=-1)
Out[363]: array([[[[[[ 1., 2., 3.]]]]]])
You could directly use the ndarray constructor:
numpy.ndarray(shape=(1,) * 6)
Or the empty variant, since it seems to be more popular:
numpy.empty(shape=(1,) * 6)
This should do it:
x = np.array([])

Why does numpy's broadcasting sometimes allow comparing arrays of different lengths?

I'm trying to understand how numpy's broadcasting affects the output of np.allclose.
>>> np.allclose([], [1.])
True
I don't see why that works, but this does not:
>>> np.allclose([], [1., 2.])
ValueError: operands could not be broadcast together with shapes (0,) (2,)
What are the rules here? I can't finding anything in the numpy docs regarding empty arrays.
Broadcasting rules apply to addition as well,
In [7]: np.array([])+np.array([1.])
Out[7]: array([], dtype=float64)
In [8]: np.array([])+np.array([1.,2.])
....
ValueError: operands could not be broadcast together with shapes (0,) (2,)
Let's look at the shapes.
In [9]: np.array([]).shape,np.array([1.]).shape,np.array([1,2]).shape
Out[9]: ((0,), (1,), (2,))
(0,) and (1,) - the (1,) can be adjusted to match the shape of the other array. A 1 dimension can be adjusted to match the other array, from example increased from 1 to 3. But here it was (apparently) adjusted from 1 to 0. I don't usually work with arrays with a 0 dimension, but this looks like a proper generalization of higher dimensions.
Try (0,) and (1,1). The result is (1,0):
In [10]: np.array([])+np.array([[1.]])
Out[10]: array([], shape=(1, 0), dtype=float64)
(0,), (1,1) => (1,0),(1,1) => (1,0)
As for the 2nd case with shapes (0,) and (2,); there isn't any size 1 dimension to adjust, hence the error.
Shapes (0,) and (2,1) do broadcast (to (2,0)):
In [12]: np.array([])+np.array([[1.,2]]).T
Out[12]: array([], shape=(2, 0), dtype=float64)
Broadcasting doesn't affect np.allclose in any other way than it affects any other function.
As in the comment by #cel, [1.] is of dimension 1 and so can be broadcasted to any other dimension, including 0. On the other hand [1., 2.] is of dimension 2 and thus cannot be broadcasted.
Now why allclose([],[1.]) == True? This actually makes sense: it means that all elements in [] are close to 1.. The opposite would mean that there is at least one element in [] which is not close to 1. which is obviously False since there are no elements at all in [].
Another way to think about it is to ask yourself how you would actually code allclose():
def allclose(array, target=1.):
for x in array:
if not isclose(x, target):
return False
return True
This would return True when called with [].

why do we need np.squeeze()?

Very often, arrays are squeezed with np.squeeze(). In the documentation, it says
Remove single-dimensional entries from the shape of a.
However I'm still wondering: Why are zero and nondimensional entries in the shape of a? Or to put it differently: Why do both a.shape = (2,1) and (2,) exist?
Besides the mathematical differences between the two things, there is the issue of predictability. If your suggestion was followed, you could at no point rely on the dimension of your array. So any expression of the form my_array[x,y] would need to be replaced by something that first checks if my_array is actually two-dimensional and did not have an implicit squeeze at some point. This would probably obfuscate code far more than the occasional squeeze, which does a clearly specified thing.
Actually, it might even be very hard to tell, which axis has been removed, leading to a whole host of new problems.
In the spirit of The Zen of Python, also Explicit is better than implicit, we can also say that we should prefer explicit squeeze to implicit array conversion.
This helps you get rid of useless one dimension arrays like using
[7,8,9] instead of [[[7,8,9]]]
or [[1,2,3],[4,5,6]] instead of [[[[1,2,3],[4,5,6]]]].
Check this link from tutorials point for example.
One example of the importance is when multiplying arrays. Two 2-dimensional arrays will multiply each value at a time
e.g.
>>> x = np.ones((2, 1))*2
>>> y = np.ones((2, 1))*3
>>> x.shape
(2,1)
>>> x*y
array([[ 6.],
[ 6.]])
If you multiply a 1d array by a 2d array then the behaviour is different
>>> z = np.ones((2,))*3
>>> x*z
array([[ 6., 6.],
[ 6., 6.]])
Secondly, you also might want to squeeze the earlier dimensions e.g. a.shape = (1,2,2) to a.shape = (2,2)
When you squeeze a (2,1) array, you get (2,) which works as both (2,1) and (1,2):
>>> a = np.ones(2)
>>> a.shape
(2,)
>>> a.T.shape
(2,)
>>> X = np.ones((2,2))*2
>>> np.dot(a,X)
[4. 4.]
>>> np.dot(X,a)
[4. 4.]
This cannot happen with a (2,1) array:
>>> b = np.ones((2,1))
>>> np.dot(b,X)
Traceback (most recent call last):
ValueError: shapes (2,1) and (2,2) not aligned: 1 (dim 1) != 2 (dim 0)

derivative with numpy.diff problems

I have this problem:
I have an array of 7 elements:
vector = [array([ 76.27789424]), array([ 76.06870298]), array([ 75.85016864]), array([ 75.71155968]), array([ 75.16982466]), array([ 73.08832948]), array([ 68.59935515])]
(this array is the result of a lot of operation)
now I want calculate the derivative with numpy.diff(vector) but I know that the type must be a numpy array.
for this, I type:
vector=numpy.array(vector);
if I print the vector, now, the result is:
[[ 76.27789424]
[ 76.06870298]
[ 75.85016864]
[ 75.71155968]
[ 75.16982466]
[ 73.08832948]
[ 68.59935515]]
but If i try to calculate the derivative, the result is [].
Can You help me, please?
Thanks a lot!
vector is a list of arrays, to get a 1-D NumPy array use a list comprehension and pass it to numpy.array:
>>> vector = numpy.array([x[0] for x in vector])
>>> numpy.diff(vector)
array([-0.20919126, -0.21853434, -0.13860896, -0.54173502, -2.08149518,
-4.48897433])
vector = numpy.array(vector);
gives you a two dimensional array with seven rows and one column
>>> vector.shape
(7, 1)
The shape reads like: (length axis 0, length axis 1, length axis 2, ...)
As you can see the last axis is axis 1 and it's length is 1.
from the docs
numpy.diff(a, n=1, axis=-1)
...
axis : int, optional
The axis along which the difference is taken, default is the last axis.
There is no way to take difference of a single value. So lets try to use the first axis which has a length of 7. Since axis counting starts with zero, the first axis is 0
>>> np.diff(vector, axis=0)
array([[-0.20919126],
[-0.21853434],
[-0.13860896],
[-0.54173502],
[-2.08149518],
[-4.48897433]])
Note that every degree of derivative will be one element shorter so the new shape is (7-1, 1) which is (6, 1). Lets verify that
>>> np.diff(vector, axis=0).shape
(6, 1)

Categories