What does *variable.shape mean in python - python

I know "*variable_name" assists in packing and unpacking.
But how does variable_name.shape work? Unable to visualize why the second dimension is squeezed out when prefixing with ""?
print("top_class.shape {}".format(top_class.shape))
top_class.shape torch.Size([64, 1])
print("*top_class.shape {}".format(*top_class.shape))
*top_class.shape 64

for numpy.array that is extensively used in math-related and image processing programs, .shape describes the size of the array for all existing dimensions:
>>> import numpy as np
>>> a = np.zeros((3,3,3))
>>> a
array([[[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]],
[[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]],
[[ 0., 0., 0.],
[ 0., 0., 0.],
[ 0., 0., 0.]]])
>>> a.shape
(3, 3, 3)
>>>
The asterisk "unpacks" the tuple into several separate arguments, in your case (64,1) becomes 64, 1, so only the first one get printed because there's only one format specification.

Related

numpy: Stop numpy.array() from trying to reconcile elements. Create ndarry from list without trying to merge / reconcile the elements

I have two 2d matrices in a list, which i want to convert to a numpy array. Below are 3 examples a,b,c .
>>> import numpy as np
>>> a = [np.zeros((3,5)), np.zeros((2,9))]
>>> np.array(a)
>>> array([array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]),
array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)
>>> b = [np.zeros((3,5)), np.zeros((3,9))]
np.array(b)
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2019.2.4\helpers\pydev\_pydevd_bundle\pydevd_exec.py", line 3, in Exec
exec exp in global_vars, local_vars
File "<input>", line 1, in <module>
ValueError: could not broadcast input array from shape (3,5) into shape (3)
>>> c = [np.zeros((3,5)), np.zeros((4,9))]
np.array(c)
array([array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]),
array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)
As one can observe case a & c work but b does not. b does throw an exception. The difference is that in example b the first dimension of the 2 matrices match.
I found the following answer, which explains why this behaviour occurs.
If only the first dimension does not match, the arrays are still matched, but as individual objects, no attempt is made to reconcile them into a new (four dimensional) array.
My Question: I don't want numpy to reconcile the matrices. I just want the same behaviour as if the first dimension doesn't match. I want them to be matched as indivudal objects even if they have the same first dimension. How do I achieve this ?
Numpy still complains even if you explicitly pass object as the dtype:
>>> np.array(b, dtype=object)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not broadcast input array from shape (3,5) into shape (3)
Essentially, numpy is not really written around using dtype=object, it always assumes you want an array with a primitve numeric or structured dtype.
So I think your only option is something like:
>>> arr = np.empty(len(b), dtype=object)
>>> arr[:] = b
>>> arr
array([array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]),
array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)
And just for fun, you can use the actual np.ndarray type constructor, although this isn't very easy:
>>> np.ndarray(dtype=object, shape=len(b), buffer=np.array(list(map(id, b)),dtype=np.uint64))
array([array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]),
array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)
And note, that relies on a CPython implementation detail, that id is simply the address of the python object. So mostly I'm just showing it for fun.
In the latest version we are starting to see a warning:
In [185]: np.__version__
Out[185]: '1.19.0'
In [187]: np.array([np.zeros((3,5)), np.zeros((2,9))])
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
#!/usr/bin/python3
Out[187]:
array([array([[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0.]]),
array([[0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0.]])], dtype=object)
It still makes the object dtype array. In the matching first dimension case we get the warning and error.
In [188]: np.array([np.zeros((3,5)), np.zeros((3,9))])
/usr/local/bin/ipython3:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
#!/usr/bin/python3
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-188-b6a4475774d0> in <module>
----> 1 np.array([np.zeros((3,5)), np.zeros((3,9))])
ValueError: could not broadcast input array from shape (3,5) into shape (3)
Basically np.array tries, as first step, to make a multidimensional numeric array. Failing that it takes two routes - make an object dtype array or failure. Details are buried in compiled code.
The preallocate and assignment is the best way if you want full control over how the object array is created.
In [189]: res=np.empty(2,object)
In [191]: res[:] = [np.zeros((3,5)), np.zeros((3,9))]

Removing NaN rows from a three dimensional array

How can I remove the NaN rows from the array below using indices (since I will need to remove the same rows from a different array.
array([[[nan, 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]],
[[ 0., 0., 0., 0.],
[ 0., nan, 0., 0.],
[ 0., 0., 0., 0.]]])
I get the indices of the rows to be removed by using the command
a[np.isnan(a).any(axis=2)]
But using what I would normally use on a 2D array does not produce the desired result, losing the array structure.
a[~np.isnan(a).any(axis=2)]
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
How can I remove the rows I want using the indices obtained from my first command?
You need to reshape:
a[~np.isnan(a).any(axis=2)].reshape(a.shape[0], -1, a.shape[2])
But be aware that the number of NaN-rows at each 2D subarray should be the same to get a new 3D array.

Numpy: raise ValueError("shape too large to be a matrix.")

I am using a 12x12 numpy matrix, and I am getting "shape too large to be a matrix." My best guess is that numpy "kron" function is making trouble.
Here's my code:
a = np.matrix("0 1 0; 0 0 1; 0 0 0 ")
a_dag = np.matrix("0 0 0; 1 0 0 ; 0 1 0")
Sp = np.matrix("0 1; 0 0")
Sm = np.matrix("0 0; 1 0")
...
119 H_I1 = (np.exp(1j*(phi-omega*t))*kron(np.eye(3),Sp,np.eye(2))
120 +np.exp(-1j*(phi-omega*t))*kron(np.eye(3),Sm,np.eye(2)))
121 H_I2 = kron(a,Sp,np.eye(2)) + kron(a_dag,Sm,np.eye(2))
Here's the error:
Traceback (most recent call last):
File "/home/fyodr/qc_final.py", line 121, in <module>
H_I2 = kron(a,Sp,np.eye(2)) + kron(a_dag,Sm,np.eye(2))
File "/home/fyodr/qc_final.py", line 70, in kron
return np.kron(m[0],kron(m[1:]))
File "/usr/lib/python2.7/dist-packages/numpy/lib/shape_base.py", line 754, in kron
result = wrapper(result)
File "/usr/lib/python2.7/dist-packages/numpy/matrixlib/defmatrix.py", line 303, in __array_finalize__
raise ValueError("shape too large to be a matrix.")
ValueError: shape too large to be a matrix.
Thanks!
EDIT: I defined kron as
def kron(*m):
if len(m) == 1:
return m
else :
return np.kron(m[0],kron(m[1:]))
If np.kron were computing a regular kronecker product, then this should not be a problem.
As I commented, your kron with 3 arguments is unknown. But if it produces a 3d array as some stage, it could produce your error.
In [264]: np.kron(a.A, np.ones((3,3,3))).shape
Out[264]: (3, 9, 9)
A 2d array with a 3d returns a 3d array. But if a is a np.matrix it tries to convert that to a matrix resulting in the error. np.matrix is always 2d.
In [265]: np.kron(a, np.ones((3,3,3))).shape
---------------------------------------------------------------------------
....
ValueError: shape too large to be a matrix.
Experienced numpy users don't use np.matrix unless we really need its features, and can live with its drawbacks.
With the kron that you added, the recursive step does:
In [270]: m = (a, Sp, np.eye(2))
In [271]: kron(m[1:])
Out[271]:
((matrix([[0, 1],
[0, 0]]), array([[ 1., 0.],
[ 0., 1.]])),)
In [272]: np.array(_)
Out[272]:
array([[[[ 0., 1.],
[ 0., 0.]],
[[ 1., 0.],
[ 0., 1.]]]])
In [273]: _.shape
Out[273]: (1, 2, 2, 2)
For 2 items, your kron returns a nested tuple of arrays. np.kron applies a np.asanyarray(b) to that 2nd argument, which results in a 4d array.
Applying your kron to full *m, but turning the matrices into arrays:
In [275]: kron(a.A, Sp.A, np.eye(2))
Out[275]:
array([[[[ 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]],
[[ 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0.]]]])
In [276]: _.shape
Out[276]: (1, 2, 6, 6)
Did you even test the kron function by itself? It should have been debugged before use in a more complicated task.

Issues using Keras np_utils.to_categorical

I'm trying to make an array of one-hot vector of integers into an array of one-hot vector that keras will be able to use to fit my model. Here's the relevant part of the code:
Y_train = np.hstack(np.asarray(dataframe.output_vector)).reshape(len(dataframe),len(output_cols))
dummy_y = np_utils.to_categorical(Y_train)
Below is an image showing what Y_train and dummy_y actually are.
I couldn't find any documentation for to_categorical that could help me.
Thanks in advance.
np_utils.to_categorical is used to convert array of labeled data(from 0 to nb_classes - 1) to one-hot vector.
The official doc with an example.
In [1]: from keras.utils import np_utils # from keras import utils as np_utils
Using Theano backend.
In [2]: np_utils.to_categorical?
Signature: np_utils.to_categorical(y, num_classes=None)
Docstring:
Convert class vector (integers from 0 to nb_classes) to binary class matrix, for use with categorical_crossentropy.
# Arguments
y: class vector to be converted into a matrix
nb_classes: total number of classes
# Returns
A binary matrix representation of the input.
File: /usr/local/lib/python3.5/dist-packages/keras/utils/np_utils.py
Type: function
In [3]: y_train = [1, 0, 3, 4, 5, 0, 2, 1]
In [4]: """ Assuming the labeled dataset has total six classes (0 to 5), y_train is the true label array """
In [5]: np_utils.to_categorical(y_train, num_classes=6)
Out[5]:
array([[ 0., 1., 0., 0., 0., 0.],
[ 1., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0.],
[ 0., 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 0., 1.],
[ 1., 0., 0., 0., 0., 0.],
[ 0., 0., 1., 0., 0., 0.],
[ 0., 1., 0., 0., 0., 0.]])
from keras.utils.np_utils import to_categorical
UPDATED --- keras.utils.np_utils doesn't work in newer versions; if so use:
from tensorflow.keras.utils import to_categorical
In both cases
to_categorical(0, max_value_of_array)
It assumes the class values were in string and you will be label encoding them, hence starting everytime from 0 to n-classes.
for the same example:- consider an array of {1,2,3,4,2}
The output will be [zero value, one value, two value, three value, four value]
array([[ 0., 1., 0., 0., 0.],
[ 0., 0., 1., 0., 0.],
[ 0., 0., 0., 1., 0.],
[ 0., 0., 0., 0., 1.],
[ 0., 0., 1., 0., 0.]],
Let's look at another example:-
Again, for an array having 3 classes, Y = {4, 8, 9, 4, 9}
to_categorical(Y) will output
array([[0., 0., 0., 0., 1., 0., 0., 0., 0., 0. ],
[0., 0., 0., 0., 0., 0., 0., 0., 1., 0. ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1. ],
[0., 0., 0., 0., 1., 0., 0., 0., 0., 0. ],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 1. ]]

Inexplicable behavior when using vlen with h5py

I am using h5py to build a dataset. Since I want to store arrays with different #of rows dimension, I use the h5py special_type vlen. However, I experience behavior I can't explain, maybe you can me help in understanding what is happening:
>>>> import h5py
>>>> import numpy as np
>>>> fp = h5py.File(datasource_fname, mode='w')
>>>> dt = h5py.special_dtype(vlen=np.dtype('float32'))
>>>> train_targets = fp.create_dataset('target_sequence', shape=(9549, 5,), dtype=dt)
>>>> test
Out[130]:
array([[ 0., 1., 1., 1., 0., 1., 1., 0., 1., 0., 0.],
[ 1., 0., 0., 0., 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1.]])
>>>> train_targets[0] = test
>>>> train_targets[0]
Out[138]:
array([ array([ 0., 1., 0., 0., 0., 1., 0., 0., 0., 0., 1.], dtype=float32),
array([ 1., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0.], dtype=float32),
array([ 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0.], dtype=float32),
array([ 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0.], dtype=float32),
array([ 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0.], dtype=float32)], dtype=object)
I do expect the train_targets[0] to be of this shape, however I can't recognize the rows in my array. They seem to be totally jumbled about, however it is consistent. By which I mean that every time I try the above code, train_targets[0] looks the same.
To clarify: the first element in my train_targets, in this case test, has shape (5,11), however the second element might be of shape (5,38) which is why I use vlen.
Thank you for your help
Mat
I think
train_targets[0] = test
has stored your (11,5) array as an F ordered array in a row of train_targets. According to the (9549,5) shape, that's a row of 5 elements. And since it is vlen, each element is a 1d array of length 11.
That's what you get back in train_targets[0] - an array of 5 arrays, each shape (11,), with values taken from test (order F).
So I think there are 2 issues - what a 2d shape means, and what vlen allows.
My version of h5py is pre v2.3, so I only get string vlen. But I suspect your problem may be that vlen only works with 1d arrays, an extension, so to speak, of byte strings.
Does the 5 in shape=(9549, 5,) have anything to do with 5 in the test.shape? I don't think it does, at least not as numpy and h5py see it.
When I make a file following the string vlen example:
>>> f = h5py.File('foo.hdf5')
>>> dt = h5py.special_dtype(vlen=str)
>>> ds = f.create_dataset('VLDS', (100,100), dtype=dt)
and then do:
ds[0]='this one string'
and look at ds[0], I get an object array with 100 elements, each being this string. That is, I've set a whole row of ds.
ds[0,0]='another'
is the correct way to set just one element.
vlen is 'variable length', not 'variable shape'. While the https://www.hdfgroup.org/HDF5/doc/TechNotes/VLTypes.html documentation is not entirely clear on this, I think you can store 1d arrays with shape (11,) and (38,) with vlen, but not 2d ones.
Actually, train_targets output is reproduced with:
In [54]: test1=np.empty((5,),dtype=object)
In [55]: for i in range(5):
test1[i]=test.T.flatten()[i:i+11]
It's 11 values taken from the transpose (F order), but shifted for each sub array.

Categories