Arrays and Matrix in Python vs Matlab - python

Is there any more concise syntax to stack array/matrix? In MatLab, you can simply do [x, y] to stack horizontally, and [x; y] to stack vertically, and it can be easily chained, such as [x, x; y, y]; while in python, it seems to be more tedious, see below:
import numpy as np
x = np.array([[1, 1, 1], [1, 2, 3]])
y = x*10
np.vstack((x, y))
array([[ 1, 1, 1],
[ 1, 2, 3],
[10, 10, 10],
[10, 20, 30]])
np.hstack((x, y))
array([[ 1, 1, 1, 10, 10, 10],
[ 1, 2, 3, 10, 20, 30]])
np.vstack((np.hstack((x, x)), np.hstack((y, y))))
array([[ 1, 1, 1, 1, 1, 1],
[ 1, 2, 3, 1, 2, 3],
[10, 10, 10, 10, 10, 10],
[10, 20, 30, 10, 20, 30]])

MATLAB has its own interpreter, so it can interpret the ; etc to suit its needs. numpyuses the Python interpreter, so can't use or reuse basic syntactic characters like [],;. So the basic array constructor wraps a nested list of lists (takes a list as argument):
np.array([[1,2,3], [4,5,6]])
But that nesting can be carried to any depth, np.array([]), np.array([[[[['foo']]]]]), because arrays can have 0,1, 2 etc dimensions.
MATLAB initially only had 2d matrices, and still can't have 1 or 0d.
In MATLAB that matrix is the basic object (cell and struct came later). In Python lists are the basic object (with tuples and dicts close behind).
np.matrix takes a string argument that imitates the MATLAB syntax. np.matrix('1 2; 3 4'). But np.matrix like the original MATLAB is fixed at 2d.
https://docs.scipy.org/doc/numpy/reference/arrays.classes.html#matrix-objects
https://docs.scipy.org/doc/numpy/reference/generated/numpy.bmat.html#numpy.bmat
But seriously, who makes real, useful matrices with the 1, 2; 3, 4 syntax? Those are toys. I prefer to use np.arange(12).reshape(3,4) if I need a simple example.
numpy has added a np.stack which gives more ways of joining arrays into new constructs. And a np.block:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.block.html#numpy.block

Related

Multiplication with two 2D matrix to output 3D matrix

I am wondering any good ways to calculate this type of multiplication.
It's simply multiplying x[i] by x element-wise, and resulting into [2, 2, 3] matrix.
>>> x
array([[0, 1, 2],
[3, 4, 5]])
>>> output
array([[[ 0, 1, 4],
[ 0, 4, 10]],
[[ 0, 4, 10],
[ 9, 16, 25]]])
I tried with code below and wondering for faster version using numpy.
np.array([
np.multiply(x[i], x)
for i in range(x.shape[0])
])
There are two straightforward ways to do so, the first is using broadcasting, and the second one using einsum. I'd recommed using timeit, to compare the various versions for their speed with the application you have in mind:
out_broadcast = x[:, None, :] * x
out_einsum = np.einsum('ij,kj->ikj',x,x)

Is there an array method for testing multiple equality values?

I want to know where array a is equal to any of the values in array b.
For example,
a = np.random.randint(0,16, size=(3,4))
b = np.array([2,3,9])
# like this, but for any size b:
locations = np.nonzero((a==b[0]) | (a==b[1]) | (a==b[3]))
The reason is so I can change the values in a from (any of b) to another value:
a[locations] = 99
Or-ing the equality checks is not a great solution, because I would like to do this without knowing the size of b ahead of time. Is there an array solution?
[edit]
There are now 2 good answers to this question, one using broadcasting with extra dimensions, and another using np.in1d. Both work for the specific case in this question. I ended up using np.isin instead, since it seems like it is more agnostic to the shapes of both a and b.
I accepted the answer that taught me about in1d since that led me to my preferred solution.
You can use np.in1d then reshape back to a's shape so you can set the values in a to your special flag.
import numpy as np
np.random.seed(410012)
a = np.random.randint(0, 16, size=(3, 4))
#array([[ 8, 5, 5, 15],
# [ 3, 13, 8, 10],
# [ 3, 11, 0, 10]])
b = np.array([[2,3,9], [4,5,6]])
a[np.in1d(a, b).reshape(a.shape)] = 999
#array([[ 8, 999, 999, 15],
# [999, 13, 8, 10],
# [999, 11, 0, 10]])
Or-ing the equality checks is not a great solution, because I would like to do this without knowing the size of b ahead of time.
EDIT:
Vectorized equivalent to the code you have written above -
a = np.random.randint(0,16, size=(3,4))
b = np.array([2,3,9])
locations = np.nonzero((a==b[0]) | (a==b[1]) | (a==b[2]))
locations2 = np.nonzero((a[None,:,:]==b[:,None,None]).any(0))
np.allclose(locations, locations2)
True
This shows that your output is exactly the same as this output, without the need of explicitly mentioning b[0], b[1]... or using a for loop.
Explanation -
Broadcasting an operation can help you in this case. What you are trying to do is to compare each of the (3,4) matrix elements to each value in b which is (3,). This means that the resultant boolean matrix that you want is going to be three, (3,4) matrices, or (3,3,4)
Once you have done that, you want to take an ANY or OR between the three (3,4) matrices element-wise. That would reduce the (3,3,4) to a (3,4)
Finally you want to use np.nonzero to identify the locations where values are equal to TRUE
The above 3 steps can be done as follows -
Broadcasting comparison operation:
a[None,:,:]==b[:,None,None]] #(1,3,4) == (3,1,1) -> (3,3,4)
Reduction using OR logic:
(a[None,:,:]==b[:,None,None]).any(0) #(3,3,4) -> (3,4)
Get non-zero locations:
np.nonzero((a[None,:,:]==b[:,None,None]).any(0))
numpy.isin works on multi-dimensional a and b.
In [1]: import numpy as np
In [2]: a = np.random.randint(0, 16, size=(3, 4)); a
Out[2]:
array([[12, 2, 15, 11],
[12, 15, 5, 10],
[ 4, 2, 14, 7]])
In [3]: b = [2, 4, 5, 12]
In [4]: c = [[2, 4], [5, 12]]
In [5]: np.isin(a, b).astype(int)
Out[5]:
array([[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 0, 0]])
In [6]: np.isin(a, c).astype(int)
Out[6]:
array([[1, 1, 0, 0],
[1, 0, 1, 0],
[1, 1, 0, 0]])
In [7]: a[np.isin(a, b)] = 99; a
Out[7]:
array([[99, 99, 15, 11],
[99, 15, 99, 10],
[99, 99, 14, 7]])

Choose indices in numpy arrays on particular dimensions [duplicate]

This question already has answers here:
Index n dimensional array with (n-1) d array
(3 answers)
Closed 4 years ago.
It is hard to find a clear title but an example will put it clearly.
For example, my inputs are:
c = np.full((4, 3, 2), 5)
c[:,:,1] *= 2
ix = np.random.randint(0, 2, (4, 3))
if ix is:
array([[1, 0, 1],
[0, 0, 1],
[0, 0, 1],
[1, 1, 0]])
if want as a result:
array([[10, 5, 10],
[ 5, 5, 10],
[ 5, 5, 10],
[10, 10, 5]])
My c array can be of arbitrary dimensions, as well a the dimension I want to sample in.
It sounds like interpolation, but I'm reluctant to construct a be array of indices each time I want to apply this. Is there a way of doing this using some kind of indexing on numpy arrays ? Or do I have to use some interpolation methods...
Speed and memory are a concern here because I have to do this many times, and the arrays can be really large.
Thanks for any insight !
Create the x, y indices with numpy.ogrid, and then use advanced indexing:
idx, idy = np.ogrid[:c.shape[0], :c.shape[1]]
c[idx, idy, ix]
#array([[10, 5, 10],
# [ 5, 5, 10],
# [ 5, 5, 10],
# [10, 10, 5]])

C++ equivalent of numpy.expand_dims() and numpy.concatenate()

As it mentioned in the title, I currently have some 2D images data read from OpenCV, I need to change the dimension to 4D. E.g., dimension [320, 720] to [1, 320, 720, 1], and then make the entire data a single 4D matrix.
In Python, I can just do numpy.expand_dims() for each of those images and then numpy.concatenate() them together. I'm wondering if there is some equivalent APIs that I could use in C++. I've found expand_dims() in Tensorflow, but it only works on tensors, and I haven't found anything for concatenate() yet.
Libraries like OpenCV, Tensorflow, Boost are welcomed. But I want to keep things lighter, so it would be better if I can implement by myself (if not too complicated). Thank you in advance.
expand_dims plus concatenate, could in a more iterative language be written as:
In [107]: x = np.arange(12).reshape(3,4)
In [109]: y = np.zeros((2,3,4,3),dtype=int)
In [110]: for i in range(2):
...: for j in range(3):
...: y[i,:,:,j] = x
...:
In [111]: y
Out[111]:
array([[[[ 0, 0, 0],
[ 1, 1, 1],
[ 2, 2, 2],
[ 3, 3, 3]],
[[ 4, 4, 4],
[ 5, 5, 5],
....
[10, 10, 10],
[11, 11, 11]]]])
In other words, all you need is the ability to create a target array of the right size, and the ability to copy the 2d arrays into appropriate slots.

Creating a non trivial view of numpy array

TL;DR:
I am looking for a way to get a non trivial, and in particular non contigous, view of a numpy ndarray.
E.g., given a 1D ndarray, x = np.array([1, 2, 3, 4]), is there a way to get a non trivial view of it, e.g. np.array([2, 4, 3, 1])?
Longer Version
The context of the question is the following: I have a 4D ndarray of shape (U, V, S, T) which I would like to reshape to a 2D ndarray of shape (U*S, V*T)in a non-trivial way, i.e. a simple np.reshape()does not do the trick as I have a more complex indexing scheme in mind, in which the reshaped array will not be contigous in memory. The arrays in my case are rather large and I would like to get a view and not a copy of the array.
Example
Given an array x(u, v, s, t)of shape (2, 2, 2, 2):
x = np.array([[[[1, 1], [1, 1]],[[2, 2], [2, 2]]],
[[[3, 3], [3, 3]], [[4, 4], [4, 4]]]])
I would like to get the view z(a, b) of the array:
np.array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
This corresponds to a indexing scheme of a = u * S + s and b = v * T + t, where in this case S = 2 = T.
What I have tried
Various approaches using np.reshape or even as_strided. Doing standard reshaping will not change the order of elements as they appear in the memory. I tried playing around with order='F' and transposing a bit but had no idea which gave me the correct result.
Since I know the indexing scheme, I tried to operate on the flattened view of the array using np.ravel(). My idea was to create an array of indices follwing the desired indexing scheme and apply it to the flattened array view, but unfortunately, fancy/advanced indexing gives a copy of the array, not a view.
Question
Is there any way to achieve the indexing view that I'm looking for?
In principle, I think this should be possible, as for example ndarray.sort() performs an in place non-trivial indexing of the array. On the other hand, this is probably implemented in C/C++, so it might even not be possible in pure Python?
Let's review the basics of an array - it has a flat data buffer, a shape, strides, and dtype. Those three attributes are used to view the elements of the data buffer in a particular way, whether it is a simple 1d sequence, 2d or higher dimensions.
A true view than use the same data buffer, but applies different shape, strides or dtype to it.
To get [2, 4, 3, 1] from [1,2,3,4] requires starting at 2, jumping forward 2, then skipping back to 1 and forward 2. That's not a regular pattern that can be represented by strides.
arr[1::2] gives the [2,4], and arr[0::2] gives the [1,3].
(U, V, S, T) to (U*S, V*T) requires a transpose to (U, S, V, T), followed by a reshape
arr.transpose(0,2,1,3).reshape(U*S, V*T)
That will require a copy, no way around that.
In [227]: arr = np.arange(2*3*4*5).reshape(2,3,4,5)
In [230]: arr1 = arr.transpose(0,2,1,3).reshape(2*4, 3*5)
In [231]: arr1.shape
Out[231]: (8, 15)
In [232]: arr1
Out[232]:
array([[ 0, 1, 2, 3, 4, 20, 21, 22, 23, 24, 40, 41, 42,
43, 44],
[ 5, 6, 7, 8, 9, 25, 26, 27, 28, 29, 45, 46, 47,
48, 49],
....)
Or with your x
In [234]: x1 = x.transpose(0,2,1,3).reshape(4,4)
In [235]: x1
Out[235]:
array([[1, 1, 2, 2],
[1, 1, 2, 2],
[3, 3, 4, 4],
[3, 3, 4, 4]])
Notice that the elements are in a different order:
In [254]: x.ravel()
Out[254]: array([1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4])
In [255]: x1.ravel()
Out[255]: array([1, 1, 2, 2, 1, 1, 2, 2, 3, 3, 4, 4, 3, 3, 4, 4])
ndarray.sort is in-place and changes the order of bytes in the data buffer. It is operating at a low level that we don't have access to. It isn't a view of the original array.

Categories