Union of two matrices in Python - python

Is it posible to make de union between two matrices in python? I mean to have in one matrix all the elements from other two matrices without reapiting any of them.
For example, if we have:
A = [[1,2],[3,4],[5,6]]
B = [[5,6],[7,8]]
The union would be C = [[1,2],[3,4],[5,6],[7,8]]
There is a numpy command for arrays: np.union1d but I cannot find any for matrices. I just have found np.concatenate and np.vstack but they write twice the repeated elements.

If I correctly understood your question, you can achieve using np.unique on the concated result of A and B, as below
import numpy as np
A = np.array([[1,2],[3,4],[5,6]])
B = np.array([[5,6],[7,8]])
np.unique(np.concatenate([A, B]), axis=0)
outputs
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
or a bit more concise stacking would be np.unique(np.r_[A,B], axis=0)

Related

Numpy horizontal concat with failure

I want to concatenate two numpy arrays with the shape (100,3) and (100,7) to get a (100,10) matrix.
I've tried it using hstack, concatenate but only receives a ValueError: all the int arrays must have same number of dimensions
In a dummy example like the following it works ...
x=np.arange(30).reshape(10,3)
y=np.arange(20).reshape(10,2)
np.concatenate((x,y), axis=1)
UPDATE 1:
I've created the first two metrics's with sklearn's preprocessing module (RobustScaler and OneHotEncoder).
UPDATE 2:
When using scipy.sparse.hstack it works, but why
The sparse hstack joins the coo attributes and builds a new coo sparse matrix from those. The numpy hstack knows nothing about the different sparse structure. To explain this further I'd have to explain sparse construction, and quote from the respective functions.
If you want to concatenate it vertically axis must beequal to 0. This is explained in the doc for concatenate.
In this link we have this example:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])
This works perfectly fine for me:
import numpy as np
x=np.arange(100 * 3).reshape(100,3)
y=np.arange(100 * 7).reshape(100,7)
np.hstack((x,y)).shape # (100, 10)

How to multiply two vector and get a matrix?

In numpy operation, I have two vectors, let's say vector A is 4X1, vector B is 1X5, if I do AXB, it should result a matrix of size 4X5.
But I tried lot of times, doing many kinds of reshape and transpose, they all either raise error saying not aligned or return a single value.
How should I get the output product of matrix I want?
Normal matrix multiplication works as long as the vectors have the right shape. Remember that * in Numpy is elementwise multiplication, and matrix multiplication is available with numpy.dot() (or with the # operator, in Python 3.5)
>>> numpy.dot(numpy.array([[1], [2]]), numpy.array([[3, 4]]))
array([[3, 4],
[6, 8]])
This is called an "outer product." You can get it using plain vectors using numpy.outer():
>>> numpy.outer(numpy.array([1, 2]), numpy.array([3, 4]))
array([[3, 4],
[6, 8]])
If you are using numpy.
First, make sure you have two vectors. For example, vec1.shape = (10, ) and vec2.shape = (26, ); in numpy, row vector and column vector are the same thing.
Second, you do res_matrix = vec1.reshape(10, 1) # vec2.reshape(1, 26) ;.
Finally, you should have: res_matrix.shape = (10, 26).
numpy documentation says it will deprecate np.matrix(), so better not use it.
Function matmul (since numpy 1.10.1) works fine:
import numpy as np
a = np.array([[1],[2],[3],[4]])
b = np.array([[1,1,1,1,1],])
ab = np.matmul(a, b)
print (ab)
print(ab.shape)
You have to declare your vectors right. The first has to be a list of lists of one number (this vector has to have columns in one row), and the second - a list of list (this vector has to have rows in one column) like in above example.
Output:
[[1 1 1 1 1]
[2 2 2 2 2]
[3 3 3 3 3]
[4 4 4 4 4]]
(4, 5)

Numpy: get 1D array as 2D array without reshape

I have need for hstacking multple arrays with with the same number of rows (although the number of rows is variable between uses) but different number of columns. However some of the arrays only have one column, eg.
array = np.array([1,2,3,4,5])
which gives
#array.shape = (5,)
but I'd like to have the shape recognized as a 2d array, eg.
#array.shape = (5,1)
So that hstack can actually combine them.
My current solution is:
array = np.atleast_2d([1,2,3,4,5]).T
#array.shape = (5,1)
So I was wondering, is there a better way to do this? Would
array = np.array([1,2,3,4,5]).reshape(len([1,2,3,4,5]), 1)
be better?
Note that my use of [1,2,3,4,5] is just a toy list to make the example concrete. In practice it will be a much larger list passed into a function as an argument. Thanks!
Check the code of hstack and vstack. One, or both of those, pass the arguments through atleast_nd. That is a perfectly acceptable way of reshaping an array.
Some other ways:
arr = np.array([1,2,3,4,5]).reshape(-1,1) # saves the use of len()
arr = np.array([1,2,3,4,5])[:,None] # adds a new dim at end
np.array([1,2,3],ndmin=2).T # used by column_stack
hstack and vstack transform their inputs with:
arrs = [atleast_1d(_m) for _m in tup]
[atleast_2d(_m) for _m in tup]
test data:
a1=np.arange(2)
a2=np.arange(10).reshape(2,5)
a3=np.arange(8).reshape(2,4)
np.hstack([a1.reshape(-1,1),a2,a3])
np.hstack([a1[:,None],a2,a3])
np.column_stack([a1,a2,a3])
result:
array([[0, 0, 1, 2, 3, 4, 0, 1, 2, 3],
[1, 5, 6, 7, 8, 9, 4, 5, 6, 7]])
If you don't know ahead of time which arrays are 1d, then column_stack is easiest to use. The others require a little function that tests for dimensionality before applying the reshaping.
Numpy: use reshape or newaxis to add dimensions
If I understand your intent correctly, you wish to convert an array of shape (N,) to an array of shape (N,1) so that you can apply np.hstack:
In [147]: np.hstack([np.atleast_2d([1,2,3,4,5]).T, np.atleast_2d([1,2,3,4,5]).T])
Out[147]:
array([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
In that case, you could use avoid reshaping the arrays and use np.column_stack instead:
In [151]: np.column_stack([[1,2,3,4,5], [1,2,3,4,5]])
Out[151]:
array([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
I followed Ludo's work and just changed the size of v from 5 to 10000. I ran the code on my PC and the result shows that atleast_2d seems to be a more efficient method in the larger scale case.
import numpy as np
import timeit
v = np.arange(10000)
print('atleast2d:',timeit.timeit(lambda:np.atleast_2d(v).T))
print('reshape:',timeit.timeit(lambda:np.array(v).reshape(-1,1))) # saves the use of len()
print('v[:,None]:', timeit.timeit(lambda:np.array(v)[:,None])) # adds a new dim at end
print('np.array(v,ndmin=2).T:', timeit.timeit(lambda:np.array(v,ndmin=2).T)) # used by column_stack
The result is:
atleast2d: 1.3809496470021259
reshape: 27.099974197000847
v[:,None]: 28.58291715100131
np.array(v,ndmin=2).T: 30.141663907001202
My suggestion is that use [:None] when dealing with a short vector and np.atleast_2d when your vector goes longer.
Just to add info on hpaulj's answer. I was curious about how fast were the four methods described. The winner is the method adding a column at the end of the 1d array.
Here is what I ran:
import numpy as np
import timeit
v = [1,2,3,4,5]
print('atleast2d:',timeit.timeit(lambda:np.atleast_2d(v).T))
print('reshape:',timeit.timeit(lambda:np.array(v).reshape(-1,1))) # saves the use of len()
print('v[:,None]:', timeit.timeit(lambda:np.array(v)[:,None])) # adds a new dim at end
print('np.array(v,ndmin=2).T:', timeit.timeit(lambda:np.array(v,ndmin=2).T)) # used by column_stack
And the results:
atleast2d: 4.455070924214851
reshape: 2.0535152913971615
v[:,None]: 1.8387219828073285
np.array(v,ndmin=2).T: 3.1735243063353664

Create matrix with 2 arrays in numpy

I want to find a command in numpy for a column vector times a row vector equals to a matrix
[1,1,1,1 ] ^T * [ 2,3 ] = [[2,3],[2,3],[2,3],[2,3]]
First, let's define your 1-D numpy arrays:
In [5]: one = np.array([ 1,1,1,1 ]); two = np.array([ 2,3 ])
Now, lets multiply them:
In [6]: one[:, np.newaxis] * two[np.newaxis, :]
Out[6]:
array([[2, 3],
[2, 3],
[2, 3],
[2, 3]])
This used numpy's newaxis to add the appropriate axes to get a 4x2 output matrix.
The problem you are encountering is that both of your vectors are neither column nor row vectors - they're just vectors. If you look at len(vec.shape) it's 1.
What you can do is use numpy.reshape to turn your column vector into shape (m, 1) and your row vector into shape (1, n).
import numpy as np
colu = np.reshape(u, (u.shape[0], 1))
rowv = np.reshape(v, (1, v.shape[0]))
Now when you multiply colu and rowv you'll get a matrix with shape (m, n).
If you need a matrix - use matrices. This way you can use your expression nearly verbatim:
np.matrix([1,1,1,1]).T * np.matrix([2,3])
You might want to use numpy.kron(a,b) it takes the Kronecker product of two arrays. You can see the b vector as a block. The function puts this block, multiplied by the corresponding coefficient of the a vector, on the position of that coefficient. You can also use it for matrices.
For your example it would look like:
import numpy as np
vecA = np.array([[1],[1],[1],[1]])
vecB = np.array([2,3])
Out = np.kron(vecA,vecB)
this returns
>>> Out
array([[2, 3],
[2, 3],
[2, 3],
[2, 3]])
Hope this helps you.

numpy vstack vs. column_stack

What exactly is the difference between numpy vstack and column_stack. Reading through the documentation, it looks as if column_stack is an implementation of vstack for 1D arrays. Is it a more efficient implementation? Otherwise, I cannot find a reason for just having vstack.
I think the following code illustrates the difference nicely:
>>> np.vstack(([1,2,3],[4,5,6]))
array([[1, 2, 3],
[4, 5, 6]])
>>> np.column_stack(([1,2,3],[4,5,6]))
array([[1, 4],
[2, 5],
[3, 6]])
>>> np.hstack(([1,2,3],[4,5,6]))
array([1, 2, 3, 4, 5, 6])
I've included hstack for comparison as well. Notice how column_stack stacks along the second dimension whereas vstack stacks along the first dimension. The equivalent to column_stack is the following hstack command:
>>> np.hstack(([[1],[2],[3]],[[4],[5],[6]]))
array([[1, 4],
[2, 5],
[3, 6]])
I hope we can agree that column_stack is more convenient.
hstack stacks horizontally, vstack stacks vertically:
The problem with hstack is that when you append a column you need convert it from 1d-array to a 2d-column first, because 1d array is normally interpreted as a vector-row in 2d context in numpy:
a = np.ones(2) # 2d, shape = (2, 2)
b = np.array([0, 0]) # 1d, shape = (2,)
hstack((a, b)) -> dimensions mismatch error
So either hstack((a, b[:, None])) or column_stack((a, b)):
where None serves as a shortcut for np.newaxis.
If you're stacking two vectors, you've got three options:
As for the (undocumented) row_stack, it is just a synonym of vstack, as 1d array is ready to serve as a matrix row without extra work.
The case of 3D and above proved to be too huge to fit in the answer, so I've included it in the article called Numpy Illustrated.
In the Notes section to column_stack, it points out this:
This function is equivalent to np.vstack(tup).T.
There are many functions in numpy that are convenient wrappers of other functions. For example, the Notes section of vstack says:
Equivalent to np.concatenate(tup, axis=0) if tup contains arrays that are at least 2-dimensional.
It looks like column_stack is just a convenience function for vstack.

Categories