I want to concatenate two numpy arrays with the shape (100,3) and (100,7) to get a (100,10) matrix.
I've tried it using hstack, concatenate but only receives a ValueError: all the int arrays must have same number of dimensions
In a dummy example like the following it works ...
x=np.arange(30).reshape(10,3)
y=np.arange(20).reshape(10,2)
np.concatenate((x,y), axis=1)
UPDATE 1:
I've created the first two metrics's with sklearn's preprocessing module (RobustScaler and OneHotEncoder).
UPDATE 2:
When using scipy.sparse.hstack it works, but why
The sparse hstack joins the coo attributes and builds a new coo sparse matrix from those. The numpy hstack knows nothing about the different sparse structure. To explain this further I'd have to explain sparse construction, and quote from the respective functions.
If you want to concatenate it vertically axis must beequal to 0. This is explained in the doc for concatenate.
In this link we have this example:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])
This works perfectly fine for me:
import numpy as np
x=np.arange(100 * 3).reshape(100,3)
y=np.arange(100 * 7).reshape(100,7)
np.hstack((x,y)).shape # (100, 10)
Related
Is it posible to make de union between two matrices in python? I mean to have in one matrix all the elements from other two matrices without reapiting any of them.
For example, if we have:
A = [[1,2],[3,4],[5,6]]
B = [[5,6],[7,8]]
The union would be C = [[1,2],[3,4],[5,6],[7,8]]
There is a numpy command for arrays: np.union1d but I cannot find any for matrices. I just have found np.concatenate and np.vstack but they write twice the repeated elements.
If I correctly understood your question, you can achieve using np.unique on the concated result of A and B, as below
import numpy as np
A = np.array([[1,2],[3,4],[5,6]])
B = np.array([[5,6],[7,8]])
np.unique(np.concatenate([A, B]), axis=0)
outputs
array([[1, 2],
[3, 4],
[5, 6],
[7, 8]])
or a bit more concise stacking would be np.unique(np.r_[A,B], axis=0)
Given an array X of shape (100,8192), I want to copy the subarrays of length 8192 for each of the 100 outer dimensions 10 times, so that the resulting array has shape (100,8192,10).
I'm kind of confused about how the tile function works, I can sort of only copy a 1d array (although probably not really elegantly), e.g. if I'm given a 1d array of shape (8192,), I can create a 2d array by copying the 1d array like this: np.tile(x,(10,1)).transpose(), but once I try to do this on a 2d array, I have no idea what the tile function is actually doing when you provide a tuple of values, the documentation is kind of unclear about that.
Can anybody tell me how to do this please?
EDIT: Example, given the 2d array:
In [229]: x
Out[229]:
array([[1, 2, 3],
[4, 5, 6]])
I want to get by copying along the columns 3 times in this case, the following array:
In [233]: y
Out[233]:
array([[[1, 1, 1],
[2, 2, 2],
[3, 3, 3]],
[[4, 4, 4],
[5, 5, 5],
[6, 6, 6]]])
One way to do this is using np.repeat, e.g.:
Let X be the array of shape (100,8192), to replicate the subarray of dimension 8192 10-times across the column dimension, do the following:
X_new = np.repeat(X,10).reshape(100,8192,10)
Are you really asking for a shape (100,8192,10)? By reading you, I would have rather thought of something like (100,10,8192)? Could you provide an example? If you're actually asking for (100,10,8192), maybe you want:
np.tile(x,10).reshape((100,10,8192))
Is it what you're asking for?
I have need for hstacking multple arrays with with the same number of rows (although the number of rows is variable between uses) but different number of columns. However some of the arrays only have one column, eg.
array = np.array([1,2,3,4,5])
which gives
#array.shape = (5,)
but I'd like to have the shape recognized as a 2d array, eg.
#array.shape = (5,1)
So that hstack can actually combine them.
My current solution is:
array = np.atleast_2d([1,2,3,4,5]).T
#array.shape = (5,1)
So I was wondering, is there a better way to do this? Would
array = np.array([1,2,3,4,5]).reshape(len([1,2,3,4,5]), 1)
be better?
Note that my use of [1,2,3,4,5] is just a toy list to make the example concrete. In practice it will be a much larger list passed into a function as an argument. Thanks!
Check the code of hstack and vstack. One, or both of those, pass the arguments through atleast_nd. That is a perfectly acceptable way of reshaping an array.
Some other ways:
arr = np.array([1,2,3,4,5]).reshape(-1,1) # saves the use of len()
arr = np.array([1,2,3,4,5])[:,None] # adds a new dim at end
np.array([1,2,3],ndmin=2).T # used by column_stack
hstack and vstack transform their inputs with:
arrs = [atleast_1d(_m) for _m in tup]
[atleast_2d(_m) for _m in tup]
test data:
a1=np.arange(2)
a2=np.arange(10).reshape(2,5)
a3=np.arange(8).reshape(2,4)
np.hstack([a1.reshape(-1,1),a2,a3])
np.hstack([a1[:,None],a2,a3])
np.column_stack([a1,a2,a3])
result:
array([[0, 0, 1, 2, 3, 4, 0, 1, 2, 3],
[1, 5, 6, 7, 8, 9, 4, 5, 6, 7]])
If you don't know ahead of time which arrays are 1d, then column_stack is easiest to use. The others require a little function that tests for dimensionality before applying the reshaping.
Numpy: use reshape or newaxis to add dimensions
If I understand your intent correctly, you wish to convert an array of shape (N,) to an array of shape (N,1) so that you can apply np.hstack:
In [147]: np.hstack([np.atleast_2d([1,2,3,4,5]).T, np.atleast_2d([1,2,3,4,5]).T])
Out[147]:
array([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
In that case, you could use avoid reshaping the arrays and use np.column_stack instead:
In [151]: np.column_stack([[1,2,3,4,5], [1,2,3,4,5]])
Out[151]:
array([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
I followed Ludo's work and just changed the size of v from 5 to 10000. I ran the code on my PC and the result shows that atleast_2d seems to be a more efficient method in the larger scale case.
import numpy as np
import timeit
v = np.arange(10000)
print('atleast2d:',timeit.timeit(lambda:np.atleast_2d(v).T))
print('reshape:',timeit.timeit(lambda:np.array(v).reshape(-1,1))) # saves the use of len()
print('v[:,None]:', timeit.timeit(lambda:np.array(v)[:,None])) # adds a new dim at end
print('np.array(v,ndmin=2).T:', timeit.timeit(lambda:np.array(v,ndmin=2).T)) # used by column_stack
The result is:
atleast2d: 1.3809496470021259
reshape: 27.099974197000847
v[:,None]: 28.58291715100131
np.array(v,ndmin=2).T: 30.141663907001202
My suggestion is that use [:None] when dealing with a short vector and np.atleast_2d when your vector goes longer.
Just to add info on hpaulj's answer. I was curious about how fast were the four methods described. The winner is the method adding a column at the end of the 1d array.
Here is what I ran:
import numpy as np
import timeit
v = [1,2,3,4,5]
print('atleast2d:',timeit.timeit(lambda:np.atleast_2d(v).T))
print('reshape:',timeit.timeit(lambda:np.array(v).reshape(-1,1))) # saves the use of len()
print('v[:,None]:', timeit.timeit(lambda:np.array(v)[:,None])) # adds a new dim at end
print('np.array(v,ndmin=2).T:', timeit.timeit(lambda:np.array(v,ndmin=2).T)) # used by column_stack
And the results:
atleast2d: 4.455070924214851
reshape: 2.0535152913971615
v[:,None]: 1.8387219828073285
np.array(v,ndmin=2).T: 3.1735243063353664
I want to find a command in numpy for a column vector times a row vector equals to a matrix
[1,1,1,1 ] ^T * [ 2,3 ] = [[2,3],[2,3],[2,3],[2,3]]
First, let's define your 1-D numpy arrays:
In [5]: one = np.array([ 1,1,1,1 ]); two = np.array([ 2,3 ])
Now, lets multiply them:
In [6]: one[:, np.newaxis] * two[np.newaxis, :]
Out[6]:
array([[2, 3],
[2, 3],
[2, 3],
[2, 3]])
This used numpy's newaxis to add the appropriate axes to get a 4x2 output matrix.
The problem you are encountering is that both of your vectors are neither column nor row vectors - they're just vectors. If you look at len(vec.shape) it's 1.
What you can do is use numpy.reshape to turn your column vector into shape (m, 1) and your row vector into shape (1, n).
import numpy as np
colu = np.reshape(u, (u.shape[0], 1))
rowv = np.reshape(v, (1, v.shape[0]))
Now when you multiply colu and rowv you'll get a matrix with shape (m, n).
If you need a matrix - use matrices. This way you can use your expression nearly verbatim:
np.matrix([1,1,1,1]).T * np.matrix([2,3])
You might want to use numpy.kron(a,b) it takes the Kronecker product of two arrays. You can see the b vector as a block. The function puts this block, multiplied by the corresponding coefficient of the a vector, on the position of that coefficient. You can also use it for matrices.
For your example it would look like:
import numpy as np
vecA = np.array([[1],[1],[1],[1]])
vecB = np.array([2,3])
Out = np.kron(vecA,vecB)
this returns
>>> Out
array([[2, 3],
[2, 3],
[2, 3],
[2, 3]])
Hope this helps you.
What exactly is the difference between numpy vstack and column_stack. Reading through the documentation, it looks as if column_stack is an implementation of vstack for 1D arrays. Is it a more efficient implementation? Otherwise, I cannot find a reason for just having vstack.
I think the following code illustrates the difference nicely:
>>> np.vstack(([1,2,3],[4,5,6]))
array([[1, 2, 3],
[4, 5, 6]])
>>> np.column_stack(([1,2,3],[4,5,6]))
array([[1, 4],
[2, 5],
[3, 6]])
>>> np.hstack(([1,2,3],[4,5,6]))
array([1, 2, 3, 4, 5, 6])
I've included hstack for comparison as well. Notice how column_stack stacks along the second dimension whereas vstack stacks along the first dimension. The equivalent to column_stack is the following hstack command:
>>> np.hstack(([[1],[2],[3]],[[4],[5],[6]]))
array([[1, 4],
[2, 5],
[3, 6]])
I hope we can agree that column_stack is more convenient.
hstack stacks horizontally, vstack stacks vertically:
The problem with hstack is that when you append a column you need convert it from 1d-array to a 2d-column first, because 1d array is normally interpreted as a vector-row in 2d context in numpy:
a = np.ones(2) # 2d, shape = (2, 2)
b = np.array([0, 0]) # 1d, shape = (2,)
hstack((a, b)) -> dimensions mismatch error
So either hstack((a, b[:, None])) or column_stack((a, b)):
where None serves as a shortcut for np.newaxis.
If you're stacking two vectors, you've got three options:
As for the (undocumented) row_stack, it is just a synonym of vstack, as 1d array is ready to serve as a matrix row without extra work.
The case of 3D and above proved to be too huge to fit in the answer, so I've included it in the article called Numpy Illustrated.
In the Notes section to column_stack, it points out this:
This function is equivalent to np.vstack(tup).T.
There are many functions in numpy that are convenient wrappers of other functions. For example, the Notes section of vstack says:
Equivalent to np.concatenate(tup, axis=0) if tup contains arrays that are at least 2-dimensional.
It looks like column_stack is just a convenience function for vstack.