Is there a way to concatenate two numpy arrays with different dimensions? - python

I am working with a deep learning model that is trying to concatenate a label with dimensions (1,2) with a numpy array of (25,25). I'm not really sure if it is possible to get a dimension of (627,0), however, the model summary says that is the input shape it expects.
I've tried to concatenate them, but I get the error " all the input array dimensions except for the concatenation axis must match exactly" as expected.
x = np.concatenate((X[1], to_categorical(Y_train[1]))
Where X = (25,25) and Y_train is ( 1,0), making to_categorical(Y_train[1]) equal to (2,1).
Is there a way to get this (627, 0) dimension with these dimensions?

#Psidom has a great answer to this:
Let's say you have a 1-d and a 2-d array
You can use numpy.column_stack:
np.column_stack((array_1, array_2))
Which converts the 1-d array to 2-d implicitly, and thus equivalent to np.concatenate((array_1, array_2[:,None]), axis=1).
a = np.arange(6).reshape(2,3)
b = np.arange(2)
a
#array([[0, 1, 2],
# [3, 4, 5]])
b
#array([0, 1])
np.column_stack((a, b))
#array([[0, 1, 2, 0],
# [3, 4, 5, 1]])

Related

What is the difference between using [..., :<number>] and [:<number>] on a numpy array?

In practice, using both [..., :2] and [:2] on np.array([1,2,3]) results in np.array([1,2]). Are there also cases where the result differs when you use an ellipsis like this on an array?
np.arrays are designed to handle n-dimensional arrays, specified as [rows, columns]In the case of np.array([1, 2, 3]), [:2] and [:, :2] will yield the same result because our array input is 1-dimensional of shape [1, 3], e.g. with 1 row and 3 columns.
If we instead input np.array([[1,2,3], [4,5,6]]), e.g. a 2-dimensional array of shape [2, 3], this will change. On this array, if we, e.g., do [:1, :2] we will get array([[1, 2]]) because we are asking for everything up to the first (i.e. the 2nd since we count from zero) row and everything up to the second (i.e. the 3rd) column.
Hope this makes sense.

How to convert sparse to dense adjacency matrix?

I am trying to convert a sparse adjacency matrix/list that only contains the indices of the non-zero elements ([[rows], [columns]]) to a dense matrix that contains 1s at the indices and otherwise 0s. I found a solution using to_dense_adj from Pytorch geometric (Documentation). But this does not exactly what I want, since the shape of the dense matrix is not as expected. Here is an example:
sparse_adj = torch.tensor([[0, 1, 2, 1, 0], [0, 1, 2, 3, 4]])
So the dense matrix should be of size 5x3 (the second array "stores" the columns; with non-zero elements at (0,0), (1,1), (2,2),(1,3) and (0,4)) because the elements in the first array are lower or equal than 2.
However,
dense_adj = to_dense(sparse_adj)[0]
outputs a dense matrix, but of shape (5,5). Is it possible to define the output shape or is there a different solution to get what I want?
Edit: I have a solution to convert it back to the sparse representation now that works
dense_adj = torch.sparse.FloatTensor(sparse_adj, torch.ones(5), torch.Size([3,5])).to_dense()
ind = dense_adj.nonzero(as_tuple=False).t().contiguous()
sparse_adj = torch.stack((ind[1], ind[0]), dim=0)
Or is there any alternative way that is better?
You can acheive this by first constructing a sparse matrix with torch.sparse then converting it to a dense matrix. For this you will need to provide torch.sparse.FloatTensor a 2D tensor of indices, a tensor of values as well as a output size:
sparse_adj = torch.tensor([[0, 1, 2, 1, 0], [0, 1, 2, 3, 4]])
torch.sparse.FloatTensor(sparse_adj, torch.ones(5), torch.Size([3,5])).to_dense()
You can get the size of the output matrix dynamically with
sparse_adj.max(axis=1).values + 1
So it becomes:
torch.sparse.FloatTensor(
sparse_adj,
torch.ones(sparse_adj.shape[1]),
(sparse_adj.max(axis=1).values + 1).tolist())

Getting an error with Numpy where condition

I am trying create a new column with using np.where condition of other columns in the database.
My code
df5['RiskSubType']=np.where(new_df['Snow_Risk']==1,(( ' Heavy Snow forecasted at ' +df5.LOCATION.mask(new_df.LOCATION=='',df5.LOCATION_CITY))),
np.where(df5['Wind_Risk']==1,( ' Heavy Wind forecasted at ' +df5.LOCATION.mask(df5.LOCATION=='',df5.LOCATION_CITY)),
np.where(df5['Precip_Risk']==1,( ' Heavy Rain forecasted at ' +df5.LOCATION.mask(df5.LOCATION=='',df5.LOCATION_CITY)),"No Risk Identified")))
Error
ValueError: operands could not be broadcast together with shapes
How to fix this or this alternative way do this.
So first of all, your design/code style is really hard to read, you should think about simplifying it.
Your problems occurs due to the fact, that you are trying to smash strings and arrays in the np.where function. The documentation says:
numpy.where(condition[, x, y])
Return elements chosen from x or y depending on condition.
Parameters:
condition : array_like, bool
Where True, yield x, otherwise yield y.
x, y : array_like
Values from which to choose. x, y and condition need to be broadcastable to some shape.
Returns:
out : ndarray
An array with elements from x where condition is True, and elements from y elsewhere.
As you can see x and y need to be broadcastable to some shape. Looking at the documentation of broadcastable:
6.4. Broadcasting
Another powerful feature of Numpy is broadcasting. Broadcasting takes
place when you perform operations between arrays of different shapes.
For instance
>>> a = np.array([
[0, 1],
[2, 3],
[4, 5],
])
>>> b = np.array([10, 100])
>>> a * b
array([[ 0, 100],
[ 20, 300],
[ 40, 500]])
The shapes of a and b don’t match. In order to proceed, Numpy will
stretch b into a second dimension, as if it were stacked three times
upon itself. The operation then takes place element-wise.
One of the rules of broadcasting is that only dimensions of size 1 can
be stretched (if an array only has one dimension, all other dimensions
are considered for broadcasting purposes to have size 1). In the
example above b is 1D, and has shape (2,). For broadcasting with a,
which has two dimensions, Numpy adds another dimension of size 1 to b.
b now has shape (1, 2). This new dimension can now be stretched three
times so that b’s shape matches a’s shape of (3, 2).
The other rule is that dimensions are compared from the last to the
first. Any dimensions that do not match must be stretched to become
equally sized. However, according to the previous rule, only
dimensions of size 1 can stretch. This means that some shapes cannot
broadcast and Numpy will give you an error:
>>> c = np.array([
[0, 1, 2],
[3, 4, 5],
])
>>> b = np.array([10, 100])
>>> c * b
ValueError: operands could not be broadcast together with shapes (2,3) (2,)
What happens here is that Numpy, again, adds a dimension to b, making
it of shape (1, 2). The sizes of the last dimensions of b and c (2 and
3, respectively) are then compared and found to differ. Since none of
these dimensions is of size 1 (therefore, unstretchable) Numpy gives
up and produces an error.
The solution to multiplying c and b above is to specifically tell
Numpy that it must add that extra dimension as the second dimension of
b. This is done by using None to index that second dimension. The
shape of b then becomes (2, 1), which is compatible for broadcasting
with c:
>>> c = np.array([
[0, 1, 2],
[3, 4, 5],
])
>>> b = np.array([10, 100])
>>> c * b[:, None]
array([[ 0, 10, 20],
[300, 400, 500]])
A good visual description of these rules, together with some advanced
broadcasting applications can be found in this tutorial of Numpy broadcasting rules.
So the problem is, that you are trying to broadcast an (n,)(first where) to a scalar(first string) to a (m,)(second where) to a scalar(second string) to a (k,)(third where) and so on. Since n != m != k can and will be the case and the dimensions for stretching do not match the broadcasting does not work.
Please provide something like this:
d = {'LOCATION': ['?', '?'],
'LOCATION_CITY': ['?', '?'],
'Wind_Risk': [1, 0],
'Precip_Risk': [1, 0],
'Snow_Risk': [1, 0]}
df = pd.DataFrame(data=d)

Numpy horizontal concat with failure

I want to concatenate two numpy arrays with the shape (100,3) and (100,7) to get a (100,10) matrix.
I've tried it using hstack, concatenate but only receives a ValueError: all the int arrays must have same number of dimensions
In a dummy example like the following it works ...
x=np.arange(30).reshape(10,3)
y=np.arange(20).reshape(10,2)
np.concatenate((x,y), axis=1)
UPDATE 1:
I've created the first two metrics's with sklearn's preprocessing module (RobustScaler and OneHotEncoder).
UPDATE 2:
When using scipy.sparse.hstack it works, but why
The sparse hstack joins the coo attributes and builds a new coo sparse matrix from those. The numpy hstack knows nothing about the different sparse structure. To explain this further I'd have to explain sparse construction, and quote from the respective functions.
If you want to concatenate it vertically axis must beequal to 0. This is explained in the doc for concatenate.
In this link we have this example:
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6]])
np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])
This works perfectly fine for me:
import numpy as np
x=np.arange(100 * 3).reshape(100,3)
y=np.arange(100 * 7).reshape(100,7)
np.hstack((x,y)).shape # (100, 10)

Numpy: get 1D array as 2D array without reshape

I have need for hstacking multple arrays with with the same number of rows (although the number of rows is variable between uses) but different number of columns. However some of the arrays only have one column, eg.
array = np.array([1,2,3,4,5])
which gives
#array.shape = (5,)
but I'd like to have the shape recognized as a 2d array, eg.
#array.shape = (5,1)
So that hstack can actually combine them.
My current solution is:
array = np.atleast_2d([1,2,3,4,5]).T
#array.shape = (5,1)
So I was wondering, is there a better way to do this? Would
array = np.array([1,2,3,4,5]).reshape(len([1,2,3,4,5]), 1)
be better?
Note that my use of [1,2,3,4,5] is just a toy list to make the example concrete. In practice it will be a much larger list passed into a function as an argument. Thanks!
Check the code of hstack and vstack. One, or both of those, pass the arguments through atleast_nd. That is a perfectly acceptable way of reshaping an array.
Some other ways:
arr = np.array([1,2,3,4,5]).reshape(-1,1) # saves the use of len()
arr = np.array([1,2,3,4,5])[:,None] # adds a new dim at end
np.array([1,2,3],ndmin=2).T # used by column_stack
hstack and vstack transform their inputs with:
arrs = [atleast_1d(_m) for _m in tup]
[atleast_2d(_m) for _m in tup]
test data:
a1=np.arange(2)
a2=np.arange(10).reshape(2,5)
a3=np.arange(8).reshape(2,4)
np.hstack([a1.reshape(-1,1),a2,a3])
np.hstack([a1[:,None],a2,a3])
np.column_stack([a1,a2,a3])
result:
array([[0, 0, 1, 2, 3, 4, 0, 1, 2, 3],
[1, 5, 6, 7, 8, 9, 4, 5, 6, 7]])
If you don't know ahead of time which arrays are 1d, then column_stack is easiest to use. The others require a little function that tests for dimensionality before applying the reshaping.
Numpy: use reshape or newaxis to add dimensions
If I understand your intent correctly, you wish to convert an array of shape (N,) to an array of shape (N,1) so that you can apply np.hstack:
In [147]: np.hstack([np.atleast_2d([1,2,3,4,5]).T, np.atleast_2d([1,2,3,4,5]).T])
Out[147]:
array([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
In that case, you could use avoid reshaping the arrays and use np.column_stack instead:
In [151]: np.column_stack([[1,2,3,4,5], [1,2,3,4,5]])
Out[151]:
array([[1, 1],
[2, 2],
[3, 3],
[4, 4],
[5, 5]])
I followed Ludo's work and just changed the size of v from 5 to 10000. I ran the code on my PC and the result shows that atleast_2d seems to be a more efficient method in the larger scale case.
import numpy as np
import timeit
v = np.arange(10000)
print('atleast2d:',timeit.timeit(lambda:np.atleast_2d(v).T))
print('reshape:',timeit.timeit(lambda:np.array(v).reshape(-1,1))) # saves the use of len()
print('v[:,None]:', timeit.timeit(lambda:np.array(v)[:,None])) # adds a new dim at end
print('np.array(v,ndmin=2).T:', timeit.timeit(lambda:np.array(v,ndmin=2).T)) # used by column_stack
The result is:
atleast2d: 1.3809496470021259
reshape: 27.099974197000847
v[:,None]: 28.58291715100131
np.array(v,ndmin=2).T: 30.141663907001202
My suggestion is that use [:None] when dealing with a short vector and np.atleast_2d when your vector goes longer.
Just to add info on hpaulj's answer. I was curious about how fast were the four methods described. The winner is the method adding a column at the end of the 1d array.
Here is what I ran:
import numpy as np
import timeit
v = [1,2,3,4,5]
print('atleast2d:',timeit.timeit(lambda:np.atleast_2d(v).T))
print('reshape:',timeit.timeit(lambda:np.array(v).reshape(-1,1))) # saves the use of len()
print('v[:,None]:', timeit.timeit(lambda:np.array(v)[:,None])) # adds a new dim at end
print('np.array(v,ndmin=2).T:', timeit.timeit(lambda:np.array(v,ndmin=2).T)) # used by column_stack
And the results:
atleast2d: 4.455070924214851
reshape: 2.0535152913971615
v[:,None]: 1.8387219828073285
np.array(v,ndmin=2).T: 3.1735243063353664

Categories