Numpy concatenate 2D arrays with 1D array - python

I am trying to concatenate 4 arrays, one 1D array of shape (78427,) and 3 2D array of shape (78427, 375/81/103). Basically this are 4 arrays with features for 78427 images, in which the 1D array only has 1 value for each image.
I tried concatenating the arrays as follows:
>>> print X_Cscores.shape
(78427, 375)
>>> print X_Mscores.shape
(78427, 81)
>>> print X_Tscores.shape
(78427, 103)
>>> print X_Yscores.shape
(78427,)
>>> np.concatenate((X_Cscores, X_Mscores, X_Tscores, X_Yscores), axis=1)
This results in the following error:
Traceback (most recent call last):
File "", line 1, in
ValueError: all the input arrays must have same number of dimensions
The problem seems to be the 1D array, but I can't really see why (it also has 78427 values). I tried to transpose the 1D array before concatenating it, but that also didn't work.
Any help on what's the right method to concatenate these arrays would be appreciated!

Try concatenating X_Yscores[:, None] (or X_Yscores[:, np.newaxis] as imaluengo suggests). This creates a 2D array out of a 1D array.
Example:
A = np.array([1, 2, 3])
print A.shape
print A[:, None].shape
Output:
(3,)
(3,1)

I am not sure if you want something like:
a = np.array( [ [1,2],[3,4] ] )
b = np.array( [ 5,6 ] )
c = a.ravel()
con = np.concatenate( (c,b ) )
array([1, 2, 3, 4, 5, 6])
OR
np.column_stack( (a,b) )
array([[1, 2, 5],
[3, 4, 6]])
np.row_stack( (a,b) )
array([[1, 2],
[3, 4],
[5, 6]])

You can try this one-liner:
concat = numpy.hstack([a.reshape(dim,-1) for a in [Cscores, Mscores, Tscores, Yscores]])
The "secret" here is to reshape using the known, common dimension in one axis, and -1 for the other, and it automatically matches the size (creating a new axis if needed).

Related

Generating an array of arrays in Python

I want to multiply each element of B to the whole array A to obtain P. The current and desired outputs are attached. The desired output is basically an array consisting of 2 arrays since there are two elements in B.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P=B*A
print(P)
It currently produces an error:
ValueError: operands could not be broadcast together with shapes (2,) (3,3)
The desired output is
array(([[0.02109, 0.04218, 0.06327],
[0.08436, 0.10545, 0.12654],
[0.14763, 0.16872, 0.18981]]),
([[0.00775858, 0.01551716, 0.02327574],
[0.03103432, 0.0387929 , 0.04655148],
[0.05431006, 0.06206864, 0.06982722]]))
You can do this by:
B.reshape(-1, 1, 1) * A
or
B[:, None, None] * A
where -1 or : refer to B.shape[0] which was 2 and 1, 1 or None, None add two additional dimensions to B to get the desired result shape which was (2, 3, 3).
The easiest way i can think of is using list comprehension and then casting back to numpy.ndarray
np.asarray([A*i for i in B])
Answer :
array([[[0.02109 , 0.04218 , 0.06327 ],
[0.08436 , 0.10545 , 0.12654 ],
[0.14763 , 0.16872 , 0.18981 ]],
[[0.00775858, 0.01551715, 0.02327573],
[0.03103431, 0.03879289, 0.04655146],
[0.05431004, 0.06206862, 0.0698272 ]]])
There are many possible ways for this:
Here is an overview on their runtime for the given array (bare in mind these will change for bigger arrays):
reshape: 0.000174 sec
tensordot: 0.000550 sec
einsum: 0.000196 sec
manual loop: 0.000326 sec
See the implementation for each of these:
numpy reshape
Find documentation here:
Link
Gives a new shape to an array without changing its data.
Here we reshape the array B so we can later multiply it:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = B.reshape(-1, 1, 1) * A
print(P)
numpy tensordot
Find documentation here:
Link
Given two tensors, a and b, and an array_like object containing two
array_like objects, (a_axes, b_axes), sum the products of a’s and b’s
elements (components) over the axes specified by a_axes and b_axes.
The third argument can be a single non-negative integer_like scalar,
N; if it is such, then the last N dimensions of a and the first N
dimensions of b are summed over.
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.tensordot(B, A, 0)
print(P)
numpy einsum (Einstein summation)
Find documentation here:
Link
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
P = np.einsum('ij,k', A, B)
print(P)
Note: A has two dimensions, we assign ij for their indexes. B has one dimension, we assign k to its index
manual loop
Another simple approach would be a loop (is faster than tensordot for the given input). This approach could be made "numpy free" if you dont want to use numpy for some reason. Here is the version with numpy:
import numpy as np
A=np.array([[1, 2, 3],
[4, 5, 6],
[7 , 8, 9]])
t = np.linspace(0,1,2)
B = 0.02109*np.exp(-t)
products = []
for b in B:
products.append(b*A)
P = np.array(products)
print(P)
#or the same as one-liner: np.asarray([A * elem for elem in B])

Concatenate two arrays into a new array

I have two arrays,
A = np.array([[1,2,3],[4,5,6]])
b = np.array([100,101])
I want to concatenate them so that b is added a column on the right-hand side so we have a new array A | b that would be something like:
1 2 3 100
4 5 6 101
I am trying with concatenate this way:
new = np.concatenate((A, b), axis=1)
But I get the next error:
ValueError: all the input arrays must have the same number of dimensions, but the array at index 0 has 2 dimension(s), and the array at index 1 has 1 dimension(s)
How can I concatenate these two arrays?
You can use column_stack:
>>> np.column_stack((A, b))
array([[ 1, 2, 3, 100],
[ 4, 5, 6, 101]])
which takes care of b not being 2D.
To make concatenate work, we manually make b of shape (2, 1):
>>> np.concatenate((A, b[:, np.newaxis]), axis=1)
array([[ 1, 2, 3, 100],
[ 4, 5, 6, 101]])
You could also Transpose A do a vertical stack and then transpose it back np.vstack((A.T,b)).T

How to calculate x*x.T in python

I want to calculate the following:
but I have no idea how to do this in python, I do not want to implement this manually but use a predefined function for this, something from numpy for example.
But numpy seems to ignore that x.T should be transposed.
Code:
import numpy as np
x = np.array([1, 5])
print(np.dot(x, x.T)) # = 26, This is not the matrix it should be!
While your vectors are defined as 1-d arrays, you can use np.outer:
np.outer(x, x.T)
> array([[ 1, 5],
> [ 5, 25]])
Alternatively, you could also define your vectors as matrices and use normal matrix multiplication:
x = np.array([[1], [5]])
x # x.T
> array([[ 1, 5],
> [ 5, 25]])
You can do:
x = np.array([[1], [5]])
print(np.dot(x, x.T))
Your original x is of shape (2,), while you need a shape of (2,1). Another way is reshaping your x:
x = np.array([1, 5]).reshape(-1,1)
print(np.dot(x, x.T))
.reshape(-1,1) reshapes your array to have 1 column and implicitely takes care of number of rows.
output:
[[ 1 5]
[ 5 25]]
np.matmul(x[:, np.newaxis], [x])

advanced 3d indexind in theano

My question is very similar to
Indexing tensor with index matrix in theano?
except that I have 3 dimensions. At first I want to got it working in numpy. With 2 dimensions there is no problem:
>>> idx = np.random.randint(3, size=(4, 2, 3))
>>> d = np.random.rand(4*2*3).reshape((4, 2, 3))
>>> d[1]
array([[ 0.37057415, 0.73066383, 0.76399376],
[ 0.12155831, 0.12552545, 0.87648523]])
>>> idx[1]
array([[2, 0, 1],
[2, 2, 2]])
>>> d[1][np.arange(d.shape[1])[:, np.newaxis], idx[1]]
array([[ 0.76399376, 0.37057415, 0.73066383],
[ 0.87648523, 0.87648523, 0.87648523]]) #All correct
But I have no idea how to make it works with all 3 dimensions. Example of failed try:
>>> d[np.arange(d.shape[0])[:, np.newaxis], np.arange(d.shape[1]), idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (4,1) (2,) (4,2,3)
Does this work?
d[
np.arange(d.shape[0])[:, np.newaxis, np.newaxis],
np.arange(d.shape[1])[:, np.newaxis],
idx
]
You need the index arrays to collectively have broadcastable dimensions

Convert a numpy array to an array of numpy arrays

How can I convert numpy array a to numpy array b in a (num)pythonic way. Solution should ideally work for arbitrary dimensions and array lengths.
import numpy as np
a=np.arange(12).reshape(2,3,2)
b=np.empty((2,3),dtype=object)
b[0,0]=np.array([0,1])
b[0,1]=np.array([2,3])
b[0,2]=np.array([4,5])
b[1,0]=np.array([6,7])
b[1,1]=np.array([8,9])
b[1,2]=np.array([10,11])
For a start:
In [638]: a=np.arange(12).reshape(2,3,2)
In [639]: b=np.empty((2,3),dtype=object)
In [640]: for index in np.ndindex(b.shape):
b[index]=a[index]
.....:
In [641]: b
Out[641]:
array([[array([0, 1]), array([2, 3]), array([4, 5])],
[array([6, 7]), array([8, 9]), array([10, 11])]], dtype=object)
It's not ideal since it uses iteration. But I wonder whether it is even possible to access the elements of b in any other way. By using dtype=object you break the basic vectorization that numpy is known for. b is essentially a list with numpy multiarray shape overlay. dtype=object puts an impenetrable wall around those size 2 arrays.
For example, a[:,:,0] gives me all the even numbers, in a (2,3) array. I can't get those numbers from b with just indexing. I have to use iteration:
[b[index][0] for index in np.ndindex(b.shape)]
# [0, 2, 4, 6, 8, 10]
np.array tries to make the highest dimension array that it can, given the regularity of the data. To fool it into making an array of objects, we have to give an irregular list of lists or objects. For example we could:
mylist = list(a.reshape(-1,2)) # list of arrays
mylist.append([]) # make the list irregular
b = np.array(mylist) # array of objects
b = b[:-1].reshape(2,3) # cleanup
The last solution suggests that my first one can be cleaned up a bit:
b = np.empty((6,),dtype=object)
b[:] = list(a.reshape(-1,2))
b = b.reshape(2,3)
I suspect that under the covers, the list() call does an iteration like
[x for x in a.reshape(-1,2)]
So time wise it might not be much different from the ndindex time.
One thing that I wasn't expecting about b is that I can do math on it, with nearly the same generality as on a:
b-10
b += 10
b *= 2
An alternative to an object dtype would be a structured dtype, e.g.
In [785]: b1=np.zeros((2,3),dtype=[('f0',int,(2,))])
In [786]: b1['f0'][:]=a
In [787]: b1
Out[787]:
array([[([0, 1],), ([2, 3],), ([4, 5],)],
[([6, 7],), ([8, 9],), ([10, 11],)]],
dtype=[('f0', '<i4', (2,))])
In [788]: b1['f0']
Out[788]:
array([[[ 0, 1],
[ 2, 3],
[ 4, 5]],
[[ 6, 7],
[ 8, 9],
[10, 11]]])
In [789]: b1[1,1]['f0']
Out[789]: array([8, 9])
And b and b1 can be added: b+b1 (producing an object dtype). Curiouser and curiouser!
Based on hpaulj I provide a litte more generic solution. a is an array of dimension N which shall be converted to an array b of dimension N1 with dtype object holding arrays of dimension (N-N1).
In the example N equals 5 and N1 equals 3.
import numpy as np
N=5
N1=3
#create array a with dimension N
a=np.random.random(np.random.randint(2,20,size=N))
a_shape=a.shape
b_shape=a_shape[:N1] # shape of array b
b_arr_shape=a_shape[N1:] # shape of arrays in b
#Solution 1 with list() method (faster)
b=np.empty(np.prod(b_shape),dtype=object) #init b
b[:]=list(a.reshape((-1,)+b_arr_shape))
b=b.reshape(b_shape)
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b
#Solution 2 with ndindex loop (slower)
b=np.empty(b_shape,dtype=object)
for index in np.ndindex(b_shape):
b[index]=a[index]
print "Dimension of b: {}".format(len(b.shape)) # dim of b
print "Dimension of array in b: {}".format(len(b[0,0,0].shape)) # dim of arrays in b

Categories