extracting a column as a vector from a matrix in python - python

I have a csv file , which I am converting it to a matrix using the following command:
reader = csv.reader(open("spambase_X.csv", "r"), delimiter=",")
x = list(reader)
result = numpy.array(x)
print(result.shape) #outputs (57,4601)
Now I want to extract the first column of the matrix result , which I am doing by the following:
col1=(result[:, 1])
**print(col1.shape) #outputs (57,)**
Why isnt it printing as (57,1). How can I do that?
TIA

yes it will return the array of shape (57,). If you want to be as (57,1) , you can do it so by reshape().
col1=(result[:, 1]).reshape(-1,1)

You can add []
result[:,[1]].shape
Out[284]: (2, 1)
Data input
result
Out[285]:
array([[1, 2, 3],
[1, 2, 3]])
More Information
result[:,[1]]
Out[286]:
array([[2],
[2]])
result[:,1]
Out[287]: array([2, 2])

col1 = result[:, 1] is 1D array, thus you see that the shape it (57, ).
You can convert it to a 2D array with a single column doing:
col1[:, np.newaxis] # shape: (57, 1)
If you want a 2D array with a single row you can do:
col1[np.newaxis, :] # shape: (1, 57)

Related

Dot product with sparse matrix and vector

Im having a very hard time trying to program a dot product with a matrix in sparse format and a vector.
My matrix have the shape 3 x 3 in the folowing format:
Ms=[[0, 0, 0.6153414193508929],[1, 1, 0.9884632853575251],[2, 1, 0.22943483758936845],[2, 2, 0.336180557968783]]
Where the first index represent the row number, the second is the column number and third is the data.
the vector "b" is:
b=Array([[0.32599637],[0.31726302],[0.67265016]])
My question is: how i format the FOR-loop to iterate the third index in Ms (Ie: Column 0) and add the multiplication of the columns with the consequent index in "b", and jump to the next row. (like the description of dot product)
Please, if you donĀ“t undestand ask me to clarify
Thanks in advance!
You can take advantage of the fact that if A is a matrix of shape (M, N), and b is a vector of shape (N, 1), then A.b equals a vector c of shape (M, 1).
A row x_c in c = sum((x_A, a row in A) * b).
def dot(sparse_mat, dense_vec, sparse_shape):
assert sparse_shape[1] == dense_vec.shape[0], "Columns of matrix must be equal to rows of vector."
output = np.zeros((sparse_shape[0], dense_vec.shape[1]))
for (row, col, val) in sparse_mat:
row, col = int(row), int(col)
output[row] += dense_vec[int(col)] * val
return output
Ms = [[0, 0, 0.6153414193508929],
[1, 1, 0.9884632853575251],
[2, 1, 0.22943483758936845],
[2, 2, 0.336180557968783]]
b = np.array([[0.32599637],
[0.31726302],
[0.67265016]])
print(dot(Ms, b, (3, 3)))
# [[0.20059907]
# [0.31360285]
# [0.2989231 ]]
We should verify the above with scipy's sparse matrices.
from scipy.sparse import csr_matrix
Ms = np.array(Ms)
sparse_M = csr_matrix((Ms[:, 2], (Ms[:, 0].astype(int), Ms[:, 1].astype(int))), (3, 3))
print(sparse_M)
# (0, 0) 0.6153414193508929
# (1, 1) 0.9884632853575251
# (2, 1) 0.22943483758936845
# (2, 2) 0.336180557968783
print(sparse_M # b)
# [[0.20059907]
# [0.31360285]
# [0.2989231 ]]

Dot product of a vector with each vector in another matrix

weight = np.array([[[ 0.38932115, -0.27430567]],
[[-0.04543304, -0.05643598]],
[[ 0.46912688, -0.07695298]]])
data = np.array([[-0.2056065, 0.7889058]])
like,
data = np.array([[1, 2, 3], [4, 5, 6]])
I want to take the dot product of the row in data with each row in weight, how could I accomplish this? I tried tensordot but it seems a bit convoluted / non-obvious the way axes works. Is there an easier way?
Your use of row and vector is a bit ambiguous:
In [7]: weight = np.array([[[ 0.38932115, -0.27430567]],
...:
...: [[-0.04543304, -0.05643598]],
...:
...: [[ 0.46912688, -0.07695298]]])
...:
...: data = np.array([[-0.2056065, 0.7889058]])
In [8]: weight.shape
Out[8]: (3, 1, 2)
In [9]: data.shape
Out[9]: (1, 2)
Are your rows of shape (2,) or (1,2)?
dot is a 'sum of products' function, but sum on which axis?
With einsum we can control the sum axis.
Sum on both the 1 and 2's:
In [11]: np.einsum('ijk,jk',weight, data)
Out[11]: array([-0.29644829, -0.03518134, -0.15716419]) # shape (3,)
or just the 1's:
In [12]: np.einsum('ijk,jm',weight, data)
Out[12]:
array([[[-0.08004696, 0.30713771],
[ 0.05639903, -0.21640133]],
[[ 0.00934133, -0.03584239],
[ 0.0116036 , -0.04452267]],
[[-0.09645554, 0.37009692],
[ 0.01582203, -0.06070865]]])
In [13]: _.shape
Out[13]: (3, 2, 2)
Or just the 2's:
In [14]: np.einsum('ijk,mk',weight, data)
Out[14]:
array([[[-0.29644829]],
[[-0.03518134]],
[[-0.15716419]]])
In [16]: _.shape
Out[16]: (3, 1, 1)
matmul/# also does this sum - data.T changes the (1,2) array to a (2,1). This pairs the (3,1,2) with a (2,1) to fit the "Last A with the second to the last of B" rule for dot/#.
In [17]: weight # data.T
Out[17]:
array([[[-0.29644829]],
[[-0.03518134]],
[[-0.15716419]]])
You ask about a multidimensional data. Just what do you mean by that? It already is 2d. Do you mean a (n,2) array, or a (n,1,2)? What's the relation between this n dimension and the 3 dimension of weight? No hand waving please :)
You can also use np.apply_along_axis
np.apply_along_axis(lambda x:np.dot(x,data.T),2,weight)
which gives
array([[[-0.29644829]],
[[-0.03518134]],
[[-0.15716419]]])
If data contains more than one row, this will also work, for example
weight = np.array([[[ 0.38932115, -0.27430567]],
[[-0.04543304, -0.05643598]],
[[ 0.46912688, -0.07695298]]])
data = np.array([[-0.2056065, 0.7889058],[-0.2056065, 0.7889058]])
np.apply_along_axis(lambda x:np.dot(x,data.T),2,weight)
gives you
array([[[-0.29644829, -0.29644829]],
[[-0.03518134, -0.03518134]],
[[-0.15716419, -0.15716419]]])
First transpose data before taking the dot product.
>>> weight.dot(data.T)
array([[[-0.29644829]],
[[-0.03518134]],
[[-0.15716419]]])
# Multiple rows of data.
data = np.array([[-0.2056065, 0.7889058],
[0.7889058, -.2056065]])
>>> weight.dot(data.T)
array([[[-0.29644829, 0.36353674]],
[[-0.03518134, -0.02423878]],
[[-0.15716419, 0.38591895]]])

Index of maximum values along and plane in a numpy 3D array

I have a 3D numpy array of shape (3,3,3). I would like to obtain indices of maximum values in a plane,"plane" according to me is as follows:
a = np.random.rand(3,3,3)
>>> a[:,:,0]
array([[0.98423332, 0.44410844, 0.06945133],
[0.69876575, 0.87411547, 0.53595041],
[0.53418486, 0.16186808, 0.60579623]])
>>> a[:,:,1]
array([[0.38969199, 0.80202126, 0.62189662],
[0.66609605, 0.09771614, 0.74061269],
[0.77081531, 0.20068743, 0.72762023]])
>>> a[:,:,2]
array([[0.57110332, 0.29021439, 0.15433043],
[0.21762439, 0.93112448, 0.05763075],
[0.77880124, 0.36637245, 0.29070822]])
I have a solution but I would like to have something shorter and quicker without for loops, my solution is as belows:
for i in range(3):
x=a[:,:,i].argmax()/3
y=a[:,:,i].argmax()%3
z=i
print(x,y,z)
print a[x][y][z]
(0, 0, 0)
0.9842333247061394
(0, 1, 1)
0.8020212566990867
(1, 1, 2)
0.9311244845473187
We simply need to reshape the input array to 2D by merging the last two axes and then applying argmax along the second one i.e. the merged one to give ourselves a vectorized approach -
def argmax_each_plane(a):
a2D = a.reshape(a.shape[0],-1)
idx = a2D.argmax(1)
indices = np.unravel_index(idx, a.shape[1:])
vals = a2D[np.arange(len(idx)), idx]
return vals, np.c_[indices]
Sample run -
In [60]: np.random.seed(0)
...: a = np.random.rand(3,3,3)
In [61]: a
Out[61]:
array([[[0.5488135 , 0.71518937, 0.60276338],
[0.54488318, 0.4236548 , 0.64589411],
[0.43758721, 0.891773 , 0.96366276]],
[[0.38344152, 0.79172504, 0.52889492],
[0.56804456, 0.92559664, 0.07103606],
[0.0871293 , 0.0202184 , 0.83261985]],
[[0.77815675, 0.87001215, 0.97861834],
[0.79915856, 0.46147936, 0.78052918],
[0.11827443, 0.63992102, 0.14335329]]])
In [62]: v, ind = argmax_each_plane(a)
In [63]: v
Out[63]: array([0.96366276, 0.92559664, 0.97861834])
In [64]: ind
Out[64]:
array([[2, 2],
[1, 1],
[0, 2]])
If you need z indices as well, use : np.c_[indices[0], indices[1], range(len(a2D))].

Dynamically indexing/choosing the dimension of numpy array

Just working on a CNN and am stuck on a tensor algorithm.
I want to be able to iterate through a list, or tuple, of dimensions and choose a range of elements of X (a multi dimensional array) from that dimension, while leaving the other dimensions alone.
x = np.random.random((10,3,32,32)) #some multi dimensional array
dims = [2,3] #aka the 32s
#for a dimension in dims
#I want the array of numbers from i:i+window in that dimension
#something like
arr1 = x.index(i:i+3,axis = dim[0])
#returns shape 10,3,3,32
arr2 = arr1.index(i:i+3,axis = dim[1])
#returns shape 10,3,3,3
np.take should work for you (read its docs)
In [237]: x=np.ones((10,3,32,32),int)
In [238]: dims=[2,3]
In [239]: arr1=x.take(range(1,1+3), axis=dims[0])
In [240]: arr1.shape
Out[240]: (10, 3, 3, 32)
In [241]: arr2=x.take(range(1,1+3), axis=dims[1])
In [242]: arr2.shape
Out[242]: (10, 3, 32, 3)
You can try slicing with
arr1 = x[:,:,i:i+3,:]
and
arr2 = arr1[:,:,:,i:i+3]
Shape is then
>>> x[:,:,i:i+3,:].shape
(10, 3, 3, 32)

Convert array of lists to array of tuples/triple

I have a 2D Numpy array with 3 columns. It looks something like this array([[0, 20, 1], [1,2,1], ........, [20,1,1]]). It basically is array of list of lists. How can I convert this matrix into array([(0,20,1), (1,2,1), ........., (20,1,1)])? I want the output to be a array of triple. I have been trying to use tuple and map functions described in Convert numpy array to tuple,
R = mydata #my data is sparse matrix of 1's and 0's
#First row
#R[0] = array([0,0,1,1]) #Just a sample
(rows, cols) = np.where(R)
vals = R[rows, cols]
QQ = zip(rows, cols, vals)
QT = tuple(map(tuple, np.array(QQ))) #type of QT is tuple
QTA = np.array(QT) #type is array
#QTA gives an array of lists
#QTA[0] = array([0, 2, 1])
#QTA[1] = array([0, 3, 1])
But the desired output is QTA should be array of tuples i.e QTA = array([(0,2,1), (0,3,1)]).
Your 2d array is not a list of lists, but it readily converts to that
a.tolist()
As Jimbo shows, you can convert this to a list of tuples with a comprehension (a map will also work). But when you try to wrap that in an array, you get the 2d array again. That's because np.array tries to create as large a dimensioned array as the data allows. And with sublists (or tuples) all of the same length, that's a 2d array.
To preserve tuples you have switch to a structured array. For example:
a = np.array([[0, 20, 1], [1,2,1]])
a1=np.empty((2,), dtype=object)
a1[:]=[tuple(i) for i in a]
a1
# array([(0, 20, 1), (1, 2, 1)], dtype=object)
Here I create an empty structured array with dtype object, the most general kind. Then I assign values, using a list of tuples, which is the proper data structure for this task.
An alternative dtype is
a1=np.empty((2,), dtype='int,int,int')
....
array([(0, 20, 1), (1, 2, 1)],
dtype=[('f0', '<i4'), ('f1', '<i4'), ('f2', '<i4')])
Or in one step: np.array([tuple(i) for i in a], dtype='int,int,int')
a1=np.empty((2,), dtype='(3,)int') produces the 2d array. dt=np.dtype([('f0', '<i4', 3)]) produces
array([([0, 20, 1],), ([1, 2, 1],)],
dtype=[('f0', '<i4', (3,))])
which nests 1d arrays in the tuples. So it looks like object or 3 fields is the closest we can get to an array of tuples.
not the great solution, but this will work:
# take from your sample
>>>a = np.array([[0, 20, 1], [1,2,1], [20,1,1]])
# construct an empty array with matching length
>>>b = np.empty((3,), dtype=tuple)
# manually put values into tuple and store in b
>>>for i,n in enumerate(a):
>>> b[i] = (n[0],n[1],n[2])
>>>b
array([(0, 20, 1), (1, 2, 1), (20, 1, 1)], dtype=object)
>>>type(b)
numpy.ndarray

Categories