I have a matrix A with 500 rows and 1024 columns. I would like to generate a matrix consisting of evenly spaced columns from A, say with step size 2^5. How do I do this in Numpy? I haven't seen this explained in the references I have.
You can just use slicing:
import numpy as np
arr = np.random.rand(512,1024)
step_size = 2 ** 5
arr[:, ::step_size] # shape is (512, 32)
So what it does is keeping all the rows, while taking all the columns with the desired step size. You can read about numpy indexing in the following link:
https://numpy.org/doc/stable/user/basics.indexing.html?highlight=indexing#other-indexing-options
You can apply the same logic to the rows or to both rows and columns to get a more sophisticated slicing.
Related
I am wondering how to use np.reshape to reshape a long vector into n columns array without giving the row numbers.
Normally I can find out the row number by len(a)//n:
a = np.arange(0, 10)
n = 2
b = a.reshape(len(a)//n,n)
If there a more direct way without using len(a)//n?
You can use -1 on one dimension, numpy will figure out what this number should be:
a = np.arange(0, 10)
n = 2
b = a.reshape(-1, n)
The doc is pretty clear about this feature: https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html
One shape dimension can be -1. In this case, the value is inferred
from the length of the array and remaining dimensions.
Let's say I have a square matrix with 20 lines and 20 columns. Using NumPy, what should I do to transform this matrix into a 1D array with a single line and 400 columns (that is, 20.20 = 400, all in one line)?
So far, I've tried:
1) array = np.ravel(matrix)
2) array = np.squeeze(np.asarray(matrix))
But when I print array, it's still a square matrix.
Use the reshape method:
array = matrix.reshape((1,400)).
This works for both Numpy Array and Matrix types.
UPDATE: As sacul noted, matrix.reshape(-1) is more general in terms of dimensions.
I have two numpy arrays, first array is of size 100*4*200, and second array is of size 150*6*200. In fact, I am storing the 100 samples of 200 dimensional vector representations of 4 fields in array 1 and 140 samples of 200 dimensional vectors of 6 fields in array 2.
Now I want to compute the similarity vector between the samples and create a similarity matrix. For each sample, I would like to calculate the similarity between the each combination of fields and store it such that I get a 15000*24 dimensional array.
First 150 rows will be the similarity vector between 1st row of array 1 and 150 rows of array 2, next 150 rows will be the similarity vector between the 2nd row of array 1 and 150 rows of array 2 etc.
Each similarity vector is # fields in array 1 * # fields in array 2 i.e. 1st element of the similarity vector is cosine similarity between field 1 of array 1 and field 1 of array 2, 2nd element will be the similarity between field 1 of array 1 and field 2 of array 2 and so on with last element is the similarity between last field of array 1 and last field of array 2.
What is the best way to do this using numpy arrays ?
So every "row" (i assume the first axis, that I'll call axis 0) is the sample axis. That means you have 100 samples from one vector, each with fieldsxdimentions 4x200.
Doing this the way you describe, then the first row of the first array would have (4,200) and the second one would then have (150,6,200). Then you'd want to do a cos distance between an (m,n), and (m,n,k) array, which does not make sense (the closest you have to a dot product here would be the tensor product, which I'm fairly sure is not what you want).
So we have to extract these first and then iterate over all the others.
To do this I actually recomend just splitting the array with np.split and iterate over both of them. This is just because I've never come across a faster way in numpy. You could use tensorflow to gain efficiency, but I'm not going into that here in my answer.
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
a = np.random.rand(100, 4, 200)
b = np.random.rand(150, 6, 200)
# We know the output will be 150*100 x 6*4
c = np.empty([15000, 24])
# Make an array with the rows of a and same for b
a_splitted=np.split(a, a.shape[0], 0)
b_splitted=np.split(b, b.shape[0], 0)
i=0
for alpha in a_splitted:
for beta in b_splitted:
# Gives a 4x6 matrix
sim=cosine_similarity(alpha[0],beta[0])
c[i,:]=sim.ravel()
i+=1
For the similarity-function above I just chose what #StefanFalk sugested: sklearn.metrics.pairwise.cosine_similarity. If this similarity measure is not sufficient, then you could either write your own.
I am not at all claiming that this is the best way to do this in all of python. I think the most efficient way is to do this symbolically using, as mentioned, tensorflow.
Anyways, hope it helps!
How can I take an inner product of 2 column vectors in python's numpy
Below code does not work
import numpy as np
x = np.array([[1], [2]])
np.inner(x, x)
It returned
array([[1, 2],
[2, 4]])`
instead of 5
The inner product of a vector with dimensions 2x1 (2 rows, 1 column) with another vector of dimension 2x1 (2 rows, 1 column) is a matrix with dimensions 2x2 (2 rows, 2 columns). When you take the inner product of any tensor the inner most dimensions must match (which is 1 in this case) and the result is a tensor with the dimensions matching the outter, i.e.; a 2x1 * 1x2 = 2x2.
What you want to do is transpose both such that when you multiply the dimensions are 1x2 * 2x1 = 1x1.
More generally, multiplying anything with dimensions NxM by something with dimensionsMxK, yields something with dimensions NxK. Note the inner dimensions must both be M. For more, review your matrix multiplication rules
The np.inner function will automatically transpose the second argument, thus when you pass in two 2x1, you get a 2x2, but if you pass in two 1x2 you will get a 1x1.
Try this:
import numpy as np
x = np.array([[1], [2]])
np.inner(np.transpose(x), np.transpose(x))
or simply define your x as row vectors initially.
import numpy as np
x = np.array([1,2])
np.inner(x, x)
i think you mean to have:
x= np.array([1,2])
in order to get 5 as output, your vector needs to be 1xN not Nx1 if you want to apply np.inner on it
Try the following it will work
np.dot(np.transpose(a),a))
make sure col_vector has shape (N,1) where N is the number of elements
then simply sum one to one multiplication result
np.sum(col_vector*col_vector)
I have a rather large matrix (500000 * 24) as an ndarray and I want to multiply its cell with the corresponding column min. I have already done this with for loops but I keep reading that this is not the NumPy way of doing things.
Is there a proper way of doing such an operation (I might also want to substract a constant later)?
Thanks in advance
Yes you can simply multiply your array with the minimum vector directly, an example is shown below.
import numpy as np
data = np.random.random((500000, 24))
# This returns an array of size 500,000 that is the row of 24 values
minimum = data.min(axis=1)
data = data * minimum
If you wish to create a minimum array of size 24 (where the minimum of the 500,000 values is taken) then you would choose axis=0.
This set of slides discusses how such operations can work.
Would normal multiply not do?
import numpy
a = numpy.random.random((4,2))
b = a * numpy.min(a,axis=0)