I have a function that creates a 2-dim array, a Vandermonde matrix and is called as:
vandermonde(generator, rank)
Where generator is a n-sized array for example
generator = np.array([-1/2, 1/2, 3/2, 5/2, 7/2, 9/2])
and rank=4
Then I need to create 4 Vandermonde matrices (because rank=4) skewed by h in my space (that h is arbitrary here, lets call h=1).
Therefore I came with the following deterministic code:
V = np.array([
vandermonde(generator-0*h, rank),
vandermonde(generator-1*h, rank),
vandermonde(generator-2*h, rank),
vandermonde(generator-3*h, rank)
])
Then I want instead do multiple manual calls to vandermonde I used a for-loop as in:
V=[]
for i in range(rank):
V.append(vandermonde(generator - h*i, rank))
V = np.array(V)
This approach works fine, but seems too "patchy". I tried a np.append approach as below:
M = np.array([])
for i in range(rank):
M = np.append(M,[vandermonde(generator - h*i, rank)])
But didn't worked as I expected, seems np.append expand the array instead to create a new element.
My questions are:
How can I not use standard Python lists, use directly a np approach cause np.append seems not behave as I expect, instead it just grow that array instead add a new array element
Is there any more direct numpy approaches to this?
My vandermonde function is:
def vandermonde(generator, rank=None):
"""Returns a vandermonde matrix
If rank not passwd returns a square vandermonde matrix
"""
if rank is None:
rank = len(generator)
return np.tile(generator,(rank,1)) ** np.array(range(rank)).reshape((rank,1))
The expected answer is a 3 dimensional array with size (generator, rank, rank) where each element is one of the generator skewed vandermonde matrices. For the constants above(generator, rank, h) we have:
V= array([[[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ -0.5 , 0.5 , 1.5 , 2.5 , 3.5 , 4.5 ],
[ 0.25, 0.25, 2.25, 6.25, 12.25, 20.25],
[ -0.12, 0.12, 3.38, 15.62, 42.88, 91.12]],
[[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ -1.5 , -0.5 , 0.5 , 1.5 , 2.5 , 3.5 ],
[ 2.25, 0.25, 0.25, 2.25, 6.25, 12.25],
[ -3.38, -0.12, 0.12, 3.38, 15.62, 42.88]],
[[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ -2.5 , -1.5 , -0.5 , 0.5 , 1.5 , 2.5 ],
[ 6.25, 2.25, 0.25, 0.25, 2.25, 6.25],
[-15.62, -3.38, -0.12, 0.12, 3.38, 15.62]],
[[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ -3.5 , -2.5 , -1.5 , -0.5 , 0.5 , 1.5 ],
[ 12.25, 6.25, 2.25, 0.25, 0.25, 2.25],
[-42.88, -15.62, -3.38, -0.12, 0.12, 3.38]]])
Some related ideas can be found in this discussion on: efficient-way-to-compute-the-vandermonde-matrix
Use broadcasting to get the final 3D array in a vectorized manner -
r = np.arange(rank)
V_out = (generator - h*r[:,None,None]) ** r[:,None]
We can also use cumprod to achieve the exponential values for another solution -
gr = np.repeat(generator - h*r[:,None,None], rank, axis=1)
gr[:,0] = 1
out = gr.cumprod(1)
Related
i have vectors of this form :
test=np.linspace(0,1,10)
i want to stack them horizontally in order to make a matrix .
problem is that i define them in a loop so the first stack is between an empty matrix and the first column vector , which gives the following error:
ValueError: all the input arrays must have same number of dimensions
bottom line - i have a for loop that with every iteration creates a vector p1 and i want to add it to a final matrix of the form :
[p1 p2 p3 p4] which i could then do matrix operations on such as multiplying by the transposed etc
If you've got a list of 1D arrays that you want horizontally stacked, you could convert them all to column first, but it's probably easier to just vertically stack them and then transpose:
In [6]: vector_list = [np.linspace(0, 1, 10) for _ in range(3)]
In [7]: np.vstack(vector_list).T
Out[7]:
array([[0. , 0. , 0. ],
[0.11111111, 0.11111111, 0.11111111],
[0.22222222, 0.22222222, 0.22222222],
[0.33333333, 0.33333333, 0.33333333],
[0.44444444, 0.44444444, 0.44444444],
[0.55555556, 0.55555556, 0.55555556],
[0.66666667, 0.66666667, 0.66666667],
[0.77777778, 0.77777778, 0.77777778],
[0.88888889, 0.88888889, 0.88888889],
[1. , 1. , 1. ]])
How did you get this dimension error? What does empty array have to do with it?
A list of arrays of the same length:
In [610]: alist = [np.linspace(0,1,6), np.linspace(10,11,6)]
In [611]: alist
Out[611]:
[array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]),
array([10. , 10.2, 10.4, 10.6, 10.8, 11. ])]
Several ways of making an array from them:
In [612]: np.array(alist)
Out[612]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
In [614]: np.stack(alist)
Out[614]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
If you want to join them in columns, you can transpose one of the above, or use:
In [615]: np.stack(alist, axis=1)
Out[615]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
np.column_stack is also handy.
In newer numpy versions you can do:
In [617]: np.linspace((0,10),(1,11),6)
Out[617]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
You don't specify how you create the 'empty array' and how you attempt to stack. I can't exactly recreate the error message (full traceback would have helped). But given that message did you check the number of dimensions of the inputs? Did they match?
Array stacking in a loop is tricky. You have to pay close attention to the shapes, especially of the initial 'empty' array. There isn't a close analog to the empty list []. np.array([]) is 1d with shape (1,). np.empty((0,6)) is 2d with shape (0,6). Also all the stacking functions create a new array with each call (non operate in-place), so they are inefficient (compared to list append).
I have written the following code to scale an image to 50%. However, it took this algorithm 65 seconds to shrink a 3264x2448 image. Can someone who understands numpy explain why this algorithm is so inefficient and suggest more efficient changes?
def shrinkX2(im):
X, Y = im.shape[1] / 2, im.shape[0] / 2
new = np.zeros((Y, X, 3))
for y in range(Y):
for x in range(X):
new[y, x] = im[2*y:2*y + 2, 2*x:2*x + 2].reshape(4, 3).mean(axis=0)
return new
Going by the text of the question, it seems you are shrinking the image by 50% and by the code it seems, you are doing it in blocks. We can reshape to split each of the two axes of the 2D input by lengths as the required block sizes to get a 4D array and then compute mean along the axes corresponding to the block sizes, like so -
def block_mean(im, BSZ):
m,n = im.shape[:2]
return im.reshape(m//BSZ[0],BSZ[0],n//BSZ[1],BSZ[1],-1).mean((1,3))
Sample run -
In [44]: np.random.seed(0)
...: im = np.random.randint(0,9,(6,8,3))
In [45]: im[:2,:2,:].mean((0,1)) # average of first block across all 3 channels
Out[45]: array([3.25, 3.75, 3.5 ])
In [46]: block_mean(im, BSZ=(2,2))
Out[46]:
array([[[3.25, 3.75, 3.5 ],
[4. , 4.5 , 3.75],
[5.75, 2.75, 5. ],
[3. , 3.5 , 3.25]],
[[4. , 5.5 , 5.25],
[6.25, 1.75, 2. ],
[4.25, 2.75, 1.75],
[2. , 4.75, 3.75]],
[[3.25, 3.5 , 5.25],
[4.25, 1.5 , 5.25],
[3.5 , 3.5 , 4.25],
[0.75, 5. , 5.5 ]]])
I was reading and came across this formula:
The formula is for cosine similarity. I thought this looked interesting and I created a numpy array that has user_id as row and item_id as column. For instance, let M be this matrix:
M = [[2,3,4,1,0],[0,0,0,0,5],[5,4,3,0,0],[1,1,1,1,1]]
Here the entries inside the matrix are ratings the people u has given to item i based on row u and column i. I want to calculate this cosine similarity for this matrix between items (rows). This should yield a 5 x 5 matrix I believe. I tried to do
df = pd.DataFrame(M)
item_mean_subtracted = df.sub(df.mean(axis=0), axis=1)
similarity_matrix = item_mean_subtracted.fillna(0).corr(method="pearson").values
However, this does not seem right.
Here's a possible implementation of the adjusted cosine similarity:
import numpy as np
from scipy.spatial.distance import pdist, squareform
M = np.asarray([[2, 3, 4, 1, 0],
[0, 0, 0, 0, 5],
[5, 4, 3, 0, 0],
[1, 1, 1, 1, 1]])
M_u = M.mean(axis=1)
item_mean_subtracted = M - M_u[:, None]
similarity_matrix = 1 - squareform(pdist(item_mean_subtracted.T, 'cosine'))
Remarks:
I'm taking advantage of NumPy broadcasting to subtract the mean.
If M is a sparse matrix, you could do something like ths: M.toarray().
From the docs:
Y = pdist(X, 'cosine')
Computes the cosine distance between vectors u and v,
1 − u⋅v / (||u||2||v||2)
where ||∗||2 is the 2-norm of its argument *, and u⋅v is the dot product of u and v.
Array transposition is performed through the T method.
Demo:
In [277]: M_u
Out[277]: array([ 2. , 1. , 2.4, 1. ])
In [278]: item_mean_subtracted
Out[278]:
array([[ 0. , 1. , 2. , -1. , -2. ],
[-1. , -1. , -1. , -1. , 4. ],
[ 2.6, 1.6, 0.6, -2.4, -2.4],
[ 0. , 0. , 0. , 0. , 0. ]])
In [279]: np.set_printoptions(precision=2)
In [280]: similarity_matrix
Out[280]:
array([[ 1. , 0.87, 0.4 , -0.68, -0.72],
[ 0.87, 1. , 0.8 , -0.65, -0.91],
[ 0.4 , 0.8 , 1. , -0.38, -0.8 ],
[-0.68, -0.65, -0.38, 1. , 0.27],
[-0.72, -0.91, -0.8 , 0.27, 1. ]])
There is scipy.misc.imresize for resampling the first two dimensions of 3D arrays. It also supports bilinear interpolation. However, there does not seem to be an existing function for resizing all dimensions of arrays with any number of dimensions. How can I resample any array given a new shape of the same rank, using multi-linear interpolation?
You want scipy.ndimage.zoom, which can be used as follows:
>>> x = np.arange(8, dtype=np.float_).reshape(2, 2, 2)
>>> scipy.ndimage.zoom(x, 1.5, order=1)
array([[[ 0. , 0.5, 1. ],
[ 1. , 1.5, 2. ],
[ 2. , 2.5, 3. ]],
[[ 2. , 2.5, 3. ],
[ 3. , 3.5, 4. ],
[ 4. , 4.5, 5. ]],
[[ 4. , 4.5, 5. ],
[ 5. , 5.5, 6. ],
[ 6. , 6.5, 7. ]]])
Note that this function always preserves the boundaries of the image, essentially resampling a mesh with a node at each pixel center. You might want to look at other functions in scipy.ndimage if you need more control over exactly where the resampling occurs
I generate a matrix that I want to get the covariance of:
test=np.array([4,2,.6,4.2,2.1,.59,3.9,2,.58,4.3,2.1,.62,4.1,2.2,.63]).reshape(5,3)
test
array([[ 4. , 2. , 0.6 ],
[ 4.2 , 2.1 , 0.59],
[ 3.9 , 2. , 0.58],
[ 4.3 , 2.1 , 0.62],
[ 4.1 , 2.2 , 0.63]])
I calculate the covariance with the numpy function:
np.cov(test)
array([[ 2.92 , 3.098 , 2.846 , 3.164 , 2.966 ],
[ 3.098 , 3.28703333, 3.0199 , 3.3566 , 3.1479 ],
[ 2.846 , 3.0199 , 2.7748 , 3.0832 , 2.8933 ],
[ 3.164 , 3.3566 , 3.0832 , 3.4288 , 3.2122 ],
[ 2.966 , 3.1479 , 2.8933 , 3.2122 , 3.0193 ]])
This however is different than following the covariance formula:
mean=np.mean(test,0)
np.dot(test-mean,(test-mean).T)/(5-1)
array([[ 0.004104, -0.002886, 0.006624, -0.005416, -0.002426],
[-0.002886, 0.002649, -0.005316, 0.005044, 0.000509],
[ 0.006624, -0.005316, 0.011744, -0.010496, -0.002556],
[-0.005416, 0.005044, -0.010496, 0.010164, 0.000704],
[-0.002426, 0.000509, -0.002556, 0.000704, 0.003769]])
This does not match the numpy calculations.
In fact, I take a peek at the source code and the equation is (x-m) * (x-m).T.conj() / (N - 1) which I believe I am implementing.
The difference comes from the fact that the np.cov calculates the covariance between row vectors, which is why the result is 5*5 instead of 3*3, but np.mean calculates the average of column vectors and when you do test - mean the calculation is also broadcasted along column which differs from what np.cov is doing, the fix would be a two-step:
Firstly, make sure the mean is calculated for each row, which can be done by simply transposing the test matrix:
mean = np.mean(test.T, 0)
And then when calculate x - x_bar, reshape the mean vector so that the minus is along the rows as well, and also since the vector under test is row vector the dimension is going to be 3 instead of 5. After these fixing, it will give consistent results as np.cov does:
np.dot(test-mean[:, None],(test-mean[:, None]).T)/(3-1)
# array([[ 2.92 , 3.098 , 2.846 , 3.164 , 2.966 ],
# [ 3.098 , 3.28703333, 3.0199 , 3.3566 , 3.1479 ],
# [ 2.846 , 3.0199 , 2.7748 , 3.0832 , 2.8933 ],
# [ 3.164 , 3.3566 , 3.0832 , 3.4288 , 3.2122 ],
# [ 2.966 , 3.1479 , 2.8933 , 3.2122 , 3.0193 ]])