Optimizing numpy least squares by omitting loop and cast

Optimizing numpy least squares by omitting loop and cast - python

Currently I am trying to calculate least squares with 2 numpy arrays (X, Y) with n arrays with some same number of values in each. My output that I want is 2 numpy arrays that contain the slope and intercept respectively. Right now I have the following inefficient code:
M = []
C = []
for i in range(len(X)):
x = X[i]
y = Y[i]
A = np.vstack([x, np.ones(len(x))]).T
m, c = np.linalg.lstsq(A, y, rcond=None)[0]
M.append(m)
C.append(c)
return np.array(M), np.array(C)
Since this code relies on a couple conversions and a for loop I think there has to be a more efficient way to solve this problem, but I can't think of one that doesn't rely on the for loop and casts. I need to keep the order of M and C maintained as well.

Actually it should not be that bad, numpy is highly optimized for this kind on stuff (stacking matrices, casting...).
You cannot avoid the for-loop since np.linalg.lstsq does not support so-called "batch-processing", but it is possible to avoid memory allocation. I don't know if it will really improve anything:
M, C = [], []
for i, x, y in enumerate(zip(X, Y)):
if not i:
A = np.stack((x.T, np.ones(len(x))), axis=-1)
else:
A[:, :-1] = x.T
m, c = np.linalg.lstsq(A, y, rcond=None)[0]
M.append(m)
C.append(c)
return np.array(M), np.array(C)

After thinking twice about it, there is a way to do it by "emulating" batch processing, by reformulating your problem as a block diagonal sparse problem:
import scipy.linalg
A = scipy.linalg.block_diag(*[
np.stack((x.T, np.ones(len(x))), axis=-1) for x in X])
Y = np.concatenate(Y)
M, C = np.linalg.lstsq(A, y, rcond=None)[0]
Then, doing the inverse of block_diag for M you should be able to recover what you want. C should be fine already.
I haven't tried it myself but I think it would lead to a speedup thanks to efficient matrix algebra. You could probably even further improve efficiency by using spicy space lsqr solver in this case. Good luck with that !

Related

How to fill np array with values of function?

Imagine I have some function, for example
def f(x, y):
return x * y
And I want to fill some matrix with its values. The easiest way to do is, for example
N = 10
X = np.arange(N)
Y = np.arange(N)
matrix = np.zeros((N, N))
for i, x in enumerate(X):
for j, y in enumerate(Y):
matrix[i][j] = f(x,y)
How can I do it in pythonic way? For example using np.vectorize?

Using something like np.fromfunction or np.vectorize will give you little, if any, advantage over a normal for loop. In numpy, you can take advantage of the fact that vectorized operations use loops implemented in C. The problem is that there is no general solution to vectorize your function. For the example you give, it's possible though:
x = np.arange(N)
y = np.arange(N)
matrix = x[:, None] * y
For more complex operations that can not be reduced to numpy function calls, you may want to consider using cython or numba.

Increase performance of np.where() loop

I am trying to speed up the code for the following script (ideally >4x) without multiprocessing. In a future step, I will implement multiprocessing, but the current speed is too slow even if I split it up to 40 cores. Therefore I'm trying to optimize to code first.
import numpy as np
def loop(x,y,q,z):
matchlist = []
for ind in range(len(x)):
matchlist.append(find_match(x[ind],y[ind],q,z))
return matchlist
def find_match(x,y,q,z):
A = np.where(q == x)
B = np.where(z == y)
return np.intersect1d(A,B)
# N will finally scale up to 10^9
N = 1000
M = 300
X = np.random.randint(M, size=N)
Y = np.random.randint(M, size=N)
# Q and Z size is fixed at 120000
Q = np.random.randint(M, size=120000)
Z = np.random.randint(M, size=120000)
# convert int32 arrays to str64 arrays, to represent original data (which are strings and not numbers)
X = np.char.mod('%d', X)
Y = np.char.mod('%d', Y)
Q = np.char.mod('%d', Q)
Z = np.char.mod('%d', Z)
matchlist = loop(X,Y,Q,Z)
I have two arrays (X and Y) which are identical in length. Each row of these arrays corresponds to one DNA sequencing read (basically strings of the letters 'A','C','G','T'; details not relevant for the example code here).
I also have two 'reference arrays' (Q and Z) which are identical in length and I want to find the occurrence (with np.where()) of every element of X within Q, as well as of every element of Y within Z (basically the find_match() function). Afterwards I want to know whether there is an overlap/intersect between the indexes found for X and Y.
Example output (matchlist; some rows of X/Y have matching indexes in Q/Y, some don't e.g. index 11):
The code works fine so far, but it would take quite long to execute with my final dataset where N=10^9 (in this code example N=1000 to make the tests quicker). 1000 rows of X/Y need about 2.29s to execute on my laptop:
Every find_match() takes about 2.48 ms to execute which is roughly 1/1000 of the final loop.
One first approach would be to combine (x with y) as well as (q with z) and then I only need to run np.where() once, but I couldn't get that to work yet.
I've tried to loop and lookup within Pandas (.loc()) but this was about 4x slower than np.where().
The question is closely related to a recent question from philshem (Combine several NumPy "where" statements to one to improve performance), however, the solutions provided on this question do not work for my approach here.

Numpy isn't too helpful here, since what you need is a lookup into a jagged array, with strings as the indexes.
lookup = {}
for i, (q, z) in enumerate(zip(Q, Z)):
lookup.setdefault((q, z), []).append(i)
matchlist = [lookup.get((x, y), []) for x, y in zip(X, Y)]
If you don't need the output as a jagged array, but are OK with just a boolean denoting presence, and can preprocess each string to a number, there is a much faster method.
lookup = np.zeros((300, 300), dtype=bool)
lookup[Q, Z] = True
matchlist = lookup[X, Y]
You typically won't want to use this method to replace the former jagged case, as dense variants (eg. Daniel F's solution) will be memory inefficient and numpy does not support sparse arrays well. However, if more speed is needed then a sparse solution is certainly possible.

You only have 300*300 = 90000 unique answers. Pre-compute.
Q_ = np.arange(300)[:, None] == Q
Z_ = np.arange(300)[:, None] == Z
lookup = np.logical_and(Q_[:, None, :], Z_)
lookup.shape
Out[]: (300, 300, 120000)
Then the result is just:
out = lookup[X, Y]
If you really want the indices you can do:
i = np.where(out)
out2 = np.split(i[1], np.flatnonzero(np.diff(i[0]))+1)
You'll parallelize by chunking with this method, since a boolean array of shape(120000, 1000000000) will throw a MemoryError.

Solving A*x=x with numpy

I am new to numpy and I want to solve the equation A * x = x, where A is a matrix and x is a vector.
Searching for the vector x, if it exists.
I found the np.linalg.solve()-function , but didn't got it to work as intended.

The issue here is not so much a problem with numpy as with your understanding of the linear algebra involved. The question you are asking can be rephrased as:
A # x = x
A # x = I # x
(A - I) # x = 0
This is a specific formulation of the the general eigenvector problem, with the stipulation that the eigenvalue is 1.
Numpy solves this problem with the function np.linalg.eig:
w, v = np.linalg.eig(A)
You need to check if any of the values are 1 (there could be more than one):
mask = np.isclose(w, 1)
if mask.any():
for vec in v[:, mask].T:
print(vec)
else:
print('Nope!')
The elements of v are unit vectors. Keep in mind that any scalar multiple of such a vector is also a solution.
For issues with non-invertible matrices, you may want to use scipy.linalg.svd instead:
v, w, _ = svd(A)
The rest of the procedure will be the same.

Is there a more efficient (i.e. faster) way to compute correlated random walks?

Is there any way to vectorize or otherwise speed this up? I'm already jitting it using numba, but it is still a major bottleneck. Using numba to jit my functions on 1-d numpy arrays leads to code that is orders of magnitude faster, but there is essentially a negligible improvement when using numba on the 2-d arrays below. decomposition is a numpy matrix representing the cholesky decomposition of the correlation matrix, and x and n are constants, and nrand is numpy.random
#jit
def generate_random_correlated_walks(decomposition, x, n):
uncorrelated_walks = np.empty((2*x, n))
for i in range(x):
# Generate the random uncorrelated walks
wv = nrand.normal(loc=0, scale=1, size=n)
ws = nrand.normal(loc=0, scale=1, size=n)
uncorrelated_walks[2*i] = wv
uncorrelated_walks[(2*i) + 1] = ws
# Create a matrix out of these walks
uncorrelated_walks = np.matrix(uncorrelated_walks)
m, n = uncorrelated_walks.shape
correlated_walks = np.empty((m, n))
# Go through column and correlate the walk values
for i in range(n):
correlated_timestep = np.transpose(uncorrelated_walks[:, i]) * decomposition
correlated_walks[:, i] = correlated_timestep
return correlated_walks
EDIT: I have made the suggested changes, and now my code is as below, but unfortunately still is a major bottleneck. Any ideas?
#jit
def generate_random_correlated_walks(self, decomposition, x, n):
rows = 2*x
uncorrelated_walks = np.random.normal(loc=0, scale=1, size=(rows, n))
correlated_walks = np.empty((rows, n))
for i in range(n):
correlated_timestep = np.dot(np.transpose(uncorrelated_walks[:, i]), decomposition)
correlated_walks[:, i] = correlated_timestep
return correlated_walks

First thing you can improve is to remove your for loop. There is no advantage to calling np.random.normal a bunch of times with the same input parameters if you believe that it really generates random numbers.
Instead of using np.matrix, use np.array. This will make your life easier when you consider that the previous item can be used to shorten the entire first portion of your function into one step.
You can of course completely remove the final loop with a simple matrix multiplication: uncorrelated_walks.T # decomposition will give you the transpose of your current correlated_walks. You can avoid one of the transposes by changing the order of the arguments.
You end up with something like:
def generate_random_correlated_walks(decomposition, x, n):
uncorrelated_walks = nrand.normal(loc=0, scale=1, size=(2*x, n))
correlated_walks = np.dot(decomposition.T, uncorrelated_walks)
return correlated_walks
Not sure how much this will help you, but removing the Python-level loops should be some kind of boost since it will reduce the overhead of multiple numpy calls.
You could sacrifice legibility to turn the whole thing into a one-liner:
def generate_random_correlated_walks(decomposition, x, n):
return np.dot(decomposition.T, nrand.normal(loc=0, scale=1, size=(2*x, n)))

DFT matrix in python

What's the easiest way to get the DFT matrix for 2-d DFT in python? I could not find such function in numpy.fft. Thanks!

The easiest and most likely the fastest method would be using fft from SciPy.
import scipy as sp
def dftmtx(N):
return sp.fft(sp.eye(N))
If you know even faster way (might be more complicated) I'd appreciate your input.
Just to make it more relevant to the main question - you can also do it with numpy:
import numpy as np
dftmtx = np.fft.fft(np.eye(N))
When I had benchmarked both of them I have an impression scipy one was marginally faster but I
have not done it thoroughly and it was sometime ago so don't take my word for it.
Here's pretty good source on FFT implementations in python:
http://nbviewer.ipython.org/url/jakevdp.github.io/downloads/notebooks/UnderstandingTheFFT.ipynb
It's rather from speed perspective, but in this case we can actually see that sometimes it comes with simplicity too.

I don't think this is built in. However, direct calculation is straightforward:
import numpy as np
def DFT_matrix(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
omega = np.exp( - 2 * pi * 1J / N )
W = np.power( omega, i * j ) / sqrt(N)
return W
EDIT For a 2D FFT matrix, you can use the following:
x = np.zeros(N, N) # x is any input data with those dimensions
W = DFT_matrix(N)
dft_of_x = W.dot(x).dot(W)

As of scipy 0.14 there is a built-in scipy.linalg.dft:
Example with 16 point DFT matrix:
>>> import scipy.linalg
>>> import numpy as np
>>> m = scipy.linalg.dft(16)
Validate unitary property, note matrix is unscaled thus 16*np.eye(16):
>>> np.allclose(np.abs(np.dot( m.conj().T, m )), 16*np.eye(16))
True
For 2D DFT matrix, it's just a issue of tensor product, or specially, Kronecker Product in this case, as we are dealing with matrix algebra.
>>> m2 = np.kron(m, m) # 256x256 matrix, flattened from (16,16,16,16) tensor
Now we can give it a tiled visualization, it's done by rearranging each row into a square block
>>> import matplotlib.pyplot as plt
>>> m2tiled = m2.reshape((16,)*4).transpose(0,2,1,3).reshape((256,256))
>>> plt.subplot(121)
>>> plt.imshow(np.real(m2tiled), cmap='gray', interpolation='nearest')
>>> plt.subplot(122)
>>> plt.imshow(np.imag(m2tiled), cmap='gray', interpolation='nearest')
>>> plt.show()
Result (real and imag part separately):
As you can see they are 2D DFT basis functions
Link to documentation

#Alex| is basically correct, I add here the version I used for 2-d DFT:
def DFT_matrix_2d(N):
i, j = np.meshgrid(np.arange(N), np.arange(N))
A=np.multiply.outer(i.flatten(), i.flatten())
B=np.multiply.outer(j.flatten(), j.flatten())
omega = np.exp(-2*np.pi*1J/N)
W = np.power(omega, A+B)/N
return W

Lambda functions work too:
dftmtx = lambda N: np.fft.fft(np.eye(N))
You can call it by using dftmtx(N). Example:
In [62]: dftmtx(2)
Out[62]:
array([[ 1.+0.j, 1.+0.j],
[ 1.+0.j, -1.+0.j]])

If you wish to compute the 2D DFT as a single matrix operation, it is necessary to unravel the matrix X on which you wish to compute the DFT into a vector, as each output of the DFT has a sum over every index in the input, and a single square matrix multiplication does not have this ability. Taking care to be sure we are handling the indices correctly, I find the following works:
M = 16
N = 16
X = np.random.random((M,N)) + 1j*np.random.random((M,N))
Y = np.fft.fft2(X)
W = np.zeros((M*N,M*N),dtype=np.complex)

hold = []
for m in range(M):
for n in range(N):
hold.append((m,n))

for j in range(M*N):
for i in range(M*N):
k,l = hold[j]
m,n = hold[i]
W[j,i] = np.exp(-2*np.pi*1j*(m*k/M + n*l/N))
np.allclose(np.dot(W,X.ravel()),Y.ravel())
True
If you wish to change the normalization to orthogonal, you can divide by 1/sqrt(MN) or if you wish to have the inverse transformation, just change the sign in the exponent.

This might be a little late, but there is a better alternative for creating the DFT matrix, that performs faster, using NumPy's vander
also, this implementation does not use loops (explicitly)
def dft_matrix(signal):
N = signal.shape[0] # num of samples
w = np.exp((-2 * np.pi * 1j) / N) # remove the '-' for inverse fourier
r = np.arange(N)
w_matrix = np.vander(w ** r, increasing=True) # faster than meshgrid
return w_matrix
if I'm not mistaken, the main improvement is that this method generates the elements of the power from the (already calculated) previous elements
you can read about vander in the documentation:
numpy.vander

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Optimizing numpy least squares by omitting loop and cast - python

Related

How to fill np array with values of function?

Increase performance of np.where() loop

Solving A*x=x with numpy

Is there a more efficient (i.e. faster) way to compute correlated random walks?

DFT matrix in python

Categories

Resources