Related
I have a matrix of matrices that I need to do computations on (i.e. a x by y by z by z matrix, I need to perform a computation on each of the z by z matrices, so x*y total operations). Currently, I am looping over the first two dimensions of the matrix and performing the computation, but they are entirely independent. Is there a way I can compute these in parallel, without knowing in advance how many such matrices there will be (i.e. x and y unknown).
Yes; see the multiprocessing module. Here's an example (tweaked from the one in the docs to suit your use case). (I assume z = 1 here for simplicity, so that f takes a scalar.)
from multiprocessing import Pool
# x = 2, y = 3, z = 1 - needn't be known in advance
matrix = [[1, 2, 3], [4, 5, 6]]
def f(x):
# Your computation for each z-by-z submatrix goes here.
return x**2
with Pool() as p:
flat_results = p.map(f, [x for row in matrix for x in row])
# If you don't need to preserve the x*y shape of the input matrix,
# you can use `flat_results` and skip the rest.
x = len(matrix)
y = len(matrix[0])
results = [flat_results[i*y:(i+1)*y] for i in range(x)]
# Now `results` contains `[[1, 4, 9], [16, 25, 36]]`
This will divide up the x * y computations across several processes (one per CPU core; this can be tweaked by providing an argument to Pool()).
Depending on what you're doing, consider trying vectorized operations (as opposed to for loops) with numpy first; you might find that it's fast enough to make multiprocessing unnecessary. (If matrix were a numpy array in this example, the code would just be results = matrix**2.)
The way I approach parallel processing in python is to define a function to do what I want, then apply it in parallel using multiprocessing.Pool.starmap. It's hard to suggest any code for your problem without knowing what you are computing and how.
import multiprocessing as mp
def my_function(matrices_to_compare, matrix_of_matrices):
m1 = matrices_to_compare[0]
m2 = matrices_to_compare[1]
result = m1 - m2 # or whatever you want to do
return result
matrices_x = <list of x matrices>
matrices_y = <list of y matrices>
matrices_to_compare = list(zip(matrices_x,matrices_y))
with mp.Pool(mp.cpu_count()) as pool:
results = pool.starmap(my_function,
[(x, matrix_of_matrices) for x in matrices_to_compare],
chunksize=1)
An alternative to the multiprocessing pool approach proposed by other answers - if you have a GPU available possibly the most straightforward approach would be to use a tensor algebra package taking advantage of it, such as cupy or torch.
You could also get some more speedup by jit-compiling your code (for cpu) with cython or numba (and then for gpu programming there's also numba.cuda which however requires some background to use).
I'm trying to implement the n-mode tensor-matrix product (as defined by Kolda and Bader: https://www.sandia.gov/~tgkolda/pubs/pubfiles/SAND2007-6702.pdf) efficiently in Python using Numpy. The operation effectively gets down to (for matrix U, tensor X and axis/mode k):
Extract all vectors along axis k from X by collapsing all other axes.
Multiply these vectors on the left by U using standard matrix multiplication.
Insert the vectors again into the output tensor using the same shape, apart from X.shape[k], which is now equal to U.shape[0] (initially, X.shape[k] must be equal to U.shape[1], as a result of the matrix multiplication).
I've been using an explicit implementation for a while which performs all these steps separately:
Transpose the tensor to bring axis k to the front (in my full code I added an exception in case k == X.ndim - 1, in which case it's faster to leave it there and transpose all future operations, or at least in my application, but that's not relevant here).
Reshape the tensor to collapse all other axes.
Calculate the matrix multiplication.
Reshape the tensor to reconstruct all other axes.
Transpose the tensor back into the original order.
I would think this implementation creates a lot of unnecessary (big) arrays, so once I discovered np.einsum I thought this would speed things up considerably. However using the code below I got worse results:
import numpy as np
from time import time
def mode_k_product(U, X, mode):
transposition_order = list(range(X.ndim))
transposition_order[mode] = 0
transposition_order[0] = mode
Y = np.transpose(X, transposition_order)
transposed_ranks = list(Y.shape)
Y = np.reshape(Y, (Y.shape[0], -1))
Y = U # Y
transposed_ranks[0] = Y.shape[0]
Y = np.reshape(Y, transposed_ranks)
Y = np.transpose(Y, transposition_order)
return Y
def einsum_product(U, X, mode):
axes1 = list(range(X.ndim))
axes1[mode] = X.ndim + 1
axes2 = list(range(X.ndim))
axes2[mode] = X.ndim
return np.einsum(U, [X.ndim, X.ndim + 1], X, axes1, axes2, optimize=True)
def test_correctness():
A = np.random.rand(3, 4, 5)
for i in range(3):
B = np.random.rand(6, A.shape[i])
X = mode_k_product(B, A, i)
Y = einsum_product(B, A, i)
print(np.allclose(X, Y))
def test_time(method, amount):
U = np.random.rand(256, 512)
X = np.random.rand(512, 512, 256)
start = time()
for i in range(amount):
method(U, X, 1)
return (time() - start)/amount
def test_times():
print("Explicit:", test_time(mode_k_product, 10))
print("Einsum:", test_time(einsum_product, 10))
test_correctness()
test_times()
Timings for me:
Explicit: 3.9450525522232054
Einsum: 15.873924326896667
Is this normal or am I doing something wrong? I know there are circumstances where storing intermediate results can decrease complexity (e.g. chained matrix multiplication), however in this case I can't think of any calculations that are being repeated. Is matrix multiplication so optimized that it removes the benefits of not transposing (which technically has a lower complexity)?
I'm more familiar with the subscripts style of using einsum, so worked out these equivalences:
In [194]: np.allclose(np.einsum('ij,jkl->ikl',B0,A), einsum_product(B0,A,0))
Out[194]: True
In [195]: np.allclose(np.einsum('ij,kjl->kil',B1,A), einsum_product(B1,A,1))
Out[195]: True
In [196]: np.allclose(np.einsum('ij,klj->kli',B2,A), einsum_product(B2,A,2))
Out[196]: True
With a mode parameter, your approach in einsum_product may be best. But the equivalences help me visualize the calculation better, and may help others.
Timings should basically be the same. There's an extra setup time in einsum_product that should disappear in larger dimensions.
After updating Numpy, Einsum is only slightly slower than the explicit method, with or without multi-threading (see comments to my question).
I am trying to operate on a large sparse matrix (currently 12000 x 12000).
What I want to do is to set blocks of it to zero but keep the largest value within this block.
I already have a running solution for dense matrices:
import numpy as np
from scipy.sparse import random
np.set_printoptions(precision=2)
#x = random(10,10,density=0.5)
x = np.random.random((10,10))
x = x.T * x
print(x)
def keep_only_max(a,b,c,d):
sub = x[a:b,c:d]
z = np.max(sub)
sub[sub < z] = 0
sizes = np.asarray([0,1,5,4])
sizes_sum = np.cumsum(sizes)
for i in range(1,len(sizes)):
current_i_min = sizes_sum[i-1]
current_i_max = sizes_sum[i]
for j in range(1,len(sizes)):
if i >= j:
continue
current_j_min = sizes_sum[j-1]
current_j_max = sizes_sum[j]
keep_only_max(current_i_min, current_i_max, current_j_min, current_j_max)
keep_only_max(current_j_min, current_j_max, current_i_min, current_i_max)
print(x)
This, however, doesn't work for sparse matrices (try uncommenting the line on top).
Any ideas how I could efficiently implement this without calling todense()?
def keep_only_max(a,b,c,d):
sub = x[a:b,c:d]
z = np.max(sub)
sub[sub < z] = 0
For a sparse x, the sub slicing works for csr format. It won't be as fast as the equivalent dense slice, but it will create a copy of that part of x.
I'd have to check the sparse max functions. But I can imagine convertering sub to coo format, using np.argmax on the .data attribute, and with the corresponding row and col values, constructing a new matrix of the same shape but just one nonzero value.
If your blocks covered x in a regular, nonoverlapping manner, I'd suggest constructing a new matrix with sparse.bmat. That basically collects the coo attributes of all the components, joins them into one set of arrays with the appropriate offsets, and makes a new coo matrix.
If the blocks are scattered or overlap you might have to generate, and insert them back into x one by one. csr format should work for that, but it will issue a sparse efficiency warning. lil is supposed to be faster for changing values. I think it will accept blocks.
I can imagine doing this with sparse matrices, but it will take time to setup a test case and debug the process.
Thanks to hpaulj I managed to implement a solution using scipy.sparse.bmat:
from scipy.sparse import coo_matrix
from scipy.sparse import csr_matrix
from scipy.sparse import rand
from scipy.sparse import bmat
import numpy as np
np.set_printoptions(precision=2)
# my matrices are symmetric, so generate random symmetric matrix
x = rand(10,10,density=0.4)
x = x.T * x
x = x
def keep_only_max(a,b,c,d):
sub = x[a:b,c:d]
z = np.unravel_index(sub.argmax(),sub.shape)
i1 = z[0]
j1 = z[1]
new = csr_matrix(([sub[i1,j1]],([i1],[j1])),shape=(b-a,d-c))
return new
def keep_all(a,b,c,d):
return x[a:b,c:d].copy()
# we want to create a chessboard pattern where the first central block is 1x1, the second 5x5 and the last 4x4
sizes = np.asarray([0,1,5,4])
sizes_sum = np.cumsum(sizes)
# acquire 2D array to store our chessboard blocks
r = range(len(sizes)-1)
blocks = [[0 for x in r] for y in r]
for i in range(1,len(sizes)):
current_i_min = sizes_sum[i-1]
current_i_max = sizes_sum[i]
for j in range(i,len(sizes)):
current_j_min = sizes_sum[j-1]
current_j_max = sizes_sum[j]
if i == j:
# keep the blocks at the diagonal completely
sub = keep_all(current_i_min, current_i_max, current_j_min, current_j_max)
blocks[i-1][j-1] = sub
else:
# the blocks not on the digonal only keep their maximum value
current_j_min = sizes_sum[j-1]
current_j_max = sizes_sum[j]
# we can leverage the matrix symmetry and only calculate one new matrix.
m1 = keep_only_max(current_i_min, current_i_max, current_j_min, current_j_max)
m2 = m1.T
blocks[i-1][j-1] = m1
blocks[j-1][i-1] = m2
z = bmat(blocks)
print(z.todense())
I tried to translate a piece of code from Matlab to Python and I'm running into some errors:
Matlab:
function [beta] = linear_regression_train(traindata)
y = traindata(:,1); %output
ind2 = find(y == 2);
ind3 = find(y == 3);
y(ind2) = -1;
y(ind3) = 1;
X = traindata(:,2:257); %X matrix,with size of 1389x256
beta = inv(X'*X)*X'*y;
Python:
def linear_regression_train(traindata):
y = traindata[:,0] # This is the output
ind2 = (labels==2).nonzero()
ind3 = (labels==3).nonzero()
y[ind2] = -1
y[ind3] = 1
X = traindata[ : , 1:256]
X_T = numpy.transpose(X)
beta = inv(X_T*X)*X_T*y
return beta
I am receiving an error: operands could not be broadcast together with shapes (257,0,1389) (1389,0,257) on the line where beta is calculated.
Any help is appreciated!
Thanks!
The problem is that you are working with numpy arrays, not matrices as in MATLAB. Matrices, by default, do matrix mathematical operations. So X*Y does a matrix multiplication of X and Y. With arrays, however, the default is to use element-by-element operations. So X*Y multiplies each corresponding element of X and Y. This is the equivalent of MATLAB's .* operation.
But just like how MATLAB's matrices can do element-by-element operations, Numpy's arrays can do matrix multiplication. So what you need to do is use numpy's matrix multiplication instead of its element-by-element multiplication. For Python 3.5 or higher (which is the version you should be using for this sort of work), that is just the # operator. So your line becomes:
beta = inv(X_T # X) # X_T # y
Or, better yet, you can use the simpler .T transpose, which is the same as np.transpose but much more concise (you can get rid of the `np.transpose line entirely):
beta = inv(X.T # X) # X.T # y
For Python 3.4 or earlier, you will need to use np.dot since those versions of python don't have the # matrix multiplication operator:
beta = np.dot(np.dot(inv(np.dot(X.T, X)), X.T), y)
Numpy has a matrix object that uses matrix operations by default like the MATLAB matrix. Do not use it! It is slow, poorly-supported, and almost never what you really want. The Python community has standardized around arrays, so use those.
There may also be some issues with the dimensions of traindata. For this to work properly then traindata.ndim should be equal to 3. In order for y and X to be 2D, traindata should be 3D.
This could be an issue if traindata is 2D and you want y to be MATLAB-style "vector" (what MATLAB calls "vectors" aren't really vectors). In numpy, using a single index like traindata[:, 0] reduces the number of dimensions, while taking a slice like traindata[:, :1] doesn't. So to keep y 2D when traindata is 2D, just do a length-1 slice, traindata[:, :1]. This is exactly the same values, but this keeps the same number of dimensions as traindata.
Notes: Your code can be significantly simplified using logical indexing:
def linear_regression_train(traindata):
y = traindata[:, 0] # This is the output
y[labels == 2] = -1
y[labels == 3] = 1
X = traindata[:, 1:257]
return inv(X.T # X) # X.T # y
return beta
Also, your slice is wrong when defining X. Python slicing excludes the last value, so to get a 256 long slice you need to do 1:257, as I did above.
Finally, please keep in mind that modifications to arrays inside functions carry over outside the functions, and indexing does not make a copy. So your changes to y (setting some values to 1 and others to -1), will affect traindata outside of your function. If you want to avoid that, you need to make a copy before you make your changes:
y = traindata[:, 0].copy()
Python 2.7.3
numpy 1.8.0
Hi all,
I am using numpy for a few months and I need help with some basic stuff. The code below should work and the bit I need help with is highlighted (# <<<<<<<):
import numpy as np
rng = np.random.RandomState(12345)
samples = np.array(np.arange(400).reshape(50, 8))
nSamples = samples.shape[0]
FOLDS = 15
foldSize = nSamples / FOLDS
indices = np.arange(nSamples)
rng.shuffle(indices)
slices = [slice(i * foldSize ,
(i + 1) * foldSize, 1) for i in xrange(FOLDS + 1)]
for i in xrange(len(slices)):
y = samples[indices[slices[i]]]
x = np.array([x for x in samples if x not in samples[slices[i]]]) # <<<<<<<
#do some processing with x and y
Basically random slices a 2D array row-wisely, use the full array to process and test in the sliced bit, then repeat for the for another slice util everything is done (It called an cross-validation experiment).
My question is: Is there a better way to select all rows in a ndarray but a slice? Am I missing something? What is the advised way to [x for x in samples if x not in samples[indices][0:3]] ?
Thanks in advance.
ps: masked arrays does not solve my problem.
ps1: I know it's already implemented elsewhere, I just need to learn.
You can create a boolean array for the rows to select as follows:
indices_to_ignore = [1, 2, 3]
mask = np.ones(samples.shape[:1], dtype=np.bool)
mask[indices_to_ignore] = 0
samples[mask].shape