I have this code below, I already optimised the algorithm to make it as fast as possible but it is still too slow. So a was thinking about using multiprocessing (I have no expierience with this kind of stuff), but I tried some things with pool and threading but either it was slower than before or didn't work. So is was wondering how I should do this so that it works and is faster. And if there are other options than multithreading to make kind of code this faster.
def calc(indices, data):
matrix = [[0] * len(indices) for i in range(len(indices))]
for i_a, i_b in list(itertools.combinations(indices, 2)):
a_res, b_res = algorithm(data[i_a], data[i_b])
matrix[i_b][i_a] = a_res
matrix[i_a][i_b] = b_res
return matrix
def algorithm(a,b):
# Verry slow and complex

Building upon Simon's answer, here is an example applying a multiprocessing pool to a version of your problem. Your mileage will vary depending on how many cores you have on your machine but I hope that this will be a helpful demonstration of how you could structure a solution to your problem:
import itertools
import numpy as np
import multiprocessing as mp
import time
def calc_mp(indices, data):
# construct pool
pool = mp.Pool(mp.cpu_count())
# we are going to populate the matrix; organize all the inputs; then map them
matrix = [[0] * len(indices) for i in range(len(indices))]
args = [(data[i_a], data[i_b]) for i_a, i_b in list(itertools.combinations(indices, 2))]
results = pool.starmap(algorithm, args)
# unpack the results into the matrix
for i_tuple, result in zip([(i_a, i_b) for i_a, i_b in list(itertools.combinations(indices, 2))], results):
# unpack
i_a, i_b = i_tuple
a_res, b_res = result
# set it in the matrix
matrix[i_b][i_a] = a_res
matrix[i_a][i_b] = b_res
return matrix
def calc_single(indices, data):
# do the simple single process version
matrix = [[0] * len(indices) for i in range(len(indices))]
for i_a, i_b in list(itertools.combinations(indices, 2)):
a_res, b_res = algorithm(data[i_a], data[i_b])
matrix[i_b][i_a] = a_res
matrix[i_a][i_b] = b_res
return matrix
def algorithm(a,b):
# Very slow and complex
return a + b, a - b
if __name__ == "__main__":
# generate test data;
indices = range(5)
data = range(len(indices))
# test single
time_start = time.time()
print(calc_single(indices, data))
print("Took {}".format(time.time() - time_start))
# mp
time_start = time.time()
print(calc_mp(indices, data))
print("Took {}".format(time.time() - time_start))
The results, with 8 cores, are
[[0, -1, -2, -3, -4], [1, 0, -1, -2, -3], [2, 3, 0, -1, -2], [3, 4, 5, 0, -1], [4, 5, 6, 7, 0]]
Took 20.02155065536499
[[0, -1, -2, -3, -4], [1, 0, -1, -2, -3], [2, 3, 0, -1, -2], [3, 4, 5, 0, -1], [4, 5, 6, 7, 0]]
Took 4.073369264602661

Your best bet in Multiprocessing. You will need to partition your data into chunks and pass each chunk to a process. Threading won't help you in Python because all Python processes run on a single cpu thread. It's still useful for some use cases, such as where you have several activities going on some of which might block, but not for parallel workloads.


How to calculate the correlation coefficient on a rolling window of a vector using numpy?

I'm able to calculate a rolling correlation coefficient for a 1D-array (data against [0, 1, 2, 3, 4]) using a loop.
I'm looking for a smarter solution using numpy (not pandas).
Here is my current code:
import numpy as np
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7,4,8,-2,-3,-4,-6,-2,0,10,0,5,8])
x = np.zeros_like(data).astype('float32')
length = 5
for i in range(length, data.shape[0]):
x[i] = np.corrcoef(data[i - length:i], np.arange(length))[0, 1]
x gives :
[ 0. 0. 0. 0. 0. 0.607 0.959 0.98 0.328 -0.287
-0.61 -0.314 -0.18 -0.8 -0.782 -0.847 -0.811 -0.825 -0.869 -0.283
0.566 0.863 0.643 0.454]
Any solution without the loop please?
Use a numpy.lib.stride_tricks.sliding_window_view (available in numpy v1.20.0+)
swindow = np.lib.stride_tricks.sliding_window_view(data, (length,))
which gives a view on the data array that looks like so:
array([[10, 5, 8, 9, 15],
[ 5, 8, 9, 15, 22],
[ 8, 9, 15, 22, 26],
[ 9, 15, 22, 26, 11],
[15, 22, 26, 11, 15],
[22, 26, 11, 15, 16],
[26, 11, 15, 16, 18],
[11, 15, 16, 18, 7],
[15, 16, 18, 7, 4],
[16, 18, 7, 4, 8],
[18, 7, 4, 8, -2],
[ 7, 4, 8, -2, -3],
[ 4, 8, -2, -3, -4],
[ 8, -2, -3, -4, -6],
[-2, -3, -4, -6, -2],
[-3, -4, -6, -2, 0],
[-4, -6, -2, 0, 10],
[-6, -2, 0, 10, 0],
[-2, 0, 10, 0, 5],
[ 0, 10, 0, 5, 8]])
Now, we want to apply the correlation coefficient calculation to each row of this array. Unfortunately, np.corrcoef doesn't take an axis argument, it applies the calculation to the entire matrix and doesn't provide a way to do so for each row/column.
However, the calculation for the correlation coefficient of two vectors is quite simple:
Applying that here:
def vec_corrcoef(X, y, axis=1):
Xm = np.mean(X, axis=axis, keepdims=True)
ym = np.mean(y)
n = np.sum((X - Xm) * (y - ym), axis=axis)
d = np.sqrt(np.sum((X - Xm)**2, axis=axis) * np.sum((y - ym)**2))
return n / d
Now, call this function with our array and arange:
cc = vec_corrcoef(swindow, np.arange(length))
which gives the desired result:
array([ 0.60697698, 0.95894955, 0.98 , 0.3279521 , -0.28709766,
-0.61035663, -0.31390158, -0.17995394, -0.80041656, -0.78192905,
-0.84702587, -0.81091772, -0.82464375, -0.86892667, -0.28347335,
0.56568542, 0.86304424, 0.64326752, 0.45374261, 0.38135638])
To get your x, just set the appropriate indices of a zeros array of the correct size.
Note: I think your x should contain nonzero values starting at the 4 index (because that's where the sliding window is full) instead of starting at index 5.
x = np.zeros(data.shape)
x[-len(cc):] = cc
If you are sure that your values should start at the index 5, then you can do:
x = np.zeros(data.shape)
x[length:] = cc[:-1] # Ignore the last value in cc
Comparing the runtimes of your original approach with those suggested in the answers here:
f_OP_loopy is your approach, which implements a sliding window using a loop
f_PH_numpy is my approach, which uses the sliding_window_view and the vectorized function for row-wise calculation of the vector correlation coefficient
f_RA_numpy is Rontogiannis's approach, which tiles the arange, calculates the correlation coefficient for the entire matrices, and only selects the first len(data) - length rows of the last column
f_RA_recur is Rontogiannis's recursive approach, but I didn't time this because it misses out on the last correlation coefficient.
Unsurprisingly, the numpy-only solution is faster than the loopy approach.
My numpy solution, which computes the row-wise correlation coefficient, is faster than that shown by Rontogiannis below, because the extra work involved in tiling the vector input and calculating the correlation of the entire matrix, only to discard the unwanted elements, is avoided by my approach.
As the input data size increases, this "extra work" in Rontogiannis's approach increases so much that its runtime is worse even than the loopy approach! I am unsure if this extra time is in the np.corrcoef calculation or in the np.tile operation.
Note: This plot was obtained on my 2.2GHz i7 Macbook Air with 8GB RAM, Python 3.10.7 and numpy 1.23.3. Similar results were obtained on Google Colab
If you're interested in the timing code, here it is:
import timeit
import numpy as np
from matplotlib import pyplot as plt
def time_funcs(funcs, sizes, arg_gen, N=20):
times = np.zeros((len(sizes), len(funcs)))
gdict = globals().copy()
for i, s in enumerate(sizes):
args = arg_gen(s)
for j, f in enumerate(funcs):
times[i, j] = timeit.timeit("f(*args)", globals=gdict, number=N) / N
print(f"{i}/{len(sizes)}, {j}/{len(funcs)}, {times[i, j]}")
except ValueError:
print(f"ERROR in {f}, with args=", *args)
return times
def plot_times(times, funcs):
fig, ax = plt.subplots()
for j, f in enumerate(funcs):
ax.plot(sizes, times[:, j], label=f.__name__)
ax.set_xlabel("Array size")
ax.set_ylabel("Time per function call (s)")
return fig, ax
def arg_gen(n):
return [np.random.randint(-100, 100, (n,)), 5]
def f_OP_loopy(data, length):
x = np.zeros_like(data).astype('float32')
for i in range(length-1, data.shape[0]):
x[i] = np.corrcoef(data[i - length + 1:i+1], np.arange(length))[0, 1]
return x
def f_PH_numpy(data, length):
swindow = np.lib.stride_tricks.sliding_window_view(data, (length,))
cc = vec_corrcoef(swindow, np.arange(length))
x = np.zeros(data.shape)
x[-len(cc):] = cc
return x
def f_RA_recur(data, length):
return np.concatenate((
rolling_correlation_recurse(data, 0, length)
def f_RA_numpy(data, length):
n = len(data)
cc = np.corrcoef(np.lib.stride_tricks.sliding_window_view(data, length), np.tile(np.arange(length), (n-length+1, 1)))[:n-length+1, -1]
x = np.zeros(data.shape)
x[-len(cc):] = cc
return x
def rolling_correlation_recurse(data, i, length) :
assert i+length < data.size
left = np.array([np.corrcoef(data[i:i+length], np.arange(length))[0, 1]])
if i+length+1 == data.size :
return left
right = rolling_correlation_recurse(data, i+1, length)
return np.concatenate((left, right))
def vec_corrcoef(X, y, axis=1):
Xm = np.mean(X, axis=axis, keepdims=True)
ym = np.mean(y)
n = np.sum((X - Xm) * (y - ym), axis=axis)
d = np.sqrt(np.sum((X - Xm)**2, axis=axis) * np.sum((y - ym)**2))
return n / d
if __name__ == "__main__":
#%% Set up sim
sizes = [5, 10, 50, 100, 500, 1000, 5000, 10_000] #, 50_000, 100_000]
funcs = [f_OP_loopy, #f_RA_recur,
f_PH_numpy, f_RA_numpy]
#%% Run timing
time_fcalls = np.zeros((len(sizes), len(funcs))) * np.nan
time_fcalls = time_funcs(funcs, sizes, arg_gen)
fig, ax = plot_times(time_fcalls, funcs)
ax.set_xlabel(f"Input size")
input("Enter x to exit")
Ask and you shall receive. Here is a solution that uses recursion:
import numpy as np
data = np.array([10,5,8,9,15,22,26,11,15,16,18,7,4,8,-2,-3,-4,-6,-2,0,10,0,5,8])
length = 5
def rolling_correlation_recurse(data, i, length) :
assert i+length < data.size
left = np.array([np.corrcoef(data[i:i+length], np.arange(length))[0, 1]])
if i+length+1 == data.size :
return left
right = rolling_correlation_recurse(data, i+1, length)
return np.concatenate((left, right))
def rolling_correlation(data, length) :
return np.concatenate((
rolling_correlation_recurse(data, 0, length)
print(rolling_correlation(data, length))
Edit: here is a numpy solution too:
n = len(data)
print(np.corrcoef(np.lib.stride_tricks.sliding_window_view(data, length), np.tile(np.arange(length), (n-length+1, 1)))[:n-length+1, -1])

Python Loop to generate random binary matrix of -1 or 1

Instead of the typical binary 0 or 1, or two consecutive numbers, I need to create a loop which will generate three random -1s and 1s, 100 times. My code so far looks like:
new = []
i =1
while i <= 100:
random = np.random.randint(low=-1, high=1, size=(1,3))
i += 1
Though this just returns None and even if it was working, would return 0s as well which is not wanted.
No need to bother importing numpy - it may be easier to just use random.choices() (from the standard library) to generate three random choices from [-1, 1], and do it 100 times.
import random
new = [random.choices([-1, 1], k=3) for _ in range(100)]
# [[1, 1, -1],
# [1, 1, 1],
# [1, 1, -1],
# [-1, -1, -1],
# ...
# [1, -1, 1]]
If you're doing this inside a function, with the intent for that function to produce new for outside use, then don't forget to do return new at the end of the function.

Find all close numerical matches in two 2D arrays

Update: I made the solution into a library called close-numerical-matches.
I am looking for a way to find all close matches (within some tolerance) between two 2D arrays and get an array of the indices of the found matches. Multiple answers on SO show how to solve this problem for exact matches (typically with a dictionary), but that is not what I am looking for. Let me give an example:
>>> arr1 = [
[19.21, 19.19],
[13.18, 11.55],
[21.45, 5.83]
>>> arr2 = [
[13.11, 11.54],
[19.20, 19.19],
[51.21, 21.55],
[19.22, 19.18],
[11.21, 11.55]
>>> find_close_match_indices(arr1, arr2, tol=0.1)
[[0, 1], [0, 3], [1, 0]]
Above, [[0, 1], [0, 3], [1, 0]] is returned because element 0 in arr1, [19.21, 19.19] is within tolerance to elements 1 and 3 in arr2. Order is not important to me, i.e. [[0, 3], [1, 0], [0, 1]] would be just as acceptable.
The shape of arr1 is (n, 2) and arr2 is (m, 2). You can expect that n and m will be huge. Now, I can easily implement this using a nested for loop but I am sure there must be some smarter way than comparing every element against all other elements.
I thought about using k-means clustering to divide the problem into k buckets and thus make the nested for-loop approach more tractable, but I think there may be a small risk two close elements are just at the "border" of each of their clusters and therefore wouldn't get compared.
Any external dependencies such as Numpy, Scipy, etc. are fine and it is fine as well as to use O(n + m) space.
You can't do it with NO loops, but you can do it with ONE loop by taking advantage of the boolean indexing:
import numpy as np
xarr1 = np.array([
[19.21, 19.19],
[13.18, 11.55],
[21.45, 5.83]
xarr2 = np.array([
[13.11, 11.54],
[19.20, 19.19],
[51.21, 21.55],
[19.22, 19.18],
[11.21, 11.55]
def find_close_match_indices(arr1, arr2, tol=0.1):
results = []
for i,r1 in enumerate(arr1[:,0]):
x1 = np.abs(arr2[:,0]-r1) < tol
results.extend( [i,k] for k in np.where(x1)[0] )
return results
[[0, 1], [0, 3], [1, 0]]
Perhaps you might find the following useful. Might be faster than #Tim-Roberts 's solution because there are no explicit for loops. But it will use more storage.
import numpy as np
xarr1 = np.array([
[19.21, 19.19],
[13.18, 11.55],
[21.45, 5.83]
xarr2 = np.array([
[13.11, 11.54],
[19.20, 19.19],
[51.21, 21.55],
[19.22, 19.18],
[11.21, 11.55]
# broadcasting
cc = xarr2-xarr1
cc = np.apply_along_axis(np.linalg.norm,-1,cc)
# or you can use other metrics of closeness e.g. as below
#cc = np.apply_along_axis(np.abs,-1,cc)
#cc = np.apply_along_axis(np.max,-1,cc)
I got an idea for how to use buckets to solve this problem. The idea is that a key is formed based on the values of the elements and the tolerance level. To make sure potential matches that were in the "edge" of the bucket are compared against other element at "edges", all neighbour buckets are compared. Finally, I modified #Tim Roberts' approach for performing the actual matching slightly to match on both columns.
I made this into a library called close-numerical-matches. Sample usage:
>>> import numpy as np
>>> from close_numerical_matches import find_matches
>>> arr0 = np.array([[25, 24], [50, 50], [25, 26]])
>>> arr1 = np.array([[25, 23], [25, 25], [50.6, 50.6], [60, 60]])
>>> find_matches(arr0, arr1, tol=1.0001)
array([[0, 0], [0, 1], [1, 2], [2, 1]])
>>> find_matches(arr0, arr1, tol=0.9999)
array([[1, 2]])
>>> find_matches(arr0, arr1, tol=0.60001)
array([], dtype=int64)
>>> find_matches(arr0, arr1, tol=0.60001, dist='max')
array([[1, 2]])
>>> manhatten_dist = lambda arr: np.sum(np.abs(arr), axis=1)
>>> matches = find_matches(arr0, arr1, tol=0.11, dist=manhatten_dist)
>>> matches
array([[0, 1], [0, 1], [2, 1]])
>>> indices0, indices1 = matches.T
>>> arr0[indices0]
array([[25, 24], [25, 24], [25, 26]])
Some profiling:
from timeit import default_timer as timer
import numpy as np
from close_numerical_matches import naive_find_matches, find_matches
arr0 = np.random.rand(320_000, 2)
arr1 = np.random.rand(44_000, 2)
start = timer()
naive_find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start) # 255.335 s
start = timer()
find_matches(arr0, arr1, tol=0.001)
end = timer()
print(end - start) # 5.821 s

Use information of two arrays to create a third one

I have two numpy-arrays and want to create a third one with the information in these twos.
Here is a simple example:
have = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
use = np.array([[2], [3]])
solution = np.array([[1, 1, 3, 4], [5, 5, 5, 8]])
What I want is to use the "use"-array, which gives me the number of how often I want to use the first element in each row from my "have"-array.
So the 2 in "use" means, that I want to have two times a "1" in my new array "solution". Similary for the "3" in use, I want that my new array has 3 times a "5". The rest from have should be the same.
It is important to use the "use"-array for doing this (or a numpy-array in general).
Do you have some ideas?
If there are only small such data structures and performance is not an issue then you can do this so simple:
np.array([ [a[0]]*b[0]+list(a[b[0]:]) for a,b in zip(have,use)])
Simply iterate through the have and replace the values based on the use.
for i in range(use.shape[0]):
have[i, :use[i, 0]] = np.repeat(have[i, 0], use[i, 0])
Using only numpy operations:
First create a boolean mask of same size as have. mask(i, j) is True if j < use[i, j] otherwise it's False. So mask is True for indices which are to be replaced by first column value. Now use np.where to replace.
n, m = have.shape
mask = np.repeat(np.arange(m)[None, :], n, axis = 0) < use
have = np.where(mask, have[:, 0:1], have)
>>> have
array([[1, 1, 3, 4],
[5, 5, 5, 8]])
If performance matters, you can use np.apply_along_axis().
import numpy as np
have = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
use = np.array([[2], [3]])
def rep1st(arr):
rep = arr[0]
res = np.repeat(arr[1], rep)
res = np.concatenate([res, arr[rep+1:]])
return res
solution = np.apply_along_axis(rep1st, 1, np.concatenate([use, have], axis=1))
As #hpaulj said, actually the method using apply_along_axis above is not as efficient as I expected. I misunderstood it. Reference: numpy np.apply_along_axis function speed up?.
However, I made some test on current methods:
import numpy as np
from timeit import timeit
def rep1st(arr):
rep = arr[0]
res = np.repeat(arr[1], rep)
res = np.concatenate([res, arr[rep + 1:]])
return res
def test(row, col, run):
have = np.random.randint(0, 100, size=(row, col))
use = np.random.randint(0, col, size=(row, 1))
d = locals()
# method by me
t1 = timeit("np.apply_along_axis(rep1st, 1, np.concatenate([use, have], axis=1))", number=run, globals=d)
# method by #quantummind
t2 = timeit("np.array([[a[0]] * b[0] + list(a[b[0]:]) for a, b in zip(have, use)])", number=run, globals=d)
# method by #Amit Vikram Singh
t3 = timeit(
"np.where(np.repeat(np.arange(have.shape[1])[None, :], have.shape[0], axis=0) < use, have[:, 0:1], have)",
number=run, globals=d
print(f"{t1:8.6f}, {t2:8.6f}, {t3:8.6f}")
test(1000, 10, 10)
test(100, 100, 10)
test(10, 1000, 10)
test(1000000, 10, 1)
test(100000, 100, 1)
test(10000, 1000, 1)
test(1000, 10000, 1)
test(100, 100000, 1)
test(10, 1000000, 1)
0.062488, 0.028484, 0.000408
0.010787, 0.013811, 0.000270
0.001057, 0.009146, 0.000216
6.146863, 3.210017, 0.044232
0.585289, 1.186013, 0.034110
0.091086, 0.961570, 0.026294
0.039448, 0.917052, 0.022553
0.028719, 0.919377, 0.022751
0.035121, 1.027036, 0.025216
It shows that the second method proposed by #Amit Vikram Singh always works well even when the arrays are huge.

Unravel Index numpy - own implementation

I try to implement np.unravel_index and np.ravel_multi_index on my own.
For np.ravel_multi_index I could write this short function:
def coord2index(coord, shape):
return np.concatenate((np.asarray(shape[1:])[::-1].cumprod()[::-1],[1])).dot(coord)
But I struggle with finding a similar, short (one-liner) function for np.unravel_index. Does somebody have an idea?
This is one possible implementation:
import numpy as np
def index2coord(index, shape):
return ((np.expand_dims(index, 1) // np.r_[1, shape[:0:-1]].cumprod()[::-1]) % shape).T
shape = (2, 3, 4)
coord = [[0, 1], [2, 0], [1, 3]]
print(index2coord(coord2index(coord, shape), shape))
# [[0 1]
# [2 0]
# [1 3]]
