Efficient way to compute pair wise difference among the 1d numpy array - python

I have a specific problem.
I made up this code to compute difference of pairs of element in 1d array.
np.array([j-i for m, i in enumerate(X[:]) for j in X[m+1:]])
For example, for a input X=np.array([0, 1, 2, 0, 1, 2, 0, 1, 2]), this code return 9*8/2=36 elements array which is:
np.array([1,2,0,1,2,0,1,2,1,-1,0,1,-1,0,1,-2,-1,0,-2,-1,0,1,2,0,1,2,1,-1,0,1,-2,-1,0,1,2,1])
Although I understand that this code is inherently a O(n^2), my code takes a lot of time for larger array X (only n~400) and use a lot of memory. So I think double loop indexing is cause of this slow down and vectorization of this method may make it faster. Do you have any idea or know standard module to compute this?

You can do this (time) efficiently using broadcasting (which uses vectorization). The solution for X of length 400 in instantaneous on my machine:
# X = np.random.rand(400)
X=np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
X = X.reshape(-1,1)
M = X.T - X
idx = np.triu_indices(len(X), k=1)
solution = M[idx]
array([ 1, 2, 0, 1, 2, 0, 1, 2, 1, -1, 0, 1, -1, 0, 1, -2, -1,
0, -2, -1, 0, 1, 2, 0, 1, 2, 1, -1, 0, 1, -2, -1, 0, 1,
2, 1])

You want to compute the difference between all possible pairs? That's inherently a O(n2).
You're always going to run into trouble at some point, but you can go a lot further by not keeping the entire square in memory, only lazily generating and using each value as you iterate.

Related

How to vectorize this pytorch code over (at least) the batch dimension?

I want to implement a code to build an adjacency matrix such that (for example):
If X[0] : [0, 1, 2, 0, 1, 0], then,
A[0, 1] = 1
A[1, 2] = 1
A[2, 0] = 1
A[0, 1] = 1
A[1, 0] = 1
The following code works fine, however, it's too slow! So, please help me to vectorize this code on the batch (first) dimension at least:
A = torch.zeros((3, 3, 3), dtype = torch.float)
X = torch.tensor([[0, 1, 2, 0, 1, 0], [1, 0, 0, 2, 1, 1], [0, 0, 2, 2, 1, 1]])
for a, x in zip(A, X):
for i, j in zip(x, x[1:]):
a[i, j] = 1
Thanks! :)
I am pretty sure that there is a much simpler way of doing this, but I tried to keep within the realm of torch function calls, to make sure that any gradient operation could be properly tracked.
In case this is not required for backpropagation, I strongly suggest you look into solution that maybe utilize some numpy functions, because I think there is a stronger guarantee to find something suitable here. But, without further ado, here is the solution I came up with.
It essentially transforms your X vector into a series of tuple entries that correspond to the position in A. For this, we need to align some of the indices (specifically, the first dimension is only implicitly given in X, since the first list in X corresponds to A[0,:,:], the second list to A[1,:,:], and so on.
This is also probably where you can start optimizing the code, because I did not find a reasonable description of such a matrix, and therefore had to come up with my own way of creating it.
# Start by "aligning" your shifted view of X
# Essentially, take the all but the last element,
# and put it on top of all but the first element.
X_shift = torch.stack([X[:,:-1], X[:,1:]], dim=2)
# X_shift.shape: (3,5,2) in your example
# To assign this properly, we need to turn it into a "concatenated" list,
# where each entry corresponds to a 2D tuple in the respective dimension of A.
temp_tuples = X_shift.view(-1,2).transpose(0,1)
# temp_tuples.shape: (2,15) in your example. Below are the values:
tensor([[0, 1, 2, 0, 1, 1, 0, 0, 2, 1, 0, 0, 2, 2, 1],
[1, 2, 0, 1, 0, 0, 0, 2, 1, 1, 0, 2, 2, 1, 1]])
# Now we have to create a matrix do indicate the proper "first dimension index"
fix_dims = torch.repeat_interleave(torch.arange(0,3,1), len(X[0])-1, 0).unsqueeze(dim=0)
# fix_dims.shape: (1,15)
# Long story short, this creates the following vector.
tensor([[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2]])
# Note that the unsqueeze is necessary to properly concatenate the two matrices:
access_tuples = tuple(torch.cat([fix_dims, temp_tuples], dim=0))
A[access_tuples] = 1
This further assumes that every dimension in X has the same number of tuples changed. If that is not the case, then you have to manually create a fix_dims vector, where each increment is repeated the length of X[i] times. If it is equal as in your example, you can safely use the proposed solution.
Make X a tuple instead of a tensor:
A = torch.zeros((3, 3, 3), dtype = torch.float)
X = ([0, 1, 2, 0, 1, 0], [1, 0, 0, 2, 1, 1], [0, 0, 2, 2, 1, 1])
A[X] = 1
For example, by casting it like this: A[tuple(X)]

Python Genetic Algorithm "Natural" Selection

How can I perform selection (i.e. deletion of elements) in an array that tends towards lower numbers.
If I have an array of fitnesses sorted lowest to highest, how can I use random number generation that tends towards the smaller numbers to delete those elements at random.
pop_sorted_by_fitness = [1, 4, 10, 330]
I want to randomly delete one of those smaller elements, where it's most of the time 1, sometimes 4, and rarely 10, with almost never 330. How can I achieve this sort of algorithm.
How about making use of exponential distribution for sampling your indexes using numpy.random.exponential
import numpy as np
s = [1, 4, 10, 330]
limit = len(s)
scale = 10**int(np.log10(limit))
index = int(np.random.exponential()*scale)%limit
Test it
In [37]: sorted([int(np.random.exponential()*scale)%limit for _ in xrange(20)])
Out[37]: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 3, 3]

Calculate persistence of sign in numpy array

This is hard to describe so consider this example. Let's say I have this array
import numpy as np
X=np.array([-1.94198425, 2.29219632, 0.35505434,
-0.06408812, -1.25963731, -0.32275248, -0.4178637 , 0.37951672])
Now I want to count the number of times (number of consecutive indices) that the sign of the elements remain the same. In this case the answer would be [1, 2, 4, 1], because there's 1 negative number, followed by 2 positive numbers, followed by 4 negatives and so on. I can calculate this by doing
times=[0]
sig=np.sign(X[0])
for x in X:
if sig==np.sign(x):
times[-1]+=1
else:
times.append(1)
sig=np.sign(x)
print(times)
Which yields the correct result.
However, if I have a 400x1000 array and I want to perform this over one of the axes things get pretty slow.
Is there any way to use Numpy/Scipy to do this easily and over on axis of an n-dimensional array?
I figured I could start with something like
a=X.copy()
a[a<=0]=-1
a[a>0]=1
And use stuff like cumsum() but so far I got nothing.
PS: I could probably use f2py, Cython or Numba, but I'm trying to avoid that because of flexibility.
Approach #1 : Vectorized one-liner solution -
np.diff(np.r_[0,np.flatnonzero(np.diff(np.sign(X))!=0)+1, len(X)])
Approach #2 : Alternatively, for some performance boost, we can make use of slicing to replace the differentiation on the sign values and use faster np.concatenate in place of np.r_ for the concatenation step, like so -
s = np.sign(X)
out = np.diff(np.concatenate(( [0], np.flatnonzero(s[1:]!=s[:-1])+1, [len(X)] )))
Approach #3 : Alternatively again, if the number of sign changes is a considerable number as compared to the length of the input array, you might want to do the concatenation on the mask array of sign change. The mask arrays/boolean arrays being much more memory efficient than int or float arrays might bring about more performance boost.
Thus, one more method would be -
s = np.sign(X)
mask = np.concatenate(( [True], s[1:]!=s[:-1], [True] ))
out = np.diff(np.flatnonzero(mask))
Extending to 2D case
We can extend the approach #3 to a 2D array case with a bit more of additional work that are explained alongwith the code comments. Good thing is that the concatenation part lets us keep the code vectorized during the extension work. Thus, on a 2D array for which we need the sign persistence on a per row basis, the implementation would look something like this -
# Get signs. Get one-off shifted mask for each row.
# Concatenate at either ends of each row with True values, getting us 2D mask
s = np.sign(X)
T = np.ones((X.shape[0],1),dtype=bool)
mask2D = np.column_stack(( T, s[:,1:]!=s[:,:-1], T ))
# Get flattened nonzeros indices on the 2D mask.
all_intervals = np.diff(np.flatnonzero(mask2D.ravel()))
# We need to remove the indices that were generated because of the True values
# concatenation. So, get those indices and delete those.
rm_idx = (mask2D[:-1].sum(1)-1).cumsum()
all_intervals1 = np.delete(all_intervals, rm_idx + np.arange(X.shape[0]-1))
# Finally, split the indices into a list of arrays, with each array giving us
# the counts of sign persistences
out = np.split(all_intervals1, rm_idx )
Sample input, output -
In [212]: X
Out[212]:
array([[-3, 1, -3, -2, 2, 3, -3, 1, 1, -1],
[-2, -3, 0, -2, -2, 0, 3, -1, -2, 2],
[ 0, -1, -3, -2, -2, 3, -3, -2, 1, 1],
[ 1, -3, 0, -1, -2, 1, -1, 1, 3, 2],
[-1, 1, 0, -2, 0, -1, -1, -3, 0, 1]])
In [213]: out
Out[213]:
[array([1, 1, 2, 2, 1, 2, 1]),
array([2, 1, 2, 1, 1, 2, 1]),
array([1, 4, 1, 2, 2]),
array([1, 1, 1, 2, 1, 1, 3]),
array([1, 1, 1, 1, 1, 3, 1, 1])]

Iterating through matrices Python

If I have two lists and want to iterate through subtracting one from the other how would I go about this? I was thinking broadcasting. Right now I have:
array1 = [0,2,2,0]
array2 = [2,2,0,1]
I would like to subtract array1 from each value in array2 and make a new matrix of outputs:
output = [2, 0, 0, 2,
2, 0, 0, 2,
0, -2, -2, 0,
1, -1, -1, 1]
so in the end it's a 4x4 matrix.
Is this possible? Is the easiest way to use broadcasting? I was thinking of making each row value in array2 into it's own array, subtracting that from array2 using broadcasting, then summing all the array's at the end into one big array (using Numpy)... is there an easier way?
If I have two lists and want to iterate through subtracting one from the other how would I go about this? I was thinking broadcasting. Right now I have:
array1 = [0,2,2,0]
array2 = [2,2,0,1]
I would like to subtract array1 from each value in array2 and make a new matrix of outputs:
output = [2, 0, 0, 2,
2, 0, 0, 2,
0, -2, -2, 0,
1, -1, -1, 1]
so in the end it's a 4x4 matrix.
Is this possible? Is the easiest way to use broadcasting? I was thinking of making each row value in array2 into it's own array, subtracting that from array2 using broadcasting, then summing all the array's at the end into one big array (using Numpy)... is there an easier way?
Broadcasting with numpy:
>>> a1 = np.array([0,2,2,0])
>>> a2 = np.array([2,2,0,1])
>>> a2[:, np.newaxis] - a1
array([[ 2, 0, 0, 2],
[ 2, 0, 0, 2],
[ 0, -2, -2, 0],
[ 1, -1, -1, 1]])
Something like this?
def all_differences(x, y):
return (a - b for a in y for b in x)
print(list(all_differences([0, 2, 2, 0], [2, 2, 0,1])))
# -> [2, 0, 0, 2, 2, 0, 0, 2, 0, -2, -2, 0, 1, -1, -1, 1]
It just itertates over every item in the second list for every item in the first list, and gives their difference.
This can also be solved with itertools.product and can be generalised for multiple lists:
import itertools
import functools
import operator
difference = functools.partial(functools.reduce, operator.sub)
def all_differences(*lists):
return map(difference, itertools.product(*reversed(lists)))
print(list(all_differences([0, 2, 2, 0], [2, 2, 0,1])))
Or just handling two lists:
import itertools
def all_differences(x, y):
return (b - a for (a, b) in itertools.product((x, y)))
print(list(all_differences([0, 2, 2, 0], [2, 2, 0,1])))

Interpreting (and comparing) output from numpy.correlate

I have looked at this question but it hasn't really given me any answers.
Essentially, how can I determine if a strong correlation exists or not using np.correlate? I expect the same output as I get from matlab's xcorr with the coeff option which I can understand (1 is a strong correlation at lag l and 0 is no correlation at lag l), but np.correlate produces values greater than 1, even when the input vectors have been normalised between 0 and 1.
Example input
import numpy as np
x = np.random.rand(10)
y = np.random.rand(10)
np.correlate(x, y, 'full')
This gives the following output:
array([ 0.15711279, 0.24562736, 0.48078652, 0.69477838, 1.07376669,
1.28020871, 1.39717118, 1.78545567, 1.85084435, 1.89776181,
1.92940874, 2.05102884, 1.35671247, 1.54329503, 0.8892999 ,
0.67574802, 0.90464743, 0.20475408, 0.33001517])
How can I tell what is a strong correlation and what is weak if I don't know the maximum possible correlation value is?
Another example:
In [10]: x = [0,1,2,1,0,0]
In [11]: y = [0,0,1,2,1,0]
In [12]: np.correlate(x, y, 'full')
Out[12]: array([0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0])
Edit: This was a badly asked question, but the marked answer does answer what was asked. I think it is important to note what I have found whilst digging around in this area, you cannot compare outputs from cross-correlation. In other words, it would not be valid to use the outputs from cross-correlation to say signal x is better correlated to signal y than signal z. Cross-correlation does not provide this kind of information
numpy.correlate is under-documented. I think that we can make sense of it, though. Let's start with your sample case:
>>> import numpy as np
>>> x = [0,1,2,1,0,0]
>>> y = [0,0,1,2,1,0]
>>> np.correlate(x, y, 'full')
array([0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0])
Those numbers are the cross-correlations for each of the possible lags. To make that more clear, let's put the lag numbers above the correlations:
>>> np.concatenate((np.arange(-5, 6)[None,...], np.correlate(x, y, 'full')[None,...]), axis=0)
array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[ 0, 0, 1, 4, 6, 4, 1, 0, 0, 0, 0]])
Here, we can see that the cross-correlation reaches its peak at a lag of -1. If you look at x and y above, that makes sense: it one shifts y to the left by one place, it matches x exactly.
To verify this, let's try again, this time shifting y further:
>>> y = [0, 0, 0, 0, 1, 2]
>>> np.concatenate((np.arange(-5, 6)[None,...], np.correlate(x, y, 'full')[None,...]), axis=0)
array([[-5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5],
[ 0, 2, 5, 4, 1, 0, 0, 0, 0, 0, 0]])
Now, the correlation peaks at a lag of -3, meaning that the best match between x and y occurs when y is shifted to the left by 3 places.

Categories