Create a permutation with same autocorrelation - python
My question is similar to this one, but with the difference that I need an array of zeros and ones as output. I have an original time series of zeroes and ones with high autocorrelation (i.e., the ones are clustered). For some significance-testing I need to create random arrays with the same number of zeroes and ones. I.e. permutations of the original array, however, also the autocorrelation should stay the same/similar to the original so a simple np.permutation does not help me.
Since I'm doing multiple realizations I would need a solution which is as fast as possible. Any help is much appreciated.
According to the question to which you refer, you would like to permute x such that
np.corrcoef(x[0: len(x) - 1], x[1: ])[0][1]
doesn't change.
Say the sequence x is composed of
z1 o1 z2 o2 z3 o3 ... zk ok,
where each zi is a sequence of 0s, and each oi is a sequence of 1s. (There are four cases, depending on whether the sequence starts with 0s or 1s, and whether it ends with 0s or 1s, but they're all the same in principle).
Suppose p and q are each permutations of {1, ..., k}, and consider the sequence
zp[1] oq[1] zp[2] oq[2] zp[3] oq[3] ... zp[k] oq[k],
that is, each of the run-length sub-sequences of 0s and 1s have been permuted internally.
For example, suppose the original sequence is
0, 0, 0, 1, 1, 0, 1.
Then
0, 0, 0, 1, 0, 1, 1,
is such a permutation, as well as
0, 1, 1, 0, 0, 0, 1,
and
0, 1, 0, 0, 0, 1, 1.
Performing this permutation will not change the correlation:
within each run, the differences are the same
the boundaries between the runs are the same as before
Therefore, this gives a way to generate permutations which do not affect the correlation. (Also, see at the end another far simpler and more efficient way which can work in many common cases.)
We start with the function preprocess, which takes the sequence, and returns a tuple starts_with_zero, zeros, ones, indicating, respectively,
whether x began with 0
The 0 runs
The 1 runs
In code, this is
import numpy as np
import itertools
def preprocess(x):
def find_runs(x, val):
matches = np.concatenate(([0], np.equal(x, val).view(np.int8), [0]))
absdiff = np.abs(np.diff(matches))
ranges = np.where(absdiff == 1)[0].reshape(-1, 2)
return ranges[:, 1] - ranges[:, 0]
starts_with_zero = x[0] == 0
run_lengths_0 = find_runs(x, 0)
run_lengths_1 = find_runs(x, 1)
zeros = [np.zeros(l) for l in run_lengths_0]
ones = [np.ones(l) for l in run_lengths_1]
return starts_with_zero, zeros, ones
(This function borrows from an answer to this question.)
To use this function, you could do, e.g.,
x = (np.random.uniform(size=10000) > 0.2).astype(int)
starts_with_zero, zeros, ones = preprocess(x)
Now we write a function to permute internally the 0 and 1 runs, and concatenate the results:
def get_next_permutation(starts_with_zero, zeros, ones):
np.random.shuffle(zeros)
np.random.shuffle(ones)
if starts_with_zero:
all_ = itertools.izip_longest(zeros, ones, fillvalue=np.array([]))
else:
all_ = itertools.izip_longest(ones, zeros, fillvalue=np.array([]))
all_ = [e for p in all_ for e in p]
x_tag = np.concatenate(all_)
return x_tag
To generate another permutation (with same correlation), you would use
x_tag = get_next_permutation(starts_with_zero, zeros, ones)
To generate many permutations, you could do:
starts_with_zero, zeros, ones = preprocess(x)
for i in range(<number of permutations needed):
x_tag = get_next_permutation(starts_with_zero, zeros, ones)
Example
Suppose we run
x = (np.random.uniform(size=10000) > 0.2).astype(int)
print np.corrcoef(x[0: len(x) - 1], x[1: ])[0][1]
starts_with_zero, zeros, ones = preprocess(x)
for i in range(10):
x_tag = get_next_permutation(starts_with_zero, zeros, ones)
print x_tag[: 50]
print np.corrcoef(x_tag[0: len(x_tag) - 1], x_tag[1: ])[0][1]
Then we get:
0.00674330566615
[ 1. 1. 1. 1. 1. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 0.
1. 1. 0. 1. 1. 1. 1. 0. 1. 1. 0. 0. 1. 0. 1. 1. 1. 1.
0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
0.00674330566615
[ 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0.
1. 1. 1. 1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1.]
0.00674330566615
[ 1. 1. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 0. 1. 1.
1. 1. 1. 1. 1. 1. 0. 1. 0. 0. 1. 1. 1. 0.]
0.00674330566615
[ 1. 1. 1. 1. 0. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 0.
1. 1. 1. 1. 1. 0. 0. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1.
1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0. 1.]
0.00674330566615
[ 1. 1. 1. 1. 0. 0. 0. 0. 1. 1. 0. 1. 1. 0. 0. 1. 0. 1.
1. 1. 0. 1. 0. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 0. 0. 1.
0. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1. 1. 1. 1.]
0.00674330566615
[ 1. 1. 0. 1. 1. 1. 0. 0. 1. 1. 0. 1. 1. 0. 0. 1. 1. 0.
1. 1. 1. 0. 1. 1. 1. 1. 0. 0. 0. 1. 1. 1. 1. 1. 1. 1.
0. 1. 1. 1. 1. 0. 1. 1. 0. 1. 0. 0. 1. 1.]
0.00674330566615
[ 1. 1. 0. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1. 1. 1.
1. 1. 0. 1. 0. 1. 1. 0. 1. 0. 1. 1. 1. 1.]
0.00674330566615
[ 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 0. 1. 1. 0. 1. 0. 1. 1.
1. 1. 1. 0. 1. 0. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1. 1. 0.
0. 1. 1. 1. 0. 1. 1. 0. 1. 1. 0. 1. 1. 1.]
0.00674330566615
[ 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1.
0. 1. 1. 1. 1. 1. 1. 0. 1. 1. 0. 1. 1. 1.]
0.00674330566615
[ 1. 1. 0. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 0. 1. 0. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 0. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 0. 1. 0. 1. 0. 1. 1. 1. 1. 1. 1. 0.]
Note that there is a much simpler solution if
your sequence is of length n,
some number m has m << n, and
m! is much larger than the number of permutations you need.
In this case, simply divide your sequence into m (approximately) equal parts, and permute them randomly. As noted before, only the m - 1 boundaries change in a way that potentially affects the correlations. Since m << n, this is negligible.
For some numbers, say you have a sequence with 10000 elements. It is known that 20! = 2432902008176640000, which is far far more permutations than you need, probably. By dividing your sequence into 20 parts and permuting, you're affecting at most 19 / 10000 with might be small enough. For these sizes, this is the method I'd use.
Related
How to modify values of a column of a 2D tensor based on condition - Tensorflow?
I have a 2D tensor and want values of its last column to be 0 if values > 0 and 1 otherwise. It should behave somewhat similar to the following block of numpy code: x = np.random.rand(8, 4) x[:, -1] = np.where(x[:, -1] > 0, 0, 1) Is there a way to achieve the same behavior for a 2D tensor in Tensorflow?
This might not be the most elegant solution, but it works: x=tf.ones((5,10)) rows=tf.stack(tf.range(tf.shape(x)[0])) column=tf.ones_like(rows)*tf.shape(x)[1]-1 idx=tf.stack((rows,column),axis=1) x_new=tf.tensor_scatter_nd_update(x, idx, tf.where(x[:, -1] > 0, 0., 1.)) print(x_new) And the result looks like this (the original x is a tf.ones): [[1. 1. 1. 1. 1. 1. 1. 1. 1. 0.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 0.]]
random number from 0-100 opposite numbers in the upper triangle and the one tringle in symmetrical matrix
I made the NxN matrix with Zeros and Ones and symmetrical and diagonal = 0. Now I want to make another matrix. Instead of the one in the matrix, I put a random number from 0-100 opposite numbers in the upper triangle and the one tringle have the same value as in the picture and I want to do this to all ones in the new matrix Thank You enter image description here
All you should need to do is generate an NxN array of random numbers and multiply: import numpy as np N = 7 base = np.zeros((N,N)) for _ in range(15): a = np.random.randint(N) b = np.random.randint(N) if a != b: base[a,b] = 1 base[b,a] = 1 print(base) # Fetch the location of the 1s. ones = np.argwhere(base==1) ones = ones[ones[:,0] < ones[:,1],:] # Assign random values. for a,b in ones: base[a,b] = base[b,a] = np.random.randint(100) print(base) Note that my array creation is just for this example. You said you already have the 1/0 matrix so I'm not worried about that part. Output: [[0. 1. 0. 1. 1. 1. 1.] [1. 0. 1. 0. 1. 1. 0.] [0. 1. 0. 1. 1. 0. 0.] [1. 0. 1. 0. 1. 0. 1.] [1. 1. 1. 1. 0. 0. 1.] [1. 1. 0. 0. 0. 0. 0.] [1. 0. 0. 1. 1. 0. 0.]] [[ 0. 37. 0. 7. 43. 40. 54.] [37. 0. 45. 0. 87. 40. 0.] [ 0. 45. 0. 74. 8. 0. 0.] [ 7. 0. 74. 0. 47. 0. 75.] [43. 87. 8. 47. 0. 0. 41.] [40. 40. 0. 0. 0. 0. 0.] [54. 0. 0. 75. 41. 0. 0.]]
Generating incorrect graphs from adjacency matrices using graph-tool on Python
I am trying to generate a graph from an adjacency matrix. I know it is something that has already been asked here but I can't get to generate one correctly. My code is import numpy as np import graph_tool.all as gt L = 10; p = 0.6 Adj = np.zeros((L,L)) for i in range(0,L): for j in range(i+1,L): if np.random.rand() < p: Adj[i,j] = 1 Adj = Adj + np.transpose(Adj) print('Adjacency matrix is \n', Adj) g = gt.Graph(directed=False) g.add_edge_list(Adj.nonzero()) gt.graph_draw(g, vertex_text=g.vertex_index, output="two-nodes.pdf") It generates an adjacency matrix with each connection happening with a probability of 60%. One result is Adjacency matrix is [[0. 1. 1. 0. 1. 0. 1. 1. 1. 0.] [1. 0. 1. 1. 1. 1. 1. 0. 1. 1.] [1. 1. 0. 1. 1. 0. 1. 1. 1. 0.] [0. 1. 1. 0. 1. 1. 1. 0. 1. 1.] [1. 1. 1. 1. 0. 1. 1. 1. 0. 1.] [0. 1. 0. 1. 1. 0. 0. 0. 1. 0.] [1. 1. 1. 1. 1. 0. 0. 1. 0. 1.] [1. 0. 1. 0. 1. 0. 1. 0. 0. 0.] [1. 1. 1. 1. 0. 1. 0. 0. 0. 1.] [0. 1. 0. 1. 1. 0. 1. 0. 1. 0.]] But I don't know why the graphical result is this one which is clearly incorrect.
As stated in add_edge_list docs, you need an iterator of (source, target) pairs where both source and target are vertex indexes, or a numpy.ndarray of shape (E,2), where E is the number of edges, and each line specifies a (source, target) pair In your case, you're passing a single tuple (check the result of Adj.nonzero()). To fix it, just try this: g.add_edge_list(np.transpose(Adj.nonzero()))
Convert for loop into numpy array
for timeprojection in range(100): for term in range(8): zerocouponbondprice[timeprojection,term] = zerocouponbondprice[timeprojection-1,term-1]*cashflow[timeprojection,term] How can I convert something like this into numpy array form, so that I can reduce two for loop to increase the speed? (If timeprojection and term are dynamic numbers.)
You can construct the numpy array from a nested list comprehension import numpy as np zerocouponbondprice = np.array([[k * l for k,l in zip(i,j)] for i,j in zip(zerocouponbondprice, cashflow[1:])])
If I get the question right, you can replace the two loops / ranges by using appropriate indexing. A simplified example: import numpy as np # these would be your input arrays zerocouponbondprice and cashflow: arr0, arr1 = np.ones((10,10)), np.ones((10,10)) # these would be your ranges: idx0, idx1 = 3, 9 # now you can do the calculation as simple as arr0[idx0:idx1, idx0:idx1] = arr0[idx0-1:idx1-1, idx0-1:idx1-1] + arr1[idx0:idx1, idx0:idx1] print(arr0) [[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.] [1. 1. 1. 2. 2. 2. 2. 2. 2. 1.] [1. 1. 1. 2. 2. 2. 2. 2. 2. 1.] [1. 1. 1. 2. 2. 2. 2. 2. 2. 1.] [1. 1. 1. 2. 2. 2. 2. 2. 2. 1.] [1. 1. 1. 2. 2. 2. 2. 2. 2. 1.] [1. 1. 1. 2. 2. 2. 2. 2. 2. 1.] [1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
Update 3 and 4 dimension elements of numpy array
I have a numpy array of shape [12, 8, 5, 5]. I want to modify the values of 3rd and 4th dimension for each element. For e.g. import numpy as np x = np.zeros((12, 80, 5, 5)) print(x[0,0,:,:]) Output: [[ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.] [ 0. 0. 0. 0. 0.]] Modify values: y = np.ones((5,5)) x[0,0,:,:] = y print(x[0,0,:,:]) Output: [[ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.] [ 1. 1. 1. 1. 1.]] I can modify for all x[i,j,:,:] using two for loops. But, I was wondering if there is any pythonic way to do it without running two loops. Just curious to know :) UPDATE Actual use case: dict_weights = copy.deepcopy(combined_weights) for i in range(0, len(combined_weights[each_layer][:, 0, 0, 0])): for j in range(0, len(combined_weights[each_layer][0, :, 0, 0])): # Extract 5x5 trans_weight = combined_weights[each_layer][i,j] trans_weight = np.fliplr(np.flipud(trans_weight )) # Update dict_weights[each_layer][i, j] = trans_weight NOTE: The dimensions i, j of combined_weights can vary. There are around 200 elements in this list with varied i and j dimensions, but 3rd and 4th dimensions are always same (i.e. 5x5). I just want to know if I can updated the elements combined_weights[:,:,5, 5] with transposed values without running 2 for loops. Thanks.
Simply do - dict_weights[each_layer] = combined_weights[each_layer][...,::-1,::-1]