I want to check if all entries of a matrix A within 10 indices of a given entry (x,y) are zero. I think something like this should do it
(numpy.take(A,[x-10:x+10,y-10:y+10]) == 0).all()
but I'm getting a invalid syntax error. Think I'm not constructing the index ranges right, any suggestions?
Don't worry about using take, just index your array like this:
(A[x-10:x+10,y-10:y+10] == 0).all()
A simple boolean check against the entries of the submatrix will do
np.all(A[x-10:x+11,y-10:y+11]==0)
(note the upper index is not included, so I changed to i-10:i+11)
Suppose A is an array of shape (19,19):
import numpy as np
H = W = 19
x, y = 1, 1
N = 10
A = np.random.randint(10, size=(H,W))
Then
In [433]: A[x-N:x+N,y-N:y+N]
Out[433]: array([[4]])
Since x-N is 1-10 = -9, A[x-N:x+N,y-N:y+N] is equivalent to A[-9:11,-9:11],
which is equivalent to A[19-9:11,19-9:11] which is the same as A[10:11,10:11].
So only one value is selected.
That's not giving you "all entries of a matrix A within 10 indices of a given
entry (x,y)".
Instead, you could generate the desired subregion using a boolean mask:
X, Y = np.ogrid[0:H,0:W]
mask = (np.abs(X - x) < N) & (np.abs(Y - y) < N)
Once you have the mask, you can select the subregion where the mask is True using A[mask], and test if every value is zero with
(A[mask] == 0).all()
import numpy as np
np.random.seed(2015)
H = W = 19
x, y = 1, 1
N = 10
A = np.random.randint(10, size=(H,W))
print(A[x-N:x+N,y-N:y+N])
# [[4]]
X, Y = np.ogrid[0:H,0:W]
mask = (np.abs(X - x) < N) & (np.abs(Y - y) < N)
print(mask.astype(int))
# [[1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]]
print(A[mask] == 0).all()
# False
Related
I need to generate the following adjacency matrices:
No of Nodes = 3
A B C AB AC BC
A 0 1 1 0 0 1
B 1 0 1 0 1 0
C 1 1 0 1 0 0
AB 0 0 1 0 0 0
AC 0 1 0 0 0 0
BC 1 0 0 0 0 0
To generate an adjacency matrix for 3 nodes, I can use the code available here, which is
out = np.block([
[1 - np.eye(3), np.eye(3) ],
[ np.eye(3), np.zeros((3, 3))]
]).astype(int)
But it cannot use for different number of nodes, for example if we have 5 nodes then:
No of Nodes = 5
A B C D E AB AC AD AE BC BD BE CD CE DE
A 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1
B 1 0 1 1 1 0 1 1 1 0 0 0 1 1 1
C 1 1 0 1 1 1 0 1 1 0 1 1 0 0 1
D 1 1 1 0 1 1 1 0 1 1 0 1 0 1 0
E 1 1 1 1 0 1 1 1 0 1 1 0 1 0 0
AB 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
AC 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
AD 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0
AE 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0
BC 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0
BD 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0
BE 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0
CD 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0
CE 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0
DE 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0
Is there any simple and easiest way to implement these adjacency matrices?
Inferring the logic of the output from your two examples, I think something like this might do what you want:
import numpy as np
def make_matrix(N, dtype=int):
n_comb = N * (N - 1) // 2
upper_left = 1 - np.eye(N, dtype=dtype)
lower_right = np.zeros((n_comb, n_comb), dtype=dtype)
cross = np.ones((N, N, N), dtype=dtype)
i = np.arange(N)
cross[i, :, i] = 0
cross[:, i, i] = 0
cross = cross[(np.triu_indices(N, k=1))]
return np.block([[upper_left, cross.T],
[cross, lower_right]])
print(make_matrix(3))
[[0 1 1 0 0 1]
[1 0 1 0 1 0]
[1 1 0 1 0 0]
[0 0 1 0 0 0]
[0 1 0 0 0 0]
[1 0 0 0 0 0]]
print(make_matrix(5))
[[0 1 1 1 1 0 0 0 0 1 1 1 1 1 1]
[1 0 1 1 1 0 1 1 1 0 0 0 1 1 1]
[1 1 0 1 1 1 0 1 1 0 1 1 0 0 1]
[1 1 1 0 1 1 1 0 1 1 0 1 0 1 0]
[1 1 1 1 0 1 1 1 0 1 1 0 1 0 0]
[0 0 1 1 1 0 0 0 0 0 0 0 0 0 0]
[0 1 0 1 1 0 0 0 0 0 0 0 0 0 0]
[0 1 1 0 1 0 0 0 0 0 0 0 0 0 0]
[0 1 1 1 0 0 0 0 0 0 0 0 0 0 0]
[1 0 0 1 1 0 0 0 0 0 0 0 0 0 0]
[1 0 1 0 1 0 0 0 0 0 0 0 0 0 0]
[1 0 1 1 0 0 0 0 0 0 0 0 0 0 0]
[1 1 0 0 1 0 0 0 0 0 0 0 0 0 0]
[1 1 0 1 0 0 0 0 0 0 0 0 0 0 0]
[1 1 1 0 0 0 0 0 0 0 0 0 0 0 0]]
The logic here is that each row of the upper-right matrix is like the flattened upper-triangle of a straightforward NxN matrix. For example:
# construct second row of upper-right matrix for 5x5 case:
x = np.ones((5, 5), dtype=int)
x[1] = 0
x[:, 1] = 0
print(x)
# [[1 0 1 1 1]
# [0 0 0 0 0]
# [1 0 1 1 1]
# [1 0 1 1 1]
# [1 0 1 1 1]]
print(x[np.triu_indices(5, k=1)])
# [0 1 1 1 0 0 0 1 1 1]
I have a matrix M with 60,000 rows and 10,000 columns. Each element is either a 0 or 1. For each column, I would like to keep only the first '1' in each row between row x and y; remaining 1s must be replaced with 0. Also note that ranges [x1:y1] and [x2:y2] will never overlap, ie. it will never be that x2 < y1. Also, the ranges yi-xi are not all necessarily the same length for all ranges i.
The issue that makes this problem more tricky is that y depends on which row the 1 is located. For example, let's consider the first column of the matrix M. Suppose there is a 1 on rows=[3,7,9,10,25]. You can then find the y for each row by indexing in y_bound. If y_bound[3]=10 then, you would remove the '1's in column one and replace them with '0' between rows 4 and 9 inclusive. Note that x=3+1=4 and y=10-1=9. You now move on to the next remaining '1', which would be in row 10 as defined by rows=[3,7,9,10,25] (since we do not look at 7 and 9) since those 1s have now been removed. Suppose y_bound[10]=23, then you would replace all '1's in between 11 and 22 inclusive with '0' (which happened to be none in this case).
This has to be done for all columns in the matrix. The good news is that y_bound depends only on which row a 1 is located and not in which column it is. Here is a reproducible example of what I am trying to achieve:
import numpy as np
M = np.random.randint(2, size=(20,10)) # random matrix of 0s and 1s
y_bound = np.array([[4,6,7,8,9,7,11,12,14,16,19,20,20,20,20,20,20,20,20,20]]).T # length = 20
# Replace 1s with 0s
for column_number, col in enumerate(M.transpose()):
rows_with_ones = np.argwhere(col==1)
previous_y = 0
for row in rows_with_ones:
x = int(row) # convert array to integer
if x >= previous_y:
y = int(y_bound[x]) # index into this array
previous_y = y
# Replace 1s between the current row and the row just before that given by y_bound in the line above
M[x+1:y,column_number] = np.where(M[x+1:y,column_number] == 1,0,0)
Any help is much appreciated! It is worth noting that I am running this on a GPU in Google Colab (using Cupy -- the GPU equivalent of numpy). I would ideally like an implementation that does not take too long to run (hopefully without for loops) as this process needs to be repeated 10,000 times.
This solves the problem, but it's just brute force. You can see my attempt at using np.where to return the list of 1s in the column. It does do that, but I don't think it is more efficient. np still has to troll through the whole column, and then you have the issue of handling rows already cleared.
import numpy as np
M = np.random.randint(2, size=(20,10)) # random matrix of 0s and 1s
y_bound = [4,6,7,8,9,7,11,12,14,16,19,20,20,20,20,20,20,20,20,20]
print(M)
print()
#for col in range(x.shape[0]):
# ones = np.where( x[:,col] )
# print( col, ones )
def mine(x):
for col in range(x.shape[1]):
for row in range(x.shape[0]):
if x[row,col]:
print( col, [row+1,y_bound[row]])
x[row+1:y_bound[row],col] = 0
# Replace 1s with 0s
def his(M):
for column_number, col in enumerate(M.transpose()):
rows_with_ones = np.argwhere(col==1)
previous_y = 0
for row in rows_with_ones:
x = row[0]
if x >= previous_y:
y = int(y_bound[x]) # index into this array
previous_y = y
# Replace 1s between the current row and the row just before that given by y_bound in the line above
print( column_number, x+1, y )
M[x+1:y,column_number] = np.where(M[x+1:y,column_number] == 1,0,0)
m = M.copy()
mine(m)
print(m)
print()
m = M.copy()
his(m)
print(m)
Output:
[[1 0 1 1 0 0 0 0 0 1]
[0 1 0 1 0 0 1 1 0 0]
[0 1 1 0 1 1 1 1 1 1]
[0 1 0 1 1 1 1 0 0 0]
[1 0 0 0 0 0 1 1 1 0]
[0 0 0 1 1 1 0 0 0 1]
[0 1 0 1 0 1 0 0 0 0]
[1 0 1 1 0 0 1 1 0 0]
[1 0 1 0 1 0 0 0 1 0]
[1 1 0 0 0 1 1 1 0 0]
[1 1 0 0 1 0 0 0 1 1]
[1 1 0 1 0 0 1 0 0 1]
[1 0 0 0 1 0 1 0 0 0]
[0 1 0 1 0 0 0 1 0 1]
[0 1 0 1 0 0 1 0 1 0]
[1 1 1 1 1 1 0 1 0 0]
[1 1 1 0 1 0 1 0 0 1]
[1 1 0 0 0 1 1 0 0 0]
[0 0 1 0 1 1 0 1 1 0]
[0 0 0 0 0 0 0 1 1 1]]
[[1 0 1 1 0 0 0 0 0 1]
[0 1 0 0 0 0 1 1 0 0]
[0 0 0 0 1 1 0 0 1 0]
[0 0 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 1]
[0 1 0 0 0 0 0 0 0 0]
[0 0 1 1 0 0 1 1 0 0]
[0 0 0 0 1 0 0 0 1 0]
[1 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 1 0 0 0 1 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 1 0 1 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]]
[[1 0 1 1 0 0 0 0 0 1]
[0 1 0 0 0 0 1 1 0 0]
[0 0 0 0 1 1 0 0 1 0]
[0 0 0 0 0 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 1]
[0 1 0 0 0 0 0 0 0 0]
[0 0 1 1 0 0 1 1 0 0]
[0 0 0 0 1 0 0 0 1 0]
[1 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 1 0 0 0]
[0 0 0 1 0 0 0 1 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 1 0 1 0 0 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 1]]
I would like to do something similar to this question, or this other one, but using periodic boundary conditions (wrapping). I'll make a quick example.
Let's say I have the following numpy array:
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 0
0 0 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0
0 0 1 1 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
Then, by using one of the methods proposed in the two linked questions, I am able to extract the bounding box of non-zero values:
0 0 0 1 1 1 1 1
0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 0
1 1 1 1 0 0 0 0
However, if the non-zero elements "cross" the border and come back on the other side, like so:
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0
Then the result is:
1 1 0 0 0 0 0 0 1 1 1
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1 1 1 1 0 0
which is not what I want. I would like the result to be the same as the previous case. I am trying to figure out an intelligent way to do this, but I am stuck. Anybody have ideas?
We can adapt this answer like so:
import numpy as np
def wrapped_bbox(a):
dims = [*range(1,a.ndim)]
bb = np.empty((a.ndim,2),int)
i = 0
while True:
n = a.shape[i]
r = np.arange(1,2*n+1)
ai = np.any(a,axis=tuple(dims))
r1_a = np.where(ai,r.reshape(2,n),0).ravel()
aux = np.maximum.accumulate(r1_a)
aux = r-aux
idx = aux.argmax()
mx = aux[idx]
if mx > n:
bb[i] = 0,n
else:
bb[i] = idx+1, idx+1 - mx
if bb[i,0] >= n:
bb[i,0] -= n
elif bb[i,1] == 0:
bb[i,1] = n
if i == len(dims):
return bb
dims[i] -= 1
i += 1
# example
x = """
......
.x...-
..x...
.....x
"""
x = np.array(x.strip().split())[:,None].view("U1")
x = (x == 'x').view('u1')
print(x)
for r in range(x.shape[1]):
print(wrapped_bbox(np.roll(x,r,axis=1)))
Run:
[[0 0 0 0 0 0] # x
[0 1 0 0 0 0]
[0 0 1 0 0 0]
[0 0 0 0 0 1]]
[[1 4] # bbox vert
[5 3]] # bbox horz, note wraparound (left > right)
[[1 4]
[0 4]] # roll by 1
[[1 4]
[1 5]] # roll by 2
[[1 4]
[2 6]] # etc.
[[1 4]
[3 1]]
[[1 4]
[4 2]]
Short version: I would like to use the values in a 2D array to index the third dimension of a corresponding subset of a larger array - and then increment those elements.
I would appreciate help making the two incorporate_votes algorithms quicker. Actually sliding the classifier over the array and calculating optimal strides is not the point here.
Long version:
I have an algorithm, which classifies each element in R1xC1 2D array as 1 of N classes.
I would like to classify a larger 2D array of size R2xC2. Rather than tessellating the larger array into multiple R1xC1 2D arrays I would like to slide the classifier over the larger array, such that each element in the larger array is classified multiple times. This means that I will have a R2xC2xN array to store the results in, and as the window slides across the large array each pixel in the window will increment one of elements in third dimension (i.e. one of the N classes).
After all the sliding is finished we can simply get the argmax in the dimension corresponding to the classification to get the per element classification.
I intend to scale this up to classify an array of several million pixels with a few dozens so I am concerned with the efficiency of using the classification results to increment one value in the classification dimension per element.
Below is the toy version of the problem I have been crafting all evening in Python3. It has a naive double for loop implementation and a slightly better one obtained by index swizzling and some smart indexing. The classifier is just random.
import numpy as np
map_rows = 8
map_cols = 10
num_candidates = 3
vote_rows = 6
vote_cols = 5
def display_tally(the_tally):
print("{:25s}{:25s}{:25s}".format("Class 0", "Class 1", "Class 2"))
for i in range(map_rows):
for k in range(num_candidates):
for j in range(map_cols):
print("{:<2}".format(the_tally[i, j, k]), end='')
print(" ", end='')
print("")
def incorporate_votes(current_tally, this_vote, left, top):
for i in range(vote_rows):
for j in range(vote_cols):
current_tally[top + i, left + j, this_vote[i, j]] += 1
return current_tally
def incorporate_votes2(current_tally, this_vote, left, top):
for i in range(num_candidates):
current_tally[i, top:top + vote_rows, left:left + vote_cols][this_vote == i] += 1
return current_tally
tally = np.zeros((map_rows, map_cols, num_candidates), dtype=int)
swizzled_tally = np.zeros((num_candidates, map_rows, map_cols), dtype=int)
print("Before voting")
display_tally(tally)
print("\n Votes from classifier A (centered at (2,2))")
votes = np.random.randint(num_candidates, size=vote_rows*vote_cols).reshape((vote_rows, vote_cols))
print(votes)
tally = incorporate_votes(tally, votes, 0, 0)
swizzled_tally = incorporate_votes2(swizzled_tally, votes, 0, 0)
print("\nAfter classifier A voting (centered at (2,2))")
display_tally(tally)
print("\n Votes from classifier B (Centered at (5, 4))")
votes2 = np.random.randint(num_candidates, size=vote_rows*vote_cols).reshape((vote_rows, vote_cols))
print(votes2)
tally = incorporate_votes(tally, votes2, 3, 2)
swizzled_tally = incorporate_votes2(swizzled_tally, votes2, 3, 2)
print("\nAfter classifier B voting (Centered at (5, 4))")
print("Naive vote counting")
display_tally(tally)
print("\nSwizzled vote counting")
display_tally(np.moveaxis(swizzled_tally, [-2, -1], [0, 1]))
new_tally = np.moveaxis(tally, -1, 0)
classifications = np.argmax(swizzled_tally, axis=0)
print("\nNaive classifications")
print(classifications)
print("\nSwizzled classifications")
classifications = np.argmax(tally, axis=2)
print(classifications)
And some sample output:
Before voting
Class 0 Class 1 Class 2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Votes from classifier A (centered at (2,2))
[[1 1 2 2 1]
[0 2 0 2 1]
[0 2 2 0 2]
[1 1 1 2 0]
[1 0 0 2 1]
[2 1 1 1 0]]
After classifier A voting (centered at (2,2))
Class 0 Class 1 Class 2
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 1 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Votes from classifier B (Centered at (5, 4))
[[2 2 2 0 0]
[0 1 2 1 2]
[2 0 0 2 0]
[2 2 1 1 1]
[1 2 0 2 1]
[1 1 1 1 2]]
After classifier B voting (Centered at (5, 4))
Naive vote counting
Class 0 Class 1 Class 2
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 1 0 0 0 0
0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0
0 1 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
Swizzled vote counting
Class 0 Class 1 Class 2
0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0
1 0 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 1 0 0 0 0
0 0 0 1 1 0 0 0 0 0 1 1 1 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0
0 1 1 0 1 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 2 0 0 1 0 0 0
0 0 0 0 1 0 0 0 0 0 0 1 1 1 0 1 1 1 0 0 1 0 0 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0
Naive classifications
[[1 1 2 2 1 0 0 0 0 0]
[0 2 0 2 1 0 0 0 0 0]
[0 2 2 0 2 2 0 0 0 0]
[1 1 1 0 0 2 1 2 0 0]
[1 0 0 2 0 0 2 0 0 0]
[2 1 1 1 0 1 1 1 0 0]
[0 0 0 1 2 0 2 1 0 0]
[0 0 0 1 1 1 1 2 0 0]]
Swizzled classifications
[[1 1 2 2 1 0 0 0 0 0]
[0 2 0 2 1 0 0 0 0 0]
[0 2 2 0 2 2 0 0 0 0]
[1 1 1 0 0 2 1 2 0 0]
[1 0 0 2 0 0 2 0 0 0]
[2 1 1 1 0 1 1 1 0 0]
[0 0 0 1 2 0 2 1 0 0]
[0 0 0 1 1 1 1 2 0 0]]
I would like to scan through sequences and return the value either 1 or 0 to indicate whether they are present or absent. For example: XYZXYZ
X Y Z X Y Z
1 0 0 1 0 0 - X
0 1 0 0 1 0 - Y
0 0 1 0 0 1 - Z
0 0 0 0 0 0 - XX
1 1 0 1 1 0 - XY
0 0 0 0 0 0 - XZ
0 0 0 0 0 0 - YX
0 0 0 0 0 0 - YY
0 1 1 0 1 1 - YZ
0 0 1 1 0 1 - ZX
0 0 0 0 0 0 - ZY
0 0 0 0 0 0 - ZZ
For two elements like XY, while scanning two elements at position X it will be given value one and when scanning at position Y, it will be given value one as well.
The example code below only scans one element at a time. When I replaced this line of code,
CHARS = ['X','Y','Z']
to
CHARS = ['X','Y','Z','XX','XY','XZ',...,'ZZ']
It can't read two elements.
The code below returns binary values in one line starting from X first and then Y and then followed by Z.
import numpy as np
seqs = ["XYZXYZ","YZYZYZ"]
CHARS = ['X','Y','Z']
CHARS_COUNT = len(CHARS)
maxlen = max(map(len, seqs))
res = np.zeros((len(seqs), CHARS_COUNT * maxlen), dtype=np.uint8)
for si, seq in enumerate(seqs):
seqlen = len(seq)
arr = np.chararray((seqlen,), buffer=seq)
for ii, char in enumerate(CHARS):
res[si][ii*seqlen:(ii+1)*seqlen][arr == char] = 1
print res
Example output of the code above:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1]]
How to enable it scan one element first and then followed by two elements?
Expected output:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0]]
I'm not sure if I completely get all the details, but this is what I'd do
seqs = ['xyzxyz', 'yzyzyz']
chars = ['x','y','z','xx','xy','xz','yx','yy','yz','zx','zy','zz']
N = len(chars)
out = []
for i, seq in enumerate(seqs):
M = len(seq) # if different seqs have different lenghts, this will break!
tmp = np.array([], dtype=int)
for c in chars:
o = np.array([0]*M)
index = -1
try:
while True:
index = seq.index(c, index+1)
o[index:(index+len(c))] = 1
except ValueError:
pass
finally:
tmp = np.r_[tmp, o]
out.append(tmp)
out = np.array(out)