In a 2D Numpy Array find max streak of consecutive 1's - python

I have a 2d numpy array like so. I want to find the maximum consecutive streak of 1's for every row.
a = np.array([[1, 1, 1, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 0, 1, 0],
[0, 0, 0, 0, 0],
[1, 1, 1, 0, 1],
[1, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[1, 0, 1, 1, 0],
]
)
Desired Output: [5, 1, 2, 0, 3, 1, 2, 2]
I have found the solution to above for a 1D array:
a = np.array([1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0])
d = np.diff(np.concatenate(([0], a, [0])))
np.max(np.flatnonzero(d == -1) - np.flatnonzero(d == 1))
> 4
On similar lines, I wrote the following but it doesn't work.
d = np.diff(np.column_stack(([0] * a.shape[0], a, [0] * a.shape[0])))
np.max(np.flatnonzero(d == -1) - np.flatnonzero(d == 1))

The 2D equivalent of you current code would be using pad, diff, where and maximum.reduceat:
# pad with a column of 0s on left/right
# and get the diff on axis=1
d = np.diff(np.pad(a, ((0,0), (1,1)), constant_values=0), axis=1)
# get row/col indices of -1
row, col = np.where(d==-1)
# get groups of rows
val, idx = np.unique(row, return_index=True)
# subtract col indices of -1/1 to get lengths
# use np.maximum.reduceat to get max length per group of rows
out = np.zeros(a.shape[0], dtype=int)
out[val] = np.maximum.reduceat(col-np.where(d==1)[1], idx)
Output: array([5, 1, 2, 0, 3, 1, 2, 2])
Intermediates:
np.pad(a, ((0,0), (1,1)), constant_values=0)
array([[0, 1, 1, 1, 1, 1, 0],
[0, 1, 0, 1, 0, 1, 0],
[0, 1, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 0, 1, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 1, 0, 1, 1, 0, 0]])
np.diff(np.pad(a, ((0,0), (1,1)), constant_values=0), axis=1)
array([[ 1, 0, 0, 0, 0, -1],
[ 1, -1, 1, -1, 1, -1],
[ 1, 0, -1, 1, -1, 0],
[ 0, 0, 0, 0, 0, 0],
[ 1, 0, 0, -1, 1, -1],
[ 1, -1, 0, 0, 0, 0],
[ 0, 1, 0, -1, 0, 0],
[ 1, -1, 1, 0, -1, 0]])
np.where(d==-1)
(array([0, 1, 1, 1, 2, 2, 4, 4, 5, 6, 7, 7]),
array([5, 1, 3, 5, 2, 4, 3, 5, 1, 3, 1, 4]))
col-np.where(d==1)[1]
array([5, 1, 1, 1, 2, 1, 3, 1, 1, 2, 1, 2])
np.unique(row, return_index=True)
(array([0, 1, 2, 4, 5, 6, 7]),
array([ 0, 1, 4, 6, 8, 9, 10]))
out = np.zeros(a.shape[0], dtype=int)
array([0, 0, 0, 0, 0, 0, 0, 0])
out[val] = np.maximum.reduceat(col-np.where(d==1)[1], idx)
array([5, 1, 2, 0, 3, 1, 2, 2])

Related

Concatenate all 2 dimensional values in a dictionary. (Output is Torch tensor)

I want to concatenate all 2 dimensional values in a dictionary.
The number of rows of these values is always the same.
D = {'a': [[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0]],
'b': [[1, 1],
[1, 1],
[1, 1]],
'c': [[2, 2, 2, 2],
[2, 2, 2, 2],
[2, 2, 2, 2]]
}
And the output must be form of a torch tensor.
tensor([[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2]])
Any help would be appreciated!!
import torch
print(torch.cat(tuple([torch.tensor(D[name]) for name in D.keys()]), dim=1))
Output:
tensor([[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2]])
from itertools import chain
l = []
for i in range(len(D)):
t = [ D[k][i] for k in D ]
l.append( list(chain.from_iterable(t)) )
Output:
[[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2],
[0, 0, 0, 0, 0, 1, 1, 2, 2, 2, 2]]

Negative Binomial GLM in R and Python - Differences in coefficients

I'm trying to implement a negative binomial regression model in R which was originally implemented in Python with statsmodels.
In some cases, the estimated coefficients are very similar:
Python code:
import statsmodels.api as sm
import numpy as np
response = [17, 18, 10, 9, 8, 5, 6, 5, 15351, 9637, 9981, 9306, 16752, 11993, 13622, 9800]
design = np.array(
[[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 1, 1],
[1, 0, 1, 1],
[1, 0, 1, 1],
[1, 0, 1, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 0, 0, 1]])
theta = 0.00792199771866427
sm.GLM(response, design, family=sm.families.NegativeBinomial(alpha=theta)).fit().params
Which gives:
array([ 1.79175947, 0.97496014, -0.16402993, 7.68415156])
And the equivalent model in R:
library(MASS)
response = c(17, 18, 10, 9, 8, 5, 6, 5, 15351, 9637, 9981, 9306, 16752, 11993, 13622, 9800)
design = as.data.frame(matrix(c(1, 1, 1, 1, 1, 1, 1, 1,1 , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1), ncol = 4, byrow = F))
theta = 0.00792199771866427
coef(glm(response ~. +0, design, family = negative.binomial(theta)))
Which gives:
V1 V2 V3 V4
1.7917595 0.9749601 -0.1640299 7.6841516
So for these two models, the estimated coefficients are very similar, down to the second decimal place. I am also fitting a reduced model dropping the second column of the model matrix, however, here the estimated coefficients are quite different between R and Python.
Python
sm.GLM(response, design[:, [0,2,3]], family=sm.families.NegativeBinomial(alpha=theta)).fit().params
array([ 2.32804838, -0.10095997, 7.11684136])
R
coef(glm(response ~. +0, design[, c(1,3,4)] , family = negative.binomial(theta)))
V1 V3 V4
2.0650897 0.3232355 7.1965690
Why does this occur? I have noticed this characteristic for a number of different feature sets.

alternating values in numpy

Trying to make my code more efficient and readable and i'm stuck. Assume I want to build something like a chess board, with alternating black and white colors on an 8x8 grid. So, using numpy, I have done this:
import numpy as np
board = np.zeros((8,8), np.int32)
for ri in range(8):
for ci in range(8):
if (ci + ri) % 2 == 0:
board[ri,ci] = 1
Which nicely outputs:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]], dtype=int32)
That I can then parse as white squares or black squares. However, in practice my array is much larger, and this way is very inefficient and unreadable. I assumed numpy already has this figured out, so I tried this:
board = np.zeros(64, np.int32)
board[::2] = 1
board = board.reshape(8,8)
But that output is wrong, and looks like this:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0]], dtype=int32)
Is there a better way to achieve what I want that works efficiently (and preferably, is readable)?
Note: i'm not attached to 1's and 0's, this can easily be done with other types of values, even True/False or strings of 2 kinds, as long as it works
Here's one approach using slicing with proper starts and stepsize of 2 in two steps -
board = np.zeros((8,8), np.int32)
board[::2,::2] = 1
board[1::2,1::2] = 1
Sample run -
In [229]: board = np.zeros((8,8), np.int32)
...: board[::2,::2] = 1
...: board[1::2,1::2] = 1
...:
In [230]: board
Out[230]:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]], dtype=int32)
Other tricky ways -
1) Broadcasted comparison :
In [254]: r = np.arange(8)%2
In [255]: (r[:,None] == r)*1
Out[255]:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])
2) Broadcasted addition :
In [279]: r = np.arange(8)
In [280]: 1-(r[:,None] + r)%2
Out[280]:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])
Just found out an alternative answer by myself, so posting it here for future reference to anyone who's interested:
a = np.array([[1,0],[0,1]])
b = np.tile(a, (4,4))
Results:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])
I think the following is also a good way of doing it for a variable input
import sys
lines = sys.stdin.readlines()
n = int(lines[0])
import numpy as np
a = np.array([[1,0], [0,1]],dtype=np.int)
outputData= np.tile(a,(n//2,n//2))
print(outputData)
You can achieve this for single even input number n
import numpy as np
i = np.eye(2)
i = i[::-1]
k = np.array(i, dtype = np.int)
print(np.tile(k,(n//2,n//2)))
I tried and found this to be shorter one for any giver number:
n = int(input())
import numpy as np
c = np.array([[0,1], [1, 0]])
print(np.tile(c, reps=(n//2, n//2)))

Modify every two largest elements of matrix rows and columns

In python, I have a matrix and I want to find the two largest elements in every row and every column and change their values to 1 (seperately, I mean get two matrices where one of them modified the rows and the other modified the cols).
The main goal is to get a corresponding matrix with zeros everywhere except those ones I've put in the 2 largest element of each row and column (using np.where(mat == 1, 1, 0).
I'm trying to use the np.argpartition in order to do so but without success.
Please help.
See image below.
Here's an approach with np.argpartition -
idx_row = np.argpartition(-a,2,axis=1)[:,:2]
out_row = np.zeros(a.shape,dtype=int)
out_row[np.arange(idx_row.shape[0])[:,None],idx_row] = 1
idx_col = np.argpartition(-a,2,axis=0)[:2]
out_col = np.zeros(a.shape,dtype=int)
out_col[idx_col,np.arange(idx_col.shape[1])] = 1
Sample input, output -
In [40]: a
Out[40]:
array([[ 3, 7, 1, -5, 14, 2, 8],
[ 5, 8, 1, 4, -3, 3, 10],
[11, 3, 5, 1, 9, 2, 5],
[ 6, 4, 12, 6, 1, 15, 4],
[ 8, 2, 0, 1, -2, 3, 5]])
In [41]: out_row
Out[41]:
array([[0, 0, 0, 0, 1, 0, 1],
[0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 1]])
In [42]: out_col
Out[42]:
array([[0, 1, 0, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0]])
Alternatively, if you are into compact codes, we can skip the initialization and use broadcasting to get the outputs from idx_row and idx_col directly, like so -
out_row = (idx_row[...,None] == np.arange(a.shape[1])).any(1).astype(int)
out_col = (idx_col[...,None] == np.arange(a.shape[0])).any(0).astype(int).T

How to get the length of repeated numbers column wise?

I am trying to get the length of repeated numbers in Python Numpy. For example, let's consider a simple ndarray
import numpy as np
a = np.array([
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 1, 1, 1, 0, 1],
[0, 1, 0, 1, 0, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 1, 1, 1, 0, 0],
])
The first column has [0, 1, 0, 1], the position of 1 is 1, now start counting from there, we get ones = 2 and zeros = 1. So I have to start counting ones and zeros when 1 is encountered (starting position).
so the answer for a would be
ones = [2, 2, 1, 1, 1, 3, 2, 2, 1, 1]
zeros = [1, 0, 2, 1, 0, 0, 1, 1, 1, 2]
Can any one please help me out?
Update
3D array:
a = np.array([
[
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 1, 0, 0, 0, 1, 1, 1, 0, 0],
[0, 1, 0, 0, 0, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 1, 1, 1, 0, 0],
],
[
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 0, 0, 1, 0, 0, 0, 1, 1],
[0, 1, 0, 1, 0, 0, 0, 1, 0, 0],
[1, 1, 0, 1, 0, 1, 1, 1, 0, 0],
]
])
The expected output should be
ones = [
[2, 3, 0, 0, 1, 3, 2, 2, 1, 0],
[1, 3, 0, 2, 1, 1, 1, 2, 1, 1]
]
zeros = [
[1, 0, 0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 2, 0, 0, 0, 2, 2]
]
With focus on performance, here's one generic approach for ndarrays -
ones_count = a.sum(-2)
zeros_count = (a.shape[-2] - ones_count - a.argmax(-2))*a.any(-2)
One alternative to get zeros_count with selections using np.where, would be -
zeros_count = np.where(a.any(-2),a.shape[-2] - ones_count - a.argmax(-2),0)
Sample runs
2D case :
In [60]: a
Out[60]:
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 0, 0, 1, 1, 1, 0, 1],
[0, 1, 0, 1, 0, 1, 0, 0, 1, 0],
[1, 1, 0, 0, 1, 1, 1, 1, 0, 0]])
In [61]: ones_count = a.sum(-2)
...: zeros_count = (a.shape[-2] - ones_count - a.argmax(-2))*a.any(-2)
...:
In [62]: ones_count
Out[62]: array([2, 2, 1, 1, 1, 3, 2, 2, 1, 1])
In [63]: zeros_count
Out[63]: array([1, 0, 2, 1, 0, 0, 1, 1, 1, 2])
3D case :
In [65]: a = np.array([
...: [
...: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...: [1, 1, 0, 0, 0, 1, 1, 1, 0, 0],
...: [0, 1, 0, 0, 0, 1, 0, 0, 1, 0],
...: [1, 1, 0, 0, 1, 1, 1, 1, 0, 0],
...: ],
...: [
...: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
...: [0, 1, 0, 0, 1, 0, 0, 0, 1, 1],
...: [0, 1, 0, 1, 0, 0, 0, 1, 0, 0],
...: [1, 1, 0, 1, 0, 1, 1, 1, 0, 0],
...: ]
...: ])
In [66]: ones_count = a.sum(-2)
...: zeros_count = (a.shape[-2] - ones_count - a.argmax(-2))*a.any(-2)
...:
In [67]: ones_count
Out[67]:
array([[2, 3, 0, 0, 1, 3, 2, 2, 1, 0],
[1, 3, 0, 2, 1, 1, 1, 2, 1, 1]])
In [68]: zeros_count
Out[68]:
array([[1, 0, 0, 0, 0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 2, 0, 0, 0, 2, 2]])
and so on for higher dim arrays.

Categories