Negative Binomial GLM in R and Python - Differences in coefficients

Negative Binomial GLM in R and Python - Differences in coefficients - python

I'm trying to implement a negative binomial regression model in R which was originally implemented in Python with statsmodels.
In some cases, the estimated coefficients are very similar:
Python code:
import statsmodels.api as sm
import numpy as np
response = [17, 18, 10, 9, 8, 5, 6, 5, 15351, 9637, 9981, 9306, 16752, 11993, 13622, 9800]
design = np.array(
[[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 0, 0],
[1, 0, 1, 1],
[1, 0, 1, 1],
[1, 0, 1, 1],
[1, 0, 1, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 0, 0, 1],
[1, 0, 0, 1]])
theta = 0.00792199771866427
sm.GLM(response, design, family=sm.families.NegativeBinomial(alpha=theta)).fit().params
Which gives:
array([ 1.79175947, 0.97496014, -0.16402993, 7.68415156])
And the equivalent model in R:
library(MASS)
response = c(17, 18, 10, 9, 8, 5, 6, 5, 15351, 9637, 9981, 9306, 16752, 11993, 13622, 9800)
design = as.data.frame(matrix(c(1, 1, 1, 1, 1, 1, 1, 1,1 , 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1,
1, 1, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 1, 1, 1, 1, 1, 1, 1, 1), ncol = 4, byrow = F))
theta = 0.00792199771866427
coef(glm(response ~. +0, design, family = negative.binomial(theta)))
Which gives:
V1 V2 V3 V4
1.7917595 0.9749601 -0.1640299 7.6841516
So for these two models, the estimated coefficients are very similar, down to the second decimal place. I am also fitting a reduced model dropping the second column of the model matrix, however, here the estimated coefficients are quite different between R and Python.
Python
sm.GLM(response, design[:, [0,2,3]], family=sm.families.NegativeBinomial(alpha=theta)).fit().params
array([ 2.32804838, -0.10095997, 7.11684136])
R
coef(glm(response ~. +0, design[, c(1,3,4)] , family = negative.binomial(theta)))
V1 V3 V4
2.0650897 0.3232355 7.1965690
Why does this occur? I have noticed this characteristic for a number of different feature sets.

Related

In a 2D Numpy Array find max streak of consecutive 1's

I have a 2d numpy array like so. I want to find the maximum consecutive streak of 1's for every row.
a = np.array([[1, 1, 1, 1, 1],
[1, 0, 1, 0, 1],
[1, 1, 0, 1, 0],
[0, 0, 0, 0, 0],
[1, 1, 1, 0, 1],
[1, 0, 0, 0, 0],
[0, 1, 1, 0, 0],
[1, 0, 1, 1, 0],
]
)
Desired Output: [5, 1, 2, 0, 3, 1, 2, 2]
I have found the solution to above for a 1D array:
a = np.array([1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0])
d = np.diff(np.concatenate(([0], a, [0])))
np.max(np.flatnonzero(d == -1) - np.flatnonzero(d == 1))
> 4
On similar lines, I wrote the following but it doesn't work.
d = np.diff(np.column_stack(([0] * a.shape[0], a, [0] * a.shape[0])))
np.max(np.flatnonzero(d == -1) - np.flatnonzero(d == 1))

The 2D equivalent of you current code would be using pad, diff, where and maximum.reduceat:
# pad with a column of 0s on left/right
# and get the diff on axis=1
d = np.diff(np.pad(a, ((0,0), (1,1)), constant_values=0), axis=1)
# get row/col indices of -1
row, col = np.where(d==-1)
# get groups of rows
val, idx = np.unique(row, return_index=True)
# subtract col indices of -1/1 to get lengths
# use np.maximum.reduceat to get max length per group of rows
out = np.zeros(a.shape[0], dtype=int)
out[val] = np.maximum.reduceat(col-np.where(d==1)[1], idx)
Output: array([5, 1, 2, 0, 3, 1, 2, 2])
Intermediates:
np.pad(a, ((0,0), (1,1)), constant_values=0)
array([[0, 1, 1, 1, 1, 1, 0],
[0, 1, 0, 1, 0, 1, 0],
[0, 1, 1, 0, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 0, 1, 0],
[0, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 0, 0, 0],
[0, 1, 0, 1, 1, 0, 0]])
np.diff(np.pad(a, ((0,0), (1,1)), constant_values=0), axis=1)
array([[ 1, 0, 0, 0, 0, -1],
[ 1, -1, 1, -1, 1, -1],
[ 1, 0, -1, 1, -1, 0],
[ 0, 0, 0, 0, 0, 0],
[ 1, 0, 0, -1, 1, -1],
[ 1, -1, 0, 0, 0, 0],
[ 0, 1, 0, -1, 0, 0],
[ 1, -1, 1, 0, -1, 0]])
np.where(d==-1)
(array([0, 1, 1, 1, 2, 2, 4, 4, 5, 6, 7, 7]),
array([5, 1, 3, 5, 2, 4, 3, 5, 1, 3, 1, 4]))
col-np.where(d==1)[1]
array([5, 1, 1, 1, 2, 1, 3, 1, 1, 2, 1, 2])
np.unique(row, return_index=True)
(array([0, 1, 2, 4, 5, 6, 7]),
array([ 0, 1, 4, 6, 8, 9, 10]))
out = np.zeros(a.shape[0], dtype=int)
array([0, 0, 0, 0, 0, 0, 0, 0])
out[val] = np.maximum.reduceat(col-np.where(d==1)[1], idx)
array([5, 1, 2, 0, 3, 1, 2, 2])

Array with a single value cause error when used in Algorithm in Python

I am building a RandomForest Model
It works fine in some cases and crashes in others.
>>> type(z)
<class 'numpy.ndarray'>
>>> z
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 1, ..., 0, 0, 0],
...,
[0, 0, 1, ..., 1, 0, 0],
[0, 0, 1, ..., 1, 0, 1],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
I build the label list through this line of code
y = z[:, i]
i is an array that can have 1 or more values
i = [14,83,33]
or
i = [26]
Here is my code
i=[4,15,33]
y = z[:, i]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=TestSize, random_state=42)
mdl= RandomForestClassifier(n_estimators=500, n_jobs=-1)
mdl.fit(X_train, y_train)
y_hat = mdl.predict(X_test)
Here are the cases that I have
CASE 1
>>> i=[4,15,33]
>>> z[:, i]
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
...,
[0, 0, 0],
[0, 0, 0],
[0, 0, 1]], dtype=uint8)
works fine
CASE 2
here I replaced i with hardcoding inside z
>>> z[:, (4,15,33)]
array([[0, 0, 0],
[0, 0, 0],
[0, 0, 0],
...,
[0, 0, 0],
[0, 0, 0],
[0, 0, 1]], dtype=uint8)
CASE 3
i is a single value
>>> i=[26]
>>> z[:, i]
array([[1],
[0],
[1],
[0],
[1],
[1],
[1],
...
[1],
[0],
[0]], dtype=uint8)
and this is causing an error when running in RandomForest algorithm
>>> mdl.fit(X_train, y_train)
<stdin>:1: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples,), for example using ravel().
RandomForestClassifier(n_estimators=500, n_jobs=-1)
CASE 4
when I hardcode case 3
>>> z[:, (26)]
array([1, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0,
0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0,
1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 0, 1,
1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 1,
0, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 1,
1, 0, 1, 1, 0, 1, 0, 1, 0, 0], dtype=uint8)
and RandomForest works fine.
My question,
Why doesn't CASE 3 work and how do I get it working without hardcoding?

If you are using scikit-learn, when the target has a single column you should pass a 1d array instead of a column vector (nx1 array) as stated in this link.
You can change it as z[:,i[0] if len(i)==1 else i]!

DBSCAN eps and min_samples

I have been trying to use DBSCAN in order to detect outliers, from my understanding DBSCAN outputs -1 as outlier and 1 as inliner, but after I ran the code, I'm getting numbers that are not -1 or 1, can someone please explain why? Also is it normal to find the best value of eps using trial and error, because I couldn't figure out a way to find the best possible eps value.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.cluster import DBSCAN
df = pd.read_csv('Final After Simple Filtering.csv',index_col=None,low_memory=True)
# Dropping columns with low feature importance
del df['AmbTemp_DegC']
del df['NacelleOrientation_Deg']
del df['MeasuredYawError']
#applying DBSCAN
DBSCAN = DBSCAN(eps = 1.8, min_samples =10,n_jobs=-1)
df['anomaly'] = DBSCAN.fit_predict(df)
np.unique(df['anomaly'],return_counts=True)
(array([ -1, 0, 1, ..., 8462, 8463, 8464]),
array([1737565, 3539278, 4455734, ..., 13, 8, 8]))
Thank you.

Well, you did not actually get the real idea of DBSCAN.
This is a copy from wikipedia:
A point p is a core point if at least minPts points are within
distance ε of it (including p).
A point q is directly reachable from p if point q is within distance ε
from core point p. Points are only said to be directly reachable from
core points.
A point q is reachable from p if there is a path p1, ..., pn with p1 =
p and pn = q, where each pi+1 is directly reachable from pi. Note that
this implies that all points on the path must be core points, with the
possible exception of q.
All points not reachable from any other point are outliers or noise
points.
So saying in easier words, The idea is that:
Any sample who has min_samples neigbours by the distance of epsilon is a core sample.
Any data sample which is not core, but has at least one core neighbor (with a distance less than eps), is a directly reachable sample and can be added to the cluster.
Any data sample which is not directly reachable nor a core, but has at least one directly reachable neighbor (with a distance less than eps) is a reachable sample and will be added to the cluster.
Any other examples are considered to be noise, outlier or whatever you want to name it.( and those will be labeled by -1)
Depending on the parameters of the clustering (eps and min_samples) , you are very likely to have more than two clusters. You see, that is the reason you are seeing other values than 0 and -1 in the result of your clustering.
To answer your second question
Also is it normal to find the best value of eps using trial and error,
If you mean doing cross-validation( over a set where you know the cluster labels or you can approximate the correct clustering), yes I think that is the normal way to do it
PS: The paper is very good and comprehensive. I highly suggest you have a look. Good luck.

sklearn.cluster.DBSCAN gives -1 for noise, which is an outlier, all the other values other than -1 is the cluster number or cluster group. To see the total number of clusters you can use the command DBSCAN.labels_
What is eps or Epsilon value used in DBScan?
Epsilon is the local radius for expanding clusters. Think of it as a step size - DBSCAN never takes a step larger than this, but by doing multiple steps DBSCAN clusters can become much larger than eps.
How to find the best eps value?
use any hyperparameter tuning method / package like GridSearchCV or Hyperopt. You can use any of the following indices mentioned here.

I have found this to be a really good example of getting to understand how DBSCAN works.
import numpy as np
from sklearn.cluster import DBSCAN
from sklearn import metrics
from sklearn.datasets import make_blobs
from sklearn.preprocessing import StandardScaler
# #############################################################################
# Generate sample data
centers = [[1, 1], [-1, -1], [1, -1]]
X, labels_true = make_blobs(n_samples=750, centers=centers, cluster_std=0.4,
random_state=0)
X = StandardScaler().fit_transform(X)
# #############################################################################
# Compute DBSCAN
db = DBSCAN(eps=0.3, min_samples=10).fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
# Number of clusters in labels, ignoring noise if present.
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
print('Estimated number of clusters: %d' % n_clusters_)
print('Estimated number of noise points: %d' % n_noise_)
print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
print("Adjusted Rand Index: %0.3f"
% metrics.adjusted_rand_score(labels_true, labels))
print("Adjusted Mutual Information: %0.3f"
% metrics.adjusted_mutual_info_score(labels_true, labels))
print("Silhouette Coefficient: %0.3f"
% metrics.silhouette_score(X, labels))
# #############################################################################
# Plot result
import matplotlib.pyplot as plt
# Black removed and is used for noise instead.
unique_labels = set(labels)
colors = [plt.cm.Spectral(each)
for each in np.linspace(0, 1, len(unique_labels))]
for k, col in zip(unique_labels, colors):
if k == -1:
# Black used for noise.
col = [0, 0, 0, 1]
class_member_mask = (labels == k)
xy = X[class_member_mask & core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=14)
xy = X[class_member_mask & ~core_samples_mask]
plt.plot(xy[:, 0], xy[:, 1], 'o', markerfacecolor=tuple(col),
markeredgecolor='k', markersize=6)
plt.title('Estimated number of clusters: %d' % n_clusters_)
plt.show()
a = np.array(labels)
a
Result:
array([ 0, 1, 0, 2, 0, 1, 1, 2, 0, 0, 1, 1, 1, 2, 1, 0, -1,
1, 1, 2, 2, 2, 2, 2, 1, 1, 2, 0, 0, 2, 0, 1, 1, 0,
1, 0, 2, 0, 0, 2, 2, 1, 1, 1, 1, 1, 0, 2, 0, 1, 2,
2, 1, 1, 2, 2, 1, 0, 2, 1, 2, 2, 2, 2, 2, 0, 2, 2,
0, 0, 0, 2, 0, 0, 2, 1, -1, 1, 0, 2, 1, 1, 0, 0, 0,
0, 1, 2, 1, 2, 2, 0, 1, 0, 1, -1, 1, 1, 0, 0, 2, 1,
2, 0, 2, 2, 2, 2, -1, 0, -1, 1, 1, 1, 1, 0, 0, 1, 0,
1, 2, 1, 0, 0, 1, 2, 1, 0, 0, 2, 0, 2, 2, 2, 0, -1,
2, 2, 0, 1, 0, 2, 0, 0, 2, 2, -1, 2, 1, -1, 2, 1, 1,
2, 2, 2, 0, 1, 0, 1, 0, 1, 0, 2, 2, -1, 1, 2, 2, 1,
0, 1, 2, 2, 2, 1, 1, 2, 2, 0, 1, 2, 0, 0, 2, 0, 0,
1, 0, 1, 0, 1, 1, 2, 2, 0, 0, 1, 1, 2, 1, 2, 2, 2,
2, 0, 2, 0, 2, 2, 0, 2, 2, 2, 0, 0, 1, 1, 1, 2, 2,
2, 2, 1, 2, 2, 0, 0, 2, 0, 0, 0, 1, 0, 1, 1, 1, 2,
1, 1, 0, 1, 2, 2, 1, 2, 2, 1, 0, 0, 1, 1, 1, 0, 1,
0, 2, 0, 2, 2, 2, 2, 2, 1, 1, 0, 0, 1, 1, 0, 0, 2,
1, -1, 2, 1, 1, 2, 1, 2, 0, 2, 2, 0, 1, 2, 2, 0, 2,
2, 0, 0, 2, 0, 2, 0, 2, 1, 0, 0, 0, 1, 2, 1, 2, 2,
0, 2, 2, 0, 0, 2, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0,
0, 1, 1, 1, 0, 2, 0, 1, 2, 2, 0, 0, 2, 0, 2, 1, 0,
2, 0, 2, 0, 2, 2, 0, 1, 0, 1, 0, 2, 2, 1, 1, 1, 2,
0, 2, 0, 2, 1, 2, 2, 0, 1, 0, 1, 0, 0, 0, 0, 2, 0,
2, 0, 1, 0, 1, 2, 1, 1, 1, 0, 1, 1, 0, 2, 1, 0, 2,
2, 1, 1, 2, 2, 2, 1, 2, 1, 2, 0, 2, 1, 2, 1, 0, 1,
0, 1, 1, 0, 1, 2, -1, 1, 0, 0, 2, 1, 2, 2, 2, 2, 1,
0, 0, 0, 0, 1, 0, 2, 1, 0, 1, 2, 0, 0, 1, 0, 1, 1,
0, -1, 0, 2, 2, 2, 1, 1, 2, 0, 1, 0, 0, 1, 0, 1, 1,
2, 2, -1, 0, 1, 2, 2, 1, 1, 1, 1, 0, 0, 0, 2, 2, 1,
2, 1, 0, 0, 1, 2, 1, 0, 0, 2, 0, 1, 0, 2, 1, 0, 2,
2, 1, 0, 0, 0, 2, 1, 1, 0, 2, 0, 0, 1, 1, 1, 1, 0,
1, 0, 1, 0, 0, 2, 0, 1, 1, 2, 1, 1, 0, 1, 0, 2, 1,
0, 0, 1, 0, 1, 1, 2, 2, 1, 2, 2, 1, 2, 1, 1, 1, 1,
2, 0, 0, 0, 1, 2, 2, 0, 2, 0, 2, 1, 0, 1, 1, 0, 0,
1, 2, 1, 2, 2, 0, 2, 1, 1, 1, 2, 0, 0, 2, 0, 2, 2,
0, 2, 0, 1, 1, 1, 1, 0, 0, 0, 2, 1, 1, 1, 1, 2, 2,
2, 0, 2, 1, 1, 0, 0, 1, 0, 2, 1, 2, 1, 0, 2, 2, 0,
0, 1, 0, 0, 2, 0, 0, 0, 2, 0, 2, 0, 0, 1, 1, 0, 0,
1, 2, 2, 0, 0, 0, 0, 2, -1, 1, 1, 2, 1, 0, 0, 2, 2,
0, 1, 2, 0, 1, 2, 2, 1, 0, 0, -1, -1, 2, 0, 0, 0, 2,
-1, 2, 0, 1, 1, 1, 1, 1, 0, 0, 2, 1, 2, 0, 1, 1, 1,
0, 2, 1, 1, -1, 2, 1, 2, 0, 2, 2, 1, 0, 0, 0, 1, 1,
2, 0, 0, 2, 2, 1, 2, 2, 2, 0, 2, 1, 2, 1, 1, 1, 2,
0, 2, 0, 2, 2, 0, 0, 2, 1, 2, 0, 2, 0, 0, 0, 1, 0,
2, 1, 2, 0, 1, 0, 0, 2, 0, 2, 1, 1, 2, 1, 0, 1, 2,
1, 2], dtype=int64)
Those -1 data points are outliers. Let's count the number of outliers and see if it matches what we see in the image above.
list(a)
b = a.tolist()
count = b.count(-1)
count
Result:
18
We got 18! Perfect!!
Relevant Link:
https://scikit-learn.org/stable/auto_examples/cluster/plot_dbscan.html#sphx-glr-auto-examples-cluster-plot-dbscan-py

alternating values in numpy

Trying to make my code more efficient and readable and i'm stuck. Assume I want to build something like a chess board, with alternating black and white colors on an 8x8 grid. So, using numpy, I have done this:
import numpy as np
board = np.zeros((8,8), np.int32)
for ri in range(8):
for ci in range(8):
if (ci + ri) % 2 == 0:
board[ri,ci] = 1
Which nicely outputs:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]], dtype=int32)
That I can then parse as white squares or black squares. However, in practice my array is much larger, and this way is very inefficient and unreadable. I assumed numpy already has this figured out, so I tried this:
board = np.zeros(64, np.int32)
board[::2] = 1
board = board.reshape(8,8)
But that output is wrong, and looks like this:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0],
[1, 0, 1, 0, 1, 0, 1, 0]], dtype=int32)
Is there a better way to achieve what I want that works efficiently (and preferably, is readable)?
Note: i'm not attached to 1's and 0's, this can easily be done with other types of values, even True/False or strings of 2 kinds, as long as it works

Here's one approach using slicing with proper starts and stepsize of 2 in two steps -
board = np.zeros((8,8), np.int32)
board[::2,::2] = 1
board[1::2,1::2] = 1
Sample run -
In [229]: board = np.zeros((8,8), np.int32)
...: board[::2,::2] = 1
...: board[1::2,1::2] = 1
...:
In [230]: board
Out[230]:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]], dtype=int32)
Other tricky ways -
1) Broadcasted comparison :
In [254]: r = np.arange(8)%2
In [255]: (r[:,None] == r)*1
Out[255]:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])
2) Broadcasted addition :
In [279]: r = np.arange(8)
In [280]: 1-(r[:,None] + r)%2
Out[280]:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])

Just found out an alternative answer by myself, so posting it here for future reference to anyone who's interested:
a = np.array([[1,0],[0,1]])
b = np.tile(a, (4,4))
Results:
array([[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1],
[1, 0, 1, 0, 1, 0, 1, 0],
[0, 1, 0, 1, 0, 1, 0, 1]])

I think the following is also a good way of doing it for a variable input
import sys
lines = sys.stdin.readlines()
n = int(lines[0])
import numpy as np
a = np.array([[1,0], [0,1]],dtype=np.int)
outputData= np.tile(a,(n//2,n//2))
print(outputData)

You can achieve this for single even input number n
import numpy as np
i = np.eye(2)
i = i[::-1]
k = np.array(i, dtype = np.int)
print(np.tile(k,(n//2,n//2)))

I tried and found this to be shorter one for any giver number:
n = int(input())
import numpy as np
c = np.array([[0,1], [1, 0]])
print(np.tile(c, reps=(n//2, n//2)))

Modify every two largest elements of matrix rows and columns

In python, I have a matrix and I want to find the two largest elements in every row and every column and change their values to 1 (seperately, I mean get two matrices where one of them modified the rows and the other modified the cols).
The main goal is to get a corresponding matrix with zeros everywhere except those ones I've put in the 2 largest element of each row and column (using np.where(mat == 1, 1, 0).
I'm trying to use the np.argpartition in order to do so but without success.
Please help.
See image below.

Here's an approach with np.argpartition -
idx_row = np.argpartition(-a,2,axis=1)[:,:2]
out_row = np.zeros(a.shape,dtype=int)
out_row[np.arange(idx_row.shape[0])[:,None],idx_row] = 1
idx_col = np.argpartition(-a,2,axis=0)[:2]
out_col = np.zeros(a.shape,dtype=int)
out_col[idx_col,np.arange(idx_col.shape[1])] = 1
Sample input, output -
In [40]: a
Out[40]:
array([[ 3, 7, 1, -5, 14, 2, 8],
[ 5, 8, 1, 4, -3, 3, 10],
[11, 3, 5, 1, 9, 2, 5],
[ 6, 4, 12, 6, 1, 15, 4],
[ 8, 2, 0, 1, -2, 3, 5]])
In [41]: out_row
Out[41]:
array([[0, 0, 0, 0, 1, 0, 1],
[0, 1, 0, 0, 0, 0, 1],
[1, 0, 0, 0, 1, 0, 0],
[0, 0, 1, 0, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 1]])
In [42]: out_col
Out[42]:
array([[0, 1, 0, 0, 1, 0, 1],
[0, 1, 0, 1, 0, 1, 1],
[1, 0, 1, 0, 1, 0, 0],
[0, 0, 1, 1, 0, 1, 0],
[1, 0, 0, 0, 0, 0, 0]])
Alternatively, if you are into compact codes, we can skip the initialization and use broadcasting to get the outputs from idx_row and idx_col directly, like so -
out_row = (idx_row[...,None] == np.arange(a.shape[1])).any(1).astype(int)
out_col = (idx_col[...,None] == np.arange(a.shape[0])).any(0).astype(int).T

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.