Numpy Indexing problem..... Advance indexing what is X[0] doing here? - python

import numpy as np
X = np.array([[0, 1, 0, 1], [1, 0, 1, 1], [0, 0, 0, 1], [1, 0, 1, 0]])
y = np.array([0, 1, 0, 1])
counts = {}
print(X[y == 0])
# prints = [[0 1 0 1]
# [0 0 0 1]]
I want to know why X[y==0] printing two data point. Shouldn't it print only [0 1 0 1] ?
because X[0]?

y == 0 gives an array with same dimensions as y, with elements True where the corresponding element in y is 0, and False otherwise.
Here, y has 0 elements at indices 0 and 2. So, X[y == 0] gives you an array containing X[0] and X[2].

Related

How to change diagonal elements in a matrix from 1 to 0, 0 to 1

Please can someone help with flipping elements on the diagonal of a matrix from 1 to 0 if 1, and 0 to 1 if 0 for the matrix rmat
mat = np.random.binomial(1,.5,4)
rmat = np.array([mat,]*4)
Thank you
You can use numpy.fill_diagonal.
NB. the operation is in place
diagonal = rmat.diagonal()
np.fill_diagonal(rmat, 1-diagonal)
input:
array([[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 0],
[1, 1, 1, 0]])
output:
array([[0, 1, 1, 0],
[1, 0, 1, 0],
[1, 1, 0, 0],
[1, 1, 1, 1]])
Try this -
Unlike the np.fill_diagonal, this method is not inplace and doesnt need explicit copy of the input rmat matrix.
n = rmat.shape[0]
output = np.where(np.eye(n, dtype=bool), np.logical_not(rmat), rmat)
output
#Original
[[0 1 0 0]
[0 1 0 0]
[0 1 0 0]
[0 1 0 0]]
#diagonal inverted
[[1 1 0 0]
[0 0 0 0]
[0 1 1 0]
[0 1 0 1]]
Another way to do this would be to use np.diag_indices along with np.logical_not
n = rmat.shape[0]
idx = np.diag_indices(n)
rmat[idx] = np.logical_not(rmat[idx])
print(rmat)

Python: adding index as new column to 2D array

Suppose I have np.array like below
dat = array([[ 0, 1, 0],
[ 1, 0, 0],
[0, 0, 1]]
)
What I want to do is that adding the (index of row + 1) as a new column to this array, which is like
newdat = array([[ 0, 1, 0, 1],
[ 1, 0, 0, 2],
[0, 0, 1, 3]]
)
How should I achieve this.
You can also use np.append(). You can also get more info about [...,None] here
import numpy as np
dat = np.array([
[0, 1, 0],
[1, 0, 0],
[0, 0, 1]
])
a = np.array(range(1,4))[...,None] #None keeps (n, 1) shape
dat = np.append(dat, a, 1)
print (dat)
The output of this will be:
[[0 1 0 1]
[1 0 0 2]
[0 0 1 3]]
Or you can use hstack()
a = np.array(range(1,4))[...,None] #None keeps (n, 1) shape
dat = np.hstack((dat, a))
And as hpaulj mentioned, np.concatenate is the way to go. You can read more about concatenate documentation. Also, see additional examples of concatenate on stackoverflow
dat = np.concatenate([dat, a], 1)
Use numpy.column_stack:
newdat = np.column_stack([dat, range(1,dat.shape[0] + 1)])
print(newdat)
#[[0 1 0 1]
# [1 0 0 2]
# [0 0 1 3]]
Try something like this using numpy.insert():
import numpy as np
dat = np.array([
[0, 1, 0],
[1, 0, 0],
[0, 0, 1]
])
dat = np.insert(dat, 3, values=[range(1, 4)], axis=1)
print(dat)
Output:
[[0 1 0 1]
[1 0 0 2]
[0 0 1 3]]
More generally, you can make use of numpy.ndarray.shape for the appropriate sizing:
dat = np.insert(dat, dat.shape[1], values=[range(1, dat.shape[0] + 1)], axis=1)

How to create a numpy array from itertools.combinations without looping

Is there a way to get this result without a loop? I've made a couple attempts at fancy indexing with W[range(W.shape[0]),... but have been so far unsuccessful.
import itertools
import numpy as np
n = 4
ct = 2
one_index_tuples = list(itertools.combinations(range(n), r=ct))
W = np.zeros((len(one_index_tuples), n), dtype='int')
for row_index, col_index in enumerate(one_index_tuples):
W[row_index, col_index] = 1
print(W)
Result:
[[1 1 0 0]
[1 0 1 0]
[1 0 0 1]
[0 1 1 0]
[0 1 0 1]
[0 0 1 1]]
You can use fancy indexing (advanced indexing) as follows:
# reshape the row index to 2d since your column index is also 2d so that the row index and
# column index will broadcast properly
W[np.arange(len(one_index_tuples))[:, None], one_index_tuples] = 1
W
#array([[1, 1, 0, 0],
# [1, 0, 1, 0],
# [1, 0, 0, 1],
# [0, 1, 1, 0],
# [0, 1, 0, 1],
# [0, 0, 1, 1]])
Try this:
[[ 1 if i in x else 0 for i in range(n) ] for x in itertools.combinations( range(n), ct )]

Select two rows from bit array based on int array python

I have two arrays one Int, and one is bit
s = [ [1] x = [ [1 0 0 0 0]
[4] [1 1 1 1 0]
[9] [0 1 1 1 0]
[0] [0 0 1 0 0]
[3] ] [0 1 1 0 0]]
I want to find the smallest two elements in s (random given) then (select and print) two rows from x (random given) based on s array,
for example, the smallest elements in s[i] are s[3]=0, s[0]=1, so i want to select x[3][0 0 1 0 0], and x[0][1 0 0 0 0]
import numpy as np
np.set_printoptions(threshold=np.nan)
s= np.random.randint(5, size=(5))
x= np.random.randint (2, size=(5, 5))
print (s)
print (x)
I tried my best using the "for loop" but no luck, any advice will be appreciated.
You can use numpy.argpartition to find out the index of the two smallest elements from s and use it as row index to subset x:
s
# array([3, 0, 0, 1, 2])
x
# array([[1, 0, 0, 0, 1],
# [1, 0, 1, 1, 1],
# [0, 0, 1, 0, 0],
# [1, 0, 0, 1, 1],
# [0, 0, 1, 0, 1]])
x[s.argpartition(2)[:2], :]
# array([[1, 0, 1, 1, 1],
# [0, 0, 1, 0, 0]])

Centralising data in numpy

I have matrices with rows that need to be centralised. In other words each row has trailing zeros at both ends, while the actual data is between the trailing zeros. However, I need the number of trailing zeros to be equal at both ends or in other words what I call the data (values between the trailing zeros) to be centred at the middle of the row. Here is an example:
array:
[[0, 1, 2, 0, 2, 1, 0, 0, 0],
[2, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 2, 0, 0, 0]]
centred_array:
[[0, 0, 1, 2, 0, 2, 1, 0, 0],
[0, 0, 0, 2, 1, 1, 0, 0, 0],
[0, 0, 1, 0, 0, 2, 0, 0, 0]]
I hope that explains it well enough so that you can see some of the issues I am having. One, I am not guaranteed a even value for the size of the "data" so the function needs to pick a centre for even values which is consistent; also this is the case for rows (rows might have an even size which means one placed needs to be chosen as the centre).
EDIT: I should probably note that I have a function that does this; its just that I can get 10^3 number of rows to centralise and my function is too slow, so efficiency would really help.
#HYRY
a = np.array([[0, 1, 2, 0, 2, 1, 0, 0, 0],
[2, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 2, 0, 0, 0]])
cd = []
(x, y) = np.shape(a)
for row in a:
trim = np.trim_zeros(row)
to_add = y - np.size(trim)
a = to_add / 2
b = to_add - a
cd.append(np.pad(trim, (a, b), 'constant', constant_values=(0, 0)).tolist())
result = np.array(cd)
print result
[[0 0 1 2 0 2 1 0 0]
[0 0 0 2 1 1 0 0 0]
[0 0 1 0 0 2 0 0 0]]
import numpy as np
def centralise(arr):
# Find the x and y indexes of the nonzero elements:
x, y = arr.nonzero()
# Find the index of the left-most and right-most elements for each row:
nonzeros = np.bincount(x)
nonzeros_idx = nonzeros.cumsum()
left = y[np.r_[0, nonzeros_idx[:-1]]]
right = y[nonzeros_idx-1]
# Calculate how much each y has to be shifted
shift = ((arr.shape[1] - (right-left) - 0.5)//2 - left).astype(int)
shift = np.repeat(shift, nonzeros)
new_y = y + shift
# Create centered_arr
centered_arr = np.zeros_like(arr)
centered_arr[x, new_y] = arr[x, y]
return centered_arr
arr = np.array([[0, 1, 2, 0, 2, 1, 0, 0, 0],
[2, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 2, 0, 0, 0]])
print(centralise(arr))
yields
[[0 0 1 2 0 2 1 0 0]
[0 0 0 2 1 1 0 0 0]
[0 0 1 0 0 2 0 0 0]]
A benchmark comparing the original code to centralise:
def orig(a):
cd = []
(x, y) = np.shape(a)
for row in a:
trim = np.trim_zeros(row)
to_add = y - np.size(trim)
a = to_add / 2
b = to_add - a
cd.append(np.pad(trim, (a, b), 'constant', constant_values=(0, 0)).tolist())
result = np.array(cd)
return result
In [481]: arr = np.tile(arr, (1000, 1))
In [482]: %timeit orig(arr)
10 loops, best of 3: 140 ms per loop
In [483]: %timeit centralise(arr)
1000 loops, best of 3: 537 µs per loop
In [486]: (orig(arr) == centralise(arr)).all()
Out[486]: True
If you only have 10^3 rows in your array, you can probably afford a python loop if you'd like a more explicit solution:
import numpy as np
a = np.array([[0, 1, 2, 0, 2, 1, 0, 0, 0],
[2, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 2, 0, 0, 0]])
for i, r in enumerate(a):
w = np.where(r!=0)[0]
nend = len(r) - w[-1] - 1
nstart = w[0]
shift = (nend - nstart)//2
a[i] = np.roll(r, shift)
print(a)
gives:
[[0 0 1 2 0 2 1 0 0]
[0 0 0 2 1 1 0 0 0]
[0 0 1 0 0 2 0 0 0]]
A solution using np.apply_along_axis:
import numpy as np
def centerRow(a):
i = np.nonzero(a <> 0)
ifirst = i[0][0]
ilast = i[0][-1]
count = ilast-ifirst+1
padleft = (np.size(a) - count) / 2
padright = np.size(a) - padleft - count
b = np.r_ [ np.repeat(0,padleft), a[ifirst:ilast+1], np.repeat(0,padright) ]
return b
arr = np.array(
[[0, 1, 2, 0, 2, 1, 0, 0, 0],
[2, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 2, 0, 0, 0]]
)
barr = np.apply_along_axis(centerRow, 1, arr)
print barr
Algorithm:
find positions of non-zero values on the row of length n
find the difference, d, between 1st and the last non-zero element
store meaningful vector, x, in the row given by length d
find the mid-point of d, d_m, if it is even, get the right element
find the mid-point of row length, n_m, if it is even, pick the right
subtract d_m-d from n_m and place x at this position in the row of zeros of length n
repeat for all rows
Quick Octave Prototype (Will Soon post Python version):
mat = [[0, 1, 2, 0, 2, 1, 0, 0, 0],
[2, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 2, 0, 0, 0]];
newMat = zeros(size(mat)); %new matrix to be filled
n = size(mat, 2);
for i = 1:size(mat,1)
newRow = newMat(i,:);
nonZeros = find(mat(i,:));
x = mat(i, nonZeros(1):nonZeros(end));
d = nonZeros(end)- nonZeros(1);
d_m = ceil(d/2);
n_m = ceil(n/2);
newRow(n_m-d_m:n_m-d_m+d) = x;
newMat(i,:) = newRow;
end
newMat
> [[0 0 1 2 0 2 1 0 0]
[0 0 0 2 1 1 0 0 0]
[0 0 1 0 0 2 0 0 0]]

Categories