Related
I have a numpy array like this:
array([[ 3., 2., 3., ..., 0., 0., 0.],
[ 3., 2., -4., ..., 0., 0., 0.],
[ 3., -4., 1., ..., 0., 0., 0.],
...,
[-1., -2., 4., ..., 0., 0., 0.],
[ 4., -2., -2., ..., 0., 0., 0.],
[-2., 2., 4., ..., 0., 0., 0.]], dtype=float32)
what I want to do is removing all the rows that do not sum to zero and remove them, while also saving such rows indexes/positions in order to eliminate them to another array.
I'm trying the following:
for i in range(len(arr1)):
count=0
for j in arr1[i]:
count+=j
if count != 0:
arr_1 = np.delete(arr1,i,axis=0)
arr_2 = np.delete(arr2,i,axis=0)
the resulting arr_1 and arr_2 still contain rows that do not sum to zero. What am I doing wrong?
You can compute sum then keep row that have sum == 0 like below:
a=np.array([
[ 3., 2., 3., 0., 0., 0.],
[ 3., 2., -4., 0., 0., 0.],
[ 3., -4., 1., 0., 0., 0.]])
b = a.sum(axis=1)
# array([8., 1., 0.])
print(a[b==0])
Output:
array([[ 3., -4., 1., 0., 0., 0.]])
Just use sum(axis=1):
mask = a.sum(axis=1) != 0
do_sum_to_0 = a[~mask]
dont_sum_to_0 = a[mask]
I have an array that is grouped and looks like this:
import numpy as np
y = np.array(
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.]]
)
n_repeats = 4
The array contains three groups, here marked as 0, 1, and 2. Every group appears n_repeats times. Here n_repeats=4. Currently I do the following to compute the mean and variance of chunks of that array:
mean = np.array([np.mean(y[i: i+n_repeats], axis=0) for i in range(0, len(y), n_repeats)])
var = np.array([np.var(y[i: i+n_repeats], axis=0) for i in range(0, len(y), n_repeats)])
Is there a better and faster way to achieve this?
Yes, reshape and then use .mean and .var along the appropriate dimension:
>>> arr.reshape(-1, 4, 6)
array([[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.]],
[[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.]]])
>>> arr.reshape(-1, 4, 6).mean(axis=1)
array([[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.]])
>>> arr.reshape(-1, 4, 6).var(axis=1)
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
In case you do not know how many groups, or number of repeats, you can try:
>>> np.vstack([y[y == i].reshape(-1,y.shape[1]).mean(axis=0) for i in np.unique(y)])
array([[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.]])
>>> np.vstack([y[y == i].reshape(-1,y.shape[1]).var(axis=0) for i in np.unique(y)])
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
Whats the most pythonic way of writing a function that returns a nxn boundary mask for convolotion, e.g for 3x3 it will return [[1,1,1],[1,0,1],[1,1,1]], for 5x5 it will return [[1,1,1,1,1],[1,0,0,0,1],[1,0,0,0,1],[1,0,0,0,1],[1,1,1,1,1]] and so on.
this works (but isnt so pythonic):
def boundaryMask(size):
mask=np.zeros((size,size))
for i in range(size):
mask[0][i]=1
mask[i][0]=1
mask[i][size-1]=1
mask[size-1][i]=1
return mask
One option would be to create an array of ones, and then assign zeros to the center of the array using slicing:
N = 4
x = np.ones((N, N))
x[1:-1, 1:-1] = 0
x
#array([[ 1., 1., 1., 1.],
# [ 1., 0., 0., 1.],
# [ 1., 0., 0., 1.],
# [ 1., 1., 1., 1.]])
Put in a function and test on various sizes:
def boundaryMask(size):
mask=np.ones((size,size))
mask[1:-1,1:-1] = 0
return mask
boundaryMask(1)
# array([[ 1.]])
boundaryMask(2)
#array([[ 1., 1.],
# [ 1., 1.]])
boundaryMask(3)
#array([[ 1., 1., 1.],
# [ 1., 0., 1.],
# [ 1., 1., 1.]])
boundaryMask(4)
#array([[ 1., 1., 1., 1.],
# [ 1., 0., 0., 1.],
# [ 1., 0., 0., 1.],
# [ 1., 1., 1., 1.]])
I have a multidimensionnal array "test[:,:,:]" and i would like to get averaged values on the test.shape[0] dimension for every 4 "frames" i would like to keep the same dimensions of my array and substitute the 4 values by the mean value.
As example:
test=np.array([[[ 2., 1., 1.],
[ 1., 1., 1.]],
[[ 3., 1., 1.],
[ 1., 1., 1.]],
[[ 3., 1., 1.],
[ 1., 1., 1.]],
[[ 5., 1., 1.],
[ 1., 1., 1.]],
[[ 2., 1., 1.],
[ 1., 1., 1.]],
[[ 3., 1., 1.],
[ 1., 1., 1.]],
[[ 3., 1., 1.],
[ 1., 1., 1.]],
[[ 5., 1., 1.],
[ 1., 1., 1.]],
[[ 2., 1., 1.],
[ 1., 1., 1.]]])
for i in range(test.shape[0]-1,4):
test_mean = (test[i,:,:]+test[i+1,:,:]+test[i+2,:,:]+test[i+3,:,:])/4.
But, i don't keep the same dimension...what is the best way to do that?
You are overwriting test_mean every time. A good start is:
test_mean = np.zeros_like(test)
for i in xrange(test.shape[0]-4):
test_mean[i] = test[i:i+4].mean(axis=0)
Here is a more efficient implementation from scipy:
from scipy.ndimage import uniform_filter1d
test_mean2 = uniform_filter1d(test, 4, axis=0)
Check the documentation to understand how the result is stored and what options you have to treat boundary values.
Let's say I have a matrix
x=array([[ 0., 0., 0.],
[ 0., 0., 1.],
[ 0., 1., 0.],
[ 0., 1., 1.],
[ 1., 0., 0.],
[ 1., 0., 1.],
[ 1., 1., 0.],
[ 1., 1., 1.]])
I want to get
array([[ 0., 0., 0.],
[ 0., 0., 2.],
[ 0., 3., 0.],
[ 0., 4., 4.],
[ 5., 0., 0.],
[ 6., 0., 6.],
[ 7., 7., 0.],
[ 8., 8., 8.]])
How to write the one-line expression between x and range(1,9)? And what is the code for the same operation for columns?
x * np.arange(1, 9).reshape(-1, 1)
or
x * arange(1, 9)[:, np.newaxis]
Both forms make a column vector out of arange(1, 9), which broadcasts nicely along the y axis of x.
"The same operation for columns" is just the transpose of the above, i.e. skip the reshape operation:
x * arange(1, 4)