Related
I have a numpy array like this:
array([[ 3., 2., 3., ..., 0., 0., 0.],
[ 3., 2., -4., ..., 0., 0., 0.],
[ 3., -4., 1., ..., 0., 0., 0.],
...,
[-1., -2., 4., ..., 0., 0., 0.],
[ 4., -2., -2., ..., 0., 0., 0.],
[-2., 2., 4., ..., 0., 0., 0.]], dtype=float32)
what I want to do is removing all the rows that do not sum to zero and remove them, while also saving such rows indexes/positions in order to eliminate them to another array.
I'm trying the following:
for i in range(len(arr1)):
count=0
for j in arr1[i]:
count+=j
if count != 0:
arr_1 = np.delete(arr1,i,axis=0)
arr_2 = np.delete(arr2,i,axis=0)
the resulting arr_1 and arr_2 still contain rows that do not sum to zero. What am I doing wrong?
You can compute sum then keep row that have sum == 0 like below:
a=np.array([
[ 3., 2., 3., 0., 0., 0.],
[ 3., 2., -4., 0., 0., 0.],
[ 3., -4., 1., 0., 0., 0.]])
b = a.sum(axis=1)
# array([8., 1., 0.])
print(a[b==0])
Output:
array([[ 3., -4., 1., 0., 0., 0.]])
Just use sum(axis=1):
mask = a.sum(axis=1) != 0
do_sum_to_0 = a[~mask]
dont_sum_to_0 = a[mask]
I have a numpy array:
arr=np.array([[1., 2., 0.],
[2., 4., 1.],
[1., 3., 2.],
[-1., -2., 4.],
[-1., -2., 5.],
[1., 2., 6.]])
I want to flip the second half of this array upward. I mean I want to have:
flipped_arr=np.array([[-1., -2., 4.],
[-1., -2., 5.],
[1., 2., 6.],
[1., 2., 0.],
[2., 4., 1.],
[1., 3., 2.]])
When I try this code:
fliped_arr=np.flip(arr, 0)
It gives me:
fliped_arr= array([[1., 2., 6.],
[-1., -2., 5.],
[-1., -2., 4.],
[1., 3., 2.],
[2., 4., 1.],
[1., 2., 0.]])
In advance, I do appreciate any help.
You can simply concatenate rows below the nth row (included) with np.r_ for instance, with row index n of your choice, at the top and the other ones at the bottom:
import numpy as np
n = 3
arr_flip_n = np.r_[arr[n:],arr[:n]]
>>> array([[-1., -2., 4.],
[-1., -2., 5.],
[ 1., 2., 6.],
[ 1., 2., 0.],
[ 2., 4., 1.],
[ 1., 3., 2.]])
you can do this by slicing the array using the midpoint:
ans = np.vstack((arr[int(arr.shape[0]/2):], arr[:int(arr.shape[0]/2)]))
to break this down a little:
find the midpoint of arr, by finding its shape, the first index of which is the number of rows, dividing by two and converting to an integer:
midpoint = int(arr.shape[0]/2)
the two halves of the array can then be sliced like so:
a = arr[:midpoint]
b = arr[midpoint:]
then stack them back together using np.vstack:
ans = np.vstack((a, b))
(note vstack takes a single argument, which is a tuple containing a and b: (a, b))
You can do this with array slicing and vstack -
arr=np.array([[1., 2., 0.],
[2., 4., 1.],
[1., 3., 2.],
[-1., -2., 4.],
[-1., -2., 5.],
[1., 2., 6.]])
mid = arr.shape[0]//2
np.vstack([arr[mid:],arr[:mid]])
array([[-1., -2., 4.],
[-1., -2., 5.],
[ 1., 2., 6.],
[ 1., 2., 0.],
[ 2., 4., 1.],
[ 1., 3., 2.]])
I have an array that is grouped and looks like this:
import numpy as np
y = np.array(
[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.]]
)
n_repeats = 4
The array contains three groups, here marked as 0, 1, and 2. Every group appears n_repeats times. Here n_repeats=4. Currently I do the following to compute the mean and variance of chunks of that array:
mean = np.array([np.mean(y[i: i+n_repeats], axis=0) for i in range(0, len(y), n_repeats)])
var = np.array([np.var(y[i: i+n_repeats], axis=0) for i in range(0, len(y), n_repeats)])
Is there a better and faster way to achieve this?
Yes, reshape and then use .mean and .var along the appropriate dimension:
>>> arr.reshape(-1, 4, 6)
array([[[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]],
[[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.],
[1., 1., 1., 1., 1., 1.]],
[[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2.]]])
>>> arr.reshape(-1, 4, 6).mean(axis=1)
array([[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.]])
>>> arr.reshape(-1, 4, 6).var(axis=1)
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
In case you do not know how many groups, or number of repeats, you can try:
>>> np.vstack([y[y == i].reshape(-1,y.shape[1]).mean(axis=0) for i in np.unique(y)])
array([[0., 0., 0., 0., 0., 0.],
[1., 1., 1., 1., 1., 1.],
[2., 2., 2., 2., 2., 2.]])
>>> np.vstack([y[y == i].reshape(-1,y.shape[1]).var(axis=0) for i in np.unique(y)])
array([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])
I want to copared each vector from one array with all vectors from another array, and count how many symbols matches per vector. Let me show an example.
I have two arrays, a and b.
For each vector in a, I want to compare it with each vector in b. I then want to return a new array which is with dimension np.array((len(a),14)) where each vector holds the number of times vectors in a had 0,1,2,3,4,..,12,13 matches with vectors from b. The wished results are shown in array c below.
I already have solved this problem using np.newaxis() but my issue is (see my function below), that this takes up so much memory so my computer can't handle it when a and b gets larger. Hence, I am looking for a more efficient way to do this calculation, as it hurts my memory big time to add on dimensions to the vectors. One solution is to go with a normal for loop, but this method is rather slow.
Is it possible to make these calculations more efficient?
a = array([[1., 1., 1., 2., 1., 1., 2., 1., 0., 2., 2., 2., 2.],
[0., 2., 2., 0., 1., 1., 0., 1., 1., 0., 2., 1., 2.],
[0., 0., 0., 1., 1., 0., 2., 1., 2., 0., 1., 2., 2.],
[1., 2., 2., 0., 1., 1., 0., 2., 0., 1., 1., 0., 2.],
[1., 2., 0., 2., 2., 0., 2., 0., 0., 1., 2., 0., 0.]])
b = array([[0., 2., 0., 0., 0., 0., 0., 1., 1., 1., 0., 2., 2.],
[1., 0., 1., 2., 2., 0., 1., 1., 1., 1., 2., 1., 2.],
[1., 2., 1., 2., 0., 0., 0., 1., 1., 2., 2., 0., 2.],
[0., 1., 2., 0., 2., 1., 0., 1., 2., 0., 0., 0., 2.],
[0., 2., 2., 1., 2., 1., 0., 1., 1., 1., 2., 2., 2.],
[0., 2., 2., 1., 0., 1., 1., 0., 1., 0., 2., 2., 1.],
[1., 0., 2., 2., 0., 1., 0., 1., 0., 1., 1., 2., 2.],
[1., 1., 0., 2., 1., 1., 1., 1., 0., 2., 0., 2., 2.],
[1., 2., 0., 0., 0., 1., 2., 1., 0., 1., 2., 0., 1.],
[1., 2., 1., 2., 2., 1., 2., 0., 2., 0., 0., 1., 1.]])
c = array([[0, 0, 0, 2, 1, 2, 2, 2, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 2, 3, 1, 2, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 3, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 3, 0, 3, 2, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 4, 0, 3, 0, 1, 0, 0, 0, 0, 0]])
My solution:
def new_method_test(a,b):
test = (a[:,np.newaxis] == b).sum(axis=2)
zero = (test == 0).sum(axis=1)
one = (test == 1).sum(axis=1)
two = (test == 2).sum(axis=1)
three = (test == 3).sum(axis=1)
four = (test == 4).sum(axis=1)
five = (test == 5).sum(axis=1)
six = (test == 6).sum(axis=1)
seven = (test == 7).sum(axis=1)
eight = (test == 8).sum(axis=1)
nine = (test == 9).sum(axis=1)
ten = (test == 10).sum(axis=1)
eleven = (test == 11).sum(axis=1)
twelve = (test == 12).sum(axis=1)
thirteen = (test == 13).sum(axis=1)
c = np.concatenate((zero,one,two,three,four,five,six,seven,eight,nine,ten,eleven,twelve,thirteen), axis = 0).reshape(14,len(a)).T
return c
Thank you for you help.
welcome to Stackoverflow! I think a for loop is the way to go if you want to save memory (and it's really not that slow). Additionally you can directly go from one test to your c output matrix with np.bincount. I think this method will be approximately equally fast as yours and it will use significantly less memory by comparison.
import numpy as np
c = np.empty(a.shape, dtype=int)
for i in range(a.shape[0]):
test_one_vector = (a[i,:]==b).sum(axis=1)
c[i,:] = np.bincount(test_one_vector, minlength=a.shape[1])
Small sidenote if you are really dealing with floating point numbers in a and b you should consider dropping the equality check (==) in favor of a proximity check like e.g. np.isclose
I have a set of data which is in columns, where the first column is the x values. How do i read this in?
If you want to store both, x and y values you can do
ydat = np.zeros((data.shape[1]-1,data.shape[0],2))
# write the x data
ydat[:,:,0] = data[:,0]
# write the y data
ydat[:,:,1] = data[:,1:].T
Edit:
If you want to store only the y-data in the sub arrays you can simply do
ydat = data[:,1:].T
Working example:
t = np.array([[ 0., 0., 1., 2.],
[ 1., 0., 1., 2.],
[ 2., 0., 1., 2.],
[ 3., 0., 1., 2.],
[ 4., 0., 1., 2.]])
a = t[:,1:].T
a
array([[ 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1.],
[ 2., 2., 2., 2., 2.]])