I have a numpy int 1D array. Which looks like this:
[0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0]
Basically, it's an array of mostly zeros with some signals that are ints [1,2,3,4,5,...] and the signals always have a "width" of 1, meaning they are surrounded by 0s.
I want to add "width" to each signal so instead of taking only 1 space in the array it would take width space in the array.
So, in this example with the width of 3, I would get
[0,0,0,0,1,1,1,0,0,2,2,2,0,0,5,5,5,0,1,1,1,0,0,0,0,0,0,0,0,0]
The length of the array stays the same, the width can be 3,5,7, but nothing too outrageous.
What would be the fastest way to do this? I feel like there probably is an easy way to do this, but not sure how to correctly call this operation.
Convolution might be what you're looking for?
>>> import numpy as np
>>> width = 3
>>> a = np.array([0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0])
>>> np.convolve(a, np.ones(width))
array([0., 0., 0., 0., 0., 1., 1., 1., 0., 0., 2., 2., 2., 0., 0., 5., 5.,
5., 0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
This does not preserve the length of the array though. If you want to preserve the length, you should use the 'same' mode as such:
>>> np.convolve(a, np.ones(width), mode='same')
array([0., 0., 0., 0., 1., 1., 1., 0., 0., 2., 2., 2., 0., 0., 5., 5., 5.,
0., 1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.])
If this is not fast enough, I suggest you take a look at scipy.signal.fftconvolve.
I know it's not the perfect solution but here it is:
I made a duplicate of the intial list and created a width range so when I find a number diffrent than 0 I replace the surrounding zeros with the appropriate number
arr = [0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0]
arr1 = [0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,5,0,0,0,1,0,0,0,0,0,0,0,0,0,0]
width = 3
width_range = [i for i in range(width//(-2)+1,width//(2)+1)]
print('width_range: ',width_range)
for idx,elem in enumerate(arr):
if elem !=0:
for i in width_range:
arr1[idx+i]=elem
print(arr1)
Output:
width_range: [-1, 0, 1]
[0, 0, 0, 0, 1, 1, 1, 0, 0, 2, 2, 2, 0, 0, 5, 5, 5, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
PS: This example only works with 3 and 5 if you want to test it with 7 you need to add zeros between your signals.
Related
Assuming that I have the following matrix consisting of one-hot encoded rows:
X = np.array([[0., 0., 0., 1., 0.], [1., 0., 0., 0., 0.], [0., 0., 1., 0., 0.]])
What I aim to do is smooth/expand the one-hot encoding in a way such that I will obtain the following output:
Y = np.array([[0., 0., 1., 1., 1.], [1., 1., 0., 0., 0.], [0., 1., 1., 1., 0.]])
assuming that I want to smooth/expand 1 element to the left or the right of the one-hot element. Thank you for the help!
We can use convolution -
In [22]: from scipy.signal import convolve2d
In [23]: convolve2d(X,np.ones((1,3)),'same')
Out[23]:
array([[0., 0., 1., 1., 1.],
[1., 1., 0., 0., 0.],
[0., 1., 1., 1., 0.]])
With binary-dilation to be more memory-efficient -
In [43]: from scipy.ndimage.morphology import binary_dilation
In [46]: binary_dilation(X,np.ones((1,3), dtype=bool)).view('i1')
Out[46]:
array([[0, 0, 1, 1, 1],
[1, 1, 0, 0, 0],
[0, 1, 1, 1, 0]], dtype=int8)
Or since we only 0s and 1s, uniform filter would also work and additionally we can use it along a generic axis (axis=1 in our case) and should be better on perf. -
In [47]: from scipy.ndimage import uniform_filter1d
In [50]: (uniform_filter1d(X,size=3,axis=1)>0).view('i1')
Out[50]:
array([[0, 0, 1, 1, 1],
[1, 1, 0, 0, 0],
[0, 1, 1, 1, 0]], dtype=int8)
You could convolve X with an array of ones:
from scipy.signal import convolve2d
convolve2d(X, np.ones((1,3)), mode='same')
array([[0., 0., 1., 1., 1.],
[1., 1., 0., 0., 0.],
[0., 1., 1., 1., 0.]])
Solution based on standard np.convolve:
import numpy as np
np.array([np.convolve(x, np.array([1,1,1]), mode='same') for x in X])
Iterate rows using list comprehension to convolve, then convert back to np.array
I want to copared each vector from one array with all vectors from another array, and count how many symbols matches per vector. Let me show an example.
I have two arrays, a and b.
For each vector in a, I want to compare it with each vector in b. I then want to return a new array which is with dimension np.array((len(a),14)) where each vector holds the number of times vectors in a had 0,1,2,3,4,..,12,13 matches with vectors from b. The wished results are shown in array c below.
I already have solved this problem using np.newaxis() but my issue is (see my function below), that this takes up so much memory so my computer can't handle it when a and b gets larger. Hence, I am looking for a more efficient way to do this calculation, as it hurts my memory big time to add on dimensions to the vectors. One solution is to go with a normal for loop, but this method is rather slow.
Is it possible to make these calculations more efficient?
a = array([[1., 1., 1., 2., 1., 1., 2., 1., 0., 2., 2., 2., 2.],
[0., 2., 2., 0., 1., 1., 0., 1., 1., 0., 2., 1., 2.],
[0., 0., 0., 1., 1., 0., 2., 1., 2., 0., 1., 2., 2.],
[1., 2., 2., 0., 1., 1., 0., 2., 0., 1., 1., 0., 2.],
[1., 2., 0., 2., 2., 0., 2., 0., 0., 1., 2., 0., 0.]])
b = array([[0., 2., 0., 0., 0., 0., 0., 1., 1., 1., 0., 2., 2.],
[1., 0., 1., 2., 2., 0., 1., 1., 1., 1., 2., 1., 2.],
[1., 2., 1., 2., 0., 0., 0., 1., 1., 2., 2., 0., 2.],
[0., 1., 2., 0., 2., 1., 0., 1., 2., 0., 0., 0., 2.],
[0., 2., 2., 1., 2., 1., 0., 1., 1., 1., 2., 2., 2.],
[0., 2., 2., 1., 0., 1., 1., 0., 1., 0., 2., 2., 1.],
[1., 0., 2., 2., 0., 1., 0., 1., 0., 1., 1., 2., 2.],
[1., 1., 0., 2., 1., 1., 1., 1., 0., 2., 0., 2., 2.],
[1., 2., 0., 0., 0., 1., 2., 1., 0., 1., 2., 0., 1.],
[1., 2., 1., 2., 2., 1., 2., 0., 2., 0., 0., 1., 1.]])
c = array([[0, 0, 0, 2, 1, 2, 2, 2, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 0, 2, 3, 1, 2, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 3, 2, 4, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 3, 0, 3, 2, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 4, 0, 3, 0, 1, 0, 0, 0, 0, 0]])
My solution:
def new_method_test(a,b):
test = (a[:,np.newaxis] == b).sum(axis=2)
zero = (test == 0).sum(axis=1)
one = (test == 1).sum(axis=1)
two = (test == 2).sum(axis=1)
three = (test == 3).sum(axis=1)
four = (test == 4).sum(axis=1)
five = (test == 5).sum(axis=1)
six = (test == 6).sum(axis=1)
seven = (test == 7).sum(axis=1)
eight = (test == 8).sum(axis=1)
nine = (test == 9).sum(axis=1)
ten = (test == 10).sum(axis=1)
eleven = (test == 11).sum(axis=1)
twelve = (test == 12).sum(axis=1)
thirteen = (test == 13).sum(axis=1)
c = np.concatenate((zero,one,two,three,four,five,six,seven,eight,nine,ten,eleven,twelve,thirteen), axis = 0).reshape(14,len(a)).T
return c
Thank you for you help.
welcome to Stackoverflow! I think a for loop is the way to go if you want to save memory (and it's really not that slow). Additionally you can directly go from one test to your c output matrix with np.bincount. I think this method will be approximately equally fast as yours and it will use significantly less memory by comparison.
import numpy as np
c = np.empty(a.shape, dtype=int)
for i in range(a.shape[0]):
test_one_vector = (a[i,:]==b).sum(axis=1)
c[i,:] = np.bincount(test_one_vector, minlength=a.shape[1])
Small sidenote if you are really dealing with floating point numbers in a and b you should consider dropping the equality check (==) in favor of a proximity check like e.g. np.isclose
Let's suppose I have a matrix with a number of binary values:
matrix([[1., 1., 1., 0., 0.],
[0., 0., 1., 1., 1.],
[0., 0., 0., 1., 0.],
[0., 0., 0., 0., 1.]])
Using np.sum(M, 0) produces:
matrix([[1., 1., 2., 2., 2.]])
How do I remove all of the columns from the matrix that have only the value of 1?
Easier to have an array here:
M = M.A
Now using simple slicing:
M[:, np.sum(M, 0)!=1]
array([[1., 0., 0.],
[1., 1., 1.],
[0., 1., 0.],
[0., 0., 1.]])
You can convert the matrix to array. Then find the index with values 1 and then use those indexes to delete the values. For example you can do the following.
import numpy as np
M = np.matrix([[1, 1, 1, 0, 0], [0, 0, 1, 1, 1], [0, 0, 0, 1, 0], [0, 0, 0, 0, 1]])
M = np.sum(M, 0)
# conversion to array
array = np.squeeze(np.asarray(M))
index_of_elements_with_value_1 = [i for i, val in enumerate(array) if val == 1]
array = np.delete(array, index_of_elements_with_value_1)
print(array)
A is an numpy array with shape (6, 8)
I want:
x_id = np.array([0, 3])
y_id = np.array([1, 3, 4, 7])
A[ [x_id, y_id] += 1 # this doesn't actually work.
Tricks like ::2 won't work because the indices do not increase regularly.
I don't want to use extra memory to repeat [0, 3] and make a new array [0, 3, 0, 3] because that is slow.
The indices for the two dimensions do not have equal length.
which is equivalent to:
A[0, 1] += 1
A[3, 3] += 1
A[0, 4] += 1
A[3, 7] += 1
Can numpy do something like this?
Update:
Not sure if broadcast_to or stride_tricks is faster than nested python loops. (Repeat NumPy array without replicating data?)
You can convert y_id to a 2d array with the 2nd dimension the same as x_id, and then the two indices will be automatically broadcasted due to the dimension difference:
x_id = np.array([0, 3])
y_id = np.array([1, 3, 4, 7])
A = np.zeros((6,8))
A[x_id, y_id.reshape(-1, x_id.size)] += 1
A
array([[ 0., 1., 0., 0., 1., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 1., 0., 0., 0., 1.],
[ 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0.]])
I have a large numpy matrix M. Some of the rows of the matrix have all of their elements as zero and I need to get the indices of those rows. The naive approach I'm considering is to loop through each row in the matrix and then check each elements.
What would be a better and a faster approach to accomplish this using numpy?
Here's one way. I assume numpy has been imported using import numpy as np.
In [20]: a
Out[20]:
array([[0, 1, 0],
[1, 0, 1],
[0, 0, 0],
[1, 1, 0],
[0, 0, 0]])
In [21]: np.where(~a.any(axis=1))[0]
Out[21]: array([2, 4])
It's a slight variation of this answer: How to check that a matrix contains a zero column?
Here's what's going on:
The any method returns True if any value in the array is "truthy". Nonzero numbers are considered True, and 0 is considered False. By using the argument axis=1, the method is applied to each row. For the example a, we have:
In [32]: a.any(axis=1)
Out[32]: array([ True, True, False, True, False], dtype=bool)
So each value indicates whether the corresponding row contains a nonzero value. The ~ operator is the binary "not" or complement:
In [33]: ~a.any(axis=1)
Out[33]: array([False, False, True, False, True], dtype=bool)
(An alternative expression that gives the same result is (a == 0).all(axis=1).)
To get the row indices, we use the where function. It returns the indices where its argument is True:
In [34]: np.where(~a.any(axis=1))
Out[34]: (array([2, 4]),)
Note that where returned a tuple containing a single array. where works for n-dimensional arrays, so it always returns a tuple. We want the single array in that tuple.
In [35]: np.where(~a.any(axis=1))[0]
Out[35]: array([2, 4])
The accepted answer works if the elements are int(0). If you want to find rows where all the values are 0.0 (floats), you have to use np.isclose():
print(x)
# output
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 1., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
1., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0.],
])
np.where(np.all(np.isclose(labels, 0), axis=1))
(array([ 0, 3]),)
Note: this also works with PyTorch Tensors, which is nice for when you want to find zeroed multihot encoding vectors.
Solution using np.sum,
useful if you want to use a threshold
a = np.array([[1.0, 1.0, 2.99],
[0.0000054, 0.00000078, 0.00000232],
[0, 0, 0],
[1, 1, 0.0],
[0.0, 0.0, 0.0]])
print(np.where(np.sum(np.abs(a), axis=1)==0)[0])
>>[2 4]
print(np.where(np.sum(np.abs(a), axis=1)<0.0001)[0])
>>[1 2 4]
Use np.prod to check if row contains atleast one zero element
print(np.where(np.prod(a, axis=1)==0)[0])
>>[2 3 4]
a = numpy.array([[10,0],[0,0],[0,10]])
isZero = numpy.all(a == 0, axis=1)
deleteFullZero = a[~numpy.all(a== 0, axis=1)]
#isZero >> [False True False]
#deleteFullZero >> [[10 0][0,10]]