Replace consecutive identic elements in the beginning of an array with 0 - python

I want to replace the N first identic consecutive numbers from an array with 0.
import numpy as np
x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
OUT -> np.array([0, 0, 0, 0 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
Loop works, but what would be a faster-vectorized implementation?
i = 0
first = x[0]
while x[i] == first and i <= x.size - 1:
x[i] = 0
i += 1

You can use argmax on a boolean array to get the index of the first changing value.
Then slice and replace:
n = (x!=x[0]).argmax() # 4
x[:n] = 0
output:
array([0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2])
intermediate array:
(x!=x[0])
# n=4
# [False False False False True True True True True True True True
# True True True True True True True]

My solution is based on itertools.groupby, so start from import itertools.
This function creates groups of consecutive equal values, contrary to e.g.
the pandasonic version of groupby, which collects withis a single group all
equal values from the input.
Another important feature is that you can assign any value to N and
replaced will be only the first N of a sequence of consecutive values.
To test my code, I set N = 4 and defined the source array as:
x = np.array([1, 1, 1, 1, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 2, 2, 2, 2, 2])
Note that it contains 5 consecutive values of 2 at the end.
Then, to get the expected result, run:
rv = []
for key, grp in itertools.groupby(x):
lst = list(grp)
lgth = len(lst)
if lgth >= N:
lst[0:N] = [0] * N
rv.extend(lst)
xNew = np.array(rv)
The result is:
[0, 0, 0, 0, 2, 3, 1, 2, 3, 2, 2, 2, 3, 3, 3, 1, 1, 0, 0, 0, 0, 2]
Note that a sequence of 4 zeroes occurs:
at the beginning (all 4 values of 1 have been replaced),
almost at the end (from 5 values of 2 first 4 have been replaced).

Related

Number of times an array is present in another array in Python

How can I count the number of times an array is present in a larger array?
a = np.array([1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])
b = np.array([1, 1, 1])
The count for the number of times b is present in a should be 3
b can be any combination of 1s and 0s
I'm working with huge arrays, so for loops are pretty slow
If the subarray being searched for contains all 1s, you can count the number of times the subarray appears in the larger array by convolving the two arrays with np.convolve and counting the number of entries in the result that equal the size of the subarray:
# 'valid' = convolve only over the complete overlap of the signals
>>> np.convolve(a, b, mode='valid')
array([1, 1, 2, 3, 2, 2, 2, 3, 3, 2, 1, 1])
# ^ ^ ^ <= Matches
>>> win_size = min(a.size, b.size)
>>> np.count_nonzero(np.convolve(a, b) == win_size)
3
For subarrays that may contain 0s, you can start by using convolution to transform a into an array containing the binary numbers encoded by each window of size b.size. Then just compare each element of the transformed array with the binary number encoded by b and count the matches:
>>> b = np.array([0, 1, 1]) # encodes '3'
>>> weights = 2 ** np.arange(b.size) # == [1, 2, 4, 8, ..., 2**(b.size-1)]
>>> np.convolve(a, weights, mode='valid')
array([4, 1, 3, 7, 6, 5, 3, 7, 7, 6, 4, 1])
# ^ ^ Matches
>>> target = (b * np.flip(weights)).sum() # target==3
>>> np.count_nonzero(np.convolve(a, weights, mode='valid') == target)
2
Not a super fast method, but you can view a as a windowed array using np.lib.stride_tricks.sliding_window_view:
window = np.lib.stride_tricks.sliding_window_view(a, b.shape)
You can now equate this to b directly and find where they match:
result = (window == b).all(-1).sum()
For older versions of numpy (pre-1.20.0), you can use np.libs.stride_tricks.as_strided to achieve a similar result:
window = np.lib.stride_tricks.as_strided(
a, shape=(*(np.array(a.shape) - b.shape + 1), *b.shape),
strides=a.strides + (a.strides[0],) * b.ndim)
Here is a solution using a list comprehension:
a = [1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1]
b = [1, 1, 1]
sum(a[i:i+len(b)]==b for i in range(len(a)-len(b)))
output: 3
Here are a few improvements on #Brian's answer:
Use np.correlate not np.convolve; they are nearly identical but convolve reads a and b in opposite directions
To deal with templates that have zeros convert the zeros to -1. For example:
a = np.array([1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1])
b = np.array([0,1,1])
np.correlate(a,2*b-1)
# array([-1, 1, 2, 1, 0, 0, 2, 1, 1, 0, -1, 1])
The template fits where the correlation equals the number of ones in the template. The indices can be extracted like so:
(np.correlate(a,2*b-1)==np.count_nonzero(b)).nonzero()[0]
# array([2, 6])
If you only need the count use np.count_nonzero
np.count_nonzero((np.correlate(a,2*b-1)==np.count_nonzero(b)))
# 2

Python How to count intersection of a specific value in both array?

I have two array A, B which both have values of [0 , 1 , 2] (same size)
I want to count the intersection of index for value 1. Basically in another word, I want to check the precision of value 1 base on array A.
So far I have tried map function but it doesnt work.
temp = list(map(lambda x,y: (x is y) == 1 ,A ,B))
However the result is not what I expected. Can you show some advice or example on how to solve this problem ?
Try this:
x = np.array([0, 1, 2, 3, 1, 4, 5])
y = np.array([0, 1, 2, 4, 1, 3, 5])
print(np.sum(list(map(lambda x,y: (x==y==1) , x, y))))
output:
2
Tensorflow code:
elems = (np.array([0, 1, 2, 3, 1, 4, 5, 0, 1, 2, 3, 1, 4, 5]), np.array([0, 1, 2, 4, 1, 3, 5, 0, 1, 2, 3, 1, 4, 5]))
alternate = tf.map_fn(lambda x: tf.math.logical_and(tf.equal(x[0], 1), tf.equal(x[0], x[1])), elems, dtype=tf.bool)
print(alternate)
print(tf.reduce_sum(tf.cast(alternate, tf.float32)))
output:
tf.Tensor([False True False False True False False False True False False True False False], shape=(14,), dtype=bool)
tf.Tensor(4.0, shape=(), dtype=float32)

Replacing values in Numpy array based on permutations

I have a large np array called X (size:32000) filled with duplicate values of 0, 1, 2, 3.
I want to replace each of the values(0, 1, 2, 3) with permutations of the following numbers: 0, 1, 2, 3, 4, 5
For example, 0, 1, 2, 3 can be replaced with following:
1, 5, 3, 4
5, 2, 4, 3
0, 5, 1, 4
and so on.(there are 360 such permutations in total)
How can I take each of the 360 permutations and replace the 32000 values in X accordingly such that finally I have 360 versions of X for each permutation?
You can try the method numpy.choose:
import numpy as np
x = np.array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3,])
perm = [1, 5, 3, 4,]
x = np.choose(x, perm)
np.choose(x, perm) will choose a value from perm for each value of x, taking x as a list of indices. I recommend looking at the documentation since this function can lead to confusion.

generate numbers between min and max with equal counts

I have a minimum value and maximum value, I'd like to generate a list of numbers between them such that all the numbers have equal counts. Is there a numpy function or any function out there?
Example: GenerateNums(start=1, stop=5, nums=10)
Expected output: [1,1,2,2,3,3,4,4,5,5] i.e each number has an almost equal count
Takes "almost equal" to heart -- the difference between the most common and least common number is at most 1. No guarantee about which number is the mode.
def gen_nums(start, stop, nums):
binsize = (1 + stop - start) * 1.0 / nums
return map(lambda x: int(start + binsize * x), xrange(nums))
gen_nums(1, 5, 10)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
There is a numpy function:
In [3]: np.arange(1,6).repeat(2)
Out[3]: array([1, 1, 2, 2, 3, 3, 4, 4, 5, 5])
def GenerateNums(start=1, stop=5, nums=10):
result = []
rep = nums/(stop - start + 1 )
for i in xraneg(start,stop):
for j in range(rep):
result.append(i)
return result
For almost equal counts, you can sample from a uniform distribution. numpy.random.randint does this:
>>> import numpy as np
>>> np.random.randint(low=1, high=6, size=10)
array([4, 5, 5, 4, 5, 5, 2, 1, 4, 2])
To get these values in sorted order:
>>> sorted(np.random.randint(low=1, high=6, size=10))
[1, 1, 1, 2, 3, 3, 3, 3, 5, 5]
This process is just like rolling dice :) As you sample more times, the counts of each value should become very similar:
>>> from collections import Counter
>>> Counter(np.random.randint(low=1, high=6, size=10000))
Counter({1: 1978, 2: 1996, 3: 2034, 4: 1982, 5: 2010})
For exactly equal counts:
>>> range(1,6) * 2
[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
>>> sorted(range(1,6) * 2)
[1, 1, 2, 2, 3, 3, 4, 4, 5, 5]
def GenerateNums(start=0,stop=0,nums=0,result=[]):
assert (nums and stop > 0), "ZeroDivisionError"
# get repeating value
iter_val = int(round(nums/stop))
# go through strt/end and repeat the item on adding
[[result.append(x) for __ in range(iter_val)] for x in range(start,stop)]
return result
print (GenerateNums(start=0, stop=5, nums=30))
>>> [0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4]

Comparing 2 numpy arrays

I have 2 numpy arrays and I want whenever element B is 1, the element in A is equal to 0. Both arrays are always in the same dimension:
A = [1, 2, 3, 4, 5]
B = [0, 0, 0, 1, 0]
I tried to do numpy slicing but I still can't get it to work.
B[A==1]=0
How can I achieve this in numpy without doing the conventional loop ?
First, you need them to be numpy arrays and not lists. Then, you just inverted B and A.
import numpy as np
A = np.array([1, 2, 3, 4, 5])
B = np.array([0, 0, 0, 1, 0])
A[B==1]=0 ## array([1, 2, 3, 0, 5])
If you use lists instead, here is what you get
A = [1, 2, 3, 4, 5]
B = [0, 0, 0, 1, 0]
A[B==1]=0 ## [0, 2, 3, 4, 5]
That's because B == 1 is False or 0 (instead of an array). So you essentially write A[0] = 0
Isn't it that what you want to do ?
A[B==1] = 0
A
array([1, 2, 3, 0, 5])

Categories