This question might be too noob, but I was still not able to figure out how to do it properly.
I have a given array [0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3] (arbitrary elements from 0-5) and I want to have a counter for the occurence of zeros in a row.
1 times 6 zeros in a row
1 times 4 zeros in a row
2 times 1 zero in a row
=> (2,0,0,1,0,1)
So the dictionary consists out of n*0 values as the index and the counter as the value.
The final array consists of 500+ million values that are unsorted like the one above.
This should get you what you want:
import numpy as np
a = [0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3]
# Find indexes of all zeroes
index_zeroes = np.where(np.array(a) == 0)[0]
# Find discontinuities in indexes, denoting separated groups of zeroes
# Note: Adding True at the end because otherwise the last zero is ignored
index_zeroes_disc = np.where(np.hstack((np.diff(index_zeroes) != 1, True)))[0]
# Count the number of zeroes in each group
# Note: Adding 0 at the start so first group of zeroes is counted
count_zeroes = np.diff(np.hstack((0, index_zeroes_disc + 1)))
# Count the number of groups with the same number of zeroes
groups_of_n_zeroes = {}
for count in count_zeroes:
if groups_of_n_zeroes.has_key(count):
groups_of_n_zeroes[count] += 1
else:
groups_of_n_zeroes[count] = 1
groups_of_n_zeroes holds:
{1: 2, 4: 1, 6: 1}
Similar to #fgb's, but with a more numpythonic handling of the counting of the occurrences:
items = np.array([0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3])
group_end_idx = np.concatenate(([-1],
np.nonzero(np.diff(items == 0))[0],
[len(items)-1]))
group_len = np.diff(group_end_idx)
zero_lens = group_len[::2] if items[0] == 0 else group_len[1::2]
counts = np.bincount(zero_lens)
>>> counts[1:]
array([2, 0, 0, 1, 0, 1], dtype=int64)
This seems awfully complicated, but I can't seem to find anything better:
>>> l = [0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 0, 1, 2, 1, 0, 2, 3]
>>> import itertools
>>> seq = [len(list(j)) for i, j in itertools.groupby(l) if i == 0]
>>> seq
[6, 4, 1, 1]
>>> import collections
>>> counter = collections.Counter(seq)
>>> [counter.get(i, 0) for i in xrange(1, max(counter) + 1)]
[2, 0, 0, 1, 0, 1]
Related
For a one dimensional numpy array of 1's and 0's, how can I effectively "mask" the array such that after the occurrence of a 1, the next n elements of the array are converted to zero. After the n elements have passed, the pattern repeats such that the first next occurrence of a 1 is preserved followed again by n zeros.
It is important that the first eligible occurrences of 1 are preserved, so a simple mask such as:
[true, false, false, true ...] won't work.
furthermore, the data set is massive so efficiency is important.
I've written crude python code to give me the desired results, but it is way too slow for what I need.
Here is an example:
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1]
n = 3
newData = []
tail = 0
for x in data:
if x == 1 and tail <= 0:
newData.append(1)
tail = n
else:
newData.append(0)
tail -= 1
print(newData)
newData: [0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1]
Is there possibly a vectorized numpy solution to this problem?
I'm processing tens of thousands of arrays, with more than a million elements in each array. So far using numpy functions has been the only way to manage this.
As far as I know, there is no option completely in numpy to do this. You could still use numpy to reduce the time for grabbing the indices, though.
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1]
n=3
def get_new_data(data,n):
new_data = np.zeros(len(data))
non_zero = np.argwhere(data).ravel()
idx = non_zero[0]
new_data[idx] =1
idx += n
for i in non_zero[1:]:
if i > idx:
new_data[i] = 1
idx+=n
return new_data
get_new_data(data, n)
A function like this should give you a better run time since you are not looping over the whole array.
If this is still not optimal to you, you can look at using numba, which works very well with numpy and is relatively easy to use.
You could do it like this:-
N = 3
data = [0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 1, 1]
newData = data.copy()
i = 0
M = [0 for _ in range(N)]
while i < len(newData) - N:
if newData[i] == 1:
newData[i + 1:i + 1 + N] = M
i += N
i += 1
print(newData)
I have an array indexs. It's very long (>10k), and each int value is rather small (<100). e.g.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0]) # int index array
indexs_max = 4 # already known
Now I want to count occurrence of each index value (e.g. 0 for 3 times, 1 for 2 times...), and get counts as np.array([3, 2, 1, 1, 1]). I have tested 4 methods as follows:
UPDATE: _test4 is #Ch3steR's sol:
indexs = np.random.randint(0, 10, (20000,))
indexs_max = 9
def _test1():
counts = np.zeros((indexs_max + 1, ), dtype=np.int32)
for ind in indexs:
counts[ind] += 1
return counts
def _test2():
counts = np.zeros((indexs_max + 1,), dtype=np.int32)
uniq_vals, uniq_cnts = np.unique(indexs, return_counts=True)
counts[uniq_vals] = uniq_cnts
# this is because some value in range may be missing
return counts
def _test3():
therange = np.arange(0, indexs_max + 1)
counts = np.sum(indexs[None] == therange[:, None], axis=1)
return counts
def _test4():
return np.bincount(indexs, minlength=indexs_max+1)
Run for 500 times, their time usage are respectively 32.499472856521606s, 0.31386804580688477s, 0.14069509506225586s, 0.017721891403198242s. Although _test3 is the fastest, it uses additional big memory.
So I'm asking for any better methods. Thank u :) (#Ch3steR)
UPDATE: np.bincount seems optimal so far.
You can use np.bincount to count the occurrences in an array.
indexs = np.array([1, 4, 3, 0, 0, 1, 2, 0])
np.bincount(indexs)
# array([3, 2, 1, 1, 1])
# 0's 1's 2's 3's 4's count
There's a caveat to it np.bincount(x).size == np.amax(x)+1
Example:
indexs = np.array([5, 10])
np.bincount(indexs)
# array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1])
# 5's 10's count
Here's it would count occurrences of 0 to the max in the array, a workaround can be
c = np.bincount(indexs) # indexs is [5, 10]
c = c[c>0]
# array([1, 1])
# 5's 10's count
If you have no missing values from i.e from 0 to your_max you can use np.bincount.
Another caveat:
From docs:
Count the number of occurrences of each value in an array of non-negative ints.
With
input = [0,0,5,9,0,4,10,3,0]
as list
I need an output, which is going to be two highest values in input while setting other list elements to zero.
output = [0,0,0,9,0,0,10,0,0]
The closest I got:
from itertools import compress
import numpy as np
import operator
input= [0,0,5,9,0,4,10,3,0]
top_2_idx = np.argsort(test)[-2:]
test[top_2_idx[0]]
test[top_2_idx[1]]
Can you please help?
You can sort, find the two largest values, and then use a list comprehension:
input = [0,0,5,9,0,4,10,3,0]
*_, c1, c2 = sorted(input)
result = [0 if i not in {c1, c2} else i for i in input]
Output:
[0, 0, 0, 9, 0, 0, 10, 0, 0]
Not as pretty as Ajax's solution but a O(n) solution and a little more dynamic:
from collections import deque
def zero_non_max(lst, keep_top_n):
"""
Returns a list with all numbers zeroed out
except the keep_top_n.
>>> zero_non_max([0, 0, 5, 9, 0, 4, 10, 3, 0], 3)
>>> [0, 0, 5, 9, 0, 0, 10, 0, 0]
"""
lst = lst.copy()
top_n = deque(maxlen=keep_top_n)
for index, x in enumerate(lst):
if len(top_n) < top_n.maxlen or x > top_n[-1][0]:
top_n.append((x, index))
lst[index] = 0
for val, index in top_n:
lst[index] = val
return lst
lst = [0, 0, 5, 9, 0, 4, 10, 3, 0]
print(zero_non_max(lst, 2))
Output:
[0, 0, 0, 9, 0, 0, 10, 0, 0]
Pure numpy approach:
import numpy as np
arr = np.array([0, 0, 5, 9, 0, 4, 10, 3, 0])
top_2_idx = np.argsort(arr)[-2:]
np.put(arr, np.argwhere(~np.isin(arr, arr[top_2_idx])), 0)
print(arr)
The output:
[ 0 0 0 9 0 0 10 0 0]
Numpy.put
It's possible to achieve this with a single list traversal, making the algorithm O(n):
First find the two highest values with a single traversal;
Then create a list of zeros and add in the found maxima.
Code
def two_max(lst):
# Find two highest values in a single traversal
max_i, max_j = 0, 1
for i in range(len(lst)):
_, max_i, max_j = sorted((max_i, max_j, i), key=lst.__getitem__)
# Make a new list with zeros and replace both maxima
new_lst = [0] * len(lst)
new_lst[max_i], new_lst[max_j] = lst[max_i], lst[max_j]
return new_lst
lst = [0, 0, 5, 9, 0, 4, 10, 3, 0]
print(two_max(lst)) # [0, 0, 0, 9, 0, 0, 10, 0, 0]
Note that if the maximum value in the list appears more than twice, only the two left-most values will appear.
As a sidenote, do not use names such as input in your code as this overshadows the built-in function of the same name.
Here is another numpy-based solution that avoids sorting the entire array, which takes O(nlogn) time.
import numpy as np
arr = np.array([0,0,5,9,0,4,10,3,0])
arr[np.argpartition(arr,-2)[:-2]] = 0
If you want to create a new array as output:
result = np.zeros_like(arr)
idx = np.argpartition(arr,-2)[-2:]
result[idx] = arr[idx]
A corresponding Python-native solution is to use heap.nlargest, which also avoids sorting the entire array.
import heapq
arr = [0,0,5,9,0,4,10,3,0]
l = len(arr)
idx1, idx2 = heapq.nlargest(2, range(l), key=arr.__getitem__)
result = [0] * l
result[idx1] = arr[idx1]
result[idx2] = arr[idx2]
Consider a sequence of coin tosses: 1, 0, 0, 1, 0, 1 where tail = 0 and head = 1.
The desired output is the sequence: 0, 1, 2, 0, 1, 0
Each element of the output sequence counts the number of tails since the last head.
I have tried a naive method:
def timer(seq):
if seq[0] == 1: time = [0]
if seq[0] == 0: time = [1]
for x in seq[1:]:
if x == 0: time.append(time[-1] + 1)
if x == 1: time.append(0)
return time
Question: Is there a better method?
Using NumPy:
import numpy as np
seq = np.array([1,0,0,1,0,1,0,0,0,0,1,0])
arr = np.arange(len(seq))
result = arr - np.maximum.accumulate(arr * seq)
print(result)
yields
[0 1 2 0 1 0 1 2 3 4 0 1]
Why arr - np.maximum.accumulate(arr * seq)? The desired output seemed related to a simple progression of integers:
arr = np.arange(len(seq))
So the natural question is, if seq = np.array([1, 0, 0, 1, 0, 1]) and the expected result is expected = np.array([0, 1, 2, 0, 1, 0]), then what value of x makes
arr + x = expected
Since
In [220]: expected - arr
Out[220]: array([ 0, 0, 0, -3, -3, -5])
it looks like x should be the cumulative max of arr * seq:
In [234]: arr * seq
Out[234]: array([0, 0, 0, 3, 0, 5])
In [235]: np.maximum.accumulate(arr * seq)
Out[235]: array([0, 0, 0, 3, 3, 5])
Step 1: Invert l:
In [311]: l = [1, 0, 0, 1, 0, 1]
In [312]: out = [int(not i) for i in l]; out
Out[312]: [0, 1, 1, 0, 1, 0]
Step 2: List comp; add previous value to current value if current value is 1.
In [319]: [out[0]] + [x + y if y else y for x, y in zip(out[:-1], out[1:])]
Out[319]: [0, 1, 2, 0, 1, 0]
This gets rid of windy ifs by zipping adjacent elements.
Using itertools.accumulate:
>>> a = [1, 0, 0, 1, 0, 1]
>>> b = [1 - x for x in a]
>>> list(accumulate(b, lambda total,e: total+1 if e==1 else 0))
[0, 1, 2, 0, 1, 0]
accumulate is only defined in Python 3. There's the equivalent Python code in the above documentation, though, if you want to use it in Python 2.
It's required to invert a because the first element returned by accumulate is the first list element, independently from the accumulator function:
>>> list(accumulate(a, lambda total,e: 0))
[1, 0, 0, 0, 0, 0]
The required output is an array with the same length as the input and none of the values are equal to the input. Therefore, the algorithm must be at least O(n) to form the new output array. Furthermore for this specific problem, you would also need to scan all the values for the input array. All these operations are O(n) and it will not get any more efficient. Constants may differ but your method is already in O(n) and will not go any lower.
Using reduce:
time = reduce(lambda l, r: l + [(l[-1]+1)*(not r)], seq, [0])[1:]
I try to be clear in the following code and differ from the original in using an explicit accumulator.
>>> s = [1,0,0,1,0,1,0,0,0,0,1,0]
>>> def zero_run_length_or_zero(seq):
"Return the run length of zeroes so far in the sequnece or zero"
accumulator, answer = 0, []
for item in seq:
accumulator = 0 if item == 1 else accumulator + 1
answer.append(accumulator)
return answer
>>> zero_run_length_or_zero(s)
[0, 1, 2, 0, 1, 0, 1, 2, 3, 4, 0, 1]
>>>
I am creating a list in python 2.7
The list consists of 1's and 0's however I need the 1's to appear randomly in the list and a set amount of times.
Here is a way I found of doing this however can take a long time to create the list
numcor = 0
while numcor != (wordlen): #wordlen being the set amount of times
usewrong = []
for l in list(mymap):
if l == "L": #L is my map an telling how long the list needs to be
use = random.choice((True, False))
if use == True:
usewrong.append(0)
else:
usewrong.append(1)
numcor = numcor + 1
Is there a more effiecient way of doing this?
Simpler to way to create the list with 0s and '1's is:
>>> n, m = 5, 10
>>> [0]*n + [1]*m
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
where n is the number of 0s and m is the number of 1s
However if you want the list to be shuffled in random order, you may use random.shuffle() as:
>>> from random import shuffle
>>> mylist = [0]*n + [1]*m # n and m are from above example
>>> shuffle(mylist)
>>> mylist
[1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1]
Here is a different approach:
from random import *
# create a list full of 0's
ls = [0 for _ in range(10)]
# pick e.g. 3 non-duplicate random indexes in range(len(ls))
random_indexes = sample(range(len(ls)), 3)
# create in-place our random list which contains 3 1's in random indexes
ls = [1 if (i in random_indexes) else ls[i] for i,j in enumerate(ls)]
The output will be:
>>> ls
[0, 1, 0, 1, 0, 0, 0, 0, 1, 0]