Compute the length of consecutive true values in a list - python

Essentially this problem can be split into two parts. I have a set of binary values that indicate whether a given signal is present or not. Given that the each value also corresponds to a unit of time (in this case minutes) I am trying to determine how long the signal exists on average given its occurrence within the overall list of values throughout the period I'm analyzing. For example, if I have the following list:
[0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
I can see that the signal occurs 3 separate times for variable lengths of time (i.e. in the first case for 3 minutes). If I want to calculate the average length of time for each occurrence however I need an indication of how many independent instances of the signal exist (i.e. 3). I have tried various index based strategies such as:
arb_ops.index(1)
to find the next occurrence of true values and correspondingly finding the next occurrence of 0 to find the length but am having trouble contextualizing this into a recursive function for the entire array.

You could use itertools.groupby() to group consecutive equal elements. To calculate a group's length convert the iterator to a list and apply len() to it:
>>> from itertools import groupby
>>> lst = [0 ,0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0 ,1, 1, 1, 1, 0]
>>> for k, g in groupby(lst):
... g = list(g)
... print(k, g, len(g))
...
0 [0, 0, 0] 3
1 [1, 1, 1] 3
0 [0, 0] 2
1 [1] 1
0 [0, 0, 0] 3
1 [1, 1, 1, 1] 4
0 [0] 1

Another option may be MaskedArray.count, which counts non-masked elements of an array along a given axis:
import numpy.ma as ma
a = ma.arange(6).reshape((2, 3))
a[1, :] = ma.masked
a
masked_array(data =
[[0 1 2]
[-- -- --]],
mask =
[[False False False]
[ True True True]],
fill_value = 999999)
a.count()
3
You can extend Masked Arrays quite far...

#eugene-yarmash solution with the groupby is decent. However, if you wanted to go with a solution that requires no import, and where you do the grouping yourself --for learning purposes-- you could try this::
>>> l = [0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
>>> def size(xs):
... sz = 0
... for x in xs:
... if x == 0 and sz > 0:
... yield sz
... sz = 0
... if x == 1:
... sz += 1
... if sz > 0:
... yield sz
...
>>> list(size(l))
[3, 1, 4]

I think this problem is actually pretty simple--you know you have a new signal if you see a value is 1, and the previous value is 0.
The code I provided is kind of long, but super simple, and done without imports.
signal = [0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
def find_number_of_signals(signal):
index = 0
signal_counter = 0
signal_duration = 0
for i in range(len(signal) - 1):
if signal[index] == 1:
signal_duration += 1.0
if signal[index- 1] == 0:
signal_counter += 1.0
index += 1
print signal_counter
print signal_duration
print float(signal_duration / signal_counter)
find_number_of_signals(signal)

Related

Find a sequence and remove previous entries

i have a question regarding the optimization of my code.
Signal = pd.Series([1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0])
I have a pandas series containing periodic bitcode. My aim is remove all entries that start before a certain sequence e.g. 1,1,0,0. So in my example I would expect a reduced series like that:
[1, 1, 0, 0, 1, 1, 0, 0]
I already have a solution for 1, 1 but its not very elegant and not that easily modified for my example: 1,1,0,0.
i = 0
bool = True
while bool:
a = Signal.iloc[i]
b = Signal.iloc[i + 1]
if a == 1 and b == 1:
bool = False
else:
i = i + 1
Signal = Signal[i:]
I appreciate your help.
We could take a rolling window view of the series with view_as_windows, check for equality with the sequence and find the first occurrence with argmax:
from skimage.util import view_as_windows
seq = [1, 1, 0, 0]
m = (view_as_windows(Signal.values, len(seq))==seq).all(1)
Signal[m.argmax():]
3 1
4 1
5 0
6 0
7 1
8 1
9 0
10 0
dtype: int64
I would go for regex - use a web interface to identify the pattern.
for example:
1,1,0,0,(.*\d)
would result in a group(1) output, that consists of all digits after the 1,1,0,0 pattern.

How can I get exactly the same amount elements replaced in numpy 2D matrix?

I got a symmetrical 2D numpy matrix, it only contains ones and zeros and diagonal elements are always 0.
I want to replace part of the elements from one to zero, and the result need to keep symmetrical too. How many elements will be selected depends on the parameterreplace_rate.
Since it's a symmetrical matrix, I take half of the matrix and select the elements(those values are 1) randomly, change them from 1 to 0. And then with a mirror operation, make sure the whole matrix are still symmetrical.
For example
com = np.array ([[0, 1, 1, 1, 1],
[1, 0, 1, 1, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 0, 1],
[1, 1, 1, 1, 0]])
replace_rate = 0.1
com = np.triu(com)
mask = np.random.choice([0,1],size=(com.shape),p=((1-replace_rate),replace_rate)).astype(np.bool)
r1 = np.random.rand(*com.shape)
com[mask] = r1[mask]
com += com.T - np.diag(com.diagonal())
com is a (5,5) symmetrical matrix, and 10% of elements (only include those values are 1, the diagonal elements are excluded) will be replaced to 0 randomly.
The question is , how can I make sure the amount of elements changed keep the same each time?
Keep the same replace_rate = 0.1, sometimes I will get result like:
com = np.array([[0 1 1 1 1]
[1 0 1 1 1]
[1 1 0 1 1]
[1 1 1 0 1]
[1 1 1 1 0]])
Actually no one changed this time, and if I repeat it, I got 2 elements changed :
com = np.array([[0 1 1 1 1]
[1 0 1 1 1]
[1 1 0 1 0]
[1 1 1 0 1]
[1 1 0 1 0]])
I want to know how to fix the amount of elements changed with the same replace_rate?
Thanks in advance!!
How about something like this:
def make_transform(m, replace_rate):
changed = [] # keep track of indices we already changed
def get_random():
# Get a random pair of indices which are not equal (i.e. not on the diagonal)
c1, c2 = random.choices(range(len(com)), k=2)
if c1 == c2 or (c1,c2) in changed or (c2,c1) in changed:
return get_random() # Recurse until we find an i,j pair : i!=j , that hasnt already been changed
else:
changed.append((c1,c2))
return c1, c2
n_changes = int(m.shape[0]**2 * replace_rate) # the number of changes to make
print(n_changes)
for _ in range(n_changes):
i, j = get_random() # Get an valid index
m[i][j] = m[j][i] = 0
return m
This is the solution I suggest:
def rand_zero(mat, replace_rate):
triu_mat = np.triu(mat)
_ind = np.where(triu_mat != 0) # gets indices of non-zero elements, not just non-diagonals
ind = [x for x in zip(*_ind)]
chng = np.random.choice(range(len(ind)), # select some indices, at rate 'replace_rate'
size = int(replace_rate*mat.size),
replace = False) # do not select duplicates
mod_mat = triu_mat
for c in chng:
mod_mat[ind[c]] = 0
mod_mat = mod_mat + mod_mat.T
return mod_mat
I use int() to truncate to an integer in size, but you can use round() if that's what you desire.
Hope this gives consistent results!

I want to take the XOR of all the elements of 1 list with another. How do I do it? [duplicate]

This question already has answers here:
How do you get the logical xor of two variables in Python?
(28 answers)
Closed 3 years ago.
I have a bunch of lists in the form of say [0,0,1,0,1...], and I want to take the XOR of 2 lists and give the output as a list.
Like:
[ 0, 0, 1 ] XOR [ 0, 1, 0 ] -> [ 0, 1, 1 ]
res = []
tmp = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
tmp = [i[index] ^ j[index] for index in range(len(i))]
res.append(temp)
The size of each of my lists / vectors is around 3500 elements, so I need something to save time, since this piece of code is taking more than 20 mins to run.
I have 3085 lists, each of which need an XOR operation with 4089 other lists.
How do I do this without iterating through each list explicitly?
Use map:
answer = list(map(operator.xor, lst1, lst2)).
or zip:
answer = [x ^ y for x,y in zip(lst1, lst2)]
If you need something faster, consider using NumPy instead of Python lists to hold your data.
Assuming a and b are the same size you can use the xor operation (i.e. ^) with simple list indexing:
a = [0, 0, 1]
b = [0, 1, 1]
c = [a[index] ^ b[index] for index in range(len(a))]
print(c) # [0, 1, 0]
or you can use zip with the xor:
a = [0, 0, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip(a, b)]
print(c) # [0, 1, 0]
zip will only go to the shortest list (if they are not the same size). If they are not the same size and you want to go to the longer list you can use zip_longest:
from itertools import zip_longest
a = [0, 0, 1, 1]
b = [0, 1, 1]
c = [x ^ y for x, y in zip_longest(a, b, fillvalue=0)]
print(c) # [0, 1, 0, 1]
Using numpy you should have some performance gains, the function you need is bitwise_xor, like so:
import numpy as np
results = []
for i in Employee_Specific_Vocabulary_Dict['Binary Vector']:
for j in Course_Specific_Vocabulary_Dict['Binary Vector']:
results.append(np.bitwise_xor(i, j))
A proof of concept:
a = [1,0,0,1,1]
b = [1,1,0,0,1]
x = np.bitwise_xor(a,b)
print("a\tb\tres")
for i in range(len(a)):
print("{}\t{}\t{}".format(a[i], b[i], x[i]))
output:
a b x
1 1 0
0 1 1
0 0 0
1 0 1
1 1 0
Edit
Note that if your arrays have the same size, you can simply do one operation and the bitwise_xor will still work, so:
a = [[1,1,0], [0,0,1]]
b = [[0,1,0], [1,0,1]]
res = np.bitwise_xor(a, b)
will still work, and you'll have:
res: [[1, 0, 0], [1, 0, 0]]
In your case, a workaround would possibily be:
results = []
n = len(Course_Specific_Vocabulary_Dict['Binary Vector'])
for a in Employee_Specific_Vocabulary_Dict['Binary Vector']:
# Get same size array w.r.t Course_Specific_Vocabulary_Dict["Binary Vector]
repeated_a = np.repeat([a], n, axis=0)
results.append(np.bitwise_xor(repeated_a, Course_Specific_Vocabulary_Dict['Binary Vector']))
However I don't know if that would actually improve performance, it is to be checked; for sure it will require some more memory.

find the number of separated pairs in array

Good Morning,
I have a numpy array like:
[0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0]
and I would like to find the number of separated pairs of 1.
Three (or more) consecutive 1s count also as pair, i.e.: in this example the returned number should be 3.
What is the best technique to achieve this?
Thanks a lot!
using itertools.groupby,
k hold the unique key 0/1 based list lst below, g hold the correspond group iterator for the unique key k
import itertools
target = 1
lst = [0,1,1,0,1,1,1,0,0,1,0,1,1,0]
pair_count = 0
for k,g in itertools.groupby(lst):
if k==target and len(list(g))>1: # match target and more than 1 count as pair
pair_count += 1
# pair_count = 3
You can easily implement it yourself, see my code below:
l = [0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0]
# Create flag to skip all elements that have >2 1 in sequence
added = False
pairs_counter = 0
for i in range(1, len(l)):
rem = l[i - 1]
if l[i] == 1 and rem == 1 and not added:
added = True
pairs_counter +=1
if l[i] == 0:
added = False
print (pairs_counter)
Complexity of this method is O(n)
I would go for something like this:
a = [0 1 1 0 1 1 1 0 0 1 0 1 1 0]
sum([
a[i-2] == a[i-1] == 1 and a[i] == 0
for i in xrange(2,len(a))
]) + (len(a) > 2 and a[-1] == a[-2] == 1)
It just keeps adding Trues and Falses together. I guess some people will find it ugly, I think it is OK.
It should be noted though, that if the list is really large, this is not a good option, since it creates the list of bools in memory. It would be easy to avoid that.
Use itertools.groupby and sum
sum(1 for target, group_count in itertools.groupby(lst)
if target == 1 and len(list(group_count)) >= 2)

Efficient way of re-numbering elements in an array

I am reasonably new to python and am trying to implement a genetic algorithm, but need some assistance with the code for one of the operations.
I have formulated the problem this way:
each individual I is represented by a string of M integers
each element e in I takes a value from 0 to N
every number from 0 - N must appear in I at least once
the value of e is not important, so long as each uniquely valued element takes the same unique value (think of them as class labels)
e is less than or equal to N
N can be different for each I
after applying the crossover operation i can potentially generate children which violate one or more of these constraints, so i need to find a way to re-number the elements so that they retain their properties, but fit with the constraints.
for example:
parent_1 (N=5): [1 3 5 4 2 1|0 0 5 2]
parent_2 (N=3): [2 0 1 3 0 1|0 2 1 3]
*** crossover applied at "|" ***
child_1: [1 3 5 4 2 1 0 2 1 3]
child_2: [2 0 1 3 0 1 0 0 5 2]
child_1 obviously still satisfies all of the constraints, as N = 5 and all values 0-5 appear at least once in the array.
The problem lies with child 2 - if we use the max(child_2) way of calculating N we get a value of 5, but if we count the number of unique values then N = 4, which is what the value for N should be. What I am asking (in a very long winded way, granted) is what is a good, pythonic way of doing this:
child_2: [2 0 1 3 0 1 0 0 5 2]
*** some python magic ***
child_2': [2 0 1 3 0 1 0 0 4 2]
*or*
child_2'': [0 1 2 3 1 2 1 1 4 0]
child_2'' is there to illustrate that the values themselves dont matter, so long as each element of a unique value maps to the same value, the constraints are satisfied.
here is what i have tried so far:
value_map = []
for el in child:
if el not in value_map:
value_map.append(el)
for ii in range(0,len(child)):
child[ii] = value_map.index(child[ii])
this approach works and returns a result similar to child_2'', but i can't imagine that it is very efficient in the way it iterates over the string twice, so i was wondering if anyone has any suggestions of how to make it better.
thanks, and sorry for such a long post for such a simple question!
You will need to iterates the list more than once, I don't think there's any way around this. After all, you first have to determine the number of different elements (first pass) before you can start changing elements (second pass). Note, however, that depending on the number of different elements you might have up to O(n^2) due to the repetitive calls to index and not in, which have O(n) on a list.
Alternatively, you could use a dict instead of a list for your value_map. A dictionary has much faster lookup than a list, so this way, the complexity should indeed be on the order of O(n). You can do this using (1) a dictionary comprehension to determine the mapping of old to new values, and (2) a list comprehension for creating the updated child.
value_map = {el: i for i, el in enumerate(set(child))}
child2 = [value_map[el] for el in child]
Or change the child in-place using a for loop.
for i, el in enumerate(child):
child[i] = value_map[el]
You can do it with a single loop like this:
value_map = []
result = []
for el in child:
if el not in value_map:
value_map.append(el)
result.append(value_map.index(el))
One solution I can think of is:
Determine the value of N and determine unused integers. (this forces you to iterate over the array once)
Go through the array and each time you meet a number superior to N, map it to an unused integer.
This forces you to go through the arrays twice, but it should be faster than your example (that forces you to go through the value_map at each element of the array at each iteration)
child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
used = set(child)
N = len(used) - 1
unused = set(xrange(N+1)) - used
value_map = dict()
for i, e in enumerate(child):
if e <= N:
continue
if e not in value_map:
value_map[e] = unused.pop()
child[i] = value_map[e]
print child # [2, 0, 1, 3, 0, 1, 0, 0, 4, 2]
I like #Selçuk Cihan answer. It can also be done in place.
>>> child = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
>>>
>>> value_map = []
>>> for i in range(len(child)):
... el = child[i]
... if el not in value_map:
... value_map.append(el)
... child[i] = value_map.index(el)
...
>>> child
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
I believe that this works, although I didn't test it for more than the single case that is given in the question.
The only thing that bothers me is that value_map appears three times in the code...
def renumber(individual):
"""
>>> renumber([2, 0, 1, 3, 0, 1, 0, 0, 4, 2])
[0, 1, 2, 3, 1, 2, 1, 1, 4, 0]
"""
value_map = {}
return [value_map.setdefault(e, len(value_map)) for e in individual]
Here is a fast solution, which iterates the list only once.
a = [2, 0, 1, 3, 0, 1, 0, 0, 5, 2]
b = [-1]*len(a)
j = 0
for i in range(len(a)):
if b[a[i]] == -1:
b[a[i]] = j
a[i] = j
j += 1
else:
a[i] = b[a[i]]
print(a) # [0, 1, 2, 3, 1, 2, 1, 1, 4, 0]

Categories