Find a sequence and remove previous entries - python

i have a question regarding the optimization of my code.
Signal = pd.Series([1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0])
I have a pandas series containing periodic bitcode. My aim is remove all entries that start before a certain sequence e.g. 1,1,0,0. So in my example I would expect a reduced series like that:
[1, 1, 0, 0, 1, 1, 0, 0]
I already have a solution for 1, 1 but its not very elegant and not that easily modified for my example: 1,1,0,0.
i = 0
bool = True
while bool:
a = Signal.iloc[i]
b = Signal.iloc[i + 1]
if a == 1 and b == 1:
bool = False
else:
i = i + 1
Signal = Signal[i:]
I appreciate your help.

We could take a rolling window view of the series with view_as_windows, check for equality with the sequence and find the first occurrence with argmax:
from skimage.util import view_as_windows
seq = [1, 1, 0, 0]
m = (view_as_windows(Signal.values, len(seq))==seq).all(1)
Signal[m.argmax():]
3 1
4 1
5 0
6 0
7 1
8 1
9 0
10 0
dtype: int64

I would go for regex - use a web interface to identify the pattern.
for example:
1,1,0,0,(.*\d)
would result in a group(1) output, that consists of all digits after the 1,1,0,0 pattern.

Related

Bit wise operator manipulation

I have a column within a dataframe that consists of either 1, 0 or -1.
For example:
<0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0>
How can I create a new column in that dataframe where it is a sequence of 1s from the first 1 to the first -1. And then starts again at 0 until the next 1.
In this example, it would be:
<0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,0,0,1,1,1,1,0,0,1,1,1,1, 1,0,0>
Essentially I’m trying to create a trading strategy where I buy when the price is >1.25 and sell when goes below 0.5. 1 represents buy and -1 represents sell. If I can get it into the form above I can easily implement it.
Not sure how your data is stored but an algorithm similar to the following might work without the use of bitwise operators
x = [0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0]
newcol = []
flag = 0
for char in x:
if char == 1 and flag == 0:
flag = 1
newcol.append(flag)
if char == -1 and flag == 1:
flag = 0
print(newcol)
Seems like a good use case for pandas:
import pandas as pd
s = pd.Series([0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0])
s2 = s.groupby(s[::-1].eq(-1).cumsum()).cummax()
print(s2.to_list())
Output:
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]

Pandas - Row number since last greater than 0 value

Let's say I have a Pandas series like so:
import pandas as pd
pd.Series([1, 0, 0, 1, 0, 0, 0], name='series')
How would I add a column with a row count since the last >0 number, like so:
pd.DataFrame({
'series': [1, 0, 0, 1, 0, 0, 0],
'row_num': [0, 1, 2, 0, 1, 2, 3]
})
Try this:
s.groupby(s.cumsum()).cumcount()
Output:
0 0
1 1
2 2
3 0
4 1
5 2
6 3
dtype: int64
Numpy
Find the places where the series/array is greater than 0
Calculate the differences from one place to the next
Subtract those values from a sequence
i = np.flatnonzero(s)
n = len(s)
delta = np.diff(np.append(i, n))
r = np.arange(n)
r - r[i].repeat(delta)
array([0, 1, 2, 0, 1, 2, 3])

Compute the length of consecutive true values in a list

Essentially this problem can be split into two parts. I have a set of binary values that indicate whether a given signal is present or not. Given that the each value also corresponds to a unit of time (in this case minutes) I am trying to determine how long the signal exists on average given its occurrence within the overall list of values throughout the period I'm analyzing. For example, if I have the following list:
[0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
I can see that the signal occurs 3 separate times for variable lengths of time (i.e. in the first case for 3 minutes). If I want to calculate the average length of time for each occurrence however I need an indication of how many independent instances of the signal exist (i.e. 3). I have tried various index based strategies such as:
arb_ops.index(1)
to find the next occurrence of true values and correspondingly finding the next occurrence of 0 to find the length but am having trouble contextualizing this into a recursive function for the entire array.
You could use itertools.groupby() to group consecutive equal elements. To calculate a group's length convert the iterator to a list and apply len() to it:
>>> from itertools import groupby
>>> lst = [0 ,0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0 ,1, 1, 1, 1, 0]
>>> for k, g in groupby(lst):
... g = list(g)
... print(k, g, len(g))
...
0 [0, 0, 0] 3
1 [1, 1, 1] 3
0 [0, 0] 2
1 [1] 1
0 [0, 0, 0] 3
1 [1, 1, 1, 1] 4
0 [0] 1
Another option may be MaskedArray.count, which counts non-masked elements of an array along a given axis:
import numpy.ma as ma
a = ma.arange(6).reshape((2, 3))
a[1, :] = ma.masked
a
masked_array(data =
[[0 1 2]
[-- -- --]],
mask =
[[False False False]
[ True True True]],
fill_value = 999999)
a.count()
3
You can extend Masked Arrays quite far...
#eugene-yarmash solution with the groupby is decent. However, if you wanted to go with a solution that requires no import, and where you do the grouping yourself --for learning purposes-- you could try this::
>>> l = [0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
>>> def size(xs):
... sz = 0
... for x in xs:
... if x == 0 and sz > 0:
... yield sz
... sz = 0
... if x == 1:
... sz += 1
... if sz > 0:
... yield sz
...
>>> list(size(l))
[3, 1, 4]
I think this problem is actually pretty simple--you know you have a new signal if you see a value is 1, and the previous value is 0.
The code I provided is kind of long, but super simple, and done without imports.
signal = [0,0,0,1,1,1,0,0,1,0,0,0,1,1,1,1,0]
def find_number_of_signals(signal):
index = 0
signal_counter = 0
signal_duration = 0
for i in range(len(signal) - 1):
if signal[index] == 1:
signal_duration += 1.0
if signal[index- 1] == 0:
signal_counter += 1.0
index += 1
print signal_counter
print signal_duration
print float(signal_duration / signal_counter)
find_number_of_signals(signal)

In order to generate all combinations of 1's and 0's we use a simple binary table. How can I easily create this binary table in an array?

For example the binary table for 3 bit:
0 0 0
0 0 1
0 1 0
1 1 1
1 0 0
1 0 1
And I want to store this into an n*n*2 array so it would be:
0 0 0
0 0 1
0 1 0
1 1 1
1 0 0
1 0 1
For generating the combinations automatically, you can use itertools.product standard library, which generates all possible combinations of the different sequences which are supplied, i. e. the cartesian product across the input iterables. The repeat argument comes in handy as all of our sequences here are identical ranges.
from itertools import product
x = [i for i in product(range(2), repeat=3)]
Now if we want an array instead a list of tuples from that, we can just pass this to numpy.array.
import numpy as np
x = np.array(x)
# [[0 0 0]
# [0 0 1]
# [0 1 0]
# [0 1 1]
# [1 0 0]
# [1 0 1]
# [1 1 0]
# [1 1 1]]
If you want all elements in a single list, so you could index them with a single index, you could chain the iterable:
from itertools import chain, product
x = list(chain.from_iterable(product(range(2), repeat=3)))
result: [0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1]
Most people would expect 2^n x n as in
np.c_[tuple(i.ravel() for i in np.mgrid[:2,:2,:2])]
# array([[0, 0, 0],
# [0, 0, 1],
# [0, 1, 0],
# [0, 1, 1],
# [1, 0, 0],
# [1, 0, 1],
# [1, 1, 0],
# [1, 1, 1]])
Explanation: np.mgrid as used here creates the coordinates of the corners of a unit cube which happen to be all combinations of 0 and 1. The individual coordinates are then ravelled and joined as columns by np.c_
Here's a recursive, native python (no libraries) version of it:
def allBinaryPossiblities(maxLength, s=""):
if len(s) == maxLength:
return s
else:
temp = allBinaryPossiblities(maxLength, s + "0") + "\n"
temp += allBinaryPossiblities(maxLength, s + "1")
return temp
print (allBinaryPossiblities(3))
It prints all possible:
000
001
010
011
100
101
110
111

Counting same elements in an array and create dictionary

This question might be too noob, but I was still not able to figure out how to do it properly.
I have a given array [0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3] (arbitrary elements from 0-5) and I want to have a counter for the occurence of zeros in a row.
1 times 6 zeros in a row
1 times 4 zeros in a row
2 times 1 zero in a row
=> (2,0,0,1,0,1)
So the dictionary consists out of n*0 values as the index and the counter as the value.
The final array consists of 500+ million values that are unsorted like the one above.
This should get you what you want:
import numpy as np
a = [0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3]
# Find indexes of all zeroes
index_zeroes = np.where(np.array(a) == 0)[0]
# Find discontinuities in indexes, denoting separated groups of zeroes
# Note: Adding True at the end because otherwise the last zero is ignored
index_zeroes_disc = np.where(np.hstack((np.diff(index_zeroes) != 1, True)))[0]
# Count the number of zeroes in each group
# Note: Adding 0 at the start so first group of zeroes is counted
count_zeroes = np.diff(np.hstack((0, index_zeroes_disc + 1)))
# Count the number of groups with the same number of zeroes
groups_of_n_zeroes = {}
for count in count_zeroes:
if groups_of_n_zeroes.has_key(count):
groups_of_n_zeroes[count] += 1
else:
groups_of_n_zeroes[count] = 1
groups_of_n_zeroes holds:
{1: 2, 4: 1, 6: 1}
Similar to #fgb's, but with a more numpythonic handling of the counting of the occurrences:
items = np.array([0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3])
group_end_idx = np.concatenate(([-1],
np.nonzero(np.diff(items == 0))[0],
[len(items)-1]))
group_len = np.diff(group_end_idx)
zero_lens = group_len[::2] if items[0] == 0 else group_len[1::2]
counts = np.bincount(zero_lens)
>>> counts[1:]
array([2, 0, 0, 1, 0, 1], dtype=int64)
This seems awfully complicated, but I can't seem to find anything better:
>>> l = [0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 0, 1, 2, 1, 0, 2, 3]
>>> import itertools
>>> seq = [len(list(j)) for i, j in itertools.groupby(l) if i == 0]
>>> seq
[6, 4, 1, 1]
>>> import collections
>>> counter = collections.Counter(seq)
>>> [counter.get(i, 0) for i in xrange(1, max(counter) + 1)]
[2, 0, 0, 1, 0, 1]

Categories