I have a column within a dataframe that consists of either 1, 0 or -1.
For example:
<0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0>
How can I create a new column in that dataframe where it is a sequence of 1s from the first 1 to the first -1. And then starts again at 0 until the next 1.
In this example, it would be:
<0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,1,1,0,0,1,1,1,1,0,0,1,1,1,1, 1,0,0>
Essentially I’m trying to create a trading strategy where I buy when the price is >1.25 and sell when goes below 0.5. 1 represents buy and -1 represents sell. If I can get it into the form above I can easily implement it.
Not sure how your data is stored but an algorithm similar to the following might work without the use of bitwise operators
x = [0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0]
newcol = []
flag = 0
for char in x:
if char == 1 and flag == 0:
flag = 1
newcol.append(flag)
if char == -1 and flag == 1:
flag = 0
print(newcol)
Seems like a good use case for pandas:
import pandas as pd
s = pd.Series([0,0,0,1,0,0,1,0,0,1,0,-1,0,0,1,0,0,-1,0,0,1,0,0,1,0,0,1,0,0,0,-1,0,0])
s2 = s.groupby(s[::-1].eq(-1).cumsum()).cummax()
print(s2.to_list())
Output:
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0]
Related
I'm trying to implement an algorithm in a boolean list. It looks like:
price_list = [0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1]
Let's say it captures daily price change of a stock, such as 0 means a decrease and 1 means an increase of price.
Consider a block of 4 days. Starting on the first day, if the price drops (value 0), return -1 and bid 1 on the second day. If the price increases (value 1), return 1 and bid 2 on the second day.
In the second day, if the price drops, return -2 and bid 1 on the third day. If it increases, return 2 and bid 3 on the third day.
On the third day, if the price drops, return -3 and bid 1 on the fourth day. If it increases, return 3 and bid 4 on the fourth day.
On the fourth day, if the price drops, return -4. If it increases, return 4. Either way, bid 1 on the next day. We can say then that it finishes a circle of bidding. The issue here is it can only be finished, or converged, if there are 3 days of price increase, so that regardless of the result in day 4, we go back to bidding 1. Otherwise, the bidding of the next day will depend on the previous day(s).
Applying this rule for the above list, the return list will be:
price_list = [ 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1]
return_list = [-1, 1, 3, -2, 1, -3, 1, 3, 2, 4, -1, -1, -1, 1, 3, -2, -1, 1]
EDIT: Got a few comments asking for my code, perhaps to show some attempt. It's still unfinished though (in fact I think I'm going nowhere with this)
df_temp = pd.DataFrame({'price' : np.random.randint(0, 2, 1000) })
df_temp['block'] = df_temp['price'].groupby(df_temp['price'].diff().ne(0).cumsum()).transform('count')
df_temp['block_shift'] = df_temp['block'].shift(1)
df_temp['mask'] = df_temp['block'] == df_temp['block_shift']
df_temp['cs'] = df_temp['mask'].cumsum() - df_temp['mask'].cumsum().where(~df_temp['mask']).ffill().fillna(0) + 1
df_temp['block_2'] = df_temp['block']
df_temp.loc[(df_temp['block'] > 4) & (df_temp['cs'] <= 4) , 'block_2'] = 4
df_temp.loc[(df_temp['cs'] > 4) & (df_temp['cs'] % 4 != 0) , 'block_2'] = df_temp['cs'] % 4
df_temp.loc[(df_temp['cs'] > 4) & (df_temp['cs'] % 4 == 0) , 'block_2'] = 4
i have a question regarding the optimization of my code.
Signal = pd.Series([1, 0, 0, 1, 1, 0, 0, 1, 1, 0, 0])
I have a pandas series containing periodic bitcode. My aim is remove all entries that start before a certain sequence e.g. 1,1,0,0. So in my example I would expect a reduced series like that:
[1, 1, 0, 0, 1, 1, 0, 0]
I already have a solution for 1, 1 but its not very elegant and not that easily modified for my example: 1,1,0,0.
i = 0
bool = True
while bool:
a = Signal.iloc[i]
b = Signal.iloc[i + 1]
if a == 1 and b == 1:
bool = False
else:
i = i + 1
Signal = Signal[i:]
I appreciate your help.
We could take a rolling window view of the series with view_as_windows, check for equality with the sequence and find the first occurrence with argmax:
from skimage.util import view_as_windows
seq = [1, 1, 0, 0]
m = (view_as_windows(Signal.values, len(seq))==seq).all(1)
Signal[m.argmax():]
3 1
4 1
5 0
6 0
7 1
8 1
9 0
10 0
dtype: int64
I would go for regex - use a web interface to identify the pattern.
for example:
1,1,0,0,(.*\d)
would result in a group(1) output, that consists of all digits after the 1,1,0,0 pattern.
So say we have some values:
data = np.random.standard_normal(size = 10)
I want my function to output an array which identifies whether the values in data are positive or not, something like:
[1, 0, 1, 1, 0, 1, 1, 0, 0, 0]
Ive tried
def myfunc():
for a in data > 0:
if a:
return 1
else:
return 0
But I'm only getting the boolean for the first value in the random array data, I don't know how to loop this function to ouput an array.
Thanks
You can do np.where, it's your friend:
np.where(data>0,1,0)
Demo:
print(np.where(data>0,1,0))
Output:
[1 0 1 1 0 1 1 0 0 0]
Do np.where(data>0,1,0).tolist() for getting a list with normal commas, output would be:
[1, 0, 1, 1, 0, 1, 1, 0, 0, 0]
It's very simple with numpy:
posvals = data > 0
>> [True, False, True, True, False, True, True, False, False, False]
If you explicitly want 1s and 0s:
posvals.astype(int)
>> [1, 0, 1, 1, 0, 1, 1, 0, 0, 0]
You can use ternary operators alongside list comprehension.
data = [10, 15, 58, 97, -50, -1, 1, -33]
output = [ 1 if number >= 0 else 0 for number in data ]
print(output)
This would output:
[1, 1, 1, 1, 0, 0, 1, 0]
What's happening is that either '1' or '0' is being assigned with the logic being if the number is bigger (or equal to) 0.
If you'd like this in function form, then it's as simple as:
def pos_or_neg(my_list):
return [ 1 if number >= 0 else 0 for number in data ]
You are attempting to combine an if and a for statement.
Seeing as you want to manipulate each element in the array using the same criteria and then return an updated array, what you want is the map function:
def my_func(data):
def map_func(num):
return num > 0
return map(map_func, data)
The map function will apply map_func() to each number in the data array and replace the current value in the array with the output from map_func()
If you explicitly want 1 and 0, map_func() would be:
def map_func(num):
if num > 0:
return 1
return 0
A rectangle is defined as any rectangular-shaped section of zeros within a 2-d array of 1s and 0s. Typical example:
[
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
]
In this example, there are three such arrays:
My goal is to determine the coordinates (outer 3 extremeties) of each array.
I start by converting the 2-d list into a numpy array:
image_as_np_array = np.array(two_d_list)
I can then get the coordinates of all the zeros thus:
np.argwhere(image_as_np_array == 0)
But this merely provides a shortcut to getting the indices by iterating over each row and calling .index(), then combining with the index of that row within the 2-d list.
I envisage now doing something like removing any element of np.argwhere() (or np.where()) where there is only a single occurrence of a 0 (effectively disregarding any row that cannot form part of a rectangle), and then trying to align contiguous coordinates, but I'm stuck with figuring out how to handle cases where any row may contain part of more than just one single rectangle (as is the case in the 3rd and 4th rows above). Is there a numpy function or functions I can leverage?
I don't know numpy, so here's a plain Python solution:
from collections import namedtuple
Rectangle = namedtuple("Rectangle", "top bottom left right")
def find_rectangles(arr):
# Deeply copy the array so that it can be modified safely
arr = [row[:] for row in arr]
rectangles = []
for top, row in enumerate(arr):
start = 0
# Look for rectangles whose top row is here
while True:
try:
left = row.index(0, start)
except ValueError:
break
# Set start to one past the last 0 in the contiguous line of 0s
try:
start = row.index(1, left)
except ValueError:
start = len(row)
right = start - 1
if ( # Width == 1
left == right or
# There are 0s above
top > 0 and not all(arr[top-1][left:right + 1])):
continue
bottom = top + 1
while (bottom < len(arr) and
# No extra zeroes on the sides
(left == 0 or arr[bottom][left-1]) and
(right == len(row) - 1 or arr[bottom][right + 1]) and
# All zeroes in the row
not any(arr[bottom][left:right + 1])):
bottom += 1
# The loop ends when bottom has gone too far, so backtrack
bottom -= 1
if ( # Height == 1
bottom == top or
# There are 0s beneath
(bottom < len(arr) - 1 and
not all(arr[bottom + 1][left:right+1]))):
continue
rectangles.append(Rectangle(top, bottom, left, right))
# Remove the rectangle so that it doesn't affect future searches
for i in range(top, bottom+1):
arr[i][left:right+1] = [1] * (right + 1 - left)
return rectangles
For the given input, the output is:
[Rectangle(top=2, bottom=3, left=3, right=5),
Rectangle(top=5, bottom=6, left=3, right=4)]
This is correct because the comments indicate that the 'rectangle' on the right is not to be counted since there is an extra 0 sticking out. I suggest you add more test cases though.
I expect it to be reasonably fast since much of the low-level iteration is done with calls to index and any, so there's decent usage of C code even without the help of numpy.
I have written a simple algorithms using the Sweep line method. The idea is that You go through the columns of You array column by column, and detect the series of zeros as potentially new rectangles. In each column You have to check if the rectangles detected earlier have ended, and if yes add them to the results.
import numpy as np
from sets import Set
from collections import namedtuple
example = np.array([
[1, 1, 1, 1, 1, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 0],
[1, 1, 1, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 0, 0, 0, 1, 0, 0],
[1, 0, 1, 1, 1, 1, 1, 1, 1],
[1, 0, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 0, 0, 1, 1, 1, 1],
[1, 1, 1, 1, 1, 1, 1, 1, 1],
])
Rectangle = namedtuple("Rectangle", "left top bottom right")
def sweep(A):
height = A.shape[0]
length = A.shape[1]
rectangles = dict() # detected rectangles {(rowstart, rowend): col}
result = []
# sweep the matrix column by column
for i in xrange(length):
column = A[:, i]
# for currently detected rectangles check if we should extend them or end
for r in rectangles.keys():
# detect non rectangles shapes like requesten in question edit and del those rectangles
if all([x == 0 for x in column[r[0]:r[1]+1]]) and ((r[0]-1>0 and column[r[0]-1]==0) or (r[1]+1<height and column[r[1]+1]==0)):
del rectangles[r]
elif any([x == 0 for x in column[r[0]:r[1]+1]]) and not all([x == 0 for x in column[r[0]:r[1]+1]]):
del rectangles[r]
# special case in the last column - add detected rectangles
elif i == length - 1 and all([x == 0 for x in column[r[0]:r[1]+1]]):
result.append(Rectangle(rectangles[r], r[0], r[1], i))
# if detected rectangle is not extended - add to result and del from list
elif all([x == 1 for x in column[r[0]:r[1]+1]]):
result.append(Rectangle(rectangles[r], r[0], r[1], i-1))
del rectangles[r]
newRectangle = False
start = 0
# go through the column and check if any new rectangles appear
for j in xrange(height):
# new rectangle in column detected
if column[j] == 0 and not newRectangle and j+1 < height and column[j+1] == 0:
start = j
newRectangle = True
# new rectangle in column ends
elif column[j] == 1 and newRectangle:
# check if new detected rectangle is already on the list
if not (start, j-1) in rectangles:
rectangles[(start, j-1)] = i
newRectangle = False
# delete single column rectangles
resultWithout1ColumnRectangles = []
for r in result:
if r[0] != r[3]:
resultWithout1ColumnRectangles.append(r)
return resultWithout1ColumnRectangles
print example
print sweep(example)
returns:
[[1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 0]
[1 1 1 0 0 0 1 0 0]
[1 0 1 0 0 0 1 0 0]
[1 0 1 1 1 1 1 1 1]
[1 0 1 0 0 1 1 1 1]
[1 1 1 0 0 1 1 1 1]
[1 1 1 1 1 1 1 1 1]]
[Rectangle(left=3, top=5, bottom=6, right=4),
Rectangle(left=3, top=2, bottom=3, right=5)]
This question might be too noob, but I was still not able to figure out how to do it properly.
I have a given array [0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3] (arbitrary elements from 0-5) and I want to have a counter for the occurence of zeros in a row.
1 times 6 zeros in a row
1 times 4 zeros in a row
2 times 1 zero in a row
=> (2,0,0,1,0,1)
So the dictionary consists out of n*0 values as the index and the counter as the value.
The final array consists of 500+ million values that are unsorted like the one above.
This should get you what you want:
import numpy as np
a = [0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3]
# Find indexes of all zeroes
index_zeroes = np.where(np.array(a) == 0)[0]
# Find discontinuities in indexes, denoting separated groups of zeroes
# Note: Adding True at the end because otherwise the last zero is ignored
index_zeroes_disc = np.where(np.hstack((np.diff(index_zeroes) != 1, True)))[0]
# Count the number of zeroes in each group
# Note: Adding 0 at the start so first group of zeroes is counted
count_zeroes = np.diff(np.hstack((0, index_zeroes_disc + 1)))
# Count the number of groups with the same number of zeroes
groups_of_n_zeroes = {}
for count in count_zeroes:
if groups_of_n_zeroes.has_key(count):
groups_of_n_zeroes[count] += 1
else:
groups_of_n_zeroes[count] = 1
groups_of_n_zeroes holds:
{1: 2, 4: 1, 6: 1}
Similar to #fgb's, but with a more numpythonic handling of the counting of the occurrences:
items = np.array([0,0,0,0,0,0,1,1,2,1,0,0,0,0,1,0,1,2,1,0,2,3])
group_end_idx = np.concatenate(([-1],
np.nonzero(np.diff(items == 0))[0],
[len(items)-1]))
group_len = np.diff(group_end_idx)
zero_lens = group_len[::2] if items[0] == 0 else group_len[1::2]
counts = np.bincount(zero_lens)
>>> counts[1:]
array([2, 0, 0, 1, 0, 1], dtype=int64)
This seems awfully complicated, but I can't seem to find anything better:
>>> l = [0, 0, 0, 0, 0, 0, 1, 1, 2, 1, 0, 0, 0, 0, 1, 0, 1, 2, 1, 0, 2, 3]
>>> import itertools
>>> seq = [len(list(j)) for i, j in itertools.groupby(l) if i == 0]
>>> seq
[6, 4, 1, 1]
>>> import collections
>>> counter = collections.Counter(seq)
>>> [counter.get(i, 0) for i in xrange(1, max(counter) + 1)]
[2, 0, 0, 1, 0, 1]