Pandas: flag consecutive values - python

I have a pandas series of the form [0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1].
0: indicates economic increase.
1: indicates economic decline.
A recession is signaled by two consecutive declines (1).
The end of the recession is signaled by two consecutive increase (0).
In the above dataset I have two recessions, begin at index 3, end at index 5 and begin at index 8 end at index 11.
I am at a lost for how to approach this with pandas. I would like to identify the index for the start and end of the recession. Any assistance would be appreciated.
Here is my python attempt at a soln.
np_decline = np.array([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
recession_start_flag = 0
recession_end_flag = 0
recession_start = []
recession_end = []
for i in range(len(np_decline) - 1):
if recession_start_flag == 0 and np_decline[i] == 1 and np_decline[i + 1] == 1:
recession_start.append(i)
recession_start_flag = 1
if recession_start_flag == 1 and np_decline[i] == 0 and np_decline[i + 1] == 0:
recession_end.append(i - 1)
recession_start_flag = 0
print(recession_start)
print(recession_end)
Is the a more pandas centric approach?
Leon

The start of a run of 1's satisfies the condition
x_prev = x.shift(1)
x_next = x.shift(-1)
((x_prev != 1) & (x == 1) & (x_next == 1))
That is to say, the value at the start of a run is 1 and the previous value is not 1 and the next value is 1. Similarly, the end of a run satisfies the condition
((x == 1) & (x_next == 0) & (x_next2 == 0))
since the value at the end of a run is 1 and the next two values value are 0.
We can find indices where these conditions are true using np.flatnonzero:
import numpy as np
import pandas as pd
x = pd.Series([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
x_prev = x.shift(1)
x_next = x.shift(-1)
x_next2 = x.shift(-2)
df = pd.DataFrame(
dict(start = np.flatnonzero((x_prev != 1) & (x == 1) & (x_next == 1)),
end = np.flatnonzero((x == 1) & (x_next == 0) & (x_next2 == 0))))
print(df[['start', 'end']])
yields
start end
0 3 5
1 8 11

You can use shift:
df = pd.DataFrame([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1], columns=['signal'])
df_prev = df.shift(1)['signal']
df_next = df.shift(-1)['signal']
df_next2 = df.shift(-2)['signal']
df.loc[(df_prev != 1) & (df['signal'] == 1) & (df_next == 1), 'start'] = 1
df.loc[(df['signal'] != 0) & (df_next == 0) & (df_next2 == 0), 'end'] = 1
df.fillna(0, inplace=True)
df = df.astype(int)
signal start end
0 0 0 0
1 1 0 0
2 0 0 0
3 1 1 0
4 1 0 0
5 1 0 1
6 0 0 0
7 0 0 0
8 1 1 0
9 1 0 0
10 0 0 0
11 1 0 1
12 0 0 0
13 0 0 0
14 1 0 0

Similar idea using shift, but writing the result as a single Boolean column:
# Boolean indexers for recession start and stops.
rec_start = (df['signal'] == 1) & (df['signal'].shift(-1) == 1)
rec_end = (df['signal'] == 0) & (df['signal'].shift(-1) == 0)
# Mark the recession start/stops as True/False.
df.loc[rec_start, 'recession'] = True
df.loc[rec_end, 'recession'] = False
# Forward fill the recession column with the last known Boolean.
# Fill any NaN's as False (i.e. locations before the first start/stop).
df['recession'] = df['recession'].ffill().fillna(False)
The resulting output:
signal recession
0 0 False
1 1 False
2 0 False
3 1 True
4 1 True
5 1 True
6 0 False
7 0 False
8 1 True
9 1 True
10 0 True
11 1 True
12 0 False
13 0 False
14 1 False

use rolling(2)
s = pd.Series([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
I subtract .5 so the rolling sum is 1 when a recession starts and -1 when it stops.
s2 = s.sub(.5).rolling(2).sum()
since both 1 and -1 evaluate to True I can mask the rolling signal to just start and stops and ffill. Get truth values of when they are positive or negative with gt(0).
pd.concat([s, s2.mask(~s2.astype(bool)).ffill().gt(0)], axis=1, keys=['signal', 'isRec'])

You can use scipy.signal.find_peaks for this problem.
from scipy.signal import find_peaks
np_decline = np.array([0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0 , 0 , 1])
peaks = find_peaks(np_decline,width=2)
recession_start_loc = peaks[1]['left_bases'][0]

Related

Create several new dataframes or dictionaries from one dataframe

I have a dataframe like this:
evt pcle bin_0 bin_1 bin_2 ... bin_49
1 pi 1 0 0 0
1 pi 1 0 0 0
1 k 0 0 0 1
1 pi 0 0 1 0
2 pi 0 0 1 0
2 k 0 1 0 0
3 J 0 1 0 0
3 pi 0 0 0 1
3 pi 1 0 0 0
3 k 0 1 0 0
...
5000 J 0 0 1 0
5000 pi 0 1 0 0
5000 k 0 0 0 1
With this information, I want to create several other dataframes df_{evt} (or maybe dictionaries should be better?):
df_1 :
pcle cant bin_0 bin_1 bin_2 ... bin_49
pi 3 2 0 1 0
k 1 0 0 0 1
df_2 :
pcle cant bin_0 bin_1 bin_2 ... bin_49
pi 1 0 0 1 0
k 0 1 0 0 0
In total there would be 5000 dataframes (1 for each evt) where in each of them:
*the column "cant" has the ocurrences of "pcle" in the particular "evt".
*bin_0 ... bin_49 have the sum of the values for this particular "pcle" in
the particular "evt".
Which is the best way to achieve this goal?
Here's a possible solution:
import pandas as pd
import numpy as np
columns = ["evt", "pcle", "bin_0", "bin_1", "bin_2", "bin_3"]
data = [[1, "pi", 1, 0, 0, 0],
[1, "pi", 0, 0, 0, 0],
[1, "k", 0, 0, 0, 1],
[1, "pi", 0, 0, 1, 0],
[2, "pi", 0, 0, 1, 0],
[2, "k", 0, 1, 0, 0],
[3, "J", 0, 1, 0, 0],
[3, "pi", 0, 0, 0, 1],
[3, "pi", 1, 0, 0, 0],
[3, "k", 0, 1, 0, 0]]
df = pd.DataFrame(data=data, columns=columns)
# group your data by the columns you want
grouped = df.groupby(["evt", "pcle"])
# compute the aggregates for the bin_X
df_t = grouped.aggregate(np.sum)
# move pcle from index to column
df_t.reset_index(level=["pcle"], inplace=True)
# count occurrences of pcle
df_t["cant"] = grouped.size().values
# filter evt with .loc
df_t.loc[1]
If you want to make it into a dictionary then you can run:
d = {i:j.reset_index(drop=True) for i, j in df_t.groupby(df_t.index)}

Numpy - How to shift values at indexes where change happened

So I would like to shift my values in a 1D numpy arrays, where change happened. The sample of shifting shall be configured.
input = np.array([0,0,0,0,1,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0])
shiftSize = 2
out = np.magic(input, shiftSize)
print out
np.array([0,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,0])
For example the first switch happened and index 4, so index 2,3 becomes '1'.
The next happened at 5, so 6 and 7 becomes '1'.
EDIT: Also it would be important to be without for cycle because, that might be slow (it is needed for large data sets)
EDIT2: indexes and variable name
I tried with np.diff, so i get where the changes happened and then np.put, but with multiple index ranges it seems impossible.
Thank you for the help in advance!
What you want is called "binary dilation" and is contained in scipy.ndimage:
import numpy as np
import scipy.ndimage
input = np.array([0,0,0,0,1,0,0,0,0,0,1,1,1,0,0,1,0,0,0,0], dtype=bool)
out = scipy.ndimage.morphology.binary_dilation(input, iterations=2).astype(int)
# array([0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0])
Nils' answer seems good. Here is an alternative using NumPy only:
import numpy as np
def dilate(ar, amount):
# Convolve with a kernel as big as the dilation scope
dil = np.convolve(np.abs(ar), np.ones(2 * amount + 1), mode='same')
# Crop in case the convolution kernel was bigger than array
dil = dil[-len(ar):]
# Take non-zero and convert to input type
return (dil != 0).astype(ar.dtype)
# Test
inp = np.array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0])
print(inp)
print(dilate(inp, 2))
Output:
[0 0 0 0 1 0 0 0 0 0 1 1 1 0 0 1 0 0 0 0]
[0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0]
Another numpy solution :
def dilatation(seed,shift):
out=seed.copy()
for sh in range(1,shift+1):
out[sh:] |= seed[:-sh]
for sh in range(-shift,0):
out[:sh] |= seed[-sh:]
return out
Example (shift = 2) :
in : [0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0]
out: [0 0 0 0 0 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1]

Best way to go towards an index in numpy, with wrap

Lets say I have a 2D array below:
[[ 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 ]
[ 0 0 0 0 0 0 ]
[ 0 0 0 0 0 2 ]
[ 0 1 0 0 0 0 ]
[ 0 0 0 0 0 0 ]]
I would like to get the direction from where '1' (index 4,1) is to '2' (index 3,5). Assuming directions are only up, down, left, right. Thus no diagonal movement.
One way to get the directions:
"right" if destination.x > start.x else "left" if target.x < start.x else None
"down" if destination.y > start.y else "up" if destination.y < start.y else None
So for this example, we can go to '2' or the destination by either going "up" or "right". That of course is just one step, once you moved, can perform the same logic to move closer to the destination.
The problem with this logic is that it doesnt take the wrapping into account. With this logic it will take 5 steps to get to the destination. There is a shorter way by actually going left or up that can get to the destination in just 3 steps, because of the wrap.
Was thinking of generating another array where the start will be the middle of the array and perform the same logic. The problem is if the array is even (like this is 6x6, need to pad to get a middle. For example:
[[ 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0]
[ 0 2 0 0 0 0 0]
[ 0 0 0 1 0 0 0]
[ 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0]
[ 0 0 0 0 0 0 0]]
Here the array is now 7x7. I believe there is a simplier way to get the answer without this extra step, but cant think of it.
Can you consider using this method?
import numpy as np
# build the array
a = np.zeros( (6,6), dtype=int )
a[4][1] = 1
a[3][5] = 2
# extract required informations
i,j = np.where(a == 1)
h,k =np.where(a == 2)
print (i-h) => [1]
print (j-k) => [-4]
Well, there is a quite simple formula for computing the distance in case of periodic boundary conditions. Below I consider only periodic b.c. on x-axis:
import numpy as np
# periodic boundary condition for the x-axis only
def steps(start, dest, L_x):
x_start = start[1]
y_start = start[0]
x_dest = dest[1]
y_dest = dest[0]
dx = x_dest - x_start
if np.abs(dx) <= L_x/2:
steps_x = x_dest - x_start
else:
if dx > 0:
steps_x = (x_dest - L_x) - x_start
else:
steps_x = (x_dest + L_x) - x_start
steps_y = y_dest - y_start
return steps_x, steps_y
Example:
grid = np.array([[0, 0, 0, 0, 0, 0 ],
[0, 0, 0, 0, 0, 0 ],
[0, 0, 0, 0, 0, 0 ],
[0, 0, 0, 0, 0, 2 ],
[0, 1, 0, 0, 0, 0 ],
[0, 0, 0, 0, 0, 0 ]])
L_x = grid.shape[1]
start = (4, 1) # (y, x) or (i, j)
dest = (3, 5)
steps_x, steps_y = steps(start, dest, grid)
dir_x = 'left' if steps_x < 0 else 'right'
dir_y = 'up' if steps_y < 0 else 'down'
print(abs(steps_x), dir_x, ',', abs(steps_y), dir_y)
Out: 2 left , 1 up
I try an other way:
On an horizontal axis of length size, to go from a to b, let delta = ((b-a)%size*2-1)//size.
if delta=-1, a=b : you don't move.
if delta=0 : you have to go right.
if delta=1 : you have to go left.
So this code seems works
size=10
vertical=['down','up',None]
horizontal=['right','left',None]
def side(a,b):
return ((b-a)%size*2-1)//size
def step(M1,M2):
x1,y1=M1
x2,y2=M2
return (vertical[side(x1,x2)],horizontal[side(y1,y2)])
For example :
In [6]: step((2,1),(2,8))
Out[6]: (None, 'left')

Drawing a checkerboard out of 1s and 0s with a nested for loop

I'm using just normal python to make a checkerboard grids out of alternating 1s and 0s. I know that I can use a nested for loop with the modulus operator but I don't know exactly what to do with the modulus inside the for loop.
def print_board(board):
for i in range(len(board)):
print " ".join([str(x) for x in board[i]])
my_grid = []
for i in range(8):
my_grid.append([0] * 8)
for j in range(8):
#This is where I'm stuck.
print_board(my_grid)
Here's my solution, using nested for loops. Note that whether i+j is even or odd is a good way to determine where it should be 1 and where it should be 0, as it always alternates between adjacent 'cells'.
def checkerboard(n):
board = []
for i in range(n):
board.append([])
for j in range(n):
board[i].append((i+j) % 2)
return board
for row in checkerboard(8):
print(row)
Prints
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
Perhaps we should first aim to solve a different problem: how to generate a list with checkboard patterns.
Such list thus has interleaved a [0,1,0,...] row, and an [1,0,1,...] row.
Let us first construct the first row with length n. We can do this like:
[i%2 for i in range(n)]
Now the next row should be:
[(i+1)%2 for i in range(n)]
the next one can be:
[(i+2)%2 for i in range(n)]
Do you see a pattern emerge? We can construct such a pattern like:
[[(i+j)%2 for i in range(n)] for j in range(m)]
Now the only thing that is left is producing it as a string. We can do this by converting the data in the list to strings, join them together (and optionally use generators instead of list comprehension). So:
'\n'.join(''.join(str((i+j)%2) for i in range(n)) for j in range(m))
So we can construct an m×n grid like:
def print_board(m,n):
print('\n'.join(''.join(str((i+j)%2) for i in range(n)) for j in range(m)))
A 10x15 board then is:
>>> print_board(10,15)
010101010101010
101010101010101
010101010101010
101010101010101
010101010101010
101010101010101
010101010101010
101010101010101
010101010101010
101010101010101
N.B.: we can make the code a bit more efficient, by using &1 instead of %2:
def print_board(m,n):
print('\n'.join(''.join(str((i+j)&1) for i in range(n)) for j in range(m)))
A simple approach
# Function to draw checkerboard
def drawBoard(length):
for row in xrange(0, length):
for col in xrange(0, length):
# Even rows will start with a 0 (No offset)
# Odd rows will start with a 1 (1 offset)
offset = 0
if row % 2 == 0:
offset = 1
# alterate each column in a row by 1 and 0
if (col + offset) % 2 == 0:
print '1',
else:
print '0',
# print new line at the end of a row
print ""
drawBoard(8)
For even widths, you could avoid loops all together and just multiply some strings:
def print_board(width):
print ('0 1 ' * (width // 2) + '\n' + '1 0 ' * (width // 2) + '\n') * (width // 2)
print_board(10)
Giving:
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0 1 0
This works as follows for a 10 x 10 grid:
Take the string
1 0
Multiply the string by 5 giving
1 0 1 0 1 0 1 0 1 0
Do the same with 0 1 giving:
0 1 0 1 0 1 0 1 0 1
Add a newline to the end of each and join them together:
1 0 1 0 1 0 1 0 1 0 \n0 1 0 1 0 1 0 1 0 1 \n
Now multiply this whole string by 5 to get the grid.
You can use list comprehension and a modulo:
new_board = [[0 if b%2 == 0 else 1 for b in range(8)] if i%2 == 0 else [1 if b%2 == 0 else 0 for b in range(8)] for i in range(8)]
for row in new_board:
print(row)
Output:
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
[0, 1, 0, 1, 0, 1, 0, 1]
[1, 0, 1, 0, 1, 0, 1, 0]
For a more custom finish:
for row in new_board:
print(' '.join(map(str, row)))
Output:
0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0
0 1 0 1 0 1 0 1
1 0 1 0 1 0 1 0

Count Positive Consecutive Elements in Dataframe

Question
Is there a way to count elements along an axis in a dataframe that conform to a condition?
Background
I am trying count the consecutive positive digits left to right along the horizontal axis (axis=1). For example, row zero would result in 0 because the row starts with a negative number, while row one would result in 2 as there are two consecutive positive numbers. Row two would result in 3 and so on.
I've tried looping over it and applying methods, but I am at a loss.
Code
df = pd.DataFrame(np.random.randn(5, 5))
df
0 1 2 3 4
0 -1.017333 -0.322464 0.635497 0.248172 1.567705
1 0.038626 0.335656 -1.374040 0.273872 1.613521
2 1.655696 1.456255 0.051992 1.559657 -0.256284
3 -0.776232 -0.386942 0.810013 -0.054174 0.696907
4 -0.250789 -0.135062 1.285705 -0.326607 -1.363189
binary = np.where(df < 0, 0, 1)
binary
array([[0, 0, 1, 1, 1],
[1, 1, 0, 1, 1],
[1, 1, 1, 1, 0],
[0, 0, 1, 0, 1],
[0, 0, 1, 0, 0]])
Here's a similar approach in Pandas
In [792]: df_p = df > 0
In [793]: df_p
Out[793]:
0 1 2 3 4
0 False False True True True
1 True True False True True
2 True True True True False
3 False False True False True
4 False False True False False
In [794]: df_p['0'] * (df_p < df_p.shift(1, axis=1)).idxmax(axis=1).astype(int)
Out[794]:
0 0
1 2
2 4
3 0
4 0
dtype: int32
Here's one approach -
def count_pos_consec_elems(a):
count = (a[:,1:] < a[:,:-1]).argmax(1)+1
count[a[:,0] < 1] = 0
count[a.all(1)] = a.shape[1]
return count
Sample run -
In [145]: df
Out[145]:
0 1 2 3 4
0 0.602198 -0.899124 -1.104486 -0.106802 -0.092505
1 0.012199 -1.415231 0.604574 -0.133460 -0.264506
2 -0.878637 1.607330 -0.950801 -0.594610 -0.718909
3 1.200000 1.200000 1.200000 1.200000 1.200000
4 1.434637 0.500000 0.421560 -1.001847 -0.980985
In [146]: binary = df.values > 0
In [147]: count_pos_consec_elems(binary)
Out[147]: array([1, 1, 0, 5, 3])

Categories