Creating a trend streak in Pandas

Creating a trend streak in Pandas - python

I'm trying to create trend streak that displays 1,-1,0 (win/loss/no movement) from a pandas database. I'm looking for the streak to increase when positive, and reset on 0, or reset and create a negative streak on -1. The desired results would be something like this:
win streak
0 0
1 1
1 2
1 3
1 4
0 0
0 0
-1 -1
-1 -2
1 1
Currently I have this that creates the win column.
dataframe.loc[dataframe['close'] > dataframe['close_1h'].shift(1), 'win'] = 1
dataframe.loc[dataframe['close'] < dataframe['close_1h'].shift(1), 'win'] = -1
dataframe.loc[dataframe['close'] == dataframe['close_1h'].shift(1), 'win'] = 0
dataframe['streak'] = numpy.nan_to_num(dataframe['win'].cumsum())
But that doesn't correctly reset the streaks as I would like it to. I've played around with the groupby doing dataframe['streak'] = dataframe.groupby([(dataframe['win'] != dataframe['win'].shift()).cumsum()]) but that gave me an error resulting in "ValueError: Length of values (927) does not match length of index (1631)"

try this:
df['streak'] = df.groupby(df['win'].diff().ne(0).cumsum())['win'].cumsum()

Related

Pandas create a conditional column based on cumulative logic operations of the other columns

I have 2 columns that represent the on switch and off switch indicator. I want to create a column called last switch where it keeps record the 'last' direction of the switch (whether it is on or off). Another condition is that if both on and off switch value is 1 for a particular row, then the 'last switch' output will return the opposite sign of the previous last switch. Currently I managed to find a solution to get this almost correct except facing the situation where both on and off shows 1 that makes my code wrong.
I also attached the screenshot with a desired output. Please help appreciate all.
df=pd.DataFrame([[1,0],[1,0],[0,1],[0,1],[0,0],[0,0],[1,0],[1,1],[0,1],[1,0],[1,1],[1,1],[0,1]], columns=['on','off'])
df['last_switch']=(df['on']-df['off']).replace(0,method='ffill')

Add the following lines to your existing code:
for i in range(df.shape[0]):
df['prev']=df['last_switch'].shift()
df.loc[(df['on']==1) & (df['off']==1), 'last_switch']=df['prev'] * (-1)
df.drop('prev', axis=1, inplace=True)
df['last_switch']=df['last_switch'].astype(int)
Output:
on off last_switch
0 1 0 1
1 1 0 1
2 0 1 -1
3 0 1 -1
4 0 0 -1
5 0 0 -1
6 1 0 1
7 1 1 -1
8 0 1 -1
9 1 0 1
10 1 1 -1
11 1 1 1
12 0 1 -1
Let me know if you need expanation of the code

df=pd.DataFrame([[1,0],[1,0],[0,1],[0,1],[0,0],[0,0],[1,0],[1,1],[0,1],[1,0],[1,1],[1,1],[0,1]], columns=['on','off'])
df['last_switch']=(df['on']-df['off']).replace(0,method='ffill')
prev_row = None
def apply_logic(row):
global prev_row
if prev_row is not None:
if (row["on"] == 1) and (row["off"] == 1):
row["last_switch"] = -prev_row["last_switch"]
prev_row = row.copy()
return row
df.apply(apply_logic,axis=1)
personally i am not a big fan of using loop against dataframe. shift wont work in this case as the "last_switch" column is dynamic and subject to change based on on&off status.
Using your intermediate reesult with apply while carrying the value from previous row should do the trick. Hope it makes sense.

Python - how to replace values greater and smaller than a particular value

How can I replace the values of a DataFrame if are smaller or greater than a particular value?
print(df)
name seq1 seq11
0 seq102 -14 -5.99
1 seq103 -5.25 -7.94
I want to set the values < than -8.5 to 1 and > than -8.5 to 0.
I tried this but all the values gets zero;
import pandas as pd
df = pd.read_csv('df.csv')
num = df._get_numeric_data()
num[num < -8.50] = 1
num[num > -8.50] = 0
The desired output should be:
name seq1 seq11
0 seq102 1 0
1 seq103 0 0
Thank you

Try
num.iloc[:,1:] = num.iloc[:,1:].applymap(lambda x: 1 if x < -8.50 else 0)
Note that values equal to -8.50 will be set to zero here.

def thresh(x):
if(x < -8.5):
return 1
elif(x > -8.5):
return 0
return x
print(df[["seq1", "seq2"]].apply(thresh))

Eliminating Negative or Non_Negative values in pandas

-)I'm working on an automation task in python wherein in each row the 1st negative value should be added up with the 1st non-negative value from the left. Further, the result should replace the positive value and 0 should replace the negative value
-)This process should continue until the entire row contains all negative or all positive values.
**CUSTOMER <30Days 31-60 Days 61-90Days 91-120Days 120-180Days 180-360Days >360Days**
ABC -2 23 2 3 2 2 -1
(>360Days)+(180-360Days)
-1 + 2
CUSTOMER <30Days 31-60 Days 61-90Days 91-120Days 120-180Days 180-360Days >360Days
ABC -2 23 2 3 2 1 0
(<30Days)+(180-360Days)
-2 + 1
CUSTOMER <30Days 31-60 Days 61-90Days 91-120Days 120-180Days 180-360Days >360Days
ABC 0 23 2 3 2 -1 0
(180-360Days)+(120-180Days)
-1 + 2
CUSTOMER <30Days 31-60 Days 61-90Days 91-120Days 120-180Days 180-360Days >360Days
ABC 0 23 2 3 2 0 0

Check this code:
import pandas as pd
#Empty DataFrame
df=pd.DataFrame()
#Enter the data
new_row={'CUSTOMER':'ABC','<30Days':-2,'31-60 Days':23,'61-90Days':2,'91-120Days':3,'120-180Days':2,'180-360Days':2,'>360Days':-1}
df=df.append(new_row,ignore_index=True)
#Keep columns order as per the requirement
df=df[['CUSTOMER','<30Days','31-60 Days','61-90Days','91-120Days','120-180Days','180-360Days','>360Days']]
#Take column names and reverse the order
ls=list(df.columns)
ls.reverse()
#Remove non integer column
ls.remove('CUSTOMER')
#Initialize variables
flag1=1
flag=0
new_ls=[]
new_ls_index=[]
for j in range(len(df)):
while flag1!=0:
#Perform logic
for i in ls:
if int(df[i][j]) < 0 and flag == 0:
new_ls.append(int(df[i][j]))
new_ls_index.append(i)
flag=1
elif flag==1 and int(df[i][j]) >= 0 :
new_ls.append(int(df[i][j]))
new_ls_index.append(i)
flag=2
elif flag==2:
df[new_ls_index[1]]=new_ls[0]+new_ls[1]
df[new_ls_index[0]]=0
flag=0
new_ls=[]
new_ls_index=[]
#Check all values in row either positive or negative
if new_ls==[]:
new_ls_neg=[]
new_ls_pos=[]
for i in ls:
if int(df[i][j]) < 0:
new_ls_neg.append(int(df[i][j]))
if int(df[i][j]) >= 0 :
new_ls_pos.append(int(df[i][j]))
if len(new_ls_neg)==len(ls) or len(new_ls_pos)==len(ls):
flag1=0 #Set flag to stop the loop

How do I count how often a column value changes in a pandas dataframe?

I have a pandas data frame that looks like:
Index Activity
0 0
1 0
2 1
3 1
4 1
5 0
...
1167 1
1168 0
1169 0
I want to count how many times it changes from 0 to 1 and when it changes from 1 to 0, but I do not want to count how many 1's or 0's there are.
For example, if I only wanted to count index 0 to 5, the count for 0 to 1 would be one.
How would I go about this? I have tried using some_value

This is a simple approach that can also tell you the index value when the change happens. Just add the index to a list.
c_1to0 = 0
c_0to1 = 0
for i in range(0, df.shape[0]-1):
if df.iloc[i]['Activity'] == 0 and df.iloc[i+1]['Activity'] == 1:
c_0to1 +=1
elif df.iloc[i]['Activity'] == 1 and df.iloc[i+1]['Activity'] == 0:
c_1to0 +=1

Integers wont append to my array for grid - python

Recently for a school project i've been making a "Treasure hunt" where the player finds treasure and bandits on a grid in python. I have a way to have the grid at a set size but, as an extra point they ask for us to be able to change the size of the grid, the amount of chests and the amount of bandits.
Here is the code for my grid maker but it wont make the "grid" array but it does for "playergrid":
def gridmaker(gridsize, debug):
global grid
global playergrid
gridinator = 1
grid = [[0]]
playergrid = [[" "]]
if debug == 1:
while gridinator <= gridsize:
grid[gridinator].append(0)
gridinator = gridinator + 1
gridinator = 1
else:
while gridinator <= gridsize:
playergrid[0].append(gridinator)
gridinator = gridinator + 1
gridinator = 1
while gridinator <= gridsize:
if debug == 1:
grid.append([0])
for i in range(gridsize):
grid[gridinator].append(0)
else:
playergrid.append([gridinator])
for i in range(gridsize):
playergrid[gridinator].append("#")
gridinator = gridinator+1
if debug == 1:
grid[1][1] = 1
else:
playergrid[1][1] = "P"
gridmaker(9, 1)
for row in grid:
print(" ".join(map(str,row)))
Sorry if it is formatted differently as there are 2 space tabs rather than 4, it works best on repl.it
print(grid) should return a grid like this:
0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
Please let me know,
Thanks!

You have to remember that lists are 0-indexed.
Which means that to access the 1st element of the grid list you would use the index 0.
With grid = [[0]] you create a list with one item (you can get that item with grid[0]), which is a list whose 1st item (grid[0][0]) is 0.
But your gridinator's starting value is 1. So when your first append runs:
grid[gridinator].append(0)
it tries to access the 2nd element of grid:
grid[1].append(0)
Which gives you an IndexError since, as the traceback should tell you* list index out of range.
You can try this yourself:
grid = [[0]]
grid[0]
grid[1]
One of your solutions could be starting the gridinator with 0, and using strict less instead of less or equal here: gridinator <= gridsize (because grid[8] gives you the 9th element of the grid).
*Please remember to include the traceback for errors in the future. They really help both yourself and the people trying to help you.
Let me know if this helps, or if I should find another way to explain it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a trend streak in Pandas - python

try this: df['streak'] = df.groupby(df['win'].diff().ne(0).cumsum())['win'].cumsum()

Related

Pandas create a conditional column based on cumulative logic operations of the other columns

Python - how to replace values greater and smaller than a particular value

Eliminating Negative or Non_Negative values in pandas

How do I count how often a column value changes in a pandas dataframe?

Integers wont append to my array for grid - python

Categories

Resources