Python - how to replace values greater and smaller than a particular value - python

How can I replace the values of a DataFrame if are smaller or greater than a particular value?
print(df)
name seq1 seq11
0 seq102 -14 -5.99
1 seq103 -5.25 -7.94
I want to set the values < than -8.5 to 1 and > than -8.5 to 0.
I tried this but all the values gets zero;
import pandas as pd
df = pd.read_csv('df.csv')
num = df._get_numeric_data()
num[num < -8.50] = 1
num[num > -8.50] = 0
The desired output should be:
name seq1 seq11
0 seq102 1 0
1 seq103 0 0
Thank you

Try
num.iloc[:,1:] = num.iloc[:,1:].applymap(lambda x: 1 if x < -8.50 else 0)
Note that values equal to -8.50 will be set to zero here.

def thresh(x):
if(x < -8.5):
return 1
elif(x > -8.5):
return 0
return x
print(df[["seq1", "seq2"]].apply(thresh))

Related

Replacing positive, negative, and zero values by 1, -1, and 0 respectively

I have a pandas dataframe(100,000 obs) with 11 columns.
I'm trying to assign df['trade_sign'] values based on the df['diff'] (which is a pd.series object of integer values)
If diff is positive, then trade_sign = 1
if diff is negative, then trade_sign = -1
if diff is 0, then trade_sign = 0
What I've tried so far:
pos['trade_sign'] = (pos['trade_sign']>0) <br>
pos['trade_sign'].replace({False: -1, True: 1}, inplace=True)
But this obviously doesn't take into account 0 values.
I also tried for loops with if conditions but that didn't work.
Essentially, how do I fix my .replace function to take account of diff values of 0.
Ideally, I'd prefer a solution that uses numpy over for loops with if conditions.
There's a sign function in numpy:
df["trade_sign"] = np.sign(df["diff"])
If you want integers,
df["trade_sign"] = np.sign(df["diff"]).astype(int)
a = [-1 if df['diff'].values[i] < 0 else 1 for i in range(len(df['diff'].values))]
df['trade_sign'] = a
You could do it this way:
pos['trade_sign'] = (pos['diff'] > 0) * 1 + (pos['diff'] < 0) * -1
The boolean results of the element-wise > and < comparisons automatically get converted to int in order to allow multiplication with 1 and -1, respectively.
This sample input and test code:
import pandas as pd
pos = pd.DataFrame({'diff':[-9,0,9,-8,0,8,-7-6-5,4,3,2,0]})
pos['trade_sign'] = (pos['diff'] > 0) * 1 + (pos['diff'] < 0) * -1
print(pos)
... gives this output:
diff trade_sign
0 -9 -1
1 0 0
2 9 1
3 -8 -1
4 0 0
5 8 1
6 -18 -1
7 4 1
8 3 1
9 2 1
10 0 0
UPDATE: In addition to the solution above, as well as some of the other excellent ideas in other answers, you can use numpy where:
pos['trade_sign'] = np.where(pos['diff'] > 0, 1, np.where(pos['diff'] < 0, -1, 0))

Creating a trend streak in Pandas

I'm trying to create trend streak that displays 1,-1,0 (win/loss/no movement) from a pandas database. I'm looking for the streak to increase when positive, and reset on 0, or reset and create a negative streak on -1. The desired results would be something like this:
win streak
0 0
1 1
1 2
1 3
1 4
0 0
0 0
-1 -1
-1 -2
1 1
Currently I have this that creates the win column.
dataframe.loc[dataframe['close'] > dataframe['close_1h'].shift(1), 'win'] = 1
dataframe.loc[dataframe['close'] < dataframe['close_1h'].shift(1), 'win'] = -1
dataframe.loc[dataframe['close'] == dataframe['close_1h'].shift(1), 'win'] = 0
dataframe['streak'] = numpy.nan_to_num(dataframe['win'].cumsum())
But that doesn't correctly reset the streaks as I would like it to. I've played around with the groupby doing dataframe['streak'] = dataframe.groupby([(dataframe['win'] != dataframe['win'].shift()).cumsum()]) but that gave me an error resulting in "ValueError: Length of values (927) does not match length of index (1631)"
try this:
df['streak'] = df.groupby(df['win'].diff().ne(0).cumsum())['win'].cumsum()

add column values according to value with if

I would like to create following dataframe:
df = pd.DataFrame({
'A': ['0','0','0','8.020833015','8.009259224','8.003472328','8.020833015','0','0','5','4.994213104','0','0','0','8.012152672','8.009259224','0'],
'Step_ID': ['Step_1','Step_1','Step_1','Step_2','Step_2','Step_2','Step_2','Step_3','Step_3','Step_4','Step_4','Step_5','Step_5','Step_5','Step_6','Step_6','Step_7']})
print (df)
What I have is the column A and according to these values I would like to set the values in the column Step_ID.
Step_ID - it begins from Step_1. Then if the number is bigger then Step_2 (for all the number that are bigger than 0, till the zero values will be reached). Then to zero values should be Step_3 assigned and so on.
# add a Step ID
df = pd.DataFrame({
'A': ['0','0','0','8.020833015','8.009259224','8.003472328','8.020833015','0','0','5','4.994213104','0','0','0','8.012152672','8.009259224','0']})
step = 0
value = None
def get_step(x):
global step
global value
if x != value:
value = x
step += 1
return f'Step_{step}'
df['Step_ID'] = df['A'].apply(get_step)
df.to_csv('test.csv' , index=None)
The code above does something similar, but only with unique numbers. Should be there one more "if" - if value > 0 in order to perform desired functionality?
I can see you implemented XOR gate but we need some customisation, I have added a new function to check.
import pandas as pd
df = pd.DataFrame({
'A': ['0','0','0','8.020833015','8.009259224','8.003472328','8.020833015','0','0','5','4.994213104','0','0','0','8.012152672','8.009259224','0']})
step = 0
value = None
def check(x, y):
try:
x = float(x)
y = float(y)
if x== 0 and y == 0:
return 0
elif x == 0 and y > 0:
return 1
elif x > 0 and y == 0:
return 1
else:
return 0
except:
return 1
def get_step(x):
global step
global value
# if x != value:
if check(x, value):
step += 1
value = x
return f'Step_{step}'
df['Step_ID'] = df['A'].apply(get_step)
df.to_csv('GSH0211.csv' , index=None)
Try this. You can adjust the threshold to the value you want.
df = pd.DataFrame({'A': ['0','0','0','8.020833015','8.009259224','8.003472328','8.020833015','0','0','5','4.994213104','0','0','0','8.012152672','8.009259224','0']})
df['A'] = df['A'].astype(float)
diff = df['A']-df['A'].shift().fillna(0)
threshold = 0.1
df['Step_ID'] = (abs(diff)>threshold).cumsum().add(1)
df['Step_ID'] = 'Step_' + df['Step_ID'].astype(str)
df
A Step_ID
0 0.000000 Step_1
1 0.000000 Step_1
2 0.000000 Step_1
3 8.020833 Step_2
4 8.009259 Step_2
5 8.003472 Step_2
6 8.020833 Step_2
7 0.000000 Step_3
8 0.000000 Step_3
9 5.000000 Step_4
10 4.994213 Step_4
11 0.000000 Step_5
12 0.000000 Step_5
13 0.000000 Step_5
14 8.012153 Step_6
15 8.009259 Step_6
16 0.000000 Step_7

Pandas set value if most columns are equal in a dataframe

starting by another my question I've done yesterday Pandas set value if all columns are equal in a dataframe
Starting by #anky_91 solution I'm working on something similar.
Instead of put 1 or -1 if all columns are equals I want something more flexible.
In fact I want 1 if (for example) the 70% percentage of the columns are 1, -1 for the same but inverse condition and 0 else.
So this is what I've wrote:
# Instead of using .all I use .sum to count the occurence of 1 and 0 for each row
m1 = local_df.eq(1).sum(axis=1)
m2 = local_df.eq(0).sum(axis=1)
# Debug print, it work
print(m1)
print(m2)
But I don't know how to change this part:
local_df['enseamble'] = np.select([m1, m2], [1, -1], 0)
m = local_df.drop(local_df.columns.difference(['enseamble']), axis=1)
I write in pseudo code what I want:
tot = m1 + m2
if m1 > m2
if(m1 * 100) / tot > 0.7 # simple percentage calculus
df['enseamble'] = 1
else if m2 > m1
if(m2 * 100) / tot > 0.7 # simple percentage calculus
df['enseamble'] = -1
else:
df['enseamble'] = 0
Thanks
Edit 1
This is an example of expected output:
NET_0 NET_1 NET_2 NET_3 NET_4 NET_5 NET_6
date
2009-08-02 0 1 1 1 0 1
2009-08-03 1 0 0 0 1 0
2009-08-04 1 1 1 0 0 0
date enseamble
2009-08-02 1 # because 1 is more than 70%
2009-08-03 -1 # because 0 is more than 70%
2009-08-04 0 # because 0 and 1 are 50-50
You could obtain the specified output from the following conditions:
thr = 0.7
c1 = (df.eq(1).sum(1)/df.shape[1]).gt(thr)
c2 = (df.eq(0).sum(1)/df.shape[1]).gt(thr)
c2.astype(int).mul(-1).add(c1)
Output
2009-08-02 0
2009-08-03 0
2009-08-04 0
2009-08-05 0
2009-08-06 -1
2009-08-07 1
dtype: int64
Or using np.select:
pd.DataFrame(np.select([c1,c2], [1,-1], 0), index=df.index, columns=['result'])
result
2009-08-02 0
2009-08-03 0
2009-08-04 0
2009-08-05 0
2009-08-06 -1
2009-08-07 1
Try with (m1 , m2 and tot are same as what you have):
cond1=(m1>m2)&((m1 * 100/tot).gt(0.7))
cond2=(m2>m1)&((m2 * 100/tot).gt(0.7))
df['enseamble'] =np.select([cond1,cond2],[1,-1],0)
m =df.drop(df.columns.difference(['enseamble']), axis=1)
print(m)
enseamble
date
2009-08-02 1
2009-08-03 -1
2009-08-04 0

Pandas convert positive number to 1 and negative number to -1

I have a column of positive and negative number. How to convert this column to a new column to realize convert positive number to 1 and negative number to -1?
You need numpy.sign
df['new'] = np.sign(df['col'])
Sample:
df = pd.DataFrame({ 'col':[-1,3,-5,7,1,0]})
df['new'] = np.sign(df['col'])
print (df)
col new
0 -1 -1
1 3 1
2 -5 -1
3 7 1
4 1 1
5 0 0
It's really easy to perform this task by -
For whole data frame -
df[df < 0] = -1
df[df > 0] = 1
For specific column -
df['column_name'][df['column_name'] < 0] = -1
df['column_name'][df['column_name'] > 0] = 1
df[df < 0] = -1
df[df > 0] = 1
no behaviour defined for df == 0

Categories