I created a dataframe with pandas and calculated the percentage of earning or losing:
and I hope I can design 2 column like entered market and trail for a trailing stop backtest for example 5% like:
earning/losing entered market trail
0 0 0
1 1 1
2 1 2
3 1 3
7 1 7
4 1 7
5 1 7
8 1 8
2 0 0
5 0 0
4 0 0
I had tried using numpy condition to create like but I can't complete
the rest of condition:
condition = [(df['earning/losing'] > 0) & (df['earning/losing'] >df['earning/losing'].shift(-1)) & (df['earning/losing'] - df['earning/losing'].shift(-1) < 5),]
value = [df.earning/losing,]
df['trail'] = np.select(condition,value,default = 0)
I think if I could create a column like trail, then I can judge the trailing condition,
But I dont know how to create the trail column in pandas
can anyone help me out? thanks alot!
Related
I've a question about a specific problem in pandas:
I have in a df a column with the following values:
5
4
3
2
1
0
0
0
0
1
2
3
4
5
I want to select all the rows from the first 5 to the last 0:
5
4
3
2
1
0
0
0
0
I tried with drop duplicates, but i loose the last three zeroes.
I'm thinking aboutusing a for cycle and stop when the i-th value of the column is greater than the i-1 value, but i don't know how to make such a cycle for a dataframe in pandas.
Can someone help me?
Thank you in advance, I hope I've explained the problem clearly.
You could use DataFrame.shift to compare with the next row, and keep only those that are less or equal than the previous. Here I use np.r_ to include the first value too:
import numpy as np
df[np.r_[True, df.col.le(df.col.shift()).to_numpy()[1:]]]
col
0 5
1 4
2 3
3 2
4 1
5 0
6 0
7 0
8 0
Let us try cummax:
s = df['col']
df.loc[s.eq(5).cummax() & s[::-1].eq(0).cummax()]
col
0 5
1 4
2 3
3 2
4 1
5 0
6 0
7 0
8 0
I am new in python. I have a Column in MS Excel file, in which four tag are used which are LOC , ORG , PER and MISC ,given data is like this:
1 LOC/Thai Buddhist temple;
2 PER/louis;
3 ORG/WikiLeaks;LOC/Southern Ocean;
4 ORG/queen;
5 PER/Sanchez;PER/Eli Wallach;MISC/The Good, The Bad and the Ugly;
6
7 PER/Thomas Watson;
...................
...................
.............#continue upto 2,000 rows
and i want a Result that in the specific row which tag is present or not ,if some tag is present then in their specific (NEW Columns which are shown below) column put "1" and if not present any tag then put "0" . I want all 4 columns in this excel file which are LOC/ORG/PER/MISC and will be 2nd ,3rd, 4th and 5th column while first column is given data,and file is contains almost 2815 rows and every row has different tag from these LOC/ORG/PER/MISC .
My goal is to count from the new columns
total number of LOC, total number of ORG, total number of PER and total number of MISC
The result will be like this:
given data LOC ORG PER MISC
1 LOC/Thai Buddhist temple; 1 0 0 0 #here only LOC is present
2 PER/louis; 0 0 1 0 #here only PER is present
3 ORG/WikiLeaks;LOC/Southern Ocean; 1 1 0 0 #here LOC and ORG is present
4 PER/Eli Wallach;MISC/The Good; 0 0 1 1 #here PER and MISC is present
5 .................................................
6 0 0 0 0 #here no tag is present
7 .....................................................
.......................................................
..................................continue up to 2815 rows....
I am beginner in Python.so, I have tried my best to search out its solution code but, I cannot find any program related to my problem that's why I posted here. so, kindly anyone helps me.
I assume you have successfully read the data from excel and created a dataframe in python using pandas (To read the excel file we have df1 = read_excel("File/path/name.xls" Header = True/False)).
Here is the layout of your dataframe df1
Colnum | Tagstring
1 |LOC/Thai Buddhist temple;
2 |PER/louis;
3 |ORG/WikiLeaks;LOC/Southern Ocean;
4 |ORG/queen;
5 |PER/Sanchez;PER/Eli Wallach;MISC/The Good, The Bad and the Ugly;
6 |PER/Thomas Watson;
Now, there are couple of ways to search for text in a string.
I will demonstrate find function :
Syntax : str.find(str, beg=0, end=len(string))
str1 = "LOC";
str2 = "PER";
str3 = "ORG";
str4 = "MISC";
df1["LOC"] = (if Tagstring.find(str1) >= 0 then 1 else 0).astype('int')
df1["PER"] = (if Tagstring.find(str2) >= 0 then 1 else 0).astype('int')
df1["ORG"] = (if Tagstring.find(str3) >= 0 then 1 else 0).astype('int')
df1["MISC"] = (if Tagstring.find(str4) >= 0 then 1 else 0).astype('int')
if you have read your data, df then you can do:
pd.concat([df,pd.DataFrame({i:df.Tagstring.str.contains(i).astype(int) for i in 'LOC ORG PER MISC'.split()})],axis=1)
Out[716]:
Tagstring LOC ORG PER MISC
Colnum
1 LOC/Thai Buddhist temple; 1 0 0 0
2 PER/louis; 0 0 1 0
3 ORG/WikiLeaks;LOC/Southern Ocean; 1 1 0 0
4 ORG/queen; 0 1 0 0
5 PER/Sanchez;PER/Eli Wallach;MISC/The Good, The... 0 0 1 1
6 PER/Thomas Watson; 0 0 1 0
I'm a new Python user (making the shift from VBA) and am having trouble figuring out Python's loop function. I have a dataframe df, and I want to create a column of variables based on some condition being met in another column, based on a loop. Something like the below:
cycle = 5
dummy = 1
for i = 1 to cycle
if df["high"].iloc[i] >= df["exit"].iloc[i] and
df["low"].iloc[i] <= df["exit"].iloc[i] then
df["signal"] = dummy
break
elif i = cycle
df["signal"] = cycle + 1
break
else:
dummy = dummy + 1
next i
Basically trying to find in which column over the next columns up to the cycle variable are the conditions in the if statement met, and if they're never met, assign cycle + 1. So df["signal"] will be a column of numbers ranging 1 -> (cycle + 1). Also, there are some NaN values in df["exit"], not sure how that affects the loop.
I've found fairly extensive documentation on row iterations on the site, I feel like this is close to where I need to get to, but can't figure out how to adapt it. Thanks for any advice!
EDIT: INCLUDED DATA SAMPLE FROM EXCEL CELLS BELOW:
high low EXIT test signal/(OUTPUT COLUMN)
4 3 4 1 1
2 2 2 1 1
2 3 5 0 6
4 3 1 0 5
2 5 2 0 4
5 5 1 0 3
3 1 5 0 2
5 1 5 1 1
1 1 4 0 0
EDIT 2: FURTHER CLARIFICATION AROUND SCRIPT
Once the condition
df["high"].iloc[i] >= df["exit"].iloc[i] and
df["low"].iloc[i] <= df["exit"].iloc[i]
is met in the loop, it should terminate for that particular instance/row.
EDIT 3: EXPECTED OUTPUT
The expected output is the df["signal"] column - it is the first instance in the loop where the condition
df["high"].iloc[i] >= df["exit"].iloc[i] and
df["low"].iloc[i] <= df["exit"].iloc[i]
is met in any given row. The output in df["signal"] is effectively i from the loop, or the given iteration.
here is how I would solve the problem, the column 'gr' must not exist before doing this:
# first check all the rows meeting the conditions and add 1 in a temporary column gr
df.loc[(df["high"] >= df["exit"]) & (df["low"] <= df["exit"]), 'gr'] = 1
# manipulate column gr to use groupby after
df['gr'] = df['gr'].cumsum().bfill()
# use cumcount after groupby to recalculate signal
df.loc[:,'signal'] = df.groupby('gr').cumcount(ascending=False).add(1)
# cut the value in signal to the value cycle + 1
df.loc[df['signal'] > cycle, 'signal'] = cycle + 1
# drop column gr
df = df.drop('gr',1)
and you get
high low exit signal
0 4 3 4 1
1 2 2 2 1
2 2 3 5 6
3 4 3 1 5
4 2 5 2 4
5 5 5 1 3
6 3 1 5 2
7 5 1 5 1
8 1 1 4 1
Note: The last row is not working properly as never a row with the condition is met after, and not sure how it will be in the full data or how to handle this. You may consider to add df = df.dropna(subset=['gr']) after the line starting with df['gr'] = ... to drop these last rows, up to you.
I have a df with badminton scores. Each sets of a games for a team are on rows and the score at each point on the columns like so:
0 0 1 1 2 3 4
0 1 2 3 3 4 4
I want to obtain only O and 1 when a point is scored, like so: (to analyse if there any pattern in the points):
0 0 1 0 1 1 1
0 1 1 1 0 1 0
I was thinking of using df.itertuples() and iloc and conditions to attribute 1 to new dataframe if next score = score+1 or 0 if next score = score + 1
But I dont know how to iterate through the generated tuples and how to generate my new df with the 0 and 1 at the good locations.
Hope that is clear thanks for your help.
Oh also, any suggestions to analyse the patterns after that ?
You just need diff(If you need convert it back try cumsum)
df.diff(axis=1).fillna(0).astype(int)
Out[1382]:
1 2 3 4 5 6 7
0 0 0 1 0 1 1 1
1 0 1 1 1 0 1 0
I have a Pandas DataFrame which looks like this:
top heading page_no
0 000000 Intro 0
1 100164 Summary 1
2 100451 Experience 1
3 200131 Awards 2
4 200287 Skills 2
5 300147 Education 3
6 300273 Awards 3
7 300329 Interests 3
8 300434 Certifications 3
9 401135 End 4
I have used a filter which uses this dataframe to get the contents from another dataframe. It needs to filter everything between the tops i.e. from 000000 to 100164 and so on till 300434 to 401135.
for index,row in df_heads.iterrows():
begin = int(row['top'])
end = ???
filter_result = result['data'][(result.top < end) & (result.top > begin)]
print(row['heading'])
print(filter_result)
sections[row['heading']] = filter_result
end = begin
What should end be initialized with so that we get the contents of the filter in the correct way ?
I think you can create new column by shift and then replace last NaN to 0 if necessary by fillna:
df_heads['shifted_top'] = df_heads['top'].shift(-1).fillna(0)
print (df_heads)
top heading page_no shifted_top
0 0 Intro 0 100164.0
1 100164 Summary 1 100451.0
2 100451 Experience 1 200131.0
3 200131 Awards 2 200287.0
4 200287 Skills 2 300147.0
5 300147 Education 3 300273.0
6 300273 Awards 3 300329.0
7 300329 Interests 3 300434.0
8 300434 Certifications 3 401135.0
9 401135 End 4 0.0
for index,row in df_heads.iterrows():
begin = int(row['top'])
end = int(row['shifted_top'])
print (begin, end)
0 100164
100164 100451
100451 200131
200131 200287
200287 300147
300147 300273
300273 300329
300329 300434
300434 401135
401135 0
you cannot access a different row's data using a for index, row in df_heads.iterrows() loop. There needs to be an additional variable create outside of the loop with the different row's data as in the example from above.
df_heads['shifted_top'] = df_heads['top'].shift(-1).fillna(0)