Create new column based on condtions of others

Create new column based on condtions of others - python

I have this df:
Segnale Prezzo Prezzo_exit
0 Long 44645 43302
1 Short 41169 44169
2 Long 44322 47093
3 Short 45323 42514
sample code to generate it:
tbl2 = {
"Segnale" : ["Long", "Short", "Long", "Short"],
"Prezzo" : [44645, 41169, 44322, 45323],
"Prezzo_exit" : [43302, 44169, 47093, 42514]}
df = pd.DataFrame(tbl2)
I need to create a new column named "esito" with this conditions:
if df["Segnale"] =="Long" and df["Prezzo"] < df["Prezzo_exit"] #row with "target"
if df["Segnale"] =="Long" and df["Prezzo"] > df["Prezzo_exit"] #row with "stop"
if df["Segnale"] =="Short" and df["Prezzo"] < df["Prezzo_exit"] #row with "stop"
if df["Segnale"] =="Short" and df["Prezzo"] > df["Prezzo_exit"] #row with "target"
So the final result will be:
Segnale Prezzo Prezzo_exit esito
0 Long 44645 43302 stop
1 Short 41169 44169 stop
2 Long 44322 47093 target
3 Short 45323 42514 target
I tried with no success:
df.loc[(df['Segnale'].str.contains('Long') & df['Prezzo'] <
df['Prezzo_exit']), 'Esito'] = 'Target'
df.loc[(df['Segnale'].str.contains('Long') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Stop'
df.loc[(df['Segnale'].str.contains('Short') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Target'
df.loc[(df['Segnale'].str.contains('Short') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Stop'

This will do what your question asks:
df.loc[(df.Segnale=='Long') & (df.Prezzo < df.Prezzo_exit), 'esito'] = 'target'
df.loc[(df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit), 'esito'] = 'stop'
df.loc[(df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit), 'esito'] = 'stop'
df.loc[(df.Segnale=='Short') & (df.Prezzo > df.Prezzo_exit), 'esito'] = 'target'
Output:
Segnale Prezzo Prezzo_exit esito
0 Long 44645 43302 stop
1 Short 41169 44169 stop
2 Long 44322 47093 target
3 Short 45323 42514 target
UPDATE:
You could also do this:
df['esito'] = ( pd.Series(['stop']*len(df)).where(
((df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit)) | ((df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit)),
'target') )
... or this:
df['esito'] = ( np.where(
((df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit)) | ((df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit)),
'stop', 'target') )

You need add parentheses to following comparison
(df['Prezzo'] < df['Prezzo_exit'])
For simplification, you can use np.select to select condition and choice in one statement.

Related

excel if and logic to data frame

I have aloe of excel files I am trying to convert to python codes and need some help :)
I have a data frame like this:
Date STD-3 STD-25 STD-2 STD-15 STD-1 Data STD1 STD15 STD2 STD25 STD3
11.05.2022 -0,057406797 -0,047838998 -0,038271198 -0,028703399 -0,019135599 0,021233631 0,019135599 0,028703399 0,038271198 0,047838998 0,057406797
I need to check for this logic:
"Data" < "STD1" and "Data" > "STD-1" = 0
"Data" > "STD1" and "Data" < "STD15" = 1
"Data" > "STD15" and "Data" < "STD2" = 1,5
"Data" > "STD2" and "Data" < "STD25" = 2
"Data" > "STD25" and "Data" < "STD3" = 2,5
"Data" > "STD3" = 3
"Data" < "STD-1" and "Data" > "STD-15" = -1
"Data" < "STD-15" and "Data" > "STD-2" = -1,5
"Data" < "STD-2" and "Data" > "STD-25" = -2
"Data" < "STD-25" and "Data" > "STD-3" = -2,5
"Data" > "STD3" = -3
And add the output to a new column.

condition = [((df['DATA'] < df['STD1']) & (df['DATA'] > df['STD-1'])), ((df['DATA'] > df['STD1']) & (df['DATA'] < df['STD15'])), ((df['DATA'] > df['STD15']) & (df['DATA'] < df['STD2'])), ((df['DATA'] > df['STD2']) & (df['DATA'] < df['STD25'])), ((df['DATA'] > df['STD25']) & (df['DATA'] < df['STD3'])), df['DATA'] > df['STD3'], ((df['DATA'] < df['STD-1']) & (df['DATA'] > df['STD-15'])), ((df['DATA'] < df['STD-15']) & (df['DATA'] > df['STD-2'])), ((df['DATA'] < df['STD-25']) & (df['DATA'] > df['STD-3'])), df['DATA'] > df['STD-3']]
result = [0, 1, 1.5, 2, 2.5, 3, -1, -1.5, -2.5, -3]
df['RESULT'] = np.select(condition, result, None)

My pandas logic doesn't seem to result what I want, no matter how many tests and changes I make

df['col1'] = df.loc[((df['NGPC PT'] > 1) | ((df['SC'] < 2)& (df['SC'] > 5)) & ((df['NGPC PT'] >4) & (df['NGPC PT'] <7))),'RULE OF NGPC'] ='SO'
Basically there are two values this is supposed to give OBS and SO, OBS is what is the standard value for col1 and whenever the values don't meet the below requirements they change to SO.
*note the below reqs are in C#:
d["RULE OF NGPC"] = (v["PT"].Equals("5") || v["PT"].Equals("6")) ? "OBS" : "SO";
int COUNT = 0;
if(v["PT"].Equals("1"))
{
if ((v["SC"].Equals("2"))&&(COUNT==0))
{
COUNT = COUNT + 1;
d["RULE OF NGPC"] = (v["PT"].Equals("1") && v["SC"].Equals("2")) ? "OBS" : "SO";
}
if ((v["SC"].Equals("3")) && (COUNT == 0))
{
COUNT = COUNT + 1;
d["RULE OF NGPC"] = (v["PT"].Equals("1") && v["SC"].Equals("3")) ? "OBS" : "SO";
}
if ((v["SC"].Equals("4")) && (COUNT == 0))
{
COUNT = COUNT + 1;
d["RULE OF NGPC"] = (v["PT"].Equals("1") && v["SC"].Equals("4")) ? "OBS" : "SO";
}
if ((v["SC"].Equals("5")) && (COUNT == 0))
{
COUNT = COUNT + 1;
d["RULE OF NGPC"] = (v["PT"].Equals("1") && v["SC"].Equals("5")) ? "OBS" : "SO";
}
}
d["NGPC PT"] = v["PT"];
d["SC"] = v["SC"];
The code in C# above is not mine, I am trying to convert it to python.

I suggest you first initialize the column with 'OBS', then replace the values based on your conditions:
df['output'] = 'OBS'
df.loc[((df['NGPC PT'] > 1) | ((df['SC'] < 2)& (df['SC'] > 5)) & ((df['NGPC PT'] >4) & (df['NGPC PT'] <7))),'output'] = 'SO'
This may work if none of the conditions in the expression are wrong.

Another Traceback Error When I Run My Python Code

I have a new Traceback Error When, I run my Python Code. It appears to be to do with the very last ) Parentheses, also maybe the last ] in my Code.
((df['Location'].str.contains('- Display')) &
df['Lancaster'] != 'L' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'SS' &
df['Hurricane'] != 'H'))
)]
And here is the Traceback Error I get :
File "<ipython-input-5-6d53e7e5ec10>", line 31
)
^
SyntaxError: invalid syntax
Here is my latest, whole Code John S, that works. I will let you know, if I get
more issues, many thanks for your help :
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get("http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/june07.html")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
df = df[1]
df = df.rename(columns=df.iloc[0])
df = df.iloc[2:]
df.head(15)
display = df[(df['Location'].str.contains('- Display')) & (df['Dakota'].str.contains('D')) & (df['Spitfire'].str.contains('S')) & (df['Lancaster'] != 'L')]
display </code>

You just have to many brackets
((df['Location'].str.contains('- Display') &
df['Lancaster'] == '' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'SS' &
df['Hurricane'] == ''))
You needed to remove a ')' after each ('- Display') it looks like you will still have some problems with sorting through your data. But this should get you past your syntax error.
Look at this online version so see my edits.
https://onlinegdb.com/Skceaucyr

you need to add ")]" in the end. So you variable southport will be now
Southport = df[
(
((df['Location'].str.contains('- Display') &
df['Lancaster'] != 'L' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'S' &
df['Hurricane'] == 'H'))
)
] | df[
(
((df['Location'].str.contains('- Display') &
df['Lancaster'] != 'L' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'S' &
df['Hurricane'] != 'H'))
)
] | df[
(
((df['Location'].str.contains('- Display') &
df['Lancaster'] != 'L' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'SS' &
df['Hurricane'] != 'H'))
)]

Inverse line graph year count matplotlib pandas python

I'm trying to create a lineplot of the count of three different groups i.e. desktop, mobile & tablet with the x axis having the years of 2014, 2015 and 2016 but I am getting the error
my code is currently:
#year-by-year change
desktop14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
desktop15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
desktop16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "DESKTOP"), "Gross_Demand_Pre_Credit"]
mobile14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
mobile15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
mobile16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "MOBILE"), "Gross_Demand_Pre_Credit"]
tablet14 = od.loc[(od.Account_Year_Week >= 201401) & (od.Account_Year_Week <= 201453) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
tablet15 = od.loc[(od.Account_Year_Week >= 201501) & (od.Account_Year_Week <= 201553) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
tablet16 = od.loc[(od.Account_Year_Week >= 201601) & (od.Account_Year_Week <= 201653) & (od.online_device_type_detail == "TABLET"), "Gross_Demand_Pre_Credit"]
devicedata = [["Desktop", desktop14.count(), desktop15.count(), desktop16.count()], ["Mobile", mobile14.count(), mobile15.count(), mobile16.count()], ["Tablet", tablet14.count(), tablet15.count(), tablet16.count()]]
df = pd.DataFrame(devicedata, columns=["Device", "2014", "2015", "2016"]).set_index("Device")
plt.show()
I want to make each of the lines the Device types and the x axis showing the change in year. How do I do this - (essentially reversing the axis).
any help is greatly appreciated

Just do
df.transpose().plot()
Result will be something like this:

python data frame filter conditions: any faster way

parts_list = imp_parts_df['Parts'].tolist()
sub_week_list = ['2016-12-11', '2016-12-04', '2016-11-27', '2016-11-20', '2016-11-13']
i = 0
start = DT.datetime.now()
for p in parts_list:
for thisdate in sub_week_list:
thisweek_start = pd.to_datetime(thisdate, format='%Y-%m-%d') #'2016/12/11'
thisweek_end = thisweek_start + DT.timedelta(days=7) # add 7 days to the week date
val_shipped = len(shipment_df[(shipment_df['loc'] == 'USW1') & (shipment_df['part'] == str(p)) & (shipment_df['shipped_date'] >= thisweek_start) & (shipment_df['shipped_date'] < thisweek_end)])
print(DT.datetime.now() - start).total_seconds()
shipment_df has around 35000 records
partlist has 436 parts
sub_week_list has 5 dates in it
it took overall 438.13 secs to run this code
Is there any faster way to do it?

parts_list = imp_parts_df['Parts'].astype(str).tolist()
i = 0
start = DT.datetime.now()
for p in parts_list:
q = 'loc == "xxx" & part == #p & "2016-11-20" <= shipped_date < "2016-11-27"'
val_shipped = len(shipment_df.query(q))
print (DT.datetime.now() - start).total_seconds()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create new column based on condtions of others - python

You need add parentheses to following comparison (df['Prezzo'] < df['Prezzo_exit']) For simplification, you can use np.select to select condition and choice in one statement.

Related

excel if and logic to data frame

My pandas logic doesn't seem to result what I want, no matter how many tests and changes I make

Another Traceback Error When I Run My Python Code

Inverse line graph year count matplotlib pandas python

python data frame filter conditions: any faster way

Categories

Resources