Another Traceback Error When I Run My Python Code - python

I have a new Traceback Error When, I run my Python Code. It appears to be to do with the very last ) Parentheses, also maybe the last ] in my Code.
((df['Location'].str.contains('- Display')) &
df['Lancaster'] != 'L' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'SS' &
df['Hurricane'] != 'H'))
)]
And here is the Traceback Error I get :
File "<ipython-input-5-6d53e7e5ec10>", line 31
)
^
SyntaxError: invalid syntax
Here is my latest, whole Code John S, that works. I will let you know, if I get
more issues, many thanks for your help :
import pandas as pd
import requests
from bs4 import BeautifulSoup
res = requests.get("http://web.archive.org/web/20070701133815/http://www.bbmf.co.uk/june07.html")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
df = df[1]
df = df.rename(columns=df.iloc[0])
df = df.iloc[2:]
df.head(15)
display = df[(df['Location'].str.contains('- Display')) & (df['Dakota'].str.contains('D')) & (df['Spitfire'].str.contains('S')) & (df['Lancaster'] != 'L')]
display </code>

You just have to many brackets
((df['Location'].str.contains('- Display') &
df['Lancaster'] == '' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'SS' &
df['Hurricane'] == ''))
You needed to remove a ')' after each ('- Display') it looks like you will still have some problems with sorting through your data. But this should get you past your syntax error.
Look at this online version so see my edits.
https://onlinegdb.com/Skceaucyr

you need to add ")]" in the end. So you variable southport will be now
Southport = df[
(
((df['Location'].str.contains('- Display') &
df['Lancaster'] != 'L' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'S' &
df['Hurricane'] == 'H'))
)
] | df[
(
((df['Location'].str.contains('- Display') &
df['Lancaster'] != 'L' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'S' &
df['Hurricane'] != 'H'))
)
] | df[
(
((df['Location'].str.contains('- Display') &
df['Lancaster'] != 'L' &
df['Dakota'] == 'D' &
df['Spitfire'] == 'SS' &
df['Hurricane'] != 'H'))
)]

Related

Optimize dataframe filtering on large datasets, pandas

I have a little challenge here and to be honest, I have absolutely no idea how to handle it.
I have this dataframe composed of 660,000 rows and about 50 columns. I need to filter this dataframe very frequently and retrieve the filtered dataframe as fast as possible (goal is to have a processing time <1second). I'd like to be able to run that locally on a laptop, therefore my "processing power" is limited.
I have multiple inputs to filter the dataframe, some are set manually (see input 1) some are retrieved from another script (see input 2, the other script is not included in the code here for simplicity).
I was hoping to simple filter through the dataset using df[(df.column == filtervalue)]. However, it seems that the processing time is way too long.
Therefore, I am wondering whether there are some technics to optimize such processing time or if on the contrary the only way to optimize that is to go with a server that has a good CPU / Memory capacity?
Thanks for the help
import pandas as pd
df = pd.read_csv('xxxxxxxx', sep=";", dtype={"id": str,"dataset1": str,"dataset2":str,"myposition":str,"bet_1_preflop":float,"bet_2_preflop":float,"bet_3_preflop":float,"bet_1_flop":float,"bet_2_flop":float,
"bet_3_flop":float,"bet_1_turn":float ,"bet_2_turn":float,"bet_3_turn":float,"bet_1_river":float,"bet_2_river":float, "bet_3_river":float,
"myhand":str,"myposition":str,"cards_flop":str,"cards_turn":str,"cards_river":str,"action1_preflop":str," action2_preflop":str,
"action3_preflop":str,"action4_preflop":str, "action1_flop":str, "action2_flop":str, "action3_flop":str,"action4_flop":str,"action1_turn":str,
"action2_turn":str, "action3_turn":str, "action4_turn":str, "action1_river":str,"action2_river":str, "action3_river":str, "action4_river":str,
"action1_preflop_binary":'Int64', "action2_preflop_binary":'Int64', "action3_preflop_binary":'Int64', "action4_preflop_binary":'Int64',
"action1_flop_binary":'Int64',"action2_flop_binary":'Int64', "action3_flop_binary":'Int64', "action4_flop_binary":'Int64', "action1_turn_binary":'Int64',
"action2_turn_binary":'Int64', "action3_turn_binary":'Int64', "action4_turn_binary":'Int64',"action1_river_binary":'Int64', "action2_river_binary":'Int64',
"action3_river_binary":'Int64', "action4_river_binary":'Int64', "tiers":'Int64',"assorties":str,
"besthand_flop":str,"checker_flop":float,"handtype_flop":str,"topsuite_flop":'Int64',"topcolor_flop":'Int64',"besthand_turn":str,"checker_turn":float,"handtype_turn":str,
"topsuite_turn":'Int64',"topcolor_turn":'Int64',"besthand_river":str,"checker_river":float,"handtype_river":str,"topsuite_river":'Int64',"topcolor_river":'Int64'})
df = df.reset_index()
#Inputs for filters 1
myposition ="sb"
myhand = "ackc"
flop = "ad9d4h"
turn = "8d"
river = "th"
a1_preflop = "r"
a2_preflop = "r"
a3_preflop = "c"
a4_preflop = ""
a1_flop = "r"
a2_flop = "f"
a3_flop = ""
a4_flop = ""
a1_turn = ""
a2_turn = ""
a3_turn = ""
a4_turn = ""
a1_river = ""
a2_river = ""
a3_river = ""
a4_river = ""
#Inputs for filters 2 (from a different script)
tiers
assorties_status
best_allhands_flop[0]
best_allhands_flop[1]
best_allhands_flop[2]
highest_suite_flop
highest_color_flop
best_allhands_turn[0]
best_allhands_turn[1]
best_allhands_turn[2]
highest_suite_turn
highest_color_turn
best_allhands_river[0]
best_allhands_river[1]
best_allhands_river[2]
highest_suite_river
highest_color_river
#filtre_preflop_a1 = df[(df.myposition == myposition) & (df.tiers == tiers) & (df.assorties == assorties_status) & (df.action1_preflop == a1_preflop)]
#filtre_preflop_a2 = df[(df.myposition == myposition) & (df.tiers == tiers) & (df.assorties == assorties_status) & (df.action1_preflop == a1_preflop) & (df.action2_preflop == a2_preflop)]
#filtre_preflop_a3 = df[(df.myposition == myposition) & (df.tiers == tiers) & (df.assorties == assorties_status) & (df.action1_preflop == a1_preflop) & (df.action2_preflop == a2_preflop) & (df.action3_preflop == a3_preflop)]
#filtre_preflop_a4 = df[(df.myposition == myposition) & (df.tiers == tiers) & (df.assorties == assorties_status) & (df.action1_preflop == a1_preflop) & (df.action2_preflop == a2_preflop) & (df.action3_preflop == a3_preflop) & (df.action4_preflop == a4_preflop)]

Create new column based on condtions of others

I have this df:
Segnale Prezzo Prezzo_exit
0 Long 44645 43302
1 Short 41169 44169
2 Long 44322 47093
3 Short 45323 42514
sample code to generate it:
tbl2 = {
"Segnale" : ["Long", "Short", "Long", "Short"],
"Prezzo" : [44645, 41169, 44322, 45323],
"Prezzo_exit" : [43302, 44169, 47093, 42514]}
df = pd.DataFrame(tbl2)
I need to create a new column named "esito" with this conditions:
if df["Segnale"] =="Long" and df["Prezzo"] < df["Prezzo_exit"] #row with "target"
if df["Segnale"] =="Long" and df["Prezzo"] > df["Prezzo_exit"] #row with "stop"
if df["Segnale"] =="Short" and df["Prezzo"] < df["Prezzo_exit"] #row with "stop"
if df["Segnale"] =="Short" and df["Prezzo"] > df["Prezzo_exit"] #row with "target"
So the final result will be:
Segnale Prezzo Prezzo_exit esito
0 Long 44645 43302 stop
1 Short 41169 44169 stop
2 Long 44322 47093 target
3 Short 45323 42514 target
I tried with no success:
df.loc[(df['Segnale'].str.contains('Long') & df['Prezzo'] <
df['Prezzo_exit']), 'Esito'] = 'Target'
df.loc[(df['Segnale'].str.contains('Long') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Stop'
df.loc[(df['Segnale'].str.contains('Short') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Target'
df.loc[(df['Segnale'].str.contains('Short') & df['Prezzo'] > df['Prezzo_exit']), 'Esito'] =
'Stop'
This will do what your question asks:
df.loc[(df.Segnale=='Long') & (df.Prezzo < df.Prezzo_exit), 'esito'] = 'target'
df.loc[(df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit), 'esito'] = 'stop'
df.loc[(df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit), 'esito'] = 'stop'
df.loc[(df.Segnale=='Short') & (df.Prezzo > df.Prezzo_exit), 'esito'] = 'target'
Output:
Segnale Prezzo Prezzo_exit esito
0 Long 44645 43302 stop
1 Short 41169 44169 stop
2 Long 44322 47093 target
3 Short 45323 42514 target
UPDATE:
You could also do this:
df['esito'] = ( pd.Series(['stop']*len(df)).where(
((df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit)) | ((df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit)),
'target') )
... or this:
df['esito'] = ( np.where(
((df.Segnale=='Long') & (df.Prezzo > df.Prezzo_exit)) | ((df.Segnale=='Short') & (df.Prezzo < df.Prezzo_exit)),
'stop', 'target') )
You need add parentheses to following comparison
(df['Prezzo'] < df['Prezzo_exit'])
For simplification, you can use np.select to select condition and choice in one statement.

Trying to Generate an NFT Using .CSV Metadata and Pandas

I have been scratching my head at generating my actual NFT's from a .csv file for a long time and looking for resources has been challenging at the very least for my Hardcoding Method (Following a Guide) If Anyone could Look through what I have and Offer some Help Figuring out what's going on I would be FOREVER Endebted to you!
def generateOneRandRow(ADATvID):
FILENAME = "ADA Tv" + str(ADATvID)
NO = ADATvID
BACKGROUND = randBackground()
ACCESSORIES = randAccessories()
HEAD = randHead()
HAT = randHat()
BODY = randBody()
CHEST = randChest()
ARMS = randArms()
FACE = randFace()
singleRow = [FILENAME,NO,BACKGROUND,ACCESSORIES,HEAD,HAT,BODY,CHEST,ARMS,FACE]
testThisRow =["ADA Tv2925","2925","cnft","couchbear","bnw","mullet","damagedorange","bluesuit","greenlightsaber","inlove"]
def checkIfExists(checkRow):
aData = pd.read_csv('adalist.csv')
index_list = aData[(aData['Background'] == checkRow[2])] & (aData['Accessories'] == checkRow[3]) & (aData['Head'] == checkRow[4]) & (aData['Hat'] == checkRow[5]) & (aData['Body'] == checkRow[6]) &(aData['Chest'] ==checkRow[7]) & (aData['Arms'] ==checkRow[7]) & (aData['Face'] == checkRow[8]).index.tolist()
print(index_list)
if index_list == []:
return False
else:
return True
checkIfExists(testThisRow)
Error Messages... Help a Python Noob Out Please! and Feel Free To FLAME Me If It's Super Obvious. THANKS!!
Change:
index_list = aData[(aData['Background'] == checkRow[2])] & (aData['Accessories'] == checkRow[3]) & (aData['Head'] == checkRow[4]) & (aData['Hat'] == checkRow[5]) & (aData['Body'] == checkRow[6]) &(aData['Chest'] ==checkRow[7]) & (aData['Arms'] ==checkRow[7]) & (aData['Face'] == checkRow[8]).index.tolist()
to:
index_list = aData[(aData['Background'] == checkRow[2])
& (aData['Accessories'] == checkRow[3]) &
(aData['Head'] == checkRow[4]) &
(aData['Hat'] == checkRow[5]) &
(aData['Body'] == checkRow[6]) &
(aData['Chest'] ==checkRow[7]) &
(aData['Arms'] ==checkRow[7]) &
(aData['Face'] == checkRow[8])].index.tolist()
Because you did not provide data, i reproduced your error as follows:
df = pd.DataFrame({'a':[1,'2'], 'b':[5,6]})
df[(df['a']=='2')]&(df['b']==6).index.tolist()
With error:
TypeError: unsupported operand type(s) for &: 'str' and 'int'
Editing the brackets:
df = pd.DataFrame({'a':[1,'2'], 'b':[5,6]})
df[(df['a']=='2')&(df['b']==6)].index.tolist()
With no error.

Alternative to irregular nested np.where clauses

I'm struggling to simplify my irregular nested np.where clauses. Is there a way to make the code more readable?
df["COL"] = np.where(
(df["A1"] == df["B1"]) & (df["A1"].notna()),
np.where(
(df["A1"] == df["C"]),
np.where(
(df["A"] == df["B"]) & df["A"].notna() & (df["A"] != df["A1"]),
"Text1",
df["A1"]
),
"Text2"
),
np.where(
(df["A"] == df["B"]) & (df["A"].notna()),
np.where(
(df["A"] == df["C"]),
df["A"],
"Text1"
),
np.where(
(df["C"].notna()),
df["C"],
"Text3"
)
)
)
Using np.select as suggested by #sammywemmy:
# Create boolean masks
m1 = (df["A1"] == df["B1"]) & (df["A1"].notna())
m11 = (df["A1"] == df["C"])
m12 = (df["A"] == df["B"]) & (df["A"].notna())
m111 = (df["A"] == df["B"]) & df["A"].notna() & (df["A"] != df["A1"])
m121 = (df["A"] == df["C"])
m122 = (df["C"].notna())
# Combine them
condlist = [m1 & m11 & m111,
m1 & m11 & ~m111,
m1 & ~m11,
~m1 & m12 & m121,
~m1 & m12 & ~m121
~m1 & ~m12 & m122,
~m1 & ~m12 & ~m122]
# Values for each combination
choicelist = ["Text1", df["A1"], "Text2", df["A"], "Text1", df["C"], "Text3"]
out = np.select(condlist, choicelist)

Why does the pandas boolean mask not give me the desired result? What am i missing here?

all help appreciated on the following:
I have the following code implemented, which filters results from a pandas dataFrame in 4 steps:
mask = ( (stock_hist['confirmed']== True) &\
(stock_hist['prevday_confirmed'] == False) & \
(stock_hist['nextday_confirmed'] == False) &\
(stock_hist['nextday_above_supp'] == True) &\
(stock_hist['prevday_above_supp'] == True)
)
result1 = stock_hist[mask]
mask = ( \
(stock_hist['confirmed'] == True) & \
(stock_hist['prevday_confirmed'] == False) & \
(stock_hist['prevday_above_supp'] == True) &\
(stock_hist['nextday_confirmed'] == True) & \
(stock_hist['current_dist'] < stock_hist['nextday_dist']) \
)
result2 = stock_hist[mask]
mask = ( (stock_hist['confirmed']== True) &\
(stock_hist['prevday_confirmed'] == True) & \
(stock_hist['nextday_confirmed'] == False) &\
(stock_hist['nextday_above_supp'] == True) &\
(stock_hist['current_dist'] < stock_hist['prevday_dist'])
)
result3 = stock_hist[mask]
mask = ( (stock_hist['confirmed']== True) &\
(stock_hist['prevday_confirmed'] == True) & \
(stock_hist['nextday_confirmed'] == True) &\
(stock_hist['current_dist'] < stock_hist['prevday_dist']) &\
(stock_hist['current_dist'] < stock_hist['nextday_dist'])
)
result4 = stock_hist[mask]
result = result1.append([result2, result3, result4])
Now, this code does exactly what I expect it to do.
However, I would expect that I should be able to do this in one single mask, like so:
mask = ( (stock_hist['confirmed']== True) &\
~(stock_hist['prevday_confirmed'] == False) & \
~(stock_hist['nextday_confirmed'] == False) &\
~(stock_hist['nextday_above_supp'] == True) &\
~(stock_hist['prevday_above_supp'] == True) \
| \
(stock_hist['confirmed'] == True) & \
~(stock_hist['prevday_confirmed'] == False) & \
~(stock_hist['prevday_above_supp'] == True) &\
~(stock_hist['nextday_confirmed'] == True) & \
~(stock_hist['current_dist'] < stock_hist['nextday_dist']) \
| \
:
:
:
But when I do that, it is as if the | acts as an & ? Because it renders FALSE for the entire mask, also for those rows that get filtered out succesfully with the first code....
What am i missing here?
This is a typical example of why the order of operations matters: just like 2 + 3 x 4 is not equal to (2 + 3) x 4, you need to add one more layer of parenthesis between your conditions.
(A & B | C & D) != ((A & B) | (C & D))
In your case, if of the mask has to be put between parenthesis in order to cumulate them in one condition using |

Categories