Python - Next in for loop on try - python

I have a for loop that itterates over a list of URLS. The urls are then loaded by the chrome driver. Some urls will load a page that is in a 'bad' format and it will fail the first xpath test. If it does I want it to go back to the next element in the loop. My cleanup code works but I can't seem to get it to go to next element in the for loop. I have an except that closes my websbrowser but nothing I tried would all me to then loop back to 'for row in mysql_cats'
for row in mysql_cats :
print ('Here is the url -', row[1])
cat_url=(row[1])
driver = webdriver.Chrome()
driver.get(cat_url); #Download the URL passed from mysql
try:
CategoryName= driver.find_element_by_xpath('//h1[#class="categoryL3"]|//h1[#class="categoryL4"]').text #finds either L3 or L4 catagory
except:
driver.close()
#this does close the webriver okay if it can't find the xpath, but not I cant get code here to get it to go to the next row in mysql_cats

I hope that you're closing the driver at the end of this code also if no exceptions occurs.
If you want to start from the beginning of the loop when an exception is raised, you may add continue, as suggested in other answers:
try:
CategoryName=driver.find_element_by_xpath('//h1[#class="categoryL3"]|//h1[#class="categoryL4"]').text #finds either L3 or L4 catagory
except NoSuchElementException:
driver.close()
continue # jumps at the beginning of the for loop
since I do not know your code, the following tip may be useless, but a common way to handle this cases is a try/except/finally clause:
for row in mysql_cats :
print ('Here is the url -', row[1])
cat_url=(row[1])
driver = webdriver.Chrome()
driver.get(cat_url); #Download the URL passed from mysql
try:
# my code, with dangerous stuff
except NoSuchElementException:
# handling of 'NoSuchElementException'. no need to 'continue'
except SomeOtherUglyException:
# handling of 'SomeOtherUglyException'
finally: # Code that is ALWAYS executed, with or without exceptions
driver.close()
I'm also assuming that you're creating new drivers each time for a reason. If it is not voluntary, you may use something like this:
driver = webdriver.Chrome()
for row in mysql_cats :
print ('Here is the url -', row[1])
cat_url=(row[1])
driver.get(cat_url); #Download the URL passed from mysql
try:
# my code, with dangerous stuff
except NoSuchElementException:
# handling of 'NoSuchElementException'. no need to 'continue'
except SomeOtherUglyException:
# handling of 'SomeOtherUglyException'
driver.close()
In this way, you have only one driver that manages all the pages you're trying to open in the for loop
have a look somewhere about how the try/except/finally is really useful when handling connections and drivers.
As a foot note, I'd like you to notice how in the code I always specify which exception I am expecting: catching all the exception can be dangerous. BTW, probably no one will die if you simply use except:

Related

Else statement is not getting executed. IF statement is the only one being run? [duplicate]

This question already has answers here:
Checking if an element exists with Python Selenium
(14 answers)
Closed 2 years ago.
I have this script that goes to google takeout and downloads the takeout file. The issue I am having is, some accounts require a relogin when the "Takeout_dlbtn" is clicked and others can download directly. I tried to put this IF ELSE statement. Where if "password" is displayed, we run through the steps to re-enter password and then download. And if the "password" does not appear, we assume we do not need to login and can download right away.
The issue I am having is, my IF statement runs, and if the password page is shown, it will enter password and then download. But if the password page is not shown, the IF statement is still run and my ELSE statement does not get executed. It throws this error "no such element :Unable to locate element: {"Method":" css select","selector":"[name-"password"]"}"
How do I get the ELSE function to run if the "password" element is not shown? Any help would be appreciated.
def DLTakeout(self, email, password):
self.driver.get("https://takeout.google.com/")
time.sleep(2)
self.driver.find_element_by_xpath(takeout_DLbtn).click()
time.sleep(4)
if self.driver.find_element_by_name("password").is_displayed():
print("IF statement")
time.sleep(3)
self.driver.find_element_by_name("password").clear()
time.sleep(5)
self.driver.find_element_by_name("password").send_keys(password)
time.sleep(5)
self.driver.find_element_by_xpath("//*[#id='passwordNext']/div/button/div[2]").click()
print("Downloading")
logging.info("Downloading")
time.sleep(10) #change time.sleep to a higher number when download file is in gbs #1500
self.write_yaml(email)
print("Write to yaml successfully")
logging.info("Write to yaml successfully")
print("Zip move process about to start")
logging.info("Zip move process about to start")
time.sleep(4)
else:
print("else statement")
time.sleep(5)
print("Downloading")
logging.info("Downloading")
according to the Selenium docs:
4.2. Locating by Name
Use this when you know the name attribute of an element. With this
strategy, the first element with a matching name attribute will be
returned. If no element has a matching name attribute, a
NoSuchElementException will be raised.
So instead of if/else, try a try/catch block:
try:
self.driver.find_element_by_name("password")
print ("if statement")
...
except NoSuchElementException:
print ("else statement")
...

switching between windows using chrome webdriver is not working

I am new to python and webscraping, I want to retrieve some information on a website but some of the information is displayed on a popup window. The problem I'm having now is switching from the main page to the popup window to get the html and then switch back to the main page.
In order words after getting some information from page A, i need to switch to this link
https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage=0VET6
to obtain the company name and then switch back to page A
this a section of the code that I have
for batch in containers:
try:
print(" ")
awardid = batch.find('a', text=True).text
ordernumber = batch.find_all('span')[1].text
cagecode = batch.find_all('span')[6].text
price = batch.find_all('span')[7].text
date = batch.find_all('span')[8].text
NSN = batch.find_all('span')[10].text
Nomenclature= batch.find_all('span')[11].text
purchasereq = batch.find_all('span')[12].text
if cagecode :
cagelink = 'https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage='+cagecode
#switch to pop-up window
try:
driver.execute_script("window.open("+cagelink+",'new window')")
except Exception as e:
print("error from driver ", str(e))
continue
try:
cagesoup = BeautifulSoup(driver.page_source,"lxml")
bodycontainer = cagesoup.find("tbody")
print('cage code body',bodycontainer)
except Exception as e:
print("error from soup ", str(e))
continue
except Exception as e:
print(colorama.Fore.MAGENTA + "award error.."+ str(e) )
# print(container1)
continue
except Exception as e:
continue
the problem I have now is that i am getting an error that I am missing a ) at the end of the argument here
driver.execute_script("window.open("+cagelink+",'new window')")
and when I try to remove the cagelink and use
https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage=0VET6
it display none. what am I doing wrongly and how can I switch between the windows to obtain the company name on the popup window?
One thing that's for sure incorrect is the argument you are passing to the js method - it is not quoted, e.g. not treated as string. This line here:
driver.execute_script("window.open("+cagelink+",'new window')")
Should be:
driver.execute_script("window.open('"+cagelink+"','new window')")
Even more readable would be to use string formatting:
driver.execute_script("window.open('{}','new window')".format(cagelink))

How to reuse a selenium driver instance during parallel processing?

To scrape a pool of URLs, I am paralell processing selenium with joblib. In this context, I am facing two challenges:
Challenge 1 is to speed up this process. In the moment, my code opens and closes a driver instance for every URL (ideally would be one for every process)
Challenge 2 is to get rid of the CPU-intensive while loop that I think I need to continue on empty results (I know that this is most likely wrong)
Pseudocode:
URL_list = [URL1, URL2, URL3, ..., URL100000] # List of URLs to be scraped
def scrape(URL):
while True: # Loop needed to use continue
try: # Try scraping
driver = webdriver.Firefox(executable_path=path) # Set up driver
website = driver.get(URL) # Get URL
results = do_something(website) # Get results from URL content
driver.close() # Close worker
if len(results) == 0: # If do_something() failed:
continue # THEN Worker to skip URL
else: # If do_something() worked:
safe_results("results.csv") # THEN Save results
break # Go to next worker/URL
except Exception as e: # If something weird happens:
save_exception(URL, e) # THEN Save error message
break # Go to next worker/URL
Parallel(n_jobs = 40)(delayed(scrape)(URL) for URL in URL_list))) # Run in 40 processes
My understanding is that in order to re-use a driver instance across iterations, the # Set up driver-line needs to be placed outside scrape(URL). However, everything outside scrape(URL) will not find its way to joblib's Parallel(n_jobs = 40). This would imply that you can't reuse driver instances while scraping with joblib which can't be true.
Q1: How to reuse driver instances during parallel processing in the above example?
Q2: How to get rid of the while-loop while maintaining functionality in the above-mentioned example?
Note: Flash and image loading is disabled in firefox_profile (code not shown)
1) You should first create a bunch of drivers: one for each process. And pass an instance to the worker. I don't know how to pass drivers to an Prallel object, but you could use threading.current_thread().name key to identify drivers. To do that, use backend="threading". So now each thread will has its own driver.
2) You don't need a loop at all. Parallel object itself iter all your urls (I hope I realy understend your intentions to use a loop)
import threading
from joblib import Parallel, delayed
from selenium import webdriver
def scrape(URL):
try:
driver = drivers[threading.current_thread().name]
except KeyError:
drivers[threading.current_thread().name] = webdriver.Firefox()
driver = drivers[threading.current_thread().name]
driver.get(URL)
results = do_something(driver)
if results:
safe_results("results.csv")
drivers = {}
Parallel(n_jobs=-1, backend="threading")(delayed(scrape)(URL) for URL in URL_list)
for driver in drivers.values():
driver.quit()
But I don't realy think you get profit in using n_job more than you have CPUs. So n_jobs=-1 is the best (of course I may be wrong, try it).

Selenium won't open a new url in for loop (Python & Chrome)

I can't seem to get Selenium to run this for loop correctly. It runs the first time without issue but when it starts the second loop to the program just stops running with no error message. I get the same results when I attempt this with a firefox browser. Maybe it has to do with me trying to start a browser instance when one is already running?
def bookroom(self):
sessionrooms=["611", "618"] #the list being used by the for loop
driver = webdriver.Firefox()
#for loop trying each room
for rooms in sessionrooms:
room=roomoptions[rooms][0]
sidenum=roomoptions[rooms][1]
bookingurl="https://coparooms.newschool.edu/copa-scheduler/Web/reservation.php?rid="+room+"&sid="+sidenum+"&rd="+self.startdate
driver.get(bookingurl)
time.sleep(3)
usernamefield = driver.find_element_by_id("email")
usernamefield.send_keys(self.username)
passwordfield = driver.find_element_by_id("password")
passwordfield.send_keys(self.password)
passwordfield.send_keys(Keys.RETURN)
time.sleep(5)
begin=Select(driver.find_element_by_name("beginPeriod"))
print(self.starttime)
begin.select_by_visible_text(convertarmy(self.starttime))
end=Select(driver.find_element_by_name("endPeriod"))
end.select_by_visible_text(convertarmy(self.endtime))
creates=driver.find_element_by_xpath("//button[contains(.,'Create')]")
creates.click() #clicks the confirm button
time.sleep(8)
xpathtest=driver.find_element_by_id("result").text
#if statement checks if creation was a success. If it is, exit the browser.
if "success" in xpathtest or "Success" in xpathtest:
print "Success!"
driver.exit()
else:
print "Failure"
time.sleep(2)
#if creation was not success try the next room in sessionrooms
Update:
I found the problem, it was just a matter of uneven spacing. Only some of the loop was "in the loop".
if "success" in xpathtest or "Success" in xpathtest:
print "Success!"
driver.exit()
You will want to break out of your loop here, as you're closing the driver, but using it in the next iteration of the loop without starting a new driver object

Pandas: Repeat function for current keyword if except

I have built a web scraper. The program enters searchterm into a searchbox and grabs the results. Pandas goes through a spreadsheet line-by-line in a column to retrieve each searchterm.
Sometimes the page doesn't load properly, prompting a refresh.
I need a way for it to repeat the function and try the same searchterm if it fails. Right now, if I return, it would go on to the next line in the spreadsheet.
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
df = pd.read_csv(searchterms.csv, delimiter=",")
def scrape(searchterm):
#Loads url
searchbox = driver.find_element_by_name("searchbox")
searchbox.clear()
searchbox.send_keys(searchterm)
print "Searching for %s ..." % searchterm
no_result = True
while no_result is True:
try:
#Find results, grab them
no_result = False
except:
#Refresh page and do the above again for the current searchterm - How?
driver.refresh()
return pd.Series([col1, col2])
df[["Column 1", "Column 2"]] = df["searchterm"].apply(scrape)
#Executes crawl for each line in csv
The try except construct comes with else clause. The else block is executed if everything goes OK. :
def scrape(searchterm):
#Loads url
no_result = True
while no_result:
#Find results, grab them
searchbox = driver.find_element_by_name("searchbox")
searchbox.clear()
try: #assumes that an exception is thrown if there is no results
searchbox.send_keys(searchterm)
print "Searching for %s ..." % searchterm
except:
#Refresh page and do the above again for the current searchterm
driver.refresh()
else: # executed if no exceptions were thrown
no_results = False
# .. some post-processing code here
return pd.Series([col1, col2])
(There is also a finally block that is executed no matter what, which is useful for cleanup tasks that don't depend on the success or failure of the preceding code)
Also, note that empty except catches any exceptions and is almost never a good idea. I'm not familiar with how selenium handles errors, but when catching exceptions, you should specify which exception you are expecting to handle. This how, if an unexpected exception occurs, your code will abort and you'll know that something bad happened.
That is why you should also try keeping as few lines a possible within the try block.

Categories