I am new to python and webscraping, I want to retrieve some information on a website but some of the information is displayed on a popup window. The problem I'm having now is switching from the main page to the popup window to get the html and then switch back to the main page.
In order words after getting some information from page A, i need to switch to this link
https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage=0VET6
to obtain the company name and then switch back to page A
this a section of the code that I have
for batch in containers:
try:
print(" ")
awardid = batch.find('a', text=True).text
ordernumber = batch.find_all('span')[1].text
cagecode = batch.find_all('span')[6].text
price = batch.find_all('span')[7].text
date = batch.find_all('span')[8].text
NSN = batch.find_all('span')[10].text
Nomenclature= batch.find_all('span')[11].text
purchasereq = batch.find_all('span')[12].text
if cagecode :
cagelink = 'https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage='+cagecode
#switch to pop-up window
try:
driver.execute_script("window.open("+cagelink+",'new window')")
except Exception as e:
print("error from driver ", str(e))
continue
try:
cagesoup = BeautifulSoup(driver.page_source,"lxml")
bodycontainer = cagesoup.find("tbody")
print('cage code body',bodycontainer)
except Exception as e:
print("error from soup ", str(e))
continue
except Exception as e:
print(colorama.Fore.MAGENTA + "award error.."+ str(e) )
# print(container1)
continue
except Exception as e:
continue
the problem I have now is that i am getting an error that I am missing a ) at the end of the argument here
driver.execute_script("window.open("+cagelink+",'new window')")
and when I try to remove the cagelink and use
https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage=0VET6
it display none. what am I doing wrongly and how can I switch between the windows to obtain the company name on the popup window?
One thing that's for sure incorrect is the argument you are passing to the js method - it is not quoted, e.g. not treated as string. This line here:
driver.execute_script("window.open("+cagelink+",'new window')")
Should be:
driver.execute_script("window.open('"+cagelink+"','new window')")
Even more readable would be to use string formatting:
driver.execute_script("window.open('{}','new window')".format(cagelink))
Related
The id of a button I want to click is changing dynamically. For example the id will be id = Button7 then the next time I run my code it is id = Button19. I noticed it loops through a set of ids but in no particular order.
I would like to iterate through all possible solutions until one of them works. Would like to do something similar to this logic.
try:
source8 = driver.find_element_by_xpath('//*[#id="xl_dijit-bootstrap_Button_99"]')
ActionChains(driver).click(source8).perform()
except Exception as e:
source8 = driver.find_element_by_xpath('//*[#id="xl_dijit-bootstrap_Button_7"]')
ActionChains(driver).click(source8).perform()
except Exception as e:
source8 = driver.find_element_by_xpath('//*[#id="xl_dijit-bootstrap_Button_27"]')
ActionChains(driver).click(source8).perform()
Just iterate over the xpaths:
for xpath in ['//xpath1', '//hpath2', '//xpath3']:
try:
# do something with xpath
break
except:
print(xpath + " failed!")
You can use contains XPath axes to detect the ID first and then do necessary actions.
elementId = driver.find_element_by_xpath("//input[contains(#id, 'xl_dijit-bootstrap_Button_')]")
elementId.click()
Else if you wish to do some other action with a specific ID then retrieve it's attribute and with this attribute('idTextAttribute') you can implement a switch case.
idTextAttribute = elementId.get_attribute("id")
def SwitchToId(idTextAttribute):
switcher = {
"xl_dijit-bootstrap_Button_99": Do Something, like click Or sendKeys,
"xl_dijit-bootstrap_Button_7": Do Something, like click Or sendKeys,
"xl_dijit-bootstrap_Button_27": Do Something, like click Or sendKeys,
}
return switcher.get(idTextAttribute, "ID not Found")
Note: Python doesn't have a switch case like Java so you can try switcher or if-elif block.
I have a for loop that itterates over a list of URLS. The urls are then loaded by the chrome driver. Some urls will load a page that is in a 'bad' format and it will fail the first xpath test. If it does I want it to go back to the next element in the loop. My cleanup code works but I can't seem to get it to go to next element in the for loop. I have an except that closes my websbrowser but nothing I tried would all me to then loop back to 'for row in mysql_cats'
for row in mysql_cats :
print ('Here is the url -', row[1])
cat_url=(row[1])
driver = webdriver.Chrome()
driver.get(cat_url); #Download the URL passed from mysql
try:
CategoryName= driver.find_element_by_xpath('//h1[#class="categoryL3"]|//h1[#class="categoryL4"]').text #finds either L3 or L4 catagory
except:
driver.close()
#this does close the webriver okay if it can't find the xpath, but not I cant get code here to get it to go to the next row in mysql_cats
I hope that you're closing the driver at the end of this code also if no exceptions occurs.
If you want to start from the beginning of the loop when an exception is raised, you may add continue, as suggested in other answers:
try:
CategoryName=driver.find_element_by_xpath('//h1[#class="categoryL3"]|//h1[#class="categoryL4"]').text #finds either L3 or L4 catagory
except NoSuchElementException:
driver.close()
continue # jumps at the beginning of the for loop
since I do not know your code, the following tip may be useless, but a common way to handle this cases is a try/except/finally clause:
for row in mysql_cats :
print ('Here is the url -', row[1])
cat_url=(row[1])
driver = webdriver.Chrome()
driver.get(cat_url); #Download the URL passed from mysql
try:
# my code, with dangerous stuff
except NoSuchElementException:
# handling of 'NoSuchElementException'. no need to 'continue'
except SomeOtherUglyException:
# handling of 'SomeOtherUglyException'
finally: # Code that is ALWAYS executed, with or without exceptions
driver.close()
I'm also assuming that you're creating new drivers each time for a reason. If it is not voluntary, you may use something like this:
driver = webdriver.Chrome()
for row in mysql_cats :
print ('Here is the url -', row[1])
cat_url=(row[1])
driver.get(cat_url); #Download the URL passed from mysql
try:
# my code, with dangerous stuff
except NoSuchElementException:
# handling of 'NoSuchElementException'. no need to 'continue'
except SomeOtherUglyException:
# handling of 'SomeOtherUglyException'
driver.close()
In this way, you have only one driver that manages all the pages you're trying to open in the for loop
have a look somewhere about how the try/except/finally is really useful when handling connections and drivers.
As a foot note, I'd like you to notice how in the code I always specify which exception I am expecting: catching all the exception can be dangerous. BTW, probably no one will die if you simply use except:
I'm trying out Python Client for Google Maps Services to pull a list of places using Places API.
Here is the GitHub page: https://github.com/googlemaps/google-maps-services-python
Here is the documentation page: https://googlemaps.github.io/google-maps-services-python/docs/2.4.4/#module-googlemaps.exceptions
In the .places def, I need to enter page_token in string format in order to get next 10 listings. When I run the codes below, it kept showing me INVALID_REQUEST
Here is my code:
places_result = gmaps.places(query="hotels in Singapore", page_token='')
for result in places_result['results']:
print(result['name'])
try :
places_result = gmaps.places(query="hotels in Singapore", page_token=places_result['next_page_token'])
except ApiError as e:
print(e)
else:
for result in places_result['results']:
print(result['name'])
Alright, after hours of trial and error. I noticed I need to add a time.sleep(2) to make it work. I'm not sure why but it works.
It failed with time.sleep(1), time.sleep(2) and above will solve the problem.
Hopefully someone can shed some light to the reason behind.
Here is my code that works to retrieve Places and go to the next page until the end. Remember to replace
1. your key at 'YOUR KEY HERE' and
2. the string you want to search at 'SEARCH SOMETHING'.
import googlemaps
import time
gmaps = googlemaps.Client(key='YOUR KEY HERE')
def printHotels(searchString, next=''):
try:
places_result = gmaps.places(query=searchString, page_token=next)
except ApiError as e:
print(e)
else:
for result in places_result['results']:
print(result['name'])
time.sleep(2)
try:
places_result['next_page_token']
except KeyError as e:
print('Complete')
else:
printHotels(searchString, next=places_result['next_page_token'])
if __name__ == '__main__':
printHotels('SEARCH SOMETHING')
I am trying to scrape webpages using python and selenium. I have a url which takes a single parameter and a list of valid parameters. I navigate to that url with a single parameter at a time and click on a link, a pop up window opens with a page.
The pop window automatically opens a print dialogue on page load.
Also the url bar is disabled for that popup.
My code:
def packAmazonOrders(self, order_ids):
order_window_handle = self.driver.current_window_handle
for each in order_ids:
self.driver.find_element_by_id('sc-search-field').send_keys(Keys.CONTROL, "a")
self.driver.find_element_by_id('sc-search-field').send_keys(Keys.DELETE)
self.driver.find_element_by_id('sc-search-field').send_keys(each)
self.driver.find_element_by_class_name('sc-search-button').click()
src = self.driver.page_source.encode('utf-8')
if 'Unshipped' in src and 'Easy Ship - Schedule pickup' in src:
is_valid = True
else:
is_valid = False
if is_valid:
print 'Packing Slip Start - %s' %each
self.driver.find_element_by_link_text('Print order packing slip').click()
handles = self.driver.window_handles
print handles
try:
handles.remove(order_window_handle)
except:
pass
self.driver.switch_to_window(handles.pop())
print handles
packing_slip_page = ''
packing_slip_page = self.driver.page_source.encode('utf-8')
if each in packing_slip_page:
print 'Packing Slip Window'
else:
print 'not found'
self.driver.close()
self.driver.switch_to_window(order_window_handle)
Now I have two questions:
How can I download that pop up page as pdf?
For first parameter every thing works fine. But for another parameters in the list the packing_slip_page does not update (which i think because of the disabled url bar. But not sure though.) I tried the print the handle (print handles) for each parametre but it always print the same value. So how to access the correct page source for other parameters?
I have built a web scraper. The program enters searchterm into a searchbox and grabs the results. Pandas goes through a spreadsheet line-by-line in a column to retrieve each searchterm.
Sometimes the page doesn't load properly, prompting a refresh.
I need a way for it to repeat the function and try the same searchterm if it fails. Right now, if I return, it would go on to the next line in the spreadsheet.
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
df = pd.read_csv(searchterms.csv, delimiter=",")
def scrape(searchterm):
#Loads url
searchbox = driver.find_element_by_name("searchbox")
searchbox.clear()
searchbox.send_keys(searchterm)
print "Searching for %s ..." % searchterm
no_result = True
while no_result is True:
try:
#Find results, grab them
no_result = False
except:
#Refresh page and do the above again for the current searchterm - How?
driver.refresh()
return pd.Series([col1, col2])
df[["Column 1", "Column 2"]] = df["searchterm"].apply(scrape)
#Executes crawl for each line in csv
The try except construct comes with else clause. The else block is executed if everything goes OK. :
def scrape(searchterm):
#Loads url
no_result = True
while no_result:
#Find results, grab them
searchbox = driver.find_element_by_name("searchbox")
searchbox.clear()
try: #assumes that an exception is thrown if there is no results
searchbox.send_keys(searchterm)
print "Searching for %s ..." % searchterm
except:
#Refresh page and do the above again for the current searchterm
driver.refresh()
else: # executed if no exceptions were thrown
no_results = False
# .. some post-processing code here
return pd.Series([col1, col2])
(There is also a finally block that is executed no matter what, which is useful for cleanup tasks that don't depend on the success or failure of the preceding code)
Also, note that empty except catches any exceptions and is almost never a good idea. I'm not familiar with how selenium handles errors, but when catching exceptions, you should specify which exception you are expecting to handle. This how, if an unexpected exception occurs, your code will abort and you'll know that something bad happened.
That is why you should also try keeping as few lines a possible within the try block.