I am trying to scrape webpages using python and selenium. I have a url which takes a single parameter and a list of valid parameters. I navigate to that url with a single parameter at a time and click on a link, a pop up window opens with a page.
The pop window automatically opens a print dialogue on page load.
Also the url bar is disabled for that popup.
My code:
def packAmazonOrders(self, order_ids):
order_window_handle = self.driver.current_window_handle
for each in order_ids:
self.driver.find_element_by_id('sc-search-field').send_keys(Keys.CONTROL, "a")
self.driver.find_element_by_id('sc-search-field').send_keys(Keys.DELETE)
self.driver.find_element_by_id('sc-search-field').send_keys(each)
self.driver.find_element_by_class_name('sc-search-button').click()
src = self.driver.page_source.encode('utf-8')
if 'Unshipped' in src and 'Easy Ship - Schedule pickup' in src:
is_valid = True
else:
is_valid = False
if is_valid:
print 'Packing Slip Start - %s' %each
self.driver.find_element_by_link_text('Print order packing slip').click()
handles = self.driver.window_handles
print handles
try:
handles.remove(order_window_handle)
except:
pass
self.driver.switch_to_window(handles.pop())
print handles
packing_slip_page = ''
packing_slip_page = self.driver.page_source.encode('utf-8')
if each in packing_slip_page:
print 'Packing Slip Window'
else:
print 'not found'
self.driver.close()
self.driver.switch_to_window(order_window_handle)
Now I have two questions:
How can I download that pop up page as pdf?
For first parameter every thing works fine. But for another parameters in the list the packing_slip_page does not update (which i think because of the disabled url bar. But not sure though.) I tried the print the handle (print handles) for each parametre but it always print the same value. So how to access the correct page source for other parameters?
Related
I'm having a hard time coming up with a solution. I'm writing code in Python using the Selenium library to pull reports from a site we utilize. In order to access the reports, it has to click on a link to expand so the code can find the report names. The issue I'm running into is the code is running fine and clicking on the link, but the link isn't expanding to reveal the reports. Debugging is saying that the link is found and clicked with a post response of 200, but nothing is happening. Due to the nature of the response code saying everything is fine, I'm at a loss of why the folder isn't expanding. Due to the nature of the site, I cannot share too much but have attached my code that I can share along with screenshots.
wait = WebDriverWait(driver_chrom, 30)
def Try_Wait(args, thing_string):
try:
wait.until(EC.presence_of_element_located(args))
logger_write.debug("found element "+thing_string)
#print("found args for ", thing_string)
return
except:
Exit_LogOut()
driver_chrom.quit()
sys.exit()
def Button_Click(args, thing_string):
try:
driver_chrom.find_element(*args).click()
logger_write.debug("Found button click"+thing_string)
#print('clicking button for ', thing_string)
return
except:
#print('did not click button ', thing_string)
logger_write.error("Did not find button "+thing_string)
return
thing_string = 'Opening folder for reports'
Try_Wait((By.XPATH,'//div[#id="folder1"]/table/tbody/tr//td[a[starts-with(#href, "javascript:clickOnNode(") and contains(#href, "1") and contains(text(),"Standard")]]'), thing_string)
Button_Click((By.XPATH,'//div[#id="folder1"]/table/tbody/tr//td[a[starts-with(#href, "javascript:clickOnNode(") and contains(#href, "1") and contains(text(),"Standard")]]'), thing_string)
This is what it looks like after code above runs
This is what it should look like so that reports are loaded into the html
Here is the inspect screenshot:
Try following xpath should work for you.
//a[text()="Standard"]
Instead this
//div[#id="folder1"]/table/tbody/tr//td[a[starts-with(#href, "javascript:clickOnNode(") and contains(#href, "1") and contains(text(),"Standard")]]
Or use Link_Text since it is anchor rag
Try_Wait((By.LINK_TEXT ,"Standard"), thing_string)
Button_Click((By.LINK_TEXT ,"Standard"), thing_string)
I'm developing a python app that sets form componentes of a certain webpage (developed using vue3.js)
I'm able to set a datepicker's value, but, after that, next operation clears the dapicker away.
I must be doing something really fool, but I'm out of ideas.
Here's my code:
import sys
from selenium import webdriver
#-----------------------------------------------------------------------------------------------------------------------
driver_headfull = None
try:
driver_headfull = webdriver.Firefox()
firefox_options = webdriver.FirefoxOptions()
except Exception as e:
print('ERROR WHEN CREATING webdriver.Firefox()')
print("Please, verify you have installed: firefox and that geckodriver.exe's is in %PATH%")
print(e)
sys.exit(5)
#navigate to url
driver_headfull.get('http://a_certain_ip')
#set a datepicker
element_to_interact_with_headfull = driver_headfull.find_element_by_id('datesPolicyEffectiveDate')
driver_headfull.execute_script("arguments[0].value = '2020-07-01';", element_to_interact_with_headfull)
#set a <div> that behaves like a <select>.
element_to_interact_with_headfull = driver_headfull.find_element_by_id('industryDescription')
driver_headfull.execute_script("arguments[0].click();", element_to_interact_with_headfull)
element_pseudo_select_option_headfull = driver_headfull.find_element_by_id('descriptionIndustryoption0')
driver_headfull.execute_script("arguments[0].click();", element_pseudo_select_option_headfull)
# this very last instruction resets value of html_id=datesPolicyEffectiveDate (datepicker)
while(True):
pass
Any ideas will be so welcome!
Well, this was a pain. I'll post it in case it's of any use for someone.
It seems the component was reloaded, and I was setting the son of the component by means of
arguments[0].value = '2020-07-01';
so the parent wouldn't see the change, and would automatically reload the child with a default (empty) value.
Adding the following snippet solved my trouble:
driver_headfull.execute_script("arguments[0].value = '2021-07-01';", element_to_interact_with_headfull)
driver_headfull.execute_script("arguments[0].dispatchEvent(new Event('input', { bubbles: true }));", element_to_interact_with_headfull)
I am new to python and webscraping, I want to retrieve some information on a website but some of the information is displayed on a popup window. The problem I'm having now is switching from the main page to the popup window to get the html and then switch back to the main page.
In order words after getting some information from page A, i need to switch to this link
https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage=0VET6
to obtain the company name and then switch back to page A
this a section of the code that I have
for batch in containers:
try:
print(" ")
awardid = batch.find('a', text=True).text
ordernumber = batch.find_all('span')[1].text
cagecode = batch.find_all('span')[6].text
price = batch.find_all('span')[7].text
date = batch.find_all('span')[8].text
NSN = batch.find_all('span')[10].text
Nomenclature= batch.find_all('span')[11].text
purchasereq = batch.find_all('span')[12].text
if cagecode :
cagelink = 'https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage='+cagecode
#switch to pop-up window
try:
driver.execute_script("window.open("+cagelink+",'new window')")
except Exception as e:
print("error from driver ", str(e))
continue
try:
cagesoup = BeautifulSoup(driver.page_source,"lxml")
bodycontainer = cagesoup.find("tbody")
print('cage code body',bodycontainer)
except Exception as e:
print("error from soup ", str(e))
continue
except Exception as e:
print(colorama.Fore.MAGENTA + "award error.."+ str(e) )
# print(container1)
continue
except Exception as e:
continue
the problem I have now is that i am getting an error that I am missing a ) at the end of the argument here
driver.execute_script("window.open("+cagelink+",'new window')")
and when I try to remove the cagelink and use
https://www.dibbs.bsm.dla.mil/Refs/cage.aspx?Cage=0VET6
it display none. what am I doing wrongly and how can I switch between the windows to obtain the company name on the popup window?
One thing that's for sure incorrect is the argument you are passing to the js method - it is not quoted, e.g. not treated as string. This line here:
driver.execute_script("window.open("+cagelink+",'new window')")
Should be:
driver.execute_script("window.open('"+cagelink+"','new window')")
Even more readable would be to use string formatting:
driver.execute_script("window.open('{}','new window')".format(cagelink))
I have built a web scraper. The program enters searchterm into a searchbox and grabs the results. Pandas goes through a spreadsheet line-by-line in a column to retrieve each searchterm.
Sometimes the page doesn't load properly, prompting a refresh.
I need a way for it to repeat the function and try the same searchterm if it fails. Right now, if I return, it would go on to the next line in the spreadsheet.
import pandas as pd
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
df = pd.read_csv(searchterms.csv, delimiter=",")
def scrape(searchterm):
#Loads url
searchbox = driver.find_element_by_name("searchbox")
searchbox.clear()
searchbox.send_keys(searchterm)
print "Searching for %s ..." % searchterm
no_result = True
while no_result is True:
try:
#Find results, grab them
no_result = False
except:
#Refresh page and do the above again for the current searchterm - How?
driver.refresh()
return pd.Series([col1, col2])
df[["Column 1", "Column 2"]] = df["searchterm"].apply(scrape)
#Executes crawl for each line in csv
The try except construct comes with else clause. The else block is executed if everything goes OK. :
def scrape(searchterm):
#Loads url
no_result = True
while no_result:
#Find results, grab them
searchbox = driver.find_element_by_name("searchbox")
searchbox.clear()
try: #assumes that an exception is thrown if there is no results
searchbox.send_keys(searchterm)
print "Searching for %s ..." % searchterm
except:
#Refresh page and do the above again for the current searchterm
driver.refresh()
else: # executed if no exceptions were thrown
no_results = False
# .. some post-processing code here
return pd.Series([col1, col2])
(There is also a finally block that is executed no matter what, which is useful for cleanup tasks that don't depend on the success or failure of the preceding code)
Also, note that empty except catches any exceptions and is almost never a good idea. I'm not familiar with how selenium handles errors, but when catching exceptions, you should specify which exception you are expecting to handle. This how, if an unexpected exception occurs, your code will abort and you'll know that something bad happened.
That is why you should also try keeping as few lines a possible within the try block.
I am REALLY confused. I'm basically trying to fill out a form on a website with mechanize for python. I got everything to work except the dropdown menu. What do I use to select it and what do I put for the value? I don't know if I'm supposed to put the name of the selection or the numerical value of it. Help would be greatly appreciated, thanks.
Code snippet:
try:
br.open("http://www.website.com/")
try:
br.select_form(nr=0)
br['number'] = "mynumber"
br['from'] = "herpderp#gmail.com"
br['subject'] = "Yellow"
br['carrier'] = "203"
br['message'] = "Hello, World!"
response = br.submit()
except:
pass
except:
print "Couldn't connect!"
quit
I'm having trouble with the carrier, which is a dropdown menu.
According to the mechanize documentation examples, you need to access attributes of the form object, not the browser object. Also, for the select control, you need to set the value to a list:
br.open("http://www.website.com/")
br.select_form(nr=0)
form = br.form
form['number'] = "mynumber"
form['from'] = "herpderp#gmail.com"
form['subject'] = "Yellow"
form['carrier'] = ["203"]
form['message'] = "Hello, World!"
response = br.submit()
Sorry for reviving a long-dead post, but this was the still best answer I could find on google and it doesn't work. After more time than I care to admit, I figured it out. infrared is right about the form object, but not about the rest, and his code doesn't work. Here's some code that works for me (though I'm sure a more elegant solution exists):
# Select the form
br.open("http://www.website.com/")
br.select_form(nr=0) # you might need to change the 0 depending on the website
# find the carrier drop down menu
control = br.form.find_control("carrier")
# loop through items to find the match
for item in control.items:
if item.name == "203":
# it matches, so select it
item.selected = True
# now fill out the rest of the form and submit
br.form['number'] = "mynumber"
br.form['from'] = "herpderp#gmail.com"
br.form['subject'] = "Yellow"
br.form['message'] = "Hello, World!"
response = br.submit()
# exit the loop
break