Python to select drop down menus on web pages without using Selenium - python

I need to use Python to open a web page, select options from drop-down lists and do some clicks all without making use of Selenium. Is there any in-built library in Python which can help me do this?

Have you tried Splinter
This is a simple example of browser interaction using it.
from splinter import Browser
with Browser() as browser:
# Visit URL
url = "http://www.google.com"
browser.visit(url)
browser.fill('q', 'splinter - python acceptance testing for web applications')
# Find and click the 'search' button
button = browser.find_by_name('btnG')
# Interact with elements
button.click()
if browser.is_text_present('splinter.readthedocs.io'):
print("Yes, the official website was found!")
else:
print("No, it wasn't found... We need to improve our SEO techniques")

Related

Automate the DHL CSV dump download using Python

I have a requirement of automating the mail sending process with the data from DHL. Currently what we are doing is:
We have a DHL account, someone has to manually login to the account , download the CSV dump which contains the order tracking details then upload it to the server, port the data from those and process it.
So I thought of automating the whole process so that it requires minimal manual intervention.
1) Is there anyway we can automate the download process from DHL?
Note: I'm using Python
I'd start by looking for something more convenient to access with code...
searching google for "dhl order tracking api" gives:
https://developer.dhl/api-catalog
as its first result, which looks useful and exposes quite a bit of functionality.
you then need to figure out how to make a "RESTful" request, which has answers here like Making a request to a RESTful API using python, and there are lots of tutorials on the internet if you search for things like "python tutorial rest client" which points to articles like this
You can use Selenium for Python. Selenium is a package that automates a browser session. you can simulate mouse clicks and other actions using Selenium.
To Install:
pip install selenium
You will also have to install the webdriver for the browser you prefer to use.
https://www.seleniumhq.org/projects/webdriver/
Make sure that the browser version that you are using is up to date.
Selenium Documentation: https://selenium-python.readthedocs.io/
Since you are dealing with passwords and sensitive data, I am not including the code.
Login and Download
You can automate download process using selenium. Below is the sample code to automate any login process and download items from a webpage. As the requirements are not specific I'm taking general use-case and explaining how to automate the login and download process using python.
# Libraries - selenium for scraping and time for delay
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import time
chromeOptions = webdriver.ChromeOptions()
prefs = {"download.default_directory" : "Path to the directory to store downloaded files"}
chromeOptions.add_experimental_option("prefs",prefs)
chromedriver = r"Path to the directory where chrome driver is stored"
browser = webdriver.Chrome(executable_path=chromedriver, chrome_options=chromeOptions)
# To maximize the browser window
browser.maximize_window()
# web link for login page
browser.get('login page link')
time.sleep(3) # wait for the page to load
# Enter your user name and password here.
username = "YOUR USER NAME"
password = "YOUR PASSWORD"
# username send
# you can find xpath to the element in developer option of the chrome
# referance answer "[https://stackoverflow.com/questions/3030487/is-there-a-way-to-get-the-xpath-in-google-chrome][1]"
a = browser.find_element_by_xpath("xpath to username text box") # find the xpath for username text box and replace inside the quotes
a.send_keys(username) # pass your username
# password send
b = browser.find_element_by_xpath("xpath to password text box") # find the xpath for password text box and replace inside the quotes
b.send_keys(password) # pass your password
# submit button clicked
browser.find_element_by_xpath("xpath to submit button").click() # find the xpath for submit or login button and replace inside the quotes
time.sleep(2) # wait for login to complete
print('Login Successful') # if there is no error you will see "Login Successful" message
# Navigate to the menu or any section using it's xpath and you can click using click() function
browser.find_element_by_xpath("x-path of the section/menu").click()
time.sleep(1)
# download file
browser.find_element_by_xpath("xpath of the download file button").click()
time.sleep(1)
# close browser window after successful completion of the process.
browser.close()
This way you can automate the login and the downloading process.
Mail automation
For Mail automation use smtplib module, explore this documentation "https://docs.python.org/3/library/smtplib.html"
Process automation (Scheduling)
To automate the whole process on an everyday basis create a cron job for both tasks. Please refer python-crontab module. Documentation: https://pypi.org/project/python-crontab/enter link description here
By using selenium, smtplib, and python-crontab you can automate your complete process with minimal or no manual intervention.

Selenium: difference between chrome and PhantomJS?-python

I want to do web scraping for Bing's search results. Basically, I am using selenium, the idea is to using selenium to click 'Next' automatedly and scrap the URLs of search results of each page. I made it run with chrome browser on my Ubuntu:
from selenium import web driver
import os
class bingURL(object):
def __init__(self):
self.driver=webdriver.Chrome(os.path.expanduser('./chromedriver'))
def get_urls(self,url):
driver=self.driver
driver.get(url)
elems = driver.find_elements_by_xpath("//a[#href]")
href=[]
for elem in elems:
link=elem.get_attribute("href")
try:
if 'bing.com' not in link and 'http' in link and 'microsoft.com' not in link and 'smashboards.com' not in link:
href.append(link)
except:
pass
return list(set(href))
def search_urls(self,keyword,pagenum):
driver=self.driver
searchurl=self.lookup(keyword) ### url of first page of google search
driver.get(searchurl)
results=self.get_urls(searchurl)
for i in range(pagenum):
driver.find_elements_by_class_name("sb_pagN")[0].click() # click 'Next' of bing search result
time.sleep(5) # wait to load page
current_url=driver.current_url
#print(current_url)
#print(self.get_urls(current_url))
results[0:0]=self.get_urls(current_url)
driver.quit()
return results
def lookup(self,query):
return "https://www.bing.com/search?q="+query
if __name__ == "__main__":
g=bingURL()
result=g.search_urls('Stackoverflow is good',10)
it works perfectly, when I run the code, it launches a Chrome browser, and I can saw it go to the next page automatically, and get URLs for 10 pages of searching results.
However, my goal is to run these codes on AWS successfully. The original codes failed with error 'Chrome failed to start'. After google, it seems I need to use a headless browser like PhantomJS on AWS. Thus I installed PhantomJS, and change the def __init__(self): to:
def __init__(self):
self.driver=webdriver.PhantomJS()
However, it cannot click 'next' anymore, and cannot scrap URLs using the old code. The error message is:
File ".../SEARCH_BING_MODULE.py", line 70, in search_urls
driver.find_elements_by_class_name("sb_pagN")[0].click()
IndexError: list index out of range
It looks like change the browser completely change the rules. How should I modify the more original code to make it work again? or how to scrap Bing search results' URLs using selenium+PhantomJS?
Thanks for your help!
Yes, You can perform all operations as per of your all 3 point using headless browser. Don't use HTMLUnit as it have many configuration issue.
PhamtomJS was another approach for headless browser but PhantomJs is having bug these days because of poorly maintenance of it.
You can use chromedriver itself for headless jobs.
You just need to pass one option in chromedriver as below:-
chromeOptions.addArguments("--headless");
Full code will appear like this :-
System.setProperty("webdriver.chrome.driver","D:\\Workspace\\JmeterWebdriverProject\\src\\lib\\chromedriver.exe");
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.addArguments("--headless");
chromeOptions.addArguments("--start-maximized");
WebDriver driver = new ChromeDriver(chromeOptions);
driver.get("https://www.google.co.in/");
Hope it will help you :)

How to click an invisible element with selenium?

i used this code to check splinter's clicking button option:
from splinter import Browser
with Browser() as browser:
# Visit URL
url = "http://www.google.com"
browser.visit(url)
browser.fill('q', 'splinter - python acceptance testing for web applications')
# Find and click the 'search' button
button = browser.find_by_name('btnG')
# Interact with elements
button.click()
if browser.is_text_present('splinter.readthedocs.org'):
print("Yes, the official website was found!")
else:
print("No, it wasn't found... We need to improve our SEO techniques")
and i got exception:
Element is not currently visible ans so may not be interacted.
waiting for the browser is not the solution (becuase i made sleep method for long time and still doesnt work).
this is sample code shown in https://splinter.readthedocs.org/en/latest/#sample-code , but is doesnt work for me
If you want to wait for an element to become invisible, you can use wait function:
wait = WebDriverWait(self.driver, 30)
wait.until(EC.invisibility_of_element_located((By.XX, "something")))

Python transfer mechanize browser session

I'm having a little difficulty trying to navigate a website past the login screen. I've done this using mechanize. However once I navigate past the login page I want to interact with the page, click attributes, etc. which mechanize cannot do. I also want to do this all "behind the curtain" so the browser window is invisible (trying not to use selenium).
Here is the code I use to login. What can I do past this to start interacting with the page
import mechanize
br = mechanize.Browser()
#get computer browser
br.set_handle_robots(False)
#what robots?
br.open("www.website.com")
#open website
br.select_form(nr=0)
#get the main form
br.set_all_readonly(False)
for control in br.form.controls:
print control
user_control = br.form.controls[0]
user_control._value = 'username'
user_password = br.form.controls[1]
user_password._value = 'password'
br.submit()
One option would be to "transfer" cookies from mechanize to selenium and use selenium with a headless browser like PhantomJS or with a virtual display. Or, just switch to selenium+PhantomJS completely (including the authentication step).
See also:
Python: how to dump cookies of a mechanize.Browser instance?
How to save and load cookies using python selenium webdriver

Selenium download file

I'm trying to make a Selenium program to automatically download and upload some files.
Note that I am not doing this for testing but for trying to automate some tasks.
So here's my set_preference for the Firefox profile
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/home/jj/web')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/json, text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream')
profile.set_preference("browser.helperApps.alwaysAsk.force", False);
Yet, I still see the dialog for download.
The Selenium firefox webdriver runs the firefox browser GUI. When a download is invoked firefox will present a popup asking if you want to view the file or save the file. As far as I can tell this is a property of the browser and there is no way to disable this using the firefox preferences or by setting the firefox profile variables. The only way I could avoid the firefox download popup was to use Mechanize along with Selenium. I used Selenium to obtain the download link and then passed this link to Mechanize to perform the actual download. Mechanize is not associated with a GUI implementation and therefore does not present user interface popups.
This clip is in Python and is part of a class that will perform the download action.
# These imports are required
from selenium import webdriver
import mechanize
import time
# Start the firefox browser using Selenium
self.driver = webdriver.Firefox()
# Load the download page using its URL.
self.driver.get(self.dnldPageWithKey)
time.sleep(3)
# Find the download link and click it
elem = self.driver.find_element_by_id("regular")
dnldlink = elem.get_attribute("href")
logfile.write("Download Link is: " + dnldlink)
pos = dnldlink.rfind("/")
dnldFilename = dnldlink[pos+1:]
dnldFilename = "/home/<mydir>/Downloads/" + dnldFilename
logfile.write("Download filename is: " + dnldFilename)
#### Now Using Mechanize ####
# Above, Selenium retrieved the download link. Because of Selenium's
# firefox download issue: it presents a download dialog that requires
# user input, Mechanize will be used to perform the download.
# Setup the mechanize browser. The browser does not get displayed.
# It is managed behind the scenes.
br = mechanize.Browser()
# Open the login page, the download requires a login
resp = br.open(webpage.loginPage)
# Select the form to use on this page. There is only one, it is the
# login form.
br.select_form(nr=0)
# Fill in the login form fields and submit the form.
br.form['login_username'] = theUsername
br.form['login_password'] = thePassword
br.submit()
# The page returned after the submit is a transition page with a link
# to the welcome page. In a user interactive session the browser would
# automtically switch us to the welcome page.
# The first link on the transition page will take us to the welcome page.
# This step may not be necessary, but it puts us where we should be after
# logging in.
br.follow_link(nr=0)
# Now download the file
br.retrieve(dnldlink, dnldFilename)
# After the download, close the Mechanize browser; we are done.
br.close()
This does work for me. I hope it helps. If there is an easier solution I would love to know it.

Categories