Related
I'm facing a problem that when I click on a button, then Javascript handle the action then it redirect to a new page with new window (It's similar to when you click on <a> with target _Blank). In the scrapy/splash I don't know how to get content from the new page (I means I don't know how to control that new page).
Anyone can help!
script = """
function main(splash)
assert(splash:go(splash.args.url))
splash:wait(0.5)
local element = splash:select('div.result-content-columns div.result-title')
local bounds = element:bounds()
element:mouse_click{x=bounds.width/2, y=bounds.height/2}
return splash:html()
end
"""
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url, self.parse, endpoint='execute', args={'lua_source': self.script})
Issue:
The problem that you can't scrape html which is out of your selection scope. When a new link is clicked, if there is an iframe involved, it rarely brings it into scope for scraping.
Solution:
Choose a method of selecting the new iframe, and then proceed to parse the new html.
The Scrapy-Splash method
(This is an adaptation of Mikhail Korobov's solution from this answer)
If you are able to get the src link of the new page that pops up, it may be the most reliable, however, you can also try selecting iframe this way:
# ...
yield SplashRequest(url, self.parse_result, endpoint='render.json',
args={'html': 1, 'iframes': 1})
def parse_result(self, response):
iframe_html = response.data['childFrames'][0]['html']
sel = parsel.Selector(iframe_html)
item = {
'my_field': sel.xpath(...),
# ...
}
The Selenium method
(requires pip install selenium,bs4, and possibly a chrome driver download from here for your os: Selenium Chromedrivers) Supports Javascript parsing! Woohoo!
With the following code, this will switch scopes to the new frame:
# Goes at the top
from bs4 import BeautifulSoup
from selenium.webdriver.chrome.options import Options
import time
# Your path depends on where you downloaded/located your chromedriver.exe
CHROME_PATH = 'C:\Program Files (x86)\Google\Chrome\Application\chrome.exe'
CHROMEDRIVER_PATH = 'chromedriver.exe'
WINDOW_SIZE = "1920,1080"
chrome_options = Options()
chrome_options.add_argument("--log-level=3")
chrome_options.add_argument("--headless") # Speeds things up if you don't need gui
chrome_options.add_argument("--window-size=%s" % WINDOW_SIZE)
chrome_options.binary_location = CHROME_PATH
browser = webdriver.Chrome(executable_path=CHROMEDRIVER_PATH, chrome_options=chrome_options)
url = "example_js_site.com" # Your site goes here
browser.get(url)
time.sleep(3) # An unsophisticated way to wait for the new page to load.
browser.switch_to.frame(0)
soup = BeautifulSoup(browser.page_source.encode('utf-8').strip(), 'lxml')
# This will return any content found in tags called '<table>'
table = soup.find_all('table')
My favorite of the two options is Selenium, but try the first solution if you are more comfortable with it!
I'm building a web scraper to automate the process of downloading tweet data using selenium and the headless chrome browser.
I've written a function which logs into twitter, navigates to the analytics page and downloads the csv file, but is there any way to use the pandas.read_csv function to read csv from the source directly without downloading as an intermediary step? I'm pushing data to a SQL database and eventually want to schedule on AWS Lambda so would be good if I could eliminate the need for creating new files.
code as follows (twt is how i've called TwitterBrowser() in the if name == "main": line)
class TwitterBrowser:
def __init__(self):
global LOGIN, PASSWORD, browser
chrome_options = Options()
chrome_options.add_argument("--incognito")
chrome_driver = os.getcwd() +"\\chromedriver.exe"
browser = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver)
parser = ConfigParser()
parser.read("apikeys.ini")
LOGIN = parser.get('TWITTER', 'USERNAME')
PASSWORD = parser.get('TWITTER', 'PASSWORD')
def get_url(self, url, sec):
load_page = browser.get(url)
try:
WebDriverWait(browser, timeout=sec)
except TimeoutException:
print('TIMED OUT!')
return load_page
def login(self):
twt.get_url('https://twitter.com/login', 5)
browser.find_element_by_xpath('//*[#id="page-container"]/div/div[1]/form/fieldset/div[1]/input').send_keys(LOGIN)
browser.find_element_by_xpath('//*[#id="page-container"]/div/div[1]/form/fieldset/div[2]/input').send_keys(PASSWORD)
WebDriverWait(browser, 5)
browser.find_element_by_xpath('//*[#id="page-container"]/div/div[1]/form/div[2]/button').click()
def tweet_analytics(self):
twt.get_url('https://analytics.twitter.com/user/'+LOGIN+'/tweets', 5)
WebDriverWait(browser, 5)
browser.find_element_by_xpath('/html/body/div[2]/div/div[2]/div').click()
WebDriverWait(browser, 5)
browser.find_element_by_xpath('/html/body/div[5]/div[4]/ul/li[1]').click()
WebDriverWait(browser, 5)
browser.find_element_by_xpath('//*[#id="export"]/button/span[2]').click()
WebDriverWait(browser, 10)
Pandas can read csv directly from url as stated here. So I'd get the raw csv link and read it directly. I'm not sure tho if Twitter analytics has the raw csv hosted on their server (raw csv exemple) or they generate a download link, generating the csv on the fly, where you'd be stuck, which is probably the case as I don't see them hosting unnecessary csvs.
In case you have to download it, you can then read it from you
Using Firefox/Python/Selenium-- I am able to use click() on a file link on a webpage to download it, and the file downloads to my Downloads folder as expected.
However, when I add more lines to click() on more than 1 link, the script no longer runs as expected. Instead of the files being downloaded, they are all opening in separate browser windows, which all close after the script completes.
Is this by design or is there a way around it or a better way to download multiple files on a webpage?
This is the website in question: https://www.treasury.gov/about/organizational-structure/ig/Pages/igdeskbook.aspx
I am trying to download the links to the Introduction and all parts of Volumes 1-4.
I have a dictionary of the locators:
IgDeskbookPageMap = dict(IgDeskbookBannerXpath = "//div[contains(text(), 'The Inspector General Deskbook')]",
IgDeskbookIntroId = "anch_202",
IgDeskbookVol1Part1Id = "anch_203",
IgDeskbookVol1Part2Id = "anch_204",
IgDeskbookVol1Part3Id = "anch_205",
IgDeskbookVol1Part4Id = "anch_206",
IgDeskbookVol2Id = "anch_207",
IgDeskbookVol3Id = "anch_208",
IgDeskbookVol4Part1Id = "anch_209",
IgDeskbookVol4Part2Id = "anch_210",
IgDeskbookVol4Part3Id = "anch_211"
This is the method:
def click(self, waitTime, locatorMode, Locator):
self.wait_until_element_clickable(waitTime, locatorMode, Locator).click()
These are the click() calls (there are more than 3, but just truncating here for space:
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookIntroId']
)
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookVol1Part1Id']
)
self.click(10,
"id",
IgDeskbookPageMap['IgDeskbookVol1Part2Id']
)
I added the following code for launching Firefox and now the download behavior works as expected when clicking on each file:
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2)
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.helperApps.alwaysAsk.force', False)
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'application/pdf,application/x-pdf')
profile.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")
profile.set_preference("pdfjs.disabled", True)
self.driver = webdriver.Firefox(profile)
A way to download such multiple files if opened in different tabs could be to follow these algorithmic steps in your own coding language :
for( all such links) :
click() the pdf link
findElement the download element
click() the download link
close the tab
switch back to last tab //should ideally be completed with previous step
We developed a Chrome extension, and I want to test our extension with Selenium. I created a test, but the problem is that our extension opens a new tab when it's installed, and I think I get an exception from the other tab. Is it possible to switch to the active tab I'm testing? Or another option is to start with the extension disabled, then login to our website and only then enable the extension. Is it possible? Here is my code:
def login_to_webapp(self):
self.driver.get(url='http://example.com/logout')
self.driver.maximize_window()
self.assertEqual(first="Web Editor", second=self.driver.title)
action = webdriver.ActionChains(driver=self.driver)
action.move_to_element(to_element=self.driver.find_element_by_xpath(xpath="//div[#id='header_floater']/div[#class='header_menu']/button[#class='btn_header signature_menu'][text()='My signature']"))
action.perform()
self.driver.find_element_by_xpath(xpath="//ul[#id='signature_menu_downlist'][#class='menu_downlist']/li[text()='Log In']").click()
self.driver.find_element_by_xpath(xpath="//form[#id='atho-form']/div[#class='input']/input[#name='useremail']").send_keys("[email]")
self.driver.find_element_by_xpath(xpath="//form[#id='atho-form']/div[#class='input']/input[#name='password']").send_keys("[password]")
self.driver.find_element_by_xpath(xpath="//form[#id='atho-form']/button[#type='submit'][#class='atho-button signin_button'][text()='Sign in']").click()
The test fails with ElementNotVisibleException: Message: element not visible, because in the new tab (opened by the extension) "Log In" is not visible (I think the new tab is opened only after the command self.driver.get(url='http://example.com/logout')).
Update: I found out that the exception is not related to the extra tab, it's from our website. But I closed the extra tab with this code, according to #aberna's answer:
def close_last_tab(self):
if (len(self.driver.window_handles) == 2):
self.driver.switch_to.window(window_name=self.driver.window_handles[-1])
self.driver.close()
self.driver.switch_to.window(window_name=self.driver.window_handles[0])
After closing the extra tab, I can see my tab in the video.
This actually worked for me in 3.x:
driver.switch_to.window(driver.window_handles[1])
window handles are appended, so this selects the second tab in the list
to continue with first tab:
driver.switch_to.window(driver.window_handles[0])
Some possible approaches:
1 - Switch between the tabs using the send_keys (CONTROL + TAB)
self.driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)
2 - Switch between the tabs using the using ActionsChains (CONTROL+TAB)
actions = ActionChains(self.driver)
actions.key_down(Keys.CONTROL).key_down(Keys.TAB).key_up(Keys.TAB).key_up(Keys.CONTROL).perform()
3 - Another approach could make usage of the Selenium methods to check current window and move to another one:
You can use
driver.window_handles
to find a list of window handles and after try to switch using the following methods.
- driver.switch_to.active_element
- driver.switch_to.default_content
- driver.switch_to.window
For example, to switch to the last opened tab, you can do:
driver.switch_to.window(driver.window_handles[-1])
The accepted answer didn't work for me.
To open a new tab and have selenium switch to it, I used:
driver.execute_script('''window.open("https://some.site/", "_blank");''')
sleep(1) # you can also try without it, just playing safe
driver.switch_to.window(driver.window_handles[-1]) # last opened tab handle
# driver.switch_to_window(driver.window_handles[-1]) # for older versions
if you need to switch back to the main tab, use:
driver.switch_to.window(driver.window_handles[0])
Summary:
The window_handles contains a list of the handles of opened tabs, use it as argument in switch_to.window() to switch between tabs.
Pressing ctrl+t or choosing window_handles[0] assumes that you only have one tab open when you start.
If you have multiple tabs open then it could become unreliable.
This is what I do:
old_tabs=self.driver.window_handles
#Perform action that opens new window here
new_tabs=self.driver.window_handles
for tab in new_tabs:
if tab in old tabs:
pass
else:
new_tab=tab
driver.switch_to.window(new_tab)
This is something that would positively identify the new tab before switching to it and sets the active window to the desired new tab.
Just telling the browser to send ctrl+tab does not work because it doesn't tell the webdriver to actually switch to the new tab.
if you want to close only active tab and need to keep the browser window open, you can make use of switch_to.window method which has the input parameter as window handle-id. Following example shows how to achieve this automation:
from selenium import webdriver
import time
driver = webdriver.Firefox()
driver.get('https://www.google.com')
driver.execute_script("window.open('');")
time.sleep(5)
driver.switch_to.window(driver.window_handles[1])
driver.get("https://facebook.com")
time.sleep(5)
driver.close()
time.sleep(5)
driver.switch_to.window(driver.window_handles[0])
driver.get("https://www.yahoo.com")
time.sleep(5)
#driver.close()
The tip from the user "aberna" worked for me the following way:
First I got a list of the tabs:
tab_list = driver.window_handles
Then I selectet the tab:
driver.switch_to.window(test[1])
Going back to previous tab:
driver.switch_to.window(test[0])
TLDR: there is a workaround solution, with some limitations.
I am working with the already opened browser, as shown here. The problem is that every time I launch the script, selenium internally selects a random tab. The official documentation says:
Clicking a link which opens in a new window will focus the new window
or tab on screen, but WebDriver will not know which window the
Operating System considers active.
It sounds very strange to me. Because is not that the first task of selenium to handle and automate browser interaction? More of that, switching to any tab with driver.switch_to.window(...) actually will switch the active tab in gui. Seems that it is a bug. At the moment of writing the python-selenium version is 4.1.0.
Let's look which approaches could we use.
Using selenium window_handles[0] approach
The approach from the answer above is not reliable. It does not always work. For example when you switch between different tabs, chromium/vivaldi may start returning not a current tab.
print("Current driver tab:", driver.title) # <- the random tab title
driver.switch_to.window(chromium_driver.window_handles[0])
print("Current driver tab:", driver.title) # <-- the currently opened tab title. But not always reliable.
So skip this method.
Using remote debugging approach
Provides nothing additional to what is in selenium driver from previous approach.
Getting the list of tabs via the remote debugging protocol like
r = requests.get("http://127.0.0.1:9222/json")
j = r.json()
found_tab = False
for el in j:
if el["type"] == "page": # Do this check, because if that is background-page, it represents one of installed extensions
found_tab = el
break
if not found_tab:
print("Could not find tab", file=sys.stderr)
real_opened_tab_handle = "CDwindow-" + found_tab["id"]
driver.switch_to(real_opened_tab_handle)
actually returns the same as what is in driver.window_handles. So also skip this method.
Workaround solution for X11
from wmctrl import Window
all_x11_windows = Window.list()
chromium_windows = [ el for el in all_x11_windows if el.wm_class == 'chromium.Chromium' ]
if len(chromium_windows) != 1:
print("unexpected numbner of chromium windows")
exit(1)
real_active_tab_name = chromium_windows[0].wm_name.rstrip(" – Chromium")
chrome_options = Options()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
# https://stackoverflow.com/a/70088095/7869636 - Selenium connect to existing browser.
# Need to start chromium as: chromium --remote-debugging-port=9222
driver = webdriver.Chrome(service=Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()), options=chrome_options)
tabs = driver.window_handles
found_active_tab = False
for tab in tabs:
driver.switch_to.window(tab)
if driver.title != real_active_tab_name:
continue
else:
found_active_tab = True
break
if not found_active_tab:
print("Cannot switch to needed tab, something went wrong")
exit(1)
else:
print("Successfully switched to opened tab")
print("Working with tab called:", driver.title)
The idea is to get the window title from wmctrl, which will let you know the active tab name.
Workaround solution for Wayland
Previous solution has a limitation, wmctrl only works with x11 windows.
I currently found out how to get the title of a window at which you click.
print("Please click on the browser window")
opened_tab = subprocess.run("qdbus org.kde.KWin /KWin queryWindowInfo | grep caption", shell=True, capture_output=True).stdout.decode("utf-8")
opened_tab_title = opened_tab.rstrip(" - Vivaldi\n").lstrip("caption: ")
Then the script from the previous solution could be used.
The solution could be improved using kwin window list query on wayland. I would be glad if somebody helps to improve this. Unfortunately, I do not know currently how to get list of wayland windows.
Here is the full script.
Note: Remove the spaces in the two lines for tiny URL below. Stack Overflow does not allow the tiny link in here.
import ahk
import win32clipboard
import traceback
import appJar
import requests
import sys
import urllib
import selenium
import getpass
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import socket
import time
import urllib.request
from ahk import AHK, Hotkey, ActionChain # You want to play with AHK.
from appJar import gui
try:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU64.exe")
except:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU32.exe")
finally:
pass
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--start-maximized')
chrome_options.add_experimental_option("excludeSwitches", ['enable-automation']);
chromeDriver = webdriver.Chrome('C:\\new_software\\chromedriver.exe', chrome_options = chrome_options)
def ahk_disabledevmodescript():
try:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU64.exe")
except:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU32.exe")
finally:
pass
ahk_disabledevmodescriptt= [
str('WinActivate,ahk_exe chrome.exe'),
str('Send {esc}'),
]
#Run-Script
for snipet in ahk_disabledevmodescriptt:
ahk.run_script(snipet, blocking=True )
return
def launchtabsagain():
chromeDriver.execute_script("window.open('https://developers.google.com/edu/python/introduction', 'tab2');")
chromeDriver.execute_script("window.open('https://www.facebook.com/', 'tab3');")
chromeDriver.execute_script("window.open('https://developer.mozilla.org/en-US/docs/Web/API/Window/open', 'tab4');")
chromeDriver.execute_script("window.open('https://www.easyespanol.org/', 'tab5');")
chromeDriver.execute_script("window.open('https://www.google.com/search?source=hp&ei=EPO2Xf3EMLPc9AO07b2gAw&q=programming+is+not+difficult&oq=programming+is+not+difficult&gs_l=psy-ab.3..0i22i30.3497.22282..22555...9.0..0.219.3981.21j16j1......0....1..gws-wiz.....6..0i362i308i154i357j0j0i131j0i10j33i22i29i30..10001%3A0%2C154.h1w5MmbFx7c&ved=0ahUKEwj9jIyzjb_lAhUzLn0KHbR2DzQQ4dUDCAg&uact=5', 'tab6');")
chromeDriver.execute_script("window.open('https://www.google.com/search?source=hp&ei=NvO2XdCrIMHg9APduYzQDA&q=dinner+recipes&oq=&gs_l=psy-ab.1.0.0i362i308i154i357l6.0.0..3736...0.0..0.179.179.0j1......0......gws-wiz.....6....10001%3A0%2C154.gsoCDxw8cyU', 'tab7');")
return
chromeDriver.get('https://ebc.cybersource.com/ebc2/')
compoanionWindow = ahk.active_window
launchtabs = launchtabsagain()
disabledevexetmessage = ahk_disabledevmodescript()
def copyUrl():
try:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU64.exe")
except:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU32.exe")
finally:
pass
snipet = str('WinActivate,ahk_exe chrome.exe')
ahk.run_script(snipet, blocking=True )
compoanionWindow.activate()
ahk_TinyChromeCopyURLScript=[
str('WinActivate,ahk_exe chrome.exe'),
str('send ^l'),
str('sleep 10'),
str('send ^c'),
str('BlockInput, MouseMoveoff'),
str('clipwait'),
]
#Run-AHK Script
if ahk:
for snipet in ahk_TinyChromeCopyURLScript:
ahk.run_script(snipet, blocking=True )
win32clipboard.OpenClipboard()
urlToShorten = win32clipboard.GetClipboardData()
win32clipboard.CloseClipboard()
return(urlToShorten)
def tiny_url(url):
try:
apiurl = "https: // tinyurl. com / api - create. php? url= " #remove spaces here
tinyp = requests.Session()
tinyp.proxies = {"https" : "https://USER:PASSWORD." + "#userproxy.visa.com:443", "http" : "http://USER:PASSWORD." + "#userproxy.visa.com:8080"}
tinyUrl = tinyp.get(apiurl+url).text
returnedresponse = tinyp.get(apiurl+url)
if returnedresponse.status_code == 200:
print('Success! response code =' + str(returnedresponse))
else:
print('Code returned = ' + str(returnedresponse))
print('From IP Address =' +IPadd)
except:
apiurl = "https: // tinyurl. com / api - create. php? url= " #remove spaces here
tinyp = requests.Session()
tinyUrl = tinyp.get(apiurl+url).text
returnedresponse = tinyp.get(apiurl+url)
if returnedresponse.status_code == 200:
print('Success! response code =' + str(returnedresponse))
print('From IP Address =' +IPadd)
else:
print('Code returned = ' + str(returnedresponse))
return tinyUrl
def tinyUrlButton():
longUrl = copyUrl()
try:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU64.exe")
except:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU32.exe")
finally:
pass
try:
shortUrl = tiny_url(longUrl)
win32clipboard.OpenClipboard()
win32clipboard.EmptyClipboard()
win32clipboard.SetClipboardText(shortUrl)
win32clipboard.CloseClipboard()
if ahk:
try:
if str(shortUrl) == 'Error':
ahk.run_script("Msgbox,262144 ,Done.,"+ shortUrl + "`rPlease make sure there is a link to copy and that the page is fully loaded., 5.5" )
else:
ahk.run_script("Msgbox,262144 ,Done.,"+ shortUrl + " is in your clipboard., 1.5" )
# ahk.run_script("WinActivate, tinyUrl" )
except:
traceback.print_exc()
print('error during ahk script')
pass
except:
print('Error getting tinyURl')
traceback.print_exc()
def closeChromeTabs():
try:
try:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU64.exe")
except:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU32.exe")
finally:
pass
compoanionWindow.activate()
ahk_CloseChromeOtherTabsScript = [
str('WinActivate,ahk_exe chrome.exe'),
str('Mouseclick, Right, 30, 25,,1'),
str('Send {UP 3} {enter}'),
str('BlockInput, MouseMoveOff'),
]
#Run-Script
if ahk:
for snipet in ahk_CloseChromeOtherTabsScript:
ahk.run_script(snipet, blocking=True )
return(True)
except:
traceback.print_exc()
print("Failed to run closeTabs function.")
ahk.run_script('Msgbox,262144,,Failed to run closeTabs function.,2')
return(False)
# create a GUI and testing this library.
window = gui("tinyUrl and close Tabs test ", "200x160")
window.setFont(9)
window.setBg("blue")
window.removeToolbar(hide=True)
window.addLabel("description", "Testing AHK Library.")
window.addLabel("title", "tinyURL")
window.setLabelBg("title", "blue")
window.setLabelFg("title", "white")
window.addButtons(["T"], tinyUrlButton)
window.addLabel("title1", "Close tabs")
window.setLabelBg("title1", "blue")
window.setLabelFg("title1", "white")
window.addButtons(["C"], closeChromeTabs)
window.addLabel("title2", "Launch tabs")
window.setLabelBg("title2", "blue")
window.setLabelFg("title2", "white")
window.addButtons(["L"], launchtabsagain)
window.go()
if window.exitFullscreen():
chromeDriver.quit()
def closeTabs():
try:
try:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU64.exe")
except:
ahk = AHK(executable_path="C:\\Program Files\\AutoHotkey\\AutoHotkeyU32.exe")
finally:
pass
compoanionWindow.activate()
ahk_CloseChromeOtherTabsScript = [
str('WinActivate,ahk_exe chrome.exe'),
str('Mouseclick, Right, 30, 25,,1'),
str('Send {UP 3} {enter}'),
str('BlockInput, MouseMoveOff'),
]
#Run-Script
if ahk:
for snipet in ahk_CloseChromeOtherTabsScript:
ahk.run_script(snipet, blocking=True )
return(True)
except:
traceback.print_exc()
print("Failed to run closeTabs function.")
ahk.run_script('Msgbox,262144,Failed,Failed to run closeTabs function.,2')
return(False)
Found a way using ahk library. Very easy for us non-programmers that need to solve this problem. used Python 3.7.3
Install ahk with. pip install ahk
import ahk
from ahk import AHK
import selenium
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ['enable-automation']); #to disable infobar about chrome controlled by automation.
chrome_options.add_argument('--start-maximized')
chromeDriver = webdriver.Chrome('C:\\new_software\\chromedriver.exe', chrome_options = options) #specify your chromedriver location
chromeDriver.get('https://www.autohotkey.com/')#launch a tab
#launch some other random tabs for testing.
chromeDriver.execute_script("window.open('https://developers.google.com/edu/python/introduction', 'tab2');")
chromeDriver.execute_script("window.open('https://www.facebook.com/', 'tab3');")
chromeDriver.execute_script("window.open('https://developer.mozilla.org/en-US/docs/Web/API/Window/open', 'tab4');"`)
seleniumwindow = ahk.active_window #as soon as you open you Selenium session, get a handle of the window frame with AHK.
seleniumwindow.activate() #will activate whatever tab you have active in the Selenium browser as AHK is activating the window frame
#To activate specific tabs I would use chromeDriver.switchTo()
#chromeDriver.switch_to_window(chromeDriver.window_handles[-1]) This takes you to the last opened tab in Selenium and chromeDriver.switch_to_window(chromeDriver.window_handles[1])to the second tab, etc..
I am working on python and selenium. I want to download file from clicking event using selenium. I wrote following code.
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
browser = webdriver.Firefox()
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.close()
I want to download both files from links with name "Export Data" from given url. How can I achieve it as it works with click event only?
Find the link using find_element(s)_by_*, then call click method.
from selenium import webdriver
# To prevent download dialog
profile = webdriver.FirefoxProfile()
profile.set_preference('browser.download.folderList', 2) # custom location
profile.set_preference('browser.download.manager.showWhenStarting', False)
profile.set_preference('browser.download.dir', '/tmp')
profile.set_preference('browser.helperApps.neverAsk.saveToDisk', 'text/csv')
browser = webdriver.Firefox(profile)
browser.get("http://www.drugcite.com/?q=ACTIMMUNE")
browser.find_element_by_id('exportpt').click()
browser.find_element_by_id('exporthlgt').click()
Added profile manipulation code to prevent download dialog.
I'll admit this solution is a little more "hacky" than the Firefox Profile saveToDisk alternative, but it works across both Chrome and Firefox, and doesn't rely on a browser-specific feature which could change at any time. And if nothing else, maybe this will give someone a little different perspective on how to solve future challenges.
Prerequisites: Ensure you have selenium and pyvirtualdisplay installed...
Python 2: sudo pip install selenium pyvirtualdisplay
Python 3: sudo pip3 install selenium pyvirtualdisplay
The Magic
import pyvirtualdisplay
import selenium
import selenium.webdriver
import time
import base64
import json
root_url = 'https://www.google.com'
download_url = 'https://www.google.com/images/branding/googlelogo/2x/googlelogo_color_272x92dp.png'
print('Opening virtual display')
display = pyvirtualdisplay.Display(visible=0, size=(1280, 1024,))
display.start()
print('\tDone')
print('Opening web browser')
driver = selenium.webdriver.Firefox()
#driver = selenium.webdriver.Chrome() # Alternately, give Chrome a try
print('\tDone')
print('Retrieving initial web page')
driver.get(root_url)
print('\tDone')
print('Injecting retrieval code into web page')
driver.execute_script("""
window.file_contents = null;
var xhr = new XMLHttpRequest();
xhr.responseType = 'blob';
xhr.onload = function() {
var reader = new FileReader();
reader.onloadend = function() {
window.file_contents = reader.result;
};
reader.readAsDataURL(xhr.response);
};
xhr.open('GET', %(download_url)s);
xhr.send();
""".replace('\r\n', ' ').replace('\r', ' ').replace('\n', ' ') % {
'download_url': json.dumps(download_url),
})
print('Looping until file is retrieved')
downloaded_file = None
while downloaded_file is None:
# Returns the file retrieved base64 encoded (perfect for downloading binary)
downloaded_file = driver.execute_script('return (window.file_contents !== null ? window.file_contents.split(\',\')[1] : null);')
print(downloaded_file)
if not downloaded_file:
print('\tNot downloaded, waiting...')
time.sleep(0.5)
print('\tDone')
print('Writing file to disk')
fp = open('google-logo.png', 'wb')
fp.write(base64.b64decode(downloaded_file))
fp.close()
print('\tDone')
driver.close() # close web browser, or it'll persist after python exits.
display.popen.kill() # close virtual display, or it'll persist after python exits.
Explaination
We first load a URL on the domain we're targeting a file download from. This allows us to perform an AJAX request on that domain, without running into cross site scripting issues.
Next, we're injecting some javascript into the DOM which fires off an AJAX request. Once the AJAX request returns a response, we take the response and load it into a FileReader object. From there we can extract the base64 encoded content of the file by calling readAsDataUrl(). We're then taking the base64 encoded content and appending it to window, a gobally accessible variable.
Finally, because the AJAX request is asynchronous, we enter a Python while loop waiting for the content to be appended to the window. Once it's appended, we decode the base64 content retrieved from the window and save it to a file.
This solution should work across all modern browsers supported by Selenium, and works whether text or binary, and across all mime types.
Alternate Approach
While I haven't tested this, Selenium does afford you the ability to wait until an element is present in the DOM. Rather than looping until a globally accessible variable is populated, you could create an element with a particular ID in the DOM and use the binding of that element as the trigger to retrieve the downloaded file.
In chrome what I do is downloading the files by clicking on the links, then I open chrome://downloads page and then retrieve the downloaded files list from shadow DOM like this:
docs = document
.querySelector('downloads-manager')
.shadowRoot.querySelector('#downloads-list')
.getElementsByTagName('downloads-item')
This solution is restrained to chrome, the data also contains information like file path and download date. (note this code is from JS, may not be the correct python syntax)
Here is the full working code. You can use web scraping to enter the username password and other field. For getting the field names appearing on the webpage, use inspect element. Element name(Username,Password or Click Button) can be entered through class or name.
from selenium import webdriver
# Using Chrome to access web
options = webdriver.ChromeOptions()
options.add_argument("download.default_directory=C:/Test") # Set the download Path
driver = webdriver.Chrome(options=options)
# Open the website
try:
driver.get('xxxx') # Your Website Address
password_box = driver.find_element_by_name('password')
password_box.send_keys('xxxx') #Password
download_button = driver.find_element_by_class_name('link_w_pass')
download_button.click()
driver.quit()
except:
driver.quit()
print("Faulty URL")