Blocking downloads on Chrome using Selenium with Python - python

I made a simple bot on python using selenium that researches a certain content in a website and returns a specific result to the user. However, that website has an ad that every time I click on a page element, or click ENTER to advance (or the bot clicks in this case) it downloads a file. I only found out about this after running a few tests while improving the bot. I tried doing it manually and the same thing happened, so it's a problem with the website itself.
But I'm guessing there's a way to completely block the downloads, because it's saving that file automatically. I don't think it makes that much difference, but this is what triggers the download:
driver.find_element(By.ID, "hero-search").send_keys(Keys.ENTER)
And I can't go around that because I need to advance to the next page. So, is there a way to block this on selenium?

You can block the downloads by using the chrome preferences:
from selenium import webdriver
options = webdriver.ChromeOptions()
prefs = {
"download.prompt_for_download", false,
"download_restrictions": 3,
}
options.add_experimental_option(
"prefs", prefs
)
driver = webdriver.Chrome(
options=options
)
driver.get(
"https://www.an_url.com"
)
driver.close()

Related

Navigating to a web page and downloading a report using Python

Could you please let me know how to improve the following script to actually click on the export button.
The following script goes to the report's page but does not click on the export button:
from selenium import webdriver
options = webdriver.ChromeOptions()
options.add_argument("<Path to Chrome profile>") #Path to your chrome profile
url = '<URL of the report>'
driver = webdriver.Chrome(executable_path="C:/tools/selenium/chromedriver.exe", chrome_options=options)
driver.get(url)
exportButton = driver.find_element_by_xpath('//*[#id="js_2o"]')
clickexport = exportButton.click()
How would you make the script actually click on the export button?
I would appreciate your help.
Thank you!
try with xpath, example:
driver.find_element_by_xpath('//button[#id="export_button"]').click()
Selenium isn't designed for this. Do you actually care about using Selenium and the browser, or do you just want the file? If the latter, use requests. You can use the browser network inspector, right click->"copy as curl" to get all the headers and cookies you need.

Taking Web screenshot using imgkit

I was trying to take screenshots using imgkit as follows,
options = {
'width': 1000,
'height': 1000
}
imgkit.from_url('https://www.ozbargain.com.au/', 'out1.jpg', options=options)
What I am getting is
The actual look is a bit different. Possibly this is due to javascript not being executed [its a guess]. Could you please tell me a way how can I do this with imgkit. Any suggested library would be helpful too.
You could use Selenium to control web browser Chrome or Firefox which can run JavaScript and browser has function to take screenshot. But JavaScript may display windows with messages which you may have to close using click() in code - but you would have to find manually (in DevTools in browser) class name, id or other values which helps Selenium to recognize button on page.
from selenium import webdriver
from time import sleep
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
driver.get('https://www.ozbargain.com.au/')
driver.set_window_size(1000, 1000)
sleep(2)
# close first message
driver.find_element_by_class_name('qc-cmp-button').click()
sleep(1)
# close second message with details
driver.find_element_by_class_name('qc-cmp-button.qc-cmp-save-and-exit').click()
sleep(1)
driver.get_screenshot_as_file("screenshot.png")
#driver.quit()
Eventually you could use PyAutoGUI or mss to take screenshot of full desktop or some region on desktop.

How to use Selenium to click a button in a popup modal box

I am trying to use Selenium in Python to pull some data from https://www.seekingalpha.com. The front page has a "Sign-in/Join now" link. I used Selenium to click it, which brought up a popup asking for sign-in information with another "Sign in" button. It seems my code below can enter my username and password, but my attempt to click the "sign in" button didn't get the right response (it clicked on the ad below the popup box.)
I am using Python 3.5.
Here is my code:
driver = webdriver.Chrome()
url = "https://seekingalpha.com"
driver.get(url)
sleep(5)
driver.find_element_by_xpath('//*[#id ="sign-in"]').click()
sleep(5)
driver.find_element_by_xpath('//*[#id ="authentication_login_email"]').send_keys("xxxx#gmail.com")
driver.find_element_by_xpath('//*[#id ="authentication_login_password"]').send_keys("xxxxxxxxx")
driver.find_element_by_xpath('//*[#id="log-btn"]').click()
Any advice/suggestion is greatly appreciated.
EDIT: previous 'answer' was wrong so I have updated it.
Got you man, this is what you need to do:
1.) grab the latest firefox
2.) grab the latest geckodriver
3.) use a firefox driver
driver = webdriver.Firefox(executable_path=r'd:\Python_projects\geckodriver.exe')
url = "https://seekingalpha.com"
driver.get(url)
sign_in = driver.find_element_by_xpath('//*[#id ="sign-in"]')
driver.execute_script('arguments[0].click()', sign_in)
time.sleep(1)
email = driver.find_element_by_xpath('//*[#id ="authentication_login_email"]')
email.send_keys("xxxx#gmail.com")
pw = driver.find_element_by_xpath('//*[#id ="authentication_login_password"]')
pw.send_keys("xxxxxxxxx")
pw.send_keys(Keys.ENTER)
Explanation:
It is easy to detect if selenium is used or not if the browser tells that information (and it seems this page does not want to be scraped):
The webdriver read-only property of the navigator interface indicates whether the user agent is controlled by automation.
I have looked for an answer how to bypass detection and found this article.
Your best of avoiding detection when using Selenium would require you to use one of the latest builds of Firefox which don’t appear to give off any obvious sign that you are using Firefox.
Gave a shot and after launch the correct page design loaded and the login attempt resulted the same like the manual attempt.
Also with a bit more searching found that if you modify your chromedriver, you have a chance to bypass detection even with chromedriver.
Learned something new today too. \o/
An additional idea:
I have made a little experiment using embedded chromium (CEF). If you open a chrome window via selenium and you open the console and check navigator.webdriver the result will be True. If you open a CEF window however and then remote debug it, the flag will be False. I did not check edge cases with it but non-edge-case scenarios should be fine with CEF.
So what you may want to check out in the future:
1.) in command line: pip install cefpython3
2.) git clone https://github.com/cztomczak/cefpython.git
3.) open your CEF project and find hello.pyin the examples
4.) update the startup to cef.Initialize(settings={"remote_debugging_port":9222})
5.) run hello.py
(this was the initial, one time setup, you may customize it in the future, but the main thing is done, you have a browser with a debug port open)
6.) modify chrome startup to:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.debugger_address = "127.0.0.1:9222"
driver = webdriver.Chrome(chrome_options=chrome_options, executable_path=chrome_driver_executable)
7.) now you have a driver without 'automated' signature in the browser. There may be some drawbacks like:
CEF is not super very latest, right now the latest released chrome is v76, CEF is v66.
also "some stuff" may not work, like window.Notification is not a thing in CEF
I tried code you provided and it works fine. i added selenium wait just to check other options and those also worked well i changed 2 lines instead of sleeps
driver.get(url)
wait = WebDriverWait(driver, 10)
signin = wait.until(EC.element_to_be_clickable((By.XPATH, "//*[#id ='sign-in']")))
#sleep(5)
signin.click()
#driver.find_element_by_xpath('//*[#id ="sign-in"]').click()
#sleep(5)
wait.until(EC.element_to_be_clickable((By.XPATH, "//*[#id ='authentication_login_email']")))
driver.find_element_by_xpath('//*[#id ="authentication_login_email"]').send_keys("xxxx#gmail.com")
and it does click on Sign in button. and what i found is there is captcha handling on the site when i checked console after clicked on sign in button it tell the story. I went ahead and added user agent to your script but it did not worked as well. Notice the blockscript parameter in response of login API and console errors in below screenshots. However there is no captcha on the ui -

logging into a website and downloading a file

I would like to log into a website and download a file. I'm using selenium and the chromedriver. Would like to know if there is a better way. It currently opens up a chrome browser window and sends the info. I don't want to see the browser window opened up and the data being sent. Just want to send it and return the data into a variable.
from selenium import webdriver
driver = webdriver.Chrome()
def site_login(URL,ID_username,ID_password,ID_submit,name,pas):
driver.get(URL)
driver.find_element_by_id(ID_username).send_keys(name)
driver.find_element_by_id(ID_password).send_keys(pas)
driver.find_element_by_id(ID_submit).click()
URL = "www.mywebsite.com/login"
ID_username = "name"
ID_password = "password"
ID_submit = "submit"
name = "myemail#mail.com"
pas = "mypassword"
resp=site_login(URL,ID_username,ID_password,ID_submit,name,pas)
You can run chrome in headless mode. In which case, the chrome UI won't show up and still performing the task you were doing. Some article I found on this https://intoli.com/blog/running-selenium-with-headless-chrome/. Hope this helps.
First option: If you are able to change the driver, you can use phantom-js as driver. That was a headless browser and you can use it with selenium.
Second option: If the site are not dynamic (easily called it SPA) or you are able to trace packet (which can be done in chrome dev tools), you can directly use request with the help of beautifulsoup if you need to get some data on the page.
Just add this two lines
chrome_options = Options()
chrome_options.add_argument("--headless")
This should make chrome run in the background.

Facebook account download automation with python?

As a regular task within my job role i often have to download full Facebook accounts from within a users account. I am trying to improve my workflow and automate this if i can.
I have tried searching for this topic on the site and although many cover the login part i am yet to locate a question that deals with the popup windows of Facebook. If i am wrong i apologise and please amend the post accordingly.
As a starting point i have decided to start learning python and am using this to script the process with a little help from selenium and Chrome Driver. I have managed to write the code to login and navigate to the correct page and click the initial link 'download a copy'. I am struggling however to get the script to locate and click the 'Start My Archive' button within the popup window.
Here is the code that i have used so far including the several alternative code blocks that i have tried commented out at the bottom:
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
#Global Variables
target = "john.doe.7"
username = "john#doe.com"
password = "Password"
sleep = 5
#Finds and locates the ChromeDriver
driver = webdriver.Chrome("C:\Python35-32\ChromeDriver\chromedriver.exe")
#Set Chrome Options
chrome_options = webdriver.ChromeOptions()
prefs = {"profile.default_content_setting_values.notifications": 2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome(chrome_options=chrome_options)
#Directs browser to webpage
driver.get('http://www.facebook.com');
#code block to log in user
def logmein():
search_box = driver.find_element_by_name('email')
search_box.send_keys(username)
search_box = driver.find_element_by_name('pass')
search_box.send_keys(password)
search_box.submit()
#code to go to specific URL
def GoToURL(URL):
time.sleep(sleep) # Let the user actually see something!
driver.get(URL);
logmein()
GoToURL('https://www.facebook.com/'+'settings')
link = driver.find_element_by_link_text('Download a copy')
link.click()
#driver.find_element_by.xpath("//button[contains(., 'Start My Archive')]").click()
#driver.find_element_by_css_selector('button._42ft._42fu.selected._42gz._42gy').click()
#driver.find_element_by_xpath("//*[contains(text(), 'Start My Archive')]").click()
#driver.find_element_by_css_selector('button._42ft._42fu.layerConfirm.uiOverlayButton.selected._42g-._42gy').click()
#from selenium.webdriver.common.action_chains.ActionChains import driver
#buttons = driver.find_elements_by_xpath("//title[contains(text(),'Start My Archive')]")
#actions = ActionChains(self.driver)
#time.sleep(2)
#actions.click(button)
#actions.perform()
Just add this to your code and run if it is working or not. Always use CssSelector.
Look i ran the below code in eclipse with java and I'm not aware of python so if something is wrong in syntax , i apologies. Just Try this and see.
driver.implicitly_wait(10)
Startmyarchive = driver.find_element_by_css_selector("._42ft._42fu.selected._42gz._42gy")
Startmyarchive.click()
driver.implicitly_wait(10)
Acknowledge = driver.find_element_by_css_selector("._42ft._42fu.layerConfirm.uiOverlayButton.selected._42g-._42gy")
Acknowledge.click()
driver.implicitly_wait(10)
ClickOkay = driver.find_element_by_css_selector("._42ft._42fu.layerCancel.uiOverlayButton.selected._42g-._42gy")
ClickOkay.click()
Happy Learning :-) Do reply back for any query.

Categories