How to change proxy in selenium firefox webdriver while running? [duplicate] - python

Is there any way to dynamically change the proxy being used by Firefox when using selenium webdriver?
Currently I have proxy support using a proxy profile but is there a way to change the proxy when the browser is alive and running?
My current code:
proxy = Proxy({
'proxyType': 'MANUAL',
'httpProxy': proxy_ip,
'ftpProxy': proxy_ip,
'sslProxy': proxy_ip,
'noProxy': '' # set this value as desired
})
browser = webdriver.Firefox(proxy=proxy)
Thanks in advance.

This is a slightly old question.
But it is actually possible to change the proxies dynamically thru a "hacky way"
I am going to use Selenium JS with Firefox but you can follow thru in the language you want.
Step 1: Visiting "about:config"
driver.get("about:config");
Step 2 : Run script that changes proxy
var setupScript=`var prefs = Components.classes["#mozilla.org/preferences-service;1"]
.getService(Components.interfaces.nsIPrefBranch);
prefs.setIntPref("network.proxy.type", 1);
prefs.setCharPref("network.proxy.http", "${proxyUsed.host}");
prefs.setIntPref("network.proxy.http_port", "${proxyUsed.port}");
prefs.setCharPref("network.proxy.ssl", "${proxyUsed.host}");
prefs.setIntPref("network.proxy.ssl_port", "${proxyUsed.port}");
prefs.setCharPref("network.proxy.ftp", "${proxyUsed.host}");
prefs.setIntPref("network.proxy.ftp_port", "${proxyUsed.port}");
`;
//running script below
driver.executeScript(setupScript);
//sleep for 1 sec
driver.sleep(1000);
Where use ${abcd} is where you put your variables, in the above example I am using ES6 which handles concatenation as shown, you can use other concatenation methods of your choice , depending on your language.
Step 3: : Visit your site
driver.get("http://whatismyip.com");
Explanation:the above code takes advantage of Firefox's API to change the preferences using JavaScript code.

As far as I know there are only two ways to change the proxy setting, one via a profile (which you are using) and the other using the capabilities of a driver when you instantiate it as per here. Sadly neither of these methods do what you want as they both happen before as you create your driver.
I have to ask, why is it you want to change your proxy settings? The only solution I can easily think of is to point firefox to a proxy that you can change at runtime. I am not sure but that might be possible with browsermob-proxy.

One possible solution is to close the webdriver instance and create it again after each operation by passing a new configuration in the browser profile

Have a try selenium-wire, It can even override header field
from seleniumwire import webdriver
options = {
'proxy': {
"http": "http://" + IP_PORT,
"https": "http://" + IP_PORT,
'custom_authorization':AUTH
},
'connection_keep_alive': True,
'connection_timeout': 30,
'verify_ssl': False
}
# Create a new instance of the Firefox driver
driver = webdriver.Firefox(seleniumwire_options=options)
driver.header_overrides = {
'Proxy-Authorization': AUTH
}
# Go to the Google home page
driver.get("http://whatismyip.com")
driver.close()

Related

How to login to website which is detecting bot usage using Selenium [duplicate]

I am running the Chrome driver over Selenium on a Ubuntu server behind a residential proxy network. Yet, my Selenium is being detected. Is there a way to make the Chrome driver and Selenium 100% undetectable?
I have been trying for so long I lost track of the many things I have done including:
Trying different versions of Chrome
Adding several flags and removing some words from the Chrome driver file.
Running it behind a proxy (residential ones also) using incognito mode.
Loading profiles.
Random mouse movements.
Randomising everything.
I am looking for a true version of Selenium that is 100% undetectable. If that ever existed. Or another automation way that is not detectable by bot trackers.
This is part of the starting of the browser:
sx = random.randint(1000, 1500)
sn = random.randint(3000, 4500)
display = Display(visible=0, size=(sx,sn))
display.start()
randagent = random.randint(0,len(useragents_desktop)-1)
uag = useragents_desktop[randagent]
#this is to prevent ip leaking
preferences =
"webrtc.ip_handling_policy" : "disable_non_proxied_udp",
"webrtc.multiple_routes_enabled": False,
"webrtc.nonproxied_udp_enabled" : False
chrome_options.add_experimental_option("prefs", preferences)
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-impl-side-painting")
chrome_options.add_argument("--disable-setuid-sandbox")
chrome_options.add_argument("--disable-seccomp-filter-sandbox")
chrome_options.add_argument("--disable-breakpad")
chrome_options.add_argument("--disable-client-side-phishing-detection")
chrome_options.add_argument("--disable-cast")
chrome_options.add_argument("--disable-cast-streaming-hw-encoding")
chrome_options.add_argument("--disable-cloud-import")
chrome_options.add_argument("--disable-popup-blocking")
chrome_options.add_argument("--ignore-certificate-errors")
chrome_options.add_argument("--disable-session-crashed-bubble")
chrome_options.add_argument("--disable-ipv6")
chrome_options.add_argument("--allow-http-screen-capture")
chrome_options.add_argument("--start-maximized")
wsize = "--window-size=" + str(sx-10) + ',' + str(sn-10)
chrome_options.add_argument(str(wsize) )
prefs = {"profile.managed_default_content_settings.images": 2}
chrome_options.add_experimental_option("prefs", prefs)
chrome_options.add_argument("blink-settings=imagesEnabled=true")
chrome_options.add_argument("start-maximized")
chrome_options.add_argument("user-agent="+uag)
chrome_options.add_extension(pluginfile)#this is for the residential proxy
driver = webdriver.Chrome(executable_path="/usr/bin/chromedriver", chrome_options=chrome_options)
The fact that selenium driven WebDriver gets detected doesn't depends on any specific Selenium, Chrome or ChromeDriver version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.
However some generic approaches to avoid getting detected while web-scraping are as follows:
The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
If you need to send multiple requests to a website, you need to keep on changing the user-agent on each request. You can find a detailed discussion in Way to change Google Chrome user agent in Selenium?
To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds
#Antoine Vastel in his blog site Detecting Chrome Headless mentioned several approaches, which distinguish the Chrome browser from a headless Chrome browser.
User agent: The user agent attribute is commonly used to detect the OS as well as the browser of the user. With Chrome version 59 it has the following value:
Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/59.0.3071.115 Safari/537.36
A check for the presence of Chrome headless can be done through:
if (/HeadlessChrome/.test(window.navigator.userAgent)) {
console.log("Chrome headless detected");
}
Plugins: navigator.plugins returns an array of plugins present in the browser. Typically, on Chrome we find default plugins, such as Chrome PDF viewer or Google Native Client. On the opposite, in headless mode, the array returned contains no plugin.
A check for the presence of Plugins can be done through:
if(navigator.plugins.length == 0) {
console.log("It may be Chrome headless");
}
Languages: In Chrome two Javascript attributes enable to obtain languages used by the user: navigator.language and navigator.languages. The first one is the language of the browser UI, while the second one is an array of string representing the user’s preferred languages. However, in headless mode, navigator.languages returns an empty string.
A check for the presence of Languages can be done through:
if(navigator.languages == "") {
console.log("Chrome headless detected");
}
WebGL: WebGL is an API to perform 3D rendering in an HTML canvas. With this API, it is possible to query for the vendor of the graphic driver as well as the renderer of the graphic driver. With a vanilla Chrome and Linux, we can obtain the following values for renderer and vendor: Google SwiftShader and Google Inc.. In headless mode, we can obtain Mesa OffScreen, which is the technology used for rendering without using any sort of window system and Brian Paul, which is the program that started the open source Mesa graphics library.
A check for the presence of WebGL can be done through:
var canvas = document.createElement('canvas');
var gl = canvas.getContext('webgl');
var debugInfo = gl.getExtension('WEBGL_debug_renderer_info');
var vendor = gl.getParameter(debugInfo.UNMASKED_VENDOR_WEBGL);
var renderer = gl.getParameter(debugInfo.UNMASKED_RENDERER_WEBGL);
if(vendor == "Brian Paul" && renderer == "Mesa OffScreen") {
console.log("Chrome headless detected");
}
Not all Chrome headless will have the same values for vendor and renderer. Others keep values that could also be found on non headless version. However, Mesa Offscreen and Brian Paul indicates the presence of the headless version.
Browser features: Modernizr library enables to test if a wide range of HTML and CSS features are present in a browser. The only difference we found between Chrome and headless Chrome was that the latter did not have the hairline feature, which detects support for hidpi/retina hairlines.
A check for the presence of hairline feature can be done through:
if(!Modernizr["hairline"]) {
console.log("It may be Chrome headless");
}
Missing image: The last on our list also seems to be the most robust, comes from the dimension of the image used by Chrome in case an image cannot be loaded. In case of a vanilla Chrome, the image has a width and height that depends on the zoom of the browser, but are different from zero. In a headless Chrome, the image has a width and an height equal to zero.
A check for the presence of Missing image can be done through:
var body = document.getElementsByTagName("body")[0];
var image = document.createElement("img");
image.src = "http://iloveponeydotcom32188.jg";
image.setAttribute("id", "fakeimage");
body.appendChild(image);
image.onerror = function(){
if(image.width == 0 && image.height == 0) {
console.log("Chrome headless detected");
}
}
References
You can find a couple of similar discussions in:
How to bypass Google captcha with Selenium and python?
How to make Selenium script undetectable using GeckoDriver and Firefox through Python?
tl; dr
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
How does recaptcha 3 know I'm using selenium/chromedriver?
Selenium and non-headless browser keeps asking for Captcha
why not try undetected-chromedriver?
Optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io Automatically downloads the driver binary and patches it.
Tested until current chrome beta versions
Works also on Brave Browser and many other Chromium based browsers
Python 3.6++
you can install it with: pip install undetected-chromedriver
There are important things you should be ware of:
Due to the inner workings of the module, it is needed to browse programmatically (ie: using .get(url) ). Never use the gui to navigate. Using your keybord and mouse for navigation causes possible detection! New Tabs: same story. If you really need multi-tabs, then open the tab with the blank page (hint: url is data:, including comma, and yes, driver accepts it) and do your thing as usual. If you follow these "rules" (actually its default behaviour), then you will have a great time for now.
In [1]: import undetected_chromedriver as uc
In [2]: driver = uc.Chrome()
In [3]: driver.execute_script('return navigator.webdriver')
Out[3]: True # Detectable
In [4]: driver.get('https://distilnetworks.com') # starts magic
In [4]: driver.execute_script('return navigator.webdriver')
In [5]: None # Undetectable!
For Python with Chrome or Chromium-based browsers, there's Selenium-Profiles
It currently supports:
Overwrite device metrics with fake-profiles
Mobile and Desktop emulation
Undetected by Google, Cloudflare, ..
Modifying headers supported using Selenium-Interceptor
Touch Actions
proxies with authentication
making single POST, GET or other requests using driver.requests.fetch(url, options) (syntax)
Installation
pip install selenium-profiles
Example script
from selenium_profiles.driver import driver as mydriver
from selenium_profiles.profiles import profiles
mydriver = mydriver()
driver = mydriver.start(profiles.Windows()) # or .Android
# get url
driver.get('https://nowsecure.nl/#relax') # test undetectability
input("Press ENTER to exit: ")
driver.quit() # Execute on the End!
Notes:
The package is licenced under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , means, in case you want to use it for something commercial, you need to ask the author first.
headless support currently isn't guaranteed, but you can use pyvirtualdisplay
What about:
import random
from selenium import webdriver
import time
driver = webdriver.Chrome("C:\\Users\\DusEck\\Desktop\\chromedriver.exe")
username = "username" # data_user
password = "password" # data_pass
driver.get("https://www.depop.com/login/") # get URL
driver.find_element_by_xpath('/html/body/div[1]/div/div[3]/div[2]/button[2]').click() # Accept cookies
split_char_pw = [] # Empty lists
split_char = []
n = 1 # Splitter
for index in range(0, len(username), n):
split_char.append(username[index: index + n])
for user_letter in split_char:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("username").send_keys(user_letter)
for index in range(0, len(password), n):
split_char.append(password[index: index + n])
for pw_letter in split_char_pw:
time.sleep(random.uniform(0.1, 0.8))
driver.find_element_by_id("password").send_keys(pw_letter)

Implementing Selenium to use authentication proxies that change

I am trying to get selenium to use a proxy that will change at a certain point.
from seleniumwire import webdriver
def proxyManage():
proxyChange("test.com", "8000", "user1", "password1")
def proxyChange(host, port, username, password):
options = {
'proxy': {
'http': 'http://'+username+':'+password+'#'+host+':'+port,
'https': 'https://'+username+':'+password+'#'+host+':'+port,
}
}
PATH = "D:/Programming/undetectable chrome/chromedriver.exe"
browser = webdriver.Chrome(PATH, options=options)
browser.get("https://whatismyipaddress.com/")
proxyManage()
So I import seleniumwire as I am unsure how normal selenium uses proxies. Now when I try to run the program to test on the website if it works I get an error below,
Traceback (most recent call last):
File "D:\Programming\Python\proxyTest.py", line 20, in <module>
proxyManage()
File "D:\Programming\Python\proxyTest.py", line 6, in proxyManage
proxyChange("test.com", "8000", "user1", "password1")
File "D:\Programming\Python\proxyTest.py", line 16, in proxyChange
browser = webdriver.Chrome(PATH, options=options)
File "C:\Users\User\AppData\Local\Programs\Python\Python39\lib\site-packages\seleniumwire\webdriver\browser.py", line 97, in __init__
chrome_options.add_argument('proxy-bypass-list=<-loopback>')
AttributeError: 'dict' object has no attribute 'add_argument'
So first is there a way I can pass these arguments to proxyChange() without the error. The idea is to have a counter in proxyManage() and every time it runs it will move the next line in the proxy.txt file and then parse these arguments to proxyChange() and hopefully it will update without the program closing again? Would be nice to multithread this.
You can try to construct it first before passing it to the dict:
http_proxy = f"'http://{username}:{password}#{host}:{port}"
https_proxy = f"'https://{username}:{password}#{host}:{port}"
options = {
'proxy': {
'http': http_proxy,
'https': https_proxy
}
}
This only works for Python >=3.6. If you're using lower version, concentrate strings should work as well.
Selenium is not suitable for auth.Its just web UI automation please check SeleniumHQ site
Two Factor Authentication shortly know as 2FA is a authorization mechanism where One Time Password(OTP) is generated using “Authenticator” mobile apps such as “Google Authenticator”, “Microsoft Authenticator” etc., or by SMS, e-mail to authenticate. Automating this seamlessly and consistently is a big challenge in Selenium. There are some ways to automate this process. But that will be another layer on top of our Selenium tests and not secured as well. So, you can avoid automating 2FA.
There are few options to get around 2FA checks:
Disable 2FA for certain Users in the test environment, so that you can use those user credentials in the automation.
Disable 2FA in your test environment.
Disable 2FA if you login from certain IPs. That way we can configure our test machine IPs to avoid this.
Change
browser = webdriver.Chrome(PATH, options=options)
to
browser = webdriver.Chrome(PATH, seleniumwire_options=options)
works for me

how to detect unexpected url change python webdriver selenium?

I am automating a browser process but same credentials are used by all the persons(only one user can access the portal at a time), so whenever somebody else login-in, the current user is automatically kicked out with url change to "http://172.17.3.248:8889/ameyoreports/?acpMode=false#loggedOut".
Is there any way to constantly check for url change while my automatation script is running along and when logout is detected end the script.
I am using python selenium webdriver.
In Java we can take help from EventLister https://seleniumhq.github.io/selenium/docs/api/java/org/openqa/selenium/support/events/WebDriverEventListener.html for example if you implement it
public class Test2 implements WebDriverEventListener{
#Override
public void beforeFindBy(By arg0, WebElement arg1, WebDriver driver) {
if(driver.getCurrentUrl().equals("http://172.17.3.248:8889/ameyoreports/?acpMode=false#loggedOut")==true) {
//do want you want.
}
}
we have to use the same like below to cross check url before doing any action (as per above example, cross check url before finding element)
FirefoxDriver driver = new FirefoxDriver();
EventFiringWebDriver eventDriver = new EventFiringWebDriver(driver);
EventHandler handler = new EventHandler();
eventDriver.register(handler);
eventDriver.get("url");
in Java it helps http://toolsqa.com/selenium-webdriver/event-listener/ for python http://selenium-python.readthedocs.io/api.html#module-selenium.webdriver.support.abstract_event_listener
hey there is current_url attribute associated with the selenium webdriver object, you will be able to fetch the changed url using webdriver.current_url.
Keep a check for that and you can break your script whenever you want.
You can test it with the following code
#using chrome webdriver
from selenium.webdriver.chrome.options import Options
browser = Options()
instance = webdriver.Chrome(webdriver_path, options=browser)
instance.get(url)
instance.current_url <<<<<<< this will give the current url opened in browser
# manually enter another url in the browser then again check
instance.current_url

How to use Crawlera with selenium (Python, Chrome, Windows) without Polipo

So basically i am trying to use the Crawlera Proxy from scrapinghub with selenium chrome on windows using python.
I checked the documentation and they suggested using Polipo like this:
1) adding the following lines to /etc/polipo/config
parentProxy = "proxy.crawlera.com:8010"
parentAuthCredentials = "<CRAWLERA_APIKEY>:"
2) adding this to selenium driver
polipo_proxy = "127.0.0.1:8123"
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'httpProxy': polipo_proxy,
'ftpProxy' : polipo_proxy,
'sslProxy' : polipo_proxy,
'noProxy' : ''
})
capabilities = dict(DesiredCapabilities.CHROME)
proxy.add_to_capabilities(capabilities)
driver = webdriver.Chrome(desired_capabilities=capabilities)
Now i'd like to not use Polipo and directly use the proxy.
Is there a way to replace the polipo_proxy variable and change it to the crawlera one? Each time i try to do it, it doesn't take it into account and runs without proxy.
Crawlera proxy format is like the folowwing: [API KEY]:#[HOST]:[PORT]
I tried adding the proxy using the following line:
chrome_options.add_argument('--proxy-server=http://[API KEY]:#[HOST]:[PORT])
but the problem is that i need to specify HTTP and HTTPS differently.
Thank you in advance!
Polipo is no longer maintained and hence there are challenges in using it. Crawlera requires Authentication, which Chrome driver does not seem to support as of now. You can try using Firefox webdriver, in that you can set the proxy authentication in the custom Firefox profile and use the profile as shown in Running selenium behind a proxy server and http://toolsqa.com/selenium-webdriver/http-proxy-authentication/.
I have been suffering from the same problem and got some relief out of it. Hope it will help you as well. To solve this problem you have to use Firefox driver and its profile to put proxy information this way.
profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy.http", "proxy.server.address")
profile.set_preference("network.proxy.http_port", "port_number")
profile.update_preferences()
driver = webdriver.Firefox(firefox_profile=profile)
This totally worked for me. For reference you can use above sites.
Scrapinghub creates a new project. You need to set up a forwarding agent by using apikey, and then set webdriver to use this agent. The project address is: zyte-smartproxy-headless-proxy
You can have a look

Phantomjs through selenium in python

I am trying to test a webpage's behaviour to requests from different referrers. I am doing the following so far
webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.customHeaders.referer'] = referer
The problem is that the webpage has ajax requests which will change some things in the html, and those ajax requests should have as referer the webpage itself and not the referer i gave at the start. It seems that the referer is set once at the start and every subsequent request be it ajax or image or anchor takes that same referer and it never changes no matter how deep you browse, is there a solution to choocing the referer only for the first request and having it dynamic for the rest?
After some search i found this and i tried to achieve it through selenium, but i have not had any success yet with this:
webdriver.DesiredCapabilities.PHANTOMJS['phantomjs.page.onInitialized'] = """function() {page.customHeaders = {};};"""
Any ideas?
From what I can tell you would need to patch PhantomJS to achieve this.
PhantomJS contains a module called GhostDriver which provides the HTTP API that WebDriver uses to communicate with the PhantomJS instance. So anything you want to do via WebDriver needs to be supported by GhostDriver, but it doesn't seem that onInitialized is supported by GhostDriver.
If you're feeling adventurous you could clone the PhantomJS repository and patch the src/ghostdriver/session.js file to do what you want.
The _init method looks like this:
_init = function() {
var page;
// Ensure a Current Window is available, if it's found to be `null`
if (_currentWindowHandle === null) {
// Create the first Window/Page
page = require("webpage").create();
// Decorate it with listeners and helpers
page = _decorateNewWindow(page);
// set session-specific CookieJar
page.cookieJar = _cookieJar;
// Make the new Window, the Current Window
_currentWindowHandle = page.windowHandle;
// Store by WindowHandle
_windows[_currentWindowHandle] = page;
}
},
You could try using the code you found:
page.onInitialized = function() {
page.customHeaders = {};
};
on the page object created there.
Depending on what you test though you might be able to save a lot of effort and ditch the browser and just test HTTP requests directly using something like the requests module.

Categories