I recently made a script that allow me to create multiple Browser sessions targeting one URL. I would like to add proxy support to it in order to not get banned when running it. I tried to use the Proxy lib from selenium but it just get ignored.
My Question : How can I add proxy support into this script while using Selenium in python ? (each session will get a random proxy)
Here is my code
You could use the stem library which allows you to use Tor in python. Read the docs here to see how to use it.
The two basic parts missing from your code are the following:
from selenium.webdriver.common.proxy import Proxy, ProxyType
chrome_options.add_argument('--proxy-server=#yourproxyhere#'
Tor!
Here you can see how I set up my stem + selenium project:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
from time import sleep
from stem import Signal
from stem.control import Controller
#this gives you a new identity
with Controller.from_port(port = 9051) as controller:
controller.authenticate()
controller.signal(Signal.NEWNYM)
#set the proxy in selenium to 127.0.0.1:9150 and have your Tor Browser open!
link = 'https://some-link.com' #target url
prox= 'socks5://127.0.0.1:9150' #Here you connect to your localhost which connects to a Tor network
#some chrome_options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % prox)
chrome_options.add_argument("--window-size=400,600")
#the following also deactivates location tracking!
prefs = {"profile.default_content_setting_values.geolocation" :2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome("path_to_chromedriver", chrome_options=chrome_options)
driver.get(link)
Here is an updated version of my code including the new proxy support elements, I want it to use proxies form a txt file via a proxies = read_from_txt("proxies.txt") and randomly rotate between them using random lib :
Thank you for quick replies, really appreciate it.
Related
I would like to make a program that modifies my ip when I go to consult a site with selenium, I use webdriver firefox, unfortunately the site that I use for my tests returns my ip and not the ip that I indicated in the options, could you tell me the error please.
The program launches and the firefox page opens (I don't use headless for the test), but it's my ip that is returned and not the one from the specified proxy.
here is my program
from selenium import webdriver
options = webdriver.FirefoxOptions()
proxy = f'{"137.74.65.101"}:{"80"}'
options.add_argument(f'--proxy-server={proxy}')
driver = webdriver.Firefox(options=options)
driver.get('https://httpbin.org/ip')
I need to connect to a website that requires digital certificate authentication, when i try to connect with selenium, it open up a pop up asking to choose the certificate.
This pop is a OS operation, so it cant be controled by selenium
enter image description here
I'm trying to configure a connection so it can automatically authenticate with the certificate, but it's not working
python has a function "load_cert_chain", and I think it can be used to solve this problem, this is how I'm trying to use it
import http.client
import ssl
from selenium import webdriver
certificate_file = 'cert.pem'
certificate_secret= 'key.pem'
host = 'example.com'
context = ssl.SSLContext(ssl.PROTOCOL_SSLv23)
context.load_cert_chain(certfile=certificate_file, keyfile=certificate_secret)
connection = http.client.HTTPSConnection(host, port=443, context=context)
driver = webdriver.Chrome()
driver.get("example.com")
I think i'm missing including the connection in the options of chrome driver, but I can't find the right way to make that
driver.get("example.com")
I've faced a problem while trying to automate some test with selenium chrome webdriver on python. My goal is to switch user-agent and proxy-server after every request that I make to avoid getting banned. Here is my code:
from selenium import webdriver
import time
import random
from fake_useragent import UserAgent
from datetime import datetime as dt
def add_random_ua():
# Generate a random user-agent useing fake_useragent library
fake_ua = UserAgent().random
# If there is already a "user-agent" argument in options delete it
for item in options.arguments:
if '--user-agent=' in item:
options.arguments.remove(item)
# Add generated user-agent to options
options.add_argument(f'--user-agent={fake_ua}')
print(f'User-agent: {fake_ua}')
def add_random_proxy():
# Same logic as with add_random_ua
# PROXY_LIST is just a list read from .txt file
random_proxy = PROXY_LIST[random.randint(0, len(PROXY_LIST) - 1)]
for item in options.arguments:
if '--proxy-server=' in item:
options.arguments.remove(item)
options.add_argument(f'--proxy-server={random_proxy}')
print(f'Proxy-server: {random_proxy}')
chromedriver = 'C:\\Users\\User\\Desktop\\chromedriver\\chromedriver.exe'
# Initial chromedriver options
options = webdriver.ChromeOptions()
options.add_argument("--window-size=1024,768")
options.add_argument("--disable-notifications")
options.add_argument("--disable-popup-blocking")
options.add_argument("--user-data-dir=C:\\Users\\User\\AppData\\Local\\Google\\Chrome\\User Data")
options.add_argument("--profile-directory=Default")
options.add_argument("--ignore-certificate-errors")
while True:
print(f'Timestamp: {dt.now().strftime("%Y-%m-%d %H:%M:%S")}')
add_random_ua()
add_random_proxy()
web = webdriver.Chrome(chromedriver, options=options)
web.implicitly_wait(30)
web.get(URL)
# Some testing code...
web.quit()
So the basic logic of the script is to set some initial chromedriver options then add random proxy-server and user-agent in a while loop, create a webdriver instance with new options, execute some code, destroy webdriver using .quit() and then make all this stuff again.
Main problem is that it doesn't work with add_random_proxy() - every time I can't connect to internet because of failed proxy connection (proxies themselves are totally fine). When I comment that line out everything works well. What am I doing wrong?
Thanks in advance!
I'm trying to capture all network logs using seleniumwire. When chromedriver is in normal mode, it is able to capture all requests. But when it is in headless mode, it is not capturing all requests.
I tried adding sleep(10), assert driver.last_request.response.status_code == 200
but neither helped.
Since seleniumwire is not that popular, I'm adding a sample guide below in the hope of getting people with knowledge of selenium to try a hand to help me fix the problem.
Working with seleniumwire
Installing seleniumwire
pip install seleniumwire
Sample script:
from seleniumwire import webdriver # Import from seleniumwire
# Create a new instance of the Chrome driver
driver = webdriver.Chrome()
# Go to the YouTube homepage.
driver.get('https://www.youtube.com')
# Access requests via the `requests` attribute
for request in driver.requests:
if request.response:
print(
request.path,
request.response.status_code,
request.response.headers['Content-Type']
)
try to capture all requests
options = {
'ignore_http_methods': [] # Capture all requests, including OPTIONS requests
}
driver = webdriver.Chrome("C:\chromedriver.exe",seleniumwire_options=options)
In default it ignores OPTIONS method
When chrome browser is opened by selenium, it uses it's own profile rather than the default one present. Try using custom profile, for chrome you can use ChromeOptions class use a custom profile and try.
I have the following code in Python:
from selenium.webdriver import Firefox
from contextlib import closing
with closing(Firefox()) as browser:
browser.get(url)
I would like to print the user-agent HTTP header and
possibly change it. Is it possible?
There is no way in Selenium to read the request or response headers. You could do it by instructing your browser to connect through a proxy that records this kind of information.
Setting the User Agent in Firefox
The usual way to change the user agent for Firefox is to set the variable "general.useragent.override" in your Firefox profile. Note that this is independent from Selenium.
You can direct Selenium to use a profile different from the default one, like this:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "whatever you want")
driver = webdriver.Firefox(profile)
Setting the User Agent in Chrome
With Chrome, what you want to do is use the user-agent command line option. Again, this is not a Selenium thing. You can invoke Chrome at the command line with chrome --user-agent=foo to set the agent to the value foo.
With Selenium you set it like this:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=whatever you want")
driver = webdriver.Chrome(chrome_options=opts)
Both methods above were tested and found to work. I don't know about other browsers.
Getting the User Agent
Selenium does not have methods to query the user agent from an instance of WebDriver. Even in the case of Firefox, you cannot discover the default user agent by checking what general.useragent.override would be if not set to a custom value. (This setting does not exist before it is set to some value.)
Once the browser is started, however, you can get the user agent by executing:
agent = driver.execute_script("return navigator.userAgent")
The agent variable will contain the user agent.
To build on Louis's helpful answer...
Setting the User Agent in PhantomJS
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
...
caps = DesiredCapabilities.PHANTOMJS
caps["phantomjs.page.settings.userAgent"] = "whatever you want"
driver = webdriver.PhantomJS(desired_capabilities=caps)
The only minor issue is that, unlike for Firefox and Chrome, this does not return your custom setting:
driver.execute_script("return navigator.userAgent")
So, if anyone figures out how to do that in PhantomJS, please edit my answer or add a comment below! Cheers.
This is a short solution to change the request UserAgent on the fly.
Change UserAgent of a request with Chrome
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Chrome(driver_path)
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent":"python 2.7", "platform":"Windows"})
driver.get('http://amiunique.org')
then return your useragent:
agent = driver.execute_script("return navigator.userAgent")
Some sources
The source code of webdriver.py from SeleniumHQ (https://github.com/SeleniumHQ/selenium/blob/11c25d75bd7ed22e6172d6a2a795a1d195fb0875/py/selenium/webdriver/chrome/webdriver.py) extends its functionalities through the Chrome Devtools Protocol
def execute_cdp_cmd(self, cmd, cmd_args):
"""
Execute Chrome Devtools Protocol command and get returned result
We can use the Chrome Devtools Protocol Viewer to list more extended functionalities (https://chromedevtools.github.io/devtools-protocol/tot/Network#method-setUserAgentOverride) as well as the parameters type to use.
Firefox Profile is deprecated, you have to use it in Firefox options like this:
opts = FirefoxOptions()
opts.add_argument("--headless")
opts.add_argument("--width=800")
opts.add_argument("--height=600")
opts.set_preference("general.useragent.override", "userAgent=Mozilla/5.0
(iPhone; CPU iPhone OS 15_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like
Gecko) CriOS/101.0.4951.44 Mobile/15E148 Safari/604.1")
To build on JJC's helpful answer that builds on Louis's helpful answer...
With PhantomJS 2.1.1-windows this line works:
driver.execute_script("return navigator.userAgent")
If it doesn't work, you can still get the user agent via the log (to build on Mma's answer):
from selenium import webdriver
import json
from fake_useragent import UserAgent
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (UserAgent().random)
driver = webdriver.PhantomJS(executable_path=r"your_path", desired_capabilities=dcap)
har = json.loads(driver.get_log('har')[0]['message']) # get the log
print('user agent: ', har['log']['entries'][0]['request']['headers'][1]['value'])