Rotating proxies and user-agents with selenium chrome on python

Rotating proxies and user-agents with selenium chrome on python - python

I've faced a problem while trying to automate some test with selenium chrome webdriver on python. My goal is to switch user-agent and proxy-server after every request that I make to avoid getting banned. Here is my code:
from selenium import webdriver
import time
import random
from fake_useragent import UserAgent
from datetime import datetime as dt
def add_random_ua():
# Generate a random user-agent useing fake_useragent library
fake_ua = UserAgent().random
# If there is already a "user-agent" argument in options delete it
for item in options.arguments:
if '--user-agent=' in item:
options.arguments.remove(item)
# Add generated user-agent to options
options.add_argument(f'--user-agent={fake_ua}')
print(f'User-agent: {fake_ua}')
def add_random_proxy():
# Same logic as with add_random_ua
# PROXY_LIST is just a list read from .txt file
random_proxy = PROXY_LIST[random.randint(0, len(PROXY_LIST) - 1)]
for item in options.arguments:
if '--proxy-server=' in item:
options.arguments.remove(item)
options.add_argument(f'--proxy-server={random_proxy}')
print(f'Proxy-server: {random_proxy}')
chromedriver = 'C:\\Users\\User\\Desktop\\chromedriver\\chromedriver.exe'
# Initial chromedriver options
options = webdriver.ChromeOptions()
options.add_argument("--window-size=1024,768")
options.add_argument("--disable-notifications")
options.add_argument("--disable-popup-blocking")
options.add_argument("--user-data-dir=C:\\Users\\User\\AppData\\Local\\Google\\Chrome\\User Data")
options.add_argument("--profile-directory=Default")
options.add_argument("--ignore-certificate-errors")
while True:
print(f'Timestamp: {dt.now().strftime("%Y-%m-%d %H:%M:%S")}')
add_random_ua()
add_random_proxy()
web = webdriver.Chrome(chromedriver, options=options)
web.implicitly_wait(30)
web.get(URL)
# Some testing code...
web.quit()
So the basic logic of the script is to set some initial chromedriver options then add random proxy-server and user-agent in a while loop, create a webdriver instance with new options, execute some code, destroy webdriver using .quit() and then make all this stuff again.
Main problem is that it doesn't work with add_random_proxy() - every time I can't connect to internet because of failed proxy connection (proxies themselves are totally fine). When I comment that line out everything works well. What am I doing wrong?
Thanks in advance!

Related

Specific web page doesn't load (empty page) HTML and CSS with Selenium?

I started working with Selenium, it works for any website I tried except one (myvisit.com) that doesn't load the page.
It opens Chrome but the page is empty. I tried to set number of delays but it still doesn't load.
When I go to the website on a regular Chrome (without Selenium) it loads everything.
Here is my simple code, not sure how to continue from that:
import os
import random
import time
# selenium libraries
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import ChromiumOptions
def delay():
time.sleep(random.randint(2,3))
driver = webdriver.Chrome(os.getcwd()+"\\webdriver\\chromedriver.exe")
driver.get("https://myvisit.com")
delay()
delay()
delay()
delay()
I also tried to use ChromiumOptions with flags like --no-sandbox but it didn't help:

from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(os.getcwd()+"\\webdriver\\chromedriver.exe",options=options)
Simply add the arguement to remove it from determining it's an automation.

Selenium closes browser after Python code is finished filling out form [duplicate]

This question already has answers here:
Python selenium keep browser open
(4 answers)
Closed 5 months ago.
I'm using Selenium to auto-fill forms using forminfo.py for the inputted information. (Right now I'm testing it out on Instagram just to figure everything out.) What I have done is, I have it check to see if the google chrome driver is already installed in a particular location and if not it installs it and if so it just continues and runs the code to fill out the form.
The problem I'm having is, everything runs fine IF the driver isn't installed and it installs it manually. BUT If the driver is already installed, after the form is filled out and it hits the submit button it closes the browser immediately. Ive only been learning Python for about a few weeks now and I cant figure out whats wrong. Also every where I've look I can find a solution to the problem. Also, I'm not getting any errors in my terminal.
################## IMPORTS ##################
import forminfo
import os
import urllib.request as urllib
import webbrowser
import zipfile
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.common.proxy import Proxy, ProxyType
from time import sleep
################## CHROME DRIVER DOWNLOAD ##################
path = "C:/Users/"+os.getlogin()+"/Documents/chromedriver.exe"
isFile = os.path.isfile(path)
if not os.path.isfile(path):
webbrowser.get('windows-default').open("http://www.google.com", new=0)
zip_path, _ = urllib.urlretrieve(forminfo.downloadurl)
urllib.urlopen = zip_path
extract_dir = "C:/Users/"+os.getlogin()+"/Documents"
zipfile.ZipFile(zip_path, "r").extractall(extract_dir)
else:
print("Starting now....")
################## PROXY INFO ##################
prox = Proxy()
prox.proxy_type = ProxyType.MANUAL
prox.socks_version = 5
prox.socks_proxy = forminfo.socks5ip
#prox.http_proxy = "ip_addr:port"
#prox.ssl_proxy = "ip_addr:port"
################## BROWSER ##################
capabilities = webdriver.DesiredCapabilities.CHROME
prox.add_to_capabilities(capabilities)
#
capabilities['acceptInsecureCerts'] = True
capabilities['acceptSslCerts'] = True
#
driver_service = Service(executable_path="C:/Users/"+os.getlogin()+"/Documents/chromedriver")
chrome_options = Options()
chrome_options.add_argument("--incognito")
chrome_options.add_argument("start-maximized")
chrome_options.add_experimental_option("useAutomationExtension", False)
chrome_options.add_experimental_option("excludeSwitches",["enable-automation"])
#
chrome_options.add_argument("--ignore-certificate-error")
chrome_options.add_argument("--ignore-ssl-errors")
#
driver = webdriver.Chrome(service=driver_service, options=chrome_options, desired_capabilities=capabilities)
driver.get('https://www.instagram.com')
################## BOT ##################
sleep(3)
elem = driver.find_element(By.NAME, "username")
elem.send_keys(forminfo.username)
elem = driver.find_element(By.NAME, "password")
elem.send_keys(forminfo.password)
sleep(3)
elem = driver.find_element(By.XPATH, "//*[#id=\"loginForm\"]/div/div[3]/button/div")
elem.click()
The only solution Ive found to the problem is here by adding a breakpoint() at the very end of my code. But I feel like this would mess some things up down the road if I decided to expand on this code and eventually turn it into a .exe file with pyIntsaller or Nuitak. So I wanted to see If there was a solution before just moving forward with that resolution.
I know it's a problem with the ################## CHROME DRIVER DOWNLOAD ################## code because I wasn't having this problem until I decided to add that code. This Is the very most recent part I added and everything was fine until then.

Converting comment to an answer.
You can use the detach option to let the browser Window stay open after using it with Selenium, example:
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("detach", True)

Adding proxy support to Selenium multi session opening script in Python

I recently made a script that allow me to create multiple Browser sessions targeting one URL. I would like to add proxy support to it in order to not get banned when running it. I tried to use the Proxy lib from selenium but it just get ignored.
My Question : How can I add proxy support into this script while using Selenium in python ? (each session will get a random proxy)
Here is my code

You could use the stem library which allows you to use Tor in python. Read the docs here to see how to use it.
The two basic parts missing from your code are the following:
from selenium.webdriver.common.proxy import Proxy, ProxyType
chrome_options.add_argument('--proxy-server=#yourproxyhere#'
Tor!
Here you can see how I set up my stem + selenium project:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
from time import sleep
from stem import Signal
from stem.control import Controller
#this gives you a new identity
with Controller.from_port(port = 9051) as controller:
controller.authenticate()
controller.signal(Signal.NEWNYM)
#set the proxy in selenium to 127.0.0.1:9150 and have your Tor Browser open!
link = 'https://some-link.com' #target url
prox= 'socks5://127.0.0.1:9150' #Here you connect to your localhost which connects to a Tor network
#some chrome_options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % prox)
chrome_options.add_argument("--window-size=400,600")
#the following also deactivates location tracking!
prefs = {"profile.default_content_setting_values.geolocation" :2}
chrome_options.add_experimental_option("prefs",prefs)
driver = webdriver.Chrome("path_to_chromedriver", chrome_options=chrome_options)
driver.get(link)

Here is an updated version of my code including the new proxy support elements, I want it to use proxies form a txt file via a proxies = read_from_txt("proxies.txt") and randomly rotate between them using random lib :
Thank you for quick replies, really appreciate it.

Reopen same browser window using selenium python and Firefox

Hey i'm trying to make an automatic program to send Whatsapp messages.
I'm currently using python, Firefox and selenium to achieve that.
The problem is that every time i'm calling driver.get(url) it opens a new instance of the firefox browser, blank with no memories of the last run. It makes me scan the bar code every time I run it.
from selenium import webdriver
from selenium.webdriver.firefox.webdriver import FirefoxProfile
cp_profile = webdriver.FirefoxProfile("/Users/Hodai/AppData/Roaming/Mozilla/Firefox/Profiles/v27qat5d.whatsapp_profile")
driver = webdriver.Firefox(executable_path="/Users/Hodai/Desktop/geckodriver",firefox_profile=cp_profile)
driver.get('http://web.whatsapp.com')
#Scan the code before proceeding further
input('Enter anything after scanning QR code')
I've tried to use profile but it seems like it has no affect.
cp_profile = webdriver.FirefoxProfile("/Users/Hodai/AppData/Roaming/Mozilla/Firefox/Profiles/v27qat5d.whatsapp_profile")
driver = webdriver.Firefox(executable_path="/Users/Hodai/Desktop/geckodriver",firefox_profile=cp_profile)

At the end I used chromedriver to achive my goal.
I tried cookies with pickle but it was a bit tricky because it remembered the cookies just for the same domain.
So I used user data for chrome.
now it works like a charm. thank you all.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.add_argument("user-data-dir=C:/Users/Designer1/AppData/Local/Google/Chrome/User Data/Profile 1")
driver = webdriver.Chrome(chrome_options=options,executable_path="C:\webdrivers\chromedriver.exe")

The easiest way I think is to save your cookies after scanned the qrcode and push them to Selenium manually.
# Load page to be able to set cookies
driver.get('http://web.whatsapp.com')
# Set saved cookies
cookies = {'name1': 'value1', 'name2', 'value2'}
for name in cookies:
driver.add_cookie({
'name': name,
'value': cookies[name],
})
# Load page using cookies
driver.get('http://web.whatsapp.com')
To get your cookies you can use the console (F12), Network tab, right click on the request, Copy => Copy Request Headers.

It should not be like that. It only opens the new window when initialized with new variable or the program starts again. Here is the code for chrome. It doesn't matter how many times you call driver.get(url) it would open the url in the same browser window
from selenium import webdriver
import selenium.webdriver.support.ui as ui
from selenium.webdriver.common.keys import Keys
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
import time
driver = webdriver.Chrome(executable_path=r"C:\new\chromedriver.exe")
driver.get('https://www.olx.com.pk/lahore/apple/q-iphone-6s/?search%5Bfilter_float_price%3Afrom%5D=40000&search%5Bfilter_float_price%3Ato%5D=55000')
time.sleep(10)
driver.get('https://www.olx.com.pk/lahore/apple/q-iphone-6s/?search%5Bfilter_float_price%3Afrom%5D=40000&search%5Bfilter_float_price%3Ato%5D=55000')
time.sleep(10)
driver.get('https://www.olx.com.pk/lahore/apple/q-iphone-6s/?search%5Bfilter_float_price%3Afrom%5D=40000&search%5Bfilter_float_price%3Ato%5D=55000')
time.sleep(10)
Let me know if the issue is resolved or you are trying to do something else.

Change user-agent for Selenium web-driver

I have the following code in Python:
from selenium.webdriver import Firefox
from contextlib import closing
with closing(Firefox()) as browser:
browser.get(url)
I would like to print the user-agent HTTP header and
possibly change it. Is it possible?

There is no way in Selenium to read the request or response headers. You could do it by instructing your browser to connect through a proxy that records this kind of information.
Setting the User Agent in Firefox
The usual way to change the user agent for Firefox is to set the variable "general.useragent.override" in your Firefox profile. Note that this is independent from Selenium.
You can direct Selenium to use a profile different from the default one, like this:
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_preference("general.useragent.override", "whatever you want")
driver = webdriver.Firefox(profile)
Setting the User Agent in Chrome
With Chrome, what you want to do is use the user-agent command line option. Again, this is not a Selenium thing. You can invoke Chrome at the command line with chrome --user-agent=foo to set the agent to the value foo.
With Selenium you set it like this:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
opts = Options()
opts.add_argument("user-agent=whatever you want")
driver = webdriver.Chrome(chrome_options=opts)
Both methods above were tested and found to work. I don't know about other browsers.
Getting the User Agent
Selenium does not have methods to query the user agent from an instance of WebDriver. Even in the case of Firefox, you cannot discover the default user agent by checking what general.useragent.override would be if not set to a custom value. (This setting does not exist before it is set to some value.)
Once the browser is started, however, you can get the user agent by executing:
agent = driver.execute_script("return navigator.userAgent")
The agent variable will contain the user agent.

To build on Louis's helpful answer...
Setting the User Agent in PhantomJS
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
...
caps = DesiredCapabilities.PHANTOMJS
caps["phantomjs.page.settings.userAgent"] = "whatever you want"
driver = webdriver.PhantomJS(desired_capabilities=caps)
The only minor issue is that, unlike for Firefox and Chrome, this does not return your custom setting:
driver.execute_script("return navigator.userAgent")
So, if anyone figures out how to do that in PhantomJS, please edit my answer or add a comment below! Cheers.

This is a short solution to change the request UserAgent on the fly.
Change UserAgent of a request with Chrome
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
driver = webdriver.Chrome(driver_path)
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent":"python 2.7", "platform":"Windows"})
driver.get('http://amiunique.org')
then return your useragent:
agent = driver.execute_script("return navigator.userAgent")
Some sources
The source code of webdriver.py from SeleniumHQ (https://github.com/SeleniumHQ/selenium/blob/11c25d75bd7ed22e6172d6a2a795a1d195fb0875/py/selenium/webdriver/chrome/webdriver.py) extends its functionalities through the Chrome Devtools Protocol
def execute_cdp_cmd(self, cmd, cmd_args):
"""
Execute Chrome Devtools Protocol command and get returned result
We can use the Chrome Devtools Protocol Viewer to list more extended functionalities (https://chromedevtools.github.io/devtools-protocol/tot/Network#method-setUserAgentOverride) as well as the parameters type to use.

Firefox Profile is deprecated, you have to use it in Firefox options like this:
opts = FirefoxOptions()
opts.add_argument("--headless")
opts.add_argument("--width=800")
opts.add_argument("--height=600")
opts.set_preference("general.useragent.override", "userAgent=Mozilla/5.0
(iPhone; CPU iPhone OS 15_4 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like
Gecko) CriOS/101.0.4951.44 Mobile/15E148 Safari/604.1")

To build on JJC's helpful answer that builds on Louis's helpful answer...
With PhantomJS 2.1.1-windows this line works:
driver.execute_script("return navigator.userAgent")
If it doesn't work, you can still get the user agent via the log (to build on Mma's answer):
from selenium import webdriver
import json
from fake_useragent import UserAgent
dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (UserAgent().random)
driver = webdriver.PhantomJS(executable_path=r"your_path", desired_capabilities=dcap)
har = json.loads(driver.get_log('har')[0]['message']) # get the log
print('user agent: ', har['log']['entries'][0]['request']['headers'][1]['value'])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Rotating proxies and user-agents with selenium chrome on python - python

Related

Specific web page doesn't load (empty page) HTML and CSS with Selenium?

Selenium closes browser after Python code is finished filling out form [duplicate]

Adding proxy support to Selenium multi session opening script in Python

Reopen same browser window using selenium python and Firefox

Change user-agent for Selenium web-driver

Categories

Resources