I'm trying to use a proxy with authentication on selenium.
I have seen a lot of examples like this one
from selenium import webdriver
PROXY = "23.23.23.23:3128" # IP:PORT or HOST:PORT
options = webdriver.ChromeOptions()
options.add_argument('--proxy-server=%s' % PROXY)
chrome = webdriver.Chrome('./chromedriver',options=options)
chrome.get("http://whatismyipaddress.com")
I'm not sure where I should enter the username and password or how, for the authentication.
There is any documentation about that?
As I have answered here.
If you need to use a proxy with no authentication (no username or password) with python and Selenium library with chromedriver you usually use the following code:
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--proxy-server=%s' % hostname + ":" + port)
driver = webdriver.Chrome(chrome_options=chrome_options)
It works fine unless proxy requires authentication. if the proxy requires you to log in with a username and password it will not work. In this case, you have to use more tricky solution that is explained below.
To set up proxy authentication we will generate a file and upload it to chromedriver dynamically using the following code below. This effectively create a chrome extension. This code configures selenium with chromedriver to use HTTP proxy that requires authentication with user/password pair.
import os
import zipfile
from selenium import webdriver
def create_chromedriver(PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS, USER_AGENT):
manifest_json = """
{
"version": "1.0.0",
"manifest_version": 2,
"name": "Chrome Proxy",
"permissions": [
"proxy",
"tabs",
"unlimitedStorage",
"storage",
"<all_urls>",
"webRequest",
"webRequestBlocking"
],
"background": {
"scripts": ["background.js"]
},
"minimum_chrome_version":"22.0.0"
}
"""
background_js = """
var config = {
mode: "fixed_servers",
rules: {
singleProxy: {
scheme: "http",
host: "%s",
port: parseInt(%s)
},
bypassList: ["localhost"]
}
};
chrome.proxy.settings.set({value: config, scope: "regular"}, function() {});
function callbackFn(details) {
return {
authCredentials: {
username: "%s",
password: "%s"
}
};
}
chrome.webRequest.onAuthRequired.addListener(
callbackFn,
{urls: ["<all_urls>"]},
['blocking']
);
""" % (PROXY_HOST, PROXY_PORT, PROXY_USER, PROXY_PASS)
def get_chromedriver(use_proxy=True, user_agent=USER_AGENT):
path = os.path.dirname(os.path.abspath(__file__))
chrome_options = webdriver.ChromeOptions()
if use_proxy:
pluginfile = 'proxy_auth_plugin.zip'
with zipfile.ZipFile(pluginfile, 'w') as zp:
zp.writestr("manifest.json", manifest_json)
zp.writestr("background.js", background_js)
chrome_options.add_extension(pluginfile)
if user_agent:
chrome_options.add_argument('--user-agent=%s' % USER_AGENT)
driver = webdriver.Chrome(
os.path.join(path, 'chromedriver'),
chrome_options=chrome_options)
return driver
driver = get_chromedriver(use_proxy=True)
# driver.get('https://www.google.com/search?q=my+ip+address')
driver.get('https://httpbin.org/ip')
user_agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
create_chromedriver('192.168.3.2', 8080, 'user', 'pass', user_agent)
Function get_chromedriver returns configured selenium web driver that you can use in your application. This code is tested and works just fine.
To implement this into your own code just call the function create_chromedriver with args of host, port, user, pass and user-agent. If you do not want to use proxy or user-agent just edit get_chromedriver accordingly so the args are false.
Related
Hoping an expert can help me with a Selenium/Cloudflare mystery. I can get a website to load in normal (non-headless) Selenium, but no matter what I try, I can't get it to load in headless.
I have followed the suggestions from the StackOverflow posts like Is there a version of Selenium WebDriver that is not detectable?. I've also looked at all the properties of window and window.navigator objects and fixed all the diffs between headless and non-headless, but somehow headless is still being detected. At this point I am extremely curious how Cloudflare could possibly figure out the difference. Thank you for the time!
List of the things I have tried:
User-agent
Replace cdc_ with another string in chromedriver
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled') (this was necessary to get website to load in non-headless)
Set navigator.webdriver = undefined
Set navigator.plugins, navigator.languages, and navigator.mimeTypes
Set window.ScreenY, window.screenTop, window.outerWidth, window.outerHeight to be nonzero
Set window.chrome and window.navigator.chrome
Set width and height of images to be nonzero
Set WebGL parameters
Fix Modernizr
Replicating the experiment
In order to get the website to load in normal (non-headless) Selenium, you have to follow a _blank link from another website (so that the target website opens in another tab). To replicate the experiment, first create an html file with the content link, and then paste the path to this html file in the following code.
The version below (non-headless) runs fine and loads the website, but if you set options.headless = True, it will get stuck on Cloudflare.
from selenium import webdriver
import time
# Replace this with the path to your html file
FULL_PATH_TO_HTML_FILE = 'file:///Users/simplepineapple/html/url_page.html'
def visit_website(browser):
browser.get(FULL_PATH_TO_HTML_FILE)
time.sleep(3)
links = browser.find_elements_by_xpath("//a[#href]")
links[0].click()
time.sleep(10)
# Switch webdriver focus to new tab so that we can extract html
tab_names = browser.window_handles
if len(tab_names) > 1:
browser.switch_to.window(tab_names[1])
time.sleep(1)
html = browser.page_source
print(html)
print()
print()
if 'Charts' in html:
print('Success')
else:
print('Fail')
time.sleep(10)
options = webdriver.ChromeOptions()
# If options.headless = True, the website will not load
options.headless = False
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument('user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36')
browser = webdriver.Chrome(options = options)
browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {
"source": '''
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
Object.defineProperty(navigator, 'plugins', {
get: function() { return {"0":{"0":{}},"1":{"0":{}},"2":{"0":{},"1":{}}}; }
});
Object.defineProperty(navigator, 'languages', {
get: () => ["en-US", "en"]
});
Object.defineProperty(navigator, 'mimeTypes', {
get: function() { return {"0":{},"1":{},"2":{},"3":{}}; }
});
window.screenY=23;
window.screenTop=23;
window.outerWidth=1337;
window.outerHeight=825;
window.chrome =
{
app: {
isInstalled: false,
},
webstore: {
onInstallStageChanged: {},
onDownloadProgress: {},
},
runtime: {
PlatformOs: {
MAC: 'mac',
WIN: 'win',
ANDROID: 'android',
CROS: 'cros',
LINUX: 'linux',
OPENBSD: 'openbsd',
},
PlatformArch: {
ARM: 'arm',
X86_32: 'x86-32',
X86_64: 'x86-64',
},
PlatformNaclArch: {
ARM: 'arm',
X86_32: 'x86-32',
X86_64: 'x86-64',
},
RequestUpdateCheckStatus: {
THROTTLED: 'throttled',
NO_UPDATE: 'no_update',
UPDATE_AVAILABLE: 'update_available',
},
OnInstalledReason: {
INSTALL: 'install',
UPDATE: 'update',
CHROME_UPDATE: 'chrome_update',
SHARED_MODULE_UPDATE: 'shared_module_update',
},
OnRestartRequiredReason: {
APP_UPDATE: 'app_update',
OS_UPDATE: 'os_update',
PERIODIC: 'periodic',
},
},
};
window.navigator.chrome =
{
app: {
isInstalled: false,
},
webstore: {
onInstallStageChanged: {},
onDownloadProgress: {},
},
runtime: {
PlatformOs: {
MAC: 'mac',
WIN: 'win',
ANDROID: 'android',
CROS: 'cros',
LINUX: 'linux',
OPENBSD: 'openbsd',
},
PlatformArch: {
ARM: 'arm',
X86_32: 'x86-32',
X86_64: 'x86-64',
},
PlatformNaclArch: {
ARM: 'arm',
X86_32: 'x86-32',
X86_64: 'x86-64',
},
RequestUpdateCheckStatus: {
THROTTLED: 'throttled',
NO_UPDATE: 'no_update',
UPDATE_AVAILABLE: 'update_available',
},
OnInstalledReason: {
INSTALL: 'install',
UPDATE: 'update',
CHROME_UPDATE: 'chrome_update',
SHARED_MODULE_UPDATE: 'shared_module_update',
},
OnRestartRequiredReason: {
APP_UPDATE: 'app_update',
OS_UPDATE: 'os_update',
PERIODIC: 'periodic',
},
},
};
['height', 'width'].forEach(property => {
const imageDescriptor = Object.getOwnPropertyDescriptor(HTMLImageElement.prototype, property);
// redefine the property with a patched descriptor
Object.defineProperty(HTMLImageElement.prototype, property, {
...imageDescriptor,
get: function() {
// return an arbitrary non-zero dimension if the image failed to load
if (this.complete && this.naturalHeight == 0) {
return 20;
}
return imageDescriptor.get.apply(this);
},
});
});
const getParameter = WebGLRenderingContext.getParameter;
WebGLRenderingContext.prototype.getParameter = function(parameter) {
if (parameter === 37445) {
return 'Intel Open Source Technology Center';
}
if (parameter === 37446) {
return 'Mesa DRI Intel(R) Ivybridge Mobile ';
}
return getParameter(parameter);
};
const elementDescriptor = Object.getOwnPropertyDescriptor(HTMLElement.prototype, 'offsetHeight');
Object.defineProperty(HTMLDivElement.prototype, 'offsetHeight', {
...elementDescriptor,
get: function() {
if (this.id === 'modernizr') {
return 1;
}
return elementDescriptor.get.apply(this);
},
});
'''
})
visit_website(browser)
browser.quit()
Using the latest Google Chrome v96.0 if you retrive the useragent
For the google-chrome browser the following user-agent is in use:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
Where as for google-chrome-headless browser the following user-agent is in use:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/96.0.4664.110 Safari/537.36
In majority of the cases the presence of the additional Headless string/parameter/attribute is intercepted as a bot and cloudflare blocks the access to the website.
Solution
There are different approaches to evade the Cloudflare detection even using Chrome in headless mode and some of the efficient approaches are as follows:
An efficient solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context. undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.
Code Block:
import undetected_chromedriver as uc
from selenium import webdriver
options = webdriver.ChromeOptions()
options.headless = True
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = uc.Chrome(options=options)
driver.get('https://bet365.com')
You can find a couple of relevant detailed discussions in:
Selenium app redirect to Cloudflare page when hosted on Heroku
Is there any possible ways to bypass cloudflare security checks?
The most efficient solution would be to use Selenium Stealth to initialize the Chrome Browsing Context. selenium-stealth is a python package to prevent detection. This programme tries to make python selenium more stealthy.
Code Block:
from selenium import webdriver
from selenium_stealth import stealth
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r"C:\path\to\chromedriver.exe")
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://bot.sannysoft.com/")
You can find a couple of relevant detailed discussions in:
Can a website detect when you are using Selenium with chromedriver?
How to automate login to a site which is detecting my attempts to login using
selenium-stealth
The cloudflare protection IUAM is used primary to avoid ddos attacks and for consequence it also protect sites from automation bot exploitation so no matter what you are using in the client side the cloudflare server is fingerprinting you.
After that they send to the client side the cf_clearance a cookie that allows you to connect for the next 15 minutes.
#undetected Selenium's answer works perfectly with https://github.com/diprajpatra/selenium-stealth
If you are using the latest version of selenium, you will need to change executable_path parameter as it's depreciated, example code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium_stealth import stealth
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=options)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://bot.sannysoft.com/")
print(driver.find_element(By.XPATH, "/html/body").text)
driver.close()
Hoping an expert can help me with a Selenium/Cloudflare mystery. I can get a website to load in normal (non-headless) Selenium, but no matter what I try, I can't get it to load in headless.
I have followed the suggestions from the StackOverflow posts like Is there a version of Selenium WebDriver that is not detectable?. I've also looked at all the properties of window and window.navigator objects and fixed all the diffs between headless and non-headless, but somehow headless is still being detected. At this point I am extremely curious how Cloudflare could possibly figure out the difference. Thank you for the time!
List of the things I have tried:
User-agent
Replace cdc_ with another string in chromedriver
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled') (this was necessary to get website to load in non-headless)
Set navigator.webdriver = undefined
Set navigator.plugins, navigator.languages, and navigator.mimeTypes
Set window.ScreenY, window.screenTop, window.outerWidth, window.outerHeight to be nonzero
Set window.chrome and window.navigator.chrome
Set width and height of images to be nonzero
Set WebGL parameters
Fix Modernizr
Replicating the experiment
In order to get the website to load in normal (non-headless) Selenium, you have to follow a _blank link from another website (so that the target website opens in another tab). To replicate the experiment, first create an html file with the content link, and then paste the path to this html file in the following code.
The version below (non-headless) runs fine and loads the website, but if you set options.headless = True, it will get stuck on Cloudflare.
from selenium import webdriver
import time
# Replace this with the path to your html file
FULL_PATH_TO_HTML_FILE = 'file:///Users/simplepineapple/html/url_page.html'
def visit_website(browser):
browser.get(FULL_PATH_TO_HTML_FILE)
time.sleep(3)
links = browser.find_elements_by_xpath("//a[#href]")
links[0].click()
time.sleep(10)
# Switch webdriver focus to new tab so that we can extract html
tab_names = browser.window_handles
if len(tab_names) > 1:
browser.switch_to.window(tab_names[1])
time.sleep(1)
html = browser.page_source
print(html)
print()
print()
if 'Charts' in html:
print('Success')
else:
print('Fail')
time.sleep(10)
options = webdriver.ChromeOptions()
# If options.headless = True, the website will not load
options.headless = False
options.add_argument("--window-size=1920,1080")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
options.add_argument('--disable-blink-features=AutomationControlled')
options.add_argument('user-agent=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36')
browser = webdriver.Chrome(options = options)
browser.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {
"source": '''
Object.defineProperty(navigator, 'webdriver', {
get: () => undefined
});
Object.defineProperty(navigator, 'plugins', {
get: function() { return {"0":{"0":{}},"1":{"0":{}},"2":{"0":{},"1":{}}}; }
});
Object.defineProperty(navigator, 'languages', {
get: () => ["en-US", "en"]
});
Object.defineProperty(navigator, 'mimeTypes', {
get: function() { return {"0":{},"1":{},"2":{},"3":{}}; }
});
window.screenY=23;
window.screenTop=23;
window.outerWidth=1337;
window.outerHeight=825;
window.chrome =
{
app: {
isInstalled: false,
},
webstore: {
onInstallStageChanged: {},
onDownloadProgress: {},
},
runtime: {
PlatformOs: {
MAC: 'mac',
WIN: 'win',
ANDROID: 'android',
CROS: 'cros',
LINUX: 'linux',
OPENBSD: 'openbsd',
},
PlatformArch: {
ARM: 'arm',
X86_32: 'x86-32',
X86_64: 'x86-64',
},
PlatformNaclArch: {
ARM: 'arm',
X86_32: 'x86-32',
X86_64: 'x86-64',
},
RequestUpdateCheckStatus: {
THROTTLED: 'throttled',
NO_UPDATE: 'no_update',
UPDATE_AVAILABLE: 'update_available',
},
OnInstalledReason: {
INSTALL: 'install',
UPDATE: 'update',
CHROME_UPDATE: 'chrome_update',
SHARED_MODULE_UPDATE: 'shared_module_update',
},
OnRestartRequiredReason: {
APP_UPDATE: 'app_update',
OS_UPDATE: 'os_update',
PERIODIC: 'periodic',
},
},
};
window.navigator.chrome =
{
app: {
isInstalled: false,
},
webstore: {
onInstallStageChanged: {},
onDownloadProgress: {},
},
runtime: {
PlatformOs: {
MAC: 'mac',
WIN: 'win',
ANDROID: 'android',
CROS: 'cros',
LINUX: 'linux',
OPENBSD: 'openbsd',
},
PlatformArch: {
ARM: 'arm',
X86_32: 'x86-32',
X86_64: 'x86-64',
},
PlatformNaclArch: {
ARM: 'arm',
X86_32: 'x86-32',
X86_64: 'x86-64',
},
RequestUpdateCheckStatus: {
THROTTLED: 'throttled',
NO_UPDATE: 'no_update',
UPDATE_AVAILABLE: 'update_available',
},
OnInstalledReason: {
INSTALL: 'install',
UPDATE: 'update',
CHROME_UPDATE: 'chrome_update',
SHARED_MODULE_UPDATE: 'shared_module_update',
},
OnRestartRequiredReason: {
APP_UPDATE: 'app_update',
OS_UPDATE: 'os_update',
PERIODIC: 'periodic',
},
},
};
['height', 'width'].forEach(property => {
const imageDescriptor = Object.getOwnPropertyDescriptor(HTMLImageElement.prototype, property);
// redefine the property with a patched descriptor
Object.defineProperty(HTMLImageElement.prototype, property, {
...imageDescriptor,
get: function() {
// return an arbitrary non-zero dimension if the image failed to load
if (this.complete && this.naturalHeight == 0) {
return 20;
}
return imageDescriptor.get.apply(this);
},
});
});
const getParameter = WebGLRenderingContext.getParameter;
WebGLRenderingContext.prototype.getParameter = function(parameter) {
if (parameter === 37445) {
return 'Intel Open Source Technology Center';
}
if (parameter === 37446) {
return 'Mesa DRI Intel(R) Ivybridge Mobile ';
}
return getParameter(parameter);
};
const elementDescriptor = Object.getOwnPropertyDescriptor(HTMLElement.prototype, 'offsetHeight');
Object.defineProperty(HTMLDivElement.prototype, 'offsetHeight', {
...elementDescriptor,
get: function() {
if (this.id === 'modernizr') {
return 1;
}
return elementDescriptor.get.apply(this);
},
});
'''
})
visit_website(browser)
browser.quit()
Using the latest Google Chrome v96.0 if you retrive the useragent
For the google-chrome browser the following user-agent is in use:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
Where as for google-chrome-headless browser the following user-agent is in use:
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/96.0.4664.110 Safari/537.36
In majority of the cases the presence of the additional Headless string/parameter/attribute is intercepted as a bot and cloudflare blocks the access to the website.
Solution
There are different approaches to evade the Cloudflare detection even using Chrome in headless mode and some of the efficient approaches are as follows:
An efficient solution would be to use the undetected-chromedriver to initialize the Chrome Browsing Context. undetected-chromedriver is an optimized Selenium Chromedriver patch which does not trigger anti-bot services like Distill Network / Imperva / DataDome / Botprotect.io. It automatically downloads the driver binary and patches it.
Code Block:
import undetected_chromedriver as uc
from selenium import webdriver
options = webdriver.ChromeOptions()
options.headless = True
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = uc.Chrome(options=options)
driver.get('https://bet365.com')
You can find a couple of relevant detailed discussions in:
Selenium app redirect to Cloudflare page when hosted on Heroku
Is there any possible ways to bypass cloudflare security checks?
The most efficient solution would be to use Selenium Stealth to initialize the Chrome Browsing Context. selenium-stealth is a python package to prevent detection. This programme tries to make python selenium more stealthy.
Code Block:
from selenium import webdriver
from selenium_stealth import stealth
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r"C:\path\to\chromedriver.exe")
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://bot.sannysoft.com/")
You can find a couple of relevant detailed discussions in:
Can a website detect when you are using Selenium with chromedriver?
How to automate login to a site which is detecting my attempts to login using
selenium-stealth
The cloudflare protection IUAM is used primary to avoid ddos attacks and for consequence it also protect sites from automation bot exploitation so no matter what you are using in the client side the cloudflare server is fingerprinting you.
After that they send to the client side the cf_clearance a cookie that allows you to connect for the next 15 minutes.
#undetected Selenium's answer works perfectly with https://github.com/diprajpatra/selenium-stealth
If you are using the latest version of selenium, you will need to change executable_path parameter as it's depreciated, example code:
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
from selenium_stealth import stealth
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--headless")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
s=Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=s, options=options)
stealth(driver,
languages=["en-US", "en"],
vendor="Google Inc.",
platform="Win32",
webgl_vendor="Intel Inc.",
renderer="Intel Iris OpenGL Engine",
fix_hairline=True,
)
driver.get("https://bot.sannysoft.com/")
print(driver.find_element(By.XPATH, "/html/body").text)
driver.close()
I am creating a script that logs onto Discord by using a token.
This is my code:
import requests
headers = {
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
with requests.Session() as s:
url = 'https://discord.com/login'
r = s.get(url, headers=headers)
print(r.content)
In order to actually login using a token I need to run the following code in the browser's developer console:
function login(token) {
setInterval(() => {
document.body.appendChild(document.createElement `iframe`).contentWindow.localStorage.token = `"${token}"`
}, 50);
setTimeout(() => {
location.reload();
}, 2500);
}
login ("[enter your token here]")
I know that this is possible in selenium but I really need to do it in requests.
<canvas class="word-cloud-canvas" id="word-cloud-canvas-1892" height="270" width="320"></canvas>
How to get the URL source of an image from HTML5 canvas using Selenium Python?
I tried to use
driver.execute_script("return arguments[0].toDataURL('image/png');", canvasElement)
But it only return the binary? of the image.
I don't want to save the image, but get the URL of the image. Is it possible?
I faced a similar issue and the only alternative I could find is to use subprocess and phantomjs
Here is the Python code
import json, subprocess
output = check_output(['phantomjs', 'getResources.js', main_url])
urls = json.loads(output)
for url in urls:
#filter and process URLs
and the Javascript file content
// getResources.js
// Usage:
// phantomjs getResources.js your_url
var page = require('webpage').create();
var system = require('system');
var urls = Array();
page.onResourceRequested = function(request, networkRequest) {
urls.push(request.url)
};
page.onLoadFinished = function(status) {
setTimeout(function() {
console.log(JSON.stringify(urls));
phantom.exit();
}, 16000);
};
page.onResourceError = function() {
return false;
}
page.onError = function() {
return false;
}
page.open(system.args[1]);
PhantomJS supports various options as well; for example to change the user agent you can use something like this:
page.settings.userAgent = 'Mozilla/5.0 (Windows NT 6.1; WOW64) ...';
This is a simplified version of this answer which I used for my issue.
I'm using my own resolver and would like to use urllib2 to just connect to the IP (no resolving in urllib2) and I would like set the HTTP Host-header myself. But urllib2 is just ignoring my Host-header:
txheaders = { 'User-Agent': UA, "Host: ": nohttp_url }
robots = urllib2.Request("http://" + ip + "/robots.txt", txdata, txheaders)
You have included ": " in the "Host" string.
txheaders = { "User-Agent": UA, "Host": nohttp_url }
robots = urllib2.Request("http://" + ip + "/robots.txt", txdata, txheaders)