This is my setup:
Raspberry Pi 3 Model B Plus Rev 1.3
Linux 4.19.66-v7+ (RaspbianGNU/Linux 9 (stretch))
Selenium 3.141.0
Browsermob-Proxy 2.1.4
Chromium 72.0.3626.121
ChromeDriver 72.0.3626.121
Python 3.5.3
I would like to record the network traffic when I visit an https page. So far, it actually works quite well. The problem is, the content of the packages that browsermob proxy records are encrypted.
Here my code
import pprint
import time
from selenium import webdriver
from pyvirtualdisplay import Display
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from browsermobproxy import Server
# Source: https://github.com/ArturSpirin/YouTube-WebDriver-Tutorials/blob/master/proxy/BmpProxy.py
class ProxyManger:
__BMP = "/usr/local/bin/browsermob-proxy-2.1.4/bin/browsermob-proxy"
def __init__(self):
self.__server = Server(ProxyManger.__BMP, options={'port': 8089})
self.__client = None
def start_server(self):
self.__server.start()
return self.__server
def start_client(self):
self.__client = self.__server.create_proxy(params={"trustAllServers": "true"})
return self.__client
#property
def client(self): return self.__client #property
def server(self):
return self.__server
# set virtual dispaly
display = Display(visible=0, size=(800, 600))
display.start()
# set browsermob-proxy
proxy = ProxyManger() server = proxy.start_server()
client = proxy.start_client()
client.new_har(url)
# set chrome options
opts = webdriver.ChromeOptions()
opts.add_argument("--proxy-server={}".format(client.proxy))
opts.add_argument("--disable-dev-shm-usage")
opts.add_argument("--no-sandbox")
opts.add_argument("--ignore-certificate-errors")
browser = webdriver.Chrome(options=opts)
browser.get(url)
time.sleep(10)
pprint.pprint(client.har)
browser.quit()
server.stop()
display.stop()
The code works quite well so far. I receive the packages i want.
The problem is the encrypted content. It is clear to me that the browsermob-proxy acts as a MITM and cannot read the contents of these packages due to the end-to-end encryption.
...
'content': {'comment': '',
'encoding': 'base64',
'mimeType': 'application/json',
'size': 10493,
'text': 'IUQHACBHdln10z6SWSgCD9DkLZ0OUL9H9+NwllhRXLaI+7nOI023mVdkr5uCJV115AeolXUwyJUgklGU8z/0tYu/n/iuQCnAQJIG8JwmwaOcwRRLTheZ8abRSDFM/gQTqc6nP03QiSiJ/ZuxVZTkH/6SKKpir/SsMAt5+RMiPU+eJ3fN+U8JBjguGdWoNCGCrSqOw9gBeKORKcY4Ek014310aXl3BUqBnJ01VqPyeaJQasKY1hxRkkYTfFGAefuYQ5pbF1588ghm1VDPrdoKB1lERMVl/j0Y2HWEt+tbdHYe3t9fCrtSN+5Nq++ejmp/pg9UUuyVF8FlWvJiA6YB'},
...
I run the Raspberry Pi headless. That means I only have access via ssh and no x. According to the github page of Browsermob-proxy, it is possible to add a certificate to my browser.
According to some internet research, this usually works in chrome via the GUI.
After doing some more research, I found this:
https://github.com/ThomasLeister/root-certificate-deployment
I ran linux-browser-import.sh, but unfortunately this had no effect on that.
Where is my mistake? Does someone have a solution to my problem? How is it possible to read packages decrypted from an ssl connection?
Is there any other method known how I can read xhr packages?
Thanks,
Mike
Related
My test environment is under a corporate proxy ("proxy.ptbc.std.com:2538").I want to open a particular video on YoTube for a period of time (eg 200 seconds) and capture the har file for each visit, the process is repeated several times for a massive test. I have tried different examples found here but the firefox / chrome browsers do not connect to the internet because they are behind the proxy.
How can run "python-selenium + browsermobproxy" behind a corporate proxy and capture the har file for each instance.
Example code:
from browsermobproxy import Server
server = Server("C:\\Utility\\browsermob-proxy-2.1.4\\bin\\browsermob-proxy")
server.start()
proxy = server.create_proxy()
from selenium import webdriver
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("google")
driver.get("http://www.google.co.in")
proxy.har # returns a HAR JSON blob
server.stop()
driver.quit()
Any help would be appreciated
According to browsermob-proxy documentation:
Sometimes you will want to route requests through an upstream proxy
server. In this case specify your proxy server by adding the httpProxy
parameter to your create proxy request:
[~]$ curl -X POST http://localhost:8080/proxy?httpProxy=yourproxyserver.com:8080
{"port":8081}
According to source code of browsermob-proxy API for Python
def create_proxy(self, params=None):
"""
Gets a client class that allow to set all the proxy details that you
may need to.
:param dict params: Dictionary where you can specify params
like httpProxy and httpsProxy
"""
params = params if params is not None else {}
client = Client(self.url[7:], params)
return client
So, everything you need is to specify params in create_proxy depending on what proxy you use (http or https):
from browsermobproxy import Server
from selenium import webdriver
import json
server = Server("C:\\Utility\\browsermob-proxy-2.1.4\\bin\\browsermob-proxy")
server.start()
# httpProxy or httpsProxy
proxy = server.create_proxy(params={'httpProxy': 'proxy.ptbc.std.com:2538'})
profile = webdriver.FirefoxProfile()
profile.set_proxy(proxy.selenium_proxy())
driver = webdriver.Firefox(firefox_profile=profile)
proxy.new_har("google")
driver.get("http://www.google.co.in")
result = json.dumps(proxy.har, ensure_ascii=False)
print(result)
server.stop()
driver.quit()
Here's the situation:
I have a .pac url as proxy. In Ubuntu, the proxy could be use as network proxy been set as automatic mode and fill the .pac url in Configuration URL.
When i use python to crawling from Google Image, the request to google won't work. So i use selenium's chrome webdriver to simulate uses's mouse & keyboard action and its work.
Then i add the '--headless' argument to increase the amount of concurrency, and i got a TimeoutException.
Then i download the .pac file and try to use "options.add_argument('--proxy-pac-url=xxx.pac')" to solve this problem, but the proxy still won't work.
And i got a solution which use a chrome extension called 'SwitchyOmega' to use .pac file proxy.
When i download the latest release from github and use "options.add_extension('xxx/SwitchyOmega_Chromium.crx')" to load the extension, and i got:"from unknown error: CRX verification failed: 3"
At last, i configure SwitchyOmega in chrome and use developer tools pack the local extension file to .crx and the extension was load correctly in webdriver. But i found the extension is unconfigured.
So how can i fix this proxy problem, thanks!
Here is my code:
class GoogleCrawler:
def __init__(self):
driver_executable = self.get_driver_executable()
options = webdriver.ChromeOptions()
options.add_argument('blink-settings=imagesEnabled=false')
# options.add_argument('--headless')
# options.add_argument('--proxy-pac-url=./xxx.pac')
# options.add_extension('./SwitchyOmega_Chromium.crx')
self.browser = webdriver.Chrome(driver_executable,
chrome_options=options)
self.driver_version_check()
def get_google_image_urls(self, keyword):
self.browser.get(f'https://www.google.com/search?q={keyword}&tbm=isch')
time.sleep(2)
img_urls = []
first_thumbnail_image_xpath = '//div[#data-ri="0"]'
image_xpath = '//div[#class="irc_c i8187 immersive-container"]//img[#class="irc_mi"]'
body_element = self.browser.find_element_by_tag_name('body')
wait = WebDriverWait(self.browser, 15)
first_thumbnail_image = wait.until(
element_to_be_clickable((By.XPATH, first_thumbnail_image_xpath)))
first_thumbnail_image.click()
scroll_flag = 0
last_scroll_distance = 0
while scroll_flag <= 50:
image_elements = self.browser.find_elements(By.XPATH, image_xpath)
img_urls.extend([
image_element.get_attribute('src')
for image_element in image_elements
])
body_element.send_keys(Keys.RIGHT)
scroll_distance = self.browser.execute_script(
'return window.pageYOffset;')
if scroll_distance == last_scroll_distance:
scroll_flag += 1
else:
last_scroll_distance = scroll_distance
scroll_flag = 0
self.browser.close()
img_urls = set(img_urls)
print(
f'[INFO]Scraping Image urls DONE: Keyword: {keyword}, Total: {len(img_urls)}'
)
return keyword, img_urls
Since headless Chrome doesn't support PAC files, and since it doesn't support Chrome Extensions, I don't think this were is way to make this work with PAC files for you.
Can you run your own proxy, with logic in that proxy, and pass that to the --proxy-server Chrome flag.
I will start by describing the infrastructure I am working within. It contains multiple proxy servers that uses a load balancer to forward user authentications to the appropriate proxy that are directly tied to an active directory. The authentication uses the credentials and source IP that was used to log into the computer the request is coming from. The server caches the IP and credentials for 60 minutes. I am using a test account specifically for this process and is only used on the unit testing server.
I am working on some automation with selenium webdriver on a remote server using a docker container. I am using python as the scripting language. I am trying to run tests on both internal and external webpages/applications. I was able to get a basic test on an internal website with the following script:
Note: 10.1.54.118 is the server hosting the docker container with the selenium web driver
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
browser = webdriver.Remote(command_executor='http://10.1.54.118:4444/wd/hub', desired_capabilities=DesiredCapabilities.CHROME)
browser.get("http://10.0.0.2")
print (browser.find_element_by_tag_name('body').text)
bodyText = browser.find_element_by_tag_name('body').text
print (bodyText)
if 'Hello' in bodyText:
print ('Found hello in body')
else:
print ('Hello not found in body')
browser.quit()
The script is able to access the internal webpage and print all the text on it.
However, I am experiencing problems trying to run test scripts against external websites.
I have tried the following articles and tutorials and it doesn't seem to work for me.
The articles and tutorials I have tried:
https://www.seleniumhq.org/docs/04_webdriver_advanced.jsp
Pass driver ChromeOptions and DesiredCapabilities?
https://www.programcreek.com/python/example/100023/selenium.webdriver.Remote
https://github.com/webdriverio/webdriverio/issues/324
https://www.programcreek.com/python/example/96010/selenium.webdriver.common.desired_capabilities.DesiredCapabilities.CHROME
Running Selenium Webdriver with a proxy in Python
how do i set proxy for chrome in python webdriver
https://docs.proxymesh.com/article/4-python-proxy-configuration
I have tried creating 4 versions of a script to access an external site i.e. google.com and simply print the text off of it. Every script returns a time out error. I apologize for posting a lot of code but maybe the community is able to see where I am going wrong with the coding aspect.
Code 1:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
PROXY = "10.32.51.169:3128" # IP:PORT or HOST:PORT
desired_capabilities = webdriver.DesiredCapabilities.CHROME.copy()
desired_capabilities['proxy'] = {
"httpProxy":PROXY,
"ftpProxy":PROXY,
"sslProxy":PROXY,
"socksUsername":"myusername",
"socksPassword":"mypassword",
"noProxy":None,
"proxyType":"MANUAL",
"class":"org.openqa.selenium.Proxy",
"autodetect":False
}
browser = webdriver.Remote('http://10.1.54.118:4444/wd/hub', desired_capabilities)
browser.get("https://www.google.com/")
print (browser.find_element_by_tag_name('body').text)
bodyText = browser.find_element_by_tag_name('body').text
print (bodyText)
if 'Hello' in bodyText:
print ('Found hello in body')
else:
print ('Hello not found in body')
browser.quit()
Is my code incorrect in any way? Am I able to pass configuration parameters to the docker chrome selenium webdriver or do I need to build the docker container with the proxy settings preconfigured before building it? I look forward to your replies and any help that can point me in the right direction.
A little late on this one, but a couple ideas + improvements:
Remove the user/pass from the socks proxy config and add them to your Proxy connection uri.
Use the selenium Proxy object to help abstract some of the other bits of the proxy capability.
Add the scheme to the proxy connection string.
Use a try/finally block to make sure the browser quits despite any failures
Note... I'm using Python3, selenium version 3.141.0, and I'm leaving out the FTP config for brevity/simplicity:
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
from selenium.webdriver.common.proxy import Proxy
# Note the addition of the scheme (http) and the user/pass into the connection string.
PROXY = 'http://myusername:mypassword#10.32.51.169:3128'
# Use the selenium Proxy object to add proxy capabilities
proxy_config = {'httpProxy': PROXY, 'sslProxy': PROXY}
proxy_object = Proxy(raw=proxy_config)
capabilities = DesiredCapabilities.CHROME.copy()
proxy_object.add_to_capabilities(capabilities)
browser = webdriver.Remote('http://10.1.54.118:4444/wd/hub', desired_capabilities=capabilities)
# Use try/finally so the browser quits even if there is an exception
try:
browser.get("https://www.google.com/")
print(browser.find_element_by_tag_name('body').text)
bodyText = browser.find_element_by_tag_name('body').text
print(bodyText)
if 'Hello' in bodyText:
print('Found hello in body')
else:
print('Hello not found in body')
finally:
browser.quit()
Now I am working on tests on python. I am using BrowserMob Proxy and Selenium to capture HTTP
requests.
robot_globals = {'proxy': None, 'selenium': None}
def robot_setup():
server = Server(settings.BROWSERMOB_PROXY_PATH, options={'port': settings.BROWSERMOB_PROXY_PORT})
server.start()
proxy = server.create_proxy()
proxy.selenium_proxy()
if settings.BROWSER_TO_TEST == 'FIREFOX':
from selenium.webdriver.firefox.webdriver import WebDriver
selenium = WebDriver(proxy=proxy, timeout=10)
elif settings.BROWSER_TO_TEST == 'CHROME':
from selenium.webdriver.chrome.webdriver import WebDriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
capabilities = DesiredCapabilities.CHROME
proxy.add_to_capabilities(capabilities)
selenium = WebDriver(executable_path=settings.CHROME_DRIVER_PATH,
desired_capabilities=capabilities,
service_log_path=settings.DRIVER_LOG_PATH)
selenium.maximize_window()
return (selenium, proxy)
class BaseGATestCase(unittest.TestCase):
def setUp(self):
self.proxy = robot_globals['proxy']
self.selenium = robot_globals['selenium']
....
class TestHomePage(BaseGATestCase):
def test_01_homepage_utme_vars(self):
self.proxy.new_har('home_page')
self.selenium.get('%s%s' % (settings.SERVER_URL_TO_TEST, '/'))
This code works correctly usually. But once or twice in month system launches the browser
but does not load the url in it. Browser just waits but the page does not get loaded at all.
However browser can load page without self.proxy.new_har('..'). The code likes this works:
class TestHomePage(BaseGATestCase):
def test_01_homepage_utme_vars(self):
self.selenium.get('%s%s' % (settings.SERVER_URL_TO_TEST, '/'))
server.log:
INFO 10/09 03:09:18 n.l.b.p.j.h.HttpSer~ - Version Jetty/5.1.x
INFO 10/09 03:09:18 n.l.b.p.j.u.Contain~ - Started HttpContext[/,/]
INFO 10/09 03:09:18 n.l.b.p.j.h.SocketL~ - Started SocketListener on 0.0.0.0:9159
INFO 10/09 03:09:18 n.l.b.p.j.u.Contain~ - Started net.lightbody.bmp.proxy.jetty.jetty.Server#6a1192e9
INFO 10/09 03:10:25 n.l.b.p.j.u.Threade~ - Stopping Acceptor ServerSocket[addr=0.0.0.0/0.0.0.0,localport=9154]
It is really weird for me because last time I couldn't fix this problem, but it was fixed itself the next day. I do not understand why. And now I have the same problem. It would be great if anyone knows how I can fix this problem. Thanks!
I had a test script that was working, and it stopped working 2 weeks ago. The test is to login to Hotmail, click on new mail, fill in email address, subject, and text in the body, and send the email. Currently I can't enter text into the body of the mail. I tried with ID, CSS, and Xpath. I also tried using the select frame but to no avail. I have attached the Python code and would appreciate help...
The aim of the script is to capture the traffic via Wireshark specifically for Hotmail send mail, with the current Hotmail protocol.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import Select
from selenium.common.exceptions import NoSuchElementException
import unittest, time, re
class HotmailloginpythonWebdriver(unittest.TestCase):
def setUp(self):
self.driver = webdriver.Firefox()
self.driver.implicitly_wait(30)
self.base_url = "https://login.live.com/"
self.verificationErrors = []
def test_hotmailloginpython_webdriver(self):
driver = self.driver
driver.get(self.base_url + "/login.srf?wa=wsignin1.0&rpsnv=11&ct=1321965448&rver=6.1.6206.0&wp=MBI&wreply=http:%2F%2Fmail.live.com%2Fdefault.aspx&lc=1033&id=64855&mkt=en-us&cbcxt=mai&snsc=1")
driver.find_element_by_id("i0116").clear()
driver.find_element_by_id("i0116").send_keys("address#hotmail.com")
driver.find_element_by_id("i0118").clear()
driver.find_element_by_id("i0118").send_keys("password")
driver.find_element_by_id("idSIButton9").click()
driver.find_element_by_id("h_inboxCount").click()
driver.find_element_by_id("NewMessage").click()
driver.find_element_by_id("AutoCompleteTo$InputBox").clear()
driver.find_element_by_id("AutoCompleteTo$InputBox").send_keys("address#hotmail.com")
driver.find_element_by_id("fSubject").clear()
driver.find_element_by_id("fSubject").send_keys("testsubject")
driver.find_element_by_css_selector("body..RichText").clear()
driver.find_element_by_css_selector("body..RichText").send_keys("gggggggggggg")
driver.find_element_by_id("SendMessage").click()
driver.find_element_by_id("c_signout").click()
def is_element_present(self, how, what):
try:
self.driver.find_element(by=how, value=what)
except NoSuchElementException, e:
return False
return True
def tearDown(self):
self.driver.quit()
self.assertEqual([], self.verificationErrors)
if __name__ == "__main__":
unittest.main()
It is very much possible that Microsoft is blocking the automated service(like Selenium) which tries to access the Hotmail or live.com page. According to the Terms of Service( TOS) at Microsoft you can use automates service to login etc. Here is what TOS (Point Number#2) says:
You must not use the service to harm others or the service. For example, you must not use the service to harm, threaten, or harass another person, organization, or Microsoft. You must not: damage, disable, overburden, or impair the service (or any network connected to the service); resell or redistribute the service or any part of it; use any unauthorized means to modify, reroute, or gain access to the service or attempt to carry out these activities; or use any automated process or service (such as a bot, a spider, periodic caching of information stored by Microsoft, or metasearching) to access or use the service.
Full text is available here: http://windows.microsoft.com/en-US/windows-live/microsoft-service-agreement.
I had similar experience myself once testing something with Twitter UI. Maybe you can look for a third party service that can help you login via SMTP or POP3 etc. to measure network traffic instead of using frontend UI.
I suspect this has something to do with cookies. Maybe you removed the cookies from your browser?
Try debugging the script until the password typing or until
driver.find_element_by_id("idSIButton9").click()
to see if it works fine. Perhaps MS changed their UI so it would be nice to debug your app from that point to see if you have to modify your script to update object id's.
Regards.
Try to use Xpath not id. In xpath you can use following-sibling.It will work.
System.setProperty("webdriver.chrome.driver",
"F:\\batch230\\chromedriver.exe");
WebDriver driver = new ChromeDriver();
//open hotmail site
driver.get("http://www.hotmail.com/");
Thread.sleep(5000);
driver.manage().window().maximize();
Thread.sleep(5000);
//do login
driver.switchTo().activeElement().sendKeys("mail id");
driver.findElement(By.id("idSIButton9")).click();
Thread.sleep(5000);
driver.switchTo().activeElement().sendKeys("password");
driver.findElement(By.id("idSIButton9")).click();
Thread.sleep(5000);
//compose mail
driver.findElement(By.xpath("//*[contains(#title,'new message')]")).click();
Thread.sleep(5000);
driver.findElement(By.xpath("(//*[#role='textbox'])[1]"))
.sendKeys("er.anil900#gmail.com",Keys.TAB,"selenium"
,Keys.TAB,"Hi",Keys.ENTER,"How are you");
Thread.sleep(5000);
//send mail
driver.findElement(By.xpath("(//*[#title='Send'])[1]")).click();
Thread.sleep(10000);
//do logout
WebElement e = driver.findElement(By.xpath("(//*[#role='menuitem'])[11]"));
Actions a = new Actions(driver);
a.click(e).build().perform();
Thread.sleep(5000);
WebElement e1 = driver.findElement(By.xpath("//*[text()='Sign out']"));
a.click(e1).build().perform();
Thread.sleep(10000);
driver.close();