I'm try to test a sample in Selenium using Python. I'm using a proxy server to my connection to Internet with authentication. When I try to run the following code :
from selenium import webdriver
if __name__ == '__main__':
proxy = "some_IP"
port = 3128
fp = webdriver.FirefoxProfile()
fp.set_preference('network.proxy.ssl_port', int(port))
fp.set_preference('network.proxy.ssl', proxy)
fp.set_preference('network.proxy.http_port', int(port))
fp.set_preference('network.proxy.http', proxy)
fp.set_preference('network.proxy.ftp', proxy)
fp.set_preference('network.proxy.ftp_port', int(port))
fp.set_preference('network.proxy.socks', proxy)
fp.set_preference('network.proxy.socks_port', int(port))
fp.set_preference('network.proxy.type', 1)
browser = webdriver.Firefox(firefox_profile=fp)
browser.set_page_load_timeout(15)
browser.get('http://www.google.com')
print browser.title
The Firefox browser open without any problem and in its proxy configuration all it's ok, even the pop-up of authentication is opened. If I authenticate myself I can navigate without any problem. The problem is that behind of this I get the following errors :
Traceback (most recent call last):
File "D:/_Vkt0r/iStuffs/Jobs/Projects/test-proxy/test.py", line 25, in <module>
browser = webdriver.Firefox(firefox_profile=fp)
File "D:\_Vkt0r\iStuffs\Jobs\Projects\test-proxy\selenium\webdriver\firefox
\webdriver.py", line 62, in __init__ desired_capabilities=capabilities)
File "D:\_Vkt0r\iStuffs\Jobs\Projects\test-proxy\selenium\webdriver\remote
\webdriver.py", line 72, in __init__ self.start_session(desired_capabilities,
browser_profile)
File "D:\_Vkt0r\iStuffs\Jobs\Projects\test-proxy\selenium\webdriver\remote
\webdriver.py", line 114, in start_session 'desiredCapabilities':
desired_capabilities,
File "D:\_Vkt0r\iStuffs\Jobs\Projects\test-proxy\selenium\webdriver\remote
\webdriver.py", line 165, in execute
self.error_handler.check_response(response)
File "D:\_Vkt0r\iStuffs\Jobs\Projects\test-proxy\selenium\webdriver\remote
\errorhandler.py", line 136, in check_response raise exception_class(value)
selenium.common.exceptions.WebDriverException: Message: '<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>ERROR: Acceso Denegado a la Cach\xc3\xa9</title> <style type="text/css"><!-- /*\n Stylesheet for Squid Error pages\n Adapted from design by Free CSS Templates\n http://www.freecsstemplates.org\n Released for free under a Creative Commons Attribution 2.5 License\n*/\n\n/* Page basics */\n* {\n\tfont-family: verdana, sans-serif;\n}\n\nhtml body {\n\tmargin: 0;\n\tpadding: 0;\n\tbackground: #efefef;\n\tfont-size: 12px;\n\tcolor: #1e1e1e;\n}\n\n/* Page displayed title area */\n#titles {\n\tmargin-left: 15px;\n\tpadding: 10px;\n\tpadding-left: 100px;\n\tbackground: url(\'http://www.squid-cache.org/Artwork/SN.png\') no-repeat left;\n}\n\n/* initial title */\n#titles h1 {\n\tcolor: #000000;\n}\n#titles h2 {\n\tcolor: #000000;\n}\n\n/* special event: FTP success page titles */\n#titles ftpsuccess {\n\tbackground-color:#00ff00;\n\twidth:100%;\n}\n\n/* Page displayed body content area */\n#content {\n\tpadding: 10px;\n\tbackground: #ffffff;\n}\n\n/* General text */\np {\n}\n\n/* error brief description */\n#error p {\n}\n\n/* some data which may have caused the problem */\n#data {\n}\n\n/* the error message received from the system or other software */\n#sysmsg {\n}\n\npre {\n font-family:sans-serif;\n}\n\n/* special event: FTP / Gopher directory listing */\n#dirmsg {\n font-family: courier;\n color: black;\n font-size: 10pt;\n}\n#dirlisting {\n margin-left: 2%;\n margin-right: 2%;\n}\n#dirlisting tr.entry td.icon,td.filename,td.size,td.date {\n border-bottom: groove;\n}\n#dirlisting td.size {\n width: 50px;\n text-align: right;\n padding-right: 5px;\n}\n\n/* horizontal lines */\nhr {\n\tmargin: 0;\n}\n\n/* page displayed footer area */\n#footer {\n\tfont-size: 9px;\n\tpadding-left: 10px;\n}\n body :lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; } :lang(he) { direction: rtl; } --></style> </head><body id=ERR_CACHE_ACCESS_DENIED> <div id="titles"> <h1>ERROR</h1> <h2>Cache Acceso Denegado</h2> </div> <hr> <div id="content"> <p>Se encontr\xc3\xb3 el siguiente error al intentar recuperar la direcci\xc3\xb3n URL: http://127.0.0.1:12233/hub/session</p> <blockquote id="error"> <p><b>Acceso Denegado a la Cach\xc3\xa9</b></p> </blockquote> <p>Lo lamento, tu no est\xc3\xa1s autorizado a solicitar http://127.0.0.1:12233/hub/session de este cach\xc3\xa9 hasta que te hayas autenticado.</p> <p>Please contact the cache administrator if you have difficulties authenticating yourself.</p> <br> </div> <hr> <div id="footer"> <p>Generado Tue, 05 Nov 2013 19:44:22 GMT por squid.proxy (squid/3.1.19)</p> <!-- ERR_CACHE_ACCESS_DENIED --> </div> </body></html> '
I'm working with selenium 2.34 and Firefox 17. Any help is appreciated.
After 4 days finding a solution to this problem, finally I find it. The problem is with the browser exclusions, I means the tab in proxy configuration in any browser. You need to put in any browser you have in the exclusions the two address:
localhost and 127.0.0.1
This is very important because if one left this cause problem with Selenium because it try to connect to a address like two mentioned above.
Related
I have successfully used the google api sandbox and been able to create posts on my blogger website, that was by using it through HTTP post requests.
POST https://blogger.googleapis.com/v3/blogs/206150456/posts?key=[YOUR_API_KEY] HTTP/1.1
Authorization: Bearer [YOUR_ACCESS_TOKEN]
Accept: application/json
Content-Type: application/json
{
"title": "Test",
"content": "Hello World Test"
}
Basically I want to convert the above code into Python code.
Attempts :
my code so far is
payload = '{"title": "A new post", "content": "With <b>exciting</b> content..."}'
r = requests.post("https://www.googleapis.com/blogger/v3/blogs/" + blogid + "/posts/" + auth + "/application/json/" + payload)
And I get the response
b'<!DOCTYPE html>\n<html lang=en>\n <meta charset=utf-8>\n <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">\n **<title>Error 404** (Not Found)!!1</title>\n <style>\n *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}#media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}#media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}#media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}\n </style>\n <a href=//www.google.com/><span id=logo aria-label=Google></span></a>\n <p><b>404.</b> <ins>That\xe2\x80\x99s an error.</ins>\n <p>The requested URL <code>/blogger/v3/blogs/2061504564313173903/posts/48738461363-jcoqroi84fk9q81nfsdcsvup6ve7hj75.apps.googleusercontent.com/application/json/%7B%22title%22:%20%22A%20new%20post%22,%20%22content%22:%20%22With%20%3Cb%3Eexciting%3C/b%3E%20content...%22%7D</code> was not found on this server. <ins>That\xe2\x80\x99s all we know.</ins>\n'
Its not a authorisation error but more like a formatting error ? = Error 404
I was expecting to see a new post generated in on my blog
I have a flask application with a few custom built tools. I'm trying to bring in some other tools into that flask application to have a single place for everything. One of those tools is MicroStrategy. I'm rendering a template and the MicroStrategy login page is working, but when I log in, it just kicks me back to the login page. When I look at the request, there are two Set-Cookie's in the header with errors.
Is it possible to do what I'm trying to do? A way to read the headers from the MicroStrategy page in the iframe and modify SameSite=None?
Here is my flask app:
#dash_app.server.route("/mstr")
def mstr():
resp = make_response(render_template("mstr.html"))
return resp
mstr.html:
<div style="position:fixed; width:100%; top:50px; left:0px; right:0px; bottom:0px; z-index:1;">
<iframe src="https://webserver.com/MicroStrategy/asp/Main.aspx" title="MicroStrategy" style="width:100%; height:100%; border:none; margin:0; padding:0; overflow:hidden;"></iframe>
</div>
I am testing how to use Selenium in python, and successfully open a page via this below code in Ubuntu 16.04:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
firefox_options = Options()
firefox_options.binary_location = '/usr/bin/firefox'
driver= webdriver.Firefox(executable_path='/home/myname/geckodriver',firefox_options=firefox_options)
driver.get('https://www.toutiao.com')
However, some data/contents are missing, comparing to open this page('https://www.toutiao.com') manually.
My Firefox version is '72.0.2' and geckodriver version is'0.26.0'. Could anybody help me on this issue please? Thanks in Advance!
I took your code, simplified the script and while execution I have encountered the similar issue i.e. the data/contents are missing comparing to open this page as follows:
Code Block:
from selenium import webdriver
driver = webdriver.Firefox(executable_path=r'C:\Utility\BrowserDrivers\geckodriver.exe')
driver.get('https://www.toutiao.com')
print(driver.page_source)
Console Output:
<html><head><style class="vjs-styles-defaults">
.video-js {
width: 300px;
height: 150px;
}
.vjs-fluid {
padding-top: 56.25%
}
</style><meta charset="utf-8"><title>????</title><meta http-equiv="x-dns-prefetch-control" content="on"><meta name="renderer" content="webkit"><link rel="dns-prefetch" href="//s3.pstatp.com/"><link rel="dns-prefetch" href="//s3a.pstatp.com/"><link rel="dns-prefetch" href="//s3b.pstatp.com"><link rel="dns-prefetch" href="//p1.pstatp.com/"><link rel="dns-prefetch" href="//p3.pstatp.com/"><meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests"><meta http-equiv="Content-Type" content="text/html; charset=utf-8"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"><meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1,minimum-scale=1,user-scalable=no,minimal-ui"><meta name="360-site-verification" content="b96e1758dfc9156a410a4fb9520c5956"><meta name="360_ssp_verify" content="2ae4ad39552c45425bddb738efda3dbb"><meta name="google-site-verification" content="3PYTTW0s7IAfkReV8wAECfjIdKY-bQeSkVTyJNZpBKE"><meta name="shenma-site-verification" content="34c05607e2a9430ad4249ed48faaf7cb_1432711730"><meta name="baidu_union_verify" content="b88dd3920f970845bad8ad9f90d687f7"><meta name="domain_verify" content="pmrgi33nmfuw4ir2ej2g65lunfqw6ltdn5wselbcm52wszbchirdqyztge3tenrsgq3dknjume2tayrvmqytemlfmiydimddgu4gcnzcfqrhi2lnmvjwc5tfei5dcnbwhazdcobuhe2dqobrpu"><meta name="keywords" content="????,??,???,????,??????"><meta name="description" content="«????»(www.toutiao.com)????????????????,?????????????????,?????????????,??????????????????????"><link rel="alternate" media="only screen and (max-width: 640px)" href="//m.toutiao.com/"><link rel="shortcut icon" href="//s3a.pstatp.com/toutiao/resource/ntoutiao_web/static/image/favicon_5995b44.ico" type="image/x-icon"><link rel="stylesheet" href="//s3.pstatp.com/toutiao/player/dist/pc_vue2.css" media="screen" title="no title"><!--[if lt IE 9]>
<p>?????????,??????</p>
.
.
.
<script>var imgUrl = '/c/9ubkblw9out4h9t6ya05r7h0uu7q2u341jhsdh7l4r4yphpuxlqgdm/';</script><script>tac='i+2gv2ch1tigds!i$1dmgs"yZl!%s"l"u&kLs#l l#vr*charCodeAtx0[!cb^i$1em7b*0d#>>>s j\uffeel s#0,<8~z|\x7f#QGNCJF[\\^D\\KFYSk~^WSZhg,(lfi~ah`{md"inb|1d<,%Dscafgd"in,8[xtm}nLzNEGQMKAdGG^NTY\x1ckgd"inb<b|1d<g,&TboLr{m,(\x02)!jx-2n&vr$testxg,%#tug{mn ,%vrfkbm[!cb|'</script><script type="text/javascript" crossorigin="anonymous" src="//s3b.pstatp.com/toutiao/static/js/vendor.63b66d4280309ac2fb48.js"></script><script type="text/javascript" crossorigin="anonymous" src="//s3a.pstatp.com/toutiao/static/js/page/index_node/index.e6afc60a3a3f653cfdba.js"></script><script type="text/javascript" crossorigin="anonymous" src="//s3b.pstatp.com/toutiao/static/js/ttstatistics.a083f6cd9b1a9a970725.js"></script><script src="//s3.pstatp.com/inapp/lib/raven.js" crossorigin="anonymous"></script><script>;(function(window) {
// sentry
window.Raven && Raven.config('//key#m.toutiao.com/log/sentry/v2/96', {
whitelistUrls: [/pstatp\.com/],
shouldSendCallback: function(data) {
var ua = navigator && navigator.userAgent;
var isDeviceOK = !/Mobile|Linux/i.test(navigator.userAgent);
return isDeviceOK;
},
tags: {
bid: 'toutiao_pc',
pid: 'index_new'
},
autoBreadcrumbs: {
'xhr': false,
'console': true,
'dom': true,
'location': true
}
}).install();
})(window);</script><script>document.getElementsByTagName('body')[0].addEventListener('click', function(e) {
var target = e.target,
ga_event,
ga_category,
ga_label,
ga_value;
while(target && target.nodeName.toUpperCase() !== 'BODY') {
ga_event = target.getAttribute('ga_event');
ga_category = target.getAttribute('ga_category') || '/';
ga_label = target.getAttribute('ga_label') || '';
ga_value = target.getAttribute('ga_value') || 1;
ga_event && window.ttAnalysis && ttAnalysis.send('event', { ev: ga_event });
target = target.parentNode;
}
});</script><script src="https://xxbg.snssdk.com/websdk/v1/getInfo?q=YOsueEs6CjZquUQrQwttBa2p27c%2FmJBGcEmZKypwf%2Fh%2B%2FFzCVrIwzk9L3bo%2FZb2O8gVTNaA4L2Bk10qWfZ2s94e6qe8KRXlOEjnI%2FrONB4jQynV3bfJ9exD2E4QPsgydRGjRLlDXE9uYD7HU3IZ%2FOU2MJG2vMgfNU55%2FmsOAlVSrPQH2wo4Eor0lgghKHjRi28vVvBdKY7JT4gG7S7ThRFD2YBIc%2Fs4JYViQu1Ll1Bg5Xn5bKuD6jZRz3AzfFqzSOWguO6vUbzL0wBc4mpa22mdpmAXIvUNWtjg5MUfXh9rfWI0ti7saL%2B0r4%2BaBdN5y4lrmxAcQZq2oeAKl4WjOeJsN%2BePpYmisoxTzdBZ6TL8IGE0E7ZUUlFlPGyUWhU3E4IRbtbCCd0QdVaJajiSOIhg9cImqTZYI56kIao1yVnV%2Bxu4%2BhaC1kHu5xsk49%2BX%2FNdwGcel%2BlOUzagkE5s8X6jEswA7jzW%2ByD6%2FusfkNyyx8WOWCJmZlTGQ4SNQr%2FQHvmK2QscQ7KnTvKVqjedUd7IFcvyTyYz3iFFrmRkOMRN9042sLiQwerXsn0f%2Fc%2Bh46PNdeU1S6BsFKq%2BZhMDxw1vI2Y1C%2Fa0RBdZC%2BGZq%2BkbNaoVotfvslg05ahevHTainlZR9DHEiWawFBJbTwjMeYrmo4NZiL5eNBUvslFn%2BDPHk%2F6Oj0Nbb89Rx8Ihi2pRH04voRog9848H2o2LR9gx0N0i0o6%3D&callback=_8712_1581940674310"></script></body></html>
Analysis
While inspecting the DOM Tree of the webpage you will find that some of the <script> tag refers to JavaScripts having keyword dist. As an example:
<link rel="stylesheet" href="//s3.pstatp.com/toutiao/player/dist/pc_vue2.css" media="screen" title="no title">
<script src="//unpkg.pstatp.com/byted/sec_sdk_build/1.1.12/dist/captcha.js"></script>
//s3a.pstatp.com/toutiao/picc_mig/dist/img.min.js
Which is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.
Distil
As per the article There Really Is Something About Distil.it...:
Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.
Further,
"One pattern with **Selenium** was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".
Reference
You can find a couple of detailed discussion in:
Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?
Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
Akamai Bot Manager detects WebDriver driven Chrome Browsing Context
Is there a version of selenium webdriver that is not detectable?
I am new to Python (Selenium, Scrapy, etc.) & Web-Scraping in general, but I am pretty familiar with other languages such as Java , so please forgive me if I am missing something very simple!
My end goal is to visit a page, sit there for around 10 seconds and then close the browser and repeat. However, I am trying to practice rotating my IP address via proxy with each request. I have been able to accomplish visiting the page but when I try to throw the rotating Proxy in the mix, I get a long connection error that I can't seem to figure out that seems to include a bunch of CSS.
Complete Code Snippet
The issue seems to be caused by the second line in the try-block where the driver is trying to access the website
import scrapy
import requests
from selenium import webdriver
from selenium.webdriver.common.proxy import Proxy, ProxyType
from scrapy.http import Request
from lxml.html import fromstring
from itertools import cycle
class VisitPageSpider(scrapy.Spider):
name = 'visitpage'
allowed_domains = ['books.toscrape.com']
def start_requests(self):
test_url = 'http://books.toscrape.com'
proxies = self.get_proxies()
proxy_pool = cycle(proxies)
prox = Proxy()
prox.proxy_type = ProxyType.MANUAL
view_count = 0
url = 'https://httpbin.org/ip'
for i in range(1, 11):
proxy = next(proxy_pool)
prox.http_proxy = proxy
prox.socks_proxy = proxy
prox.ssl_proxy = proxy
capabilities = webdriver.DesiredCapabilities.INTERNETEXPLORER
prox.add_to_capabilities(capabilities)
print("Request #%d" % i)
try:
self.driver = webdriver.Ie(desired_capabilities=capabilities)
self.driver.get(test_url)
view_count += 1
time.sleep(10)
self.driver.quit()
except:
print("Skipping. Connection error")
print('Total New Views ' + view_count)
yield Request(test_url, callback=self.visit_page)
def visit_page(self, response):
pass
def get_proxies(self):
url = 'https://free-proxy-list.net/'
response = requests.get(url)
parser = fromstring(response.text)
proxies = set()
for i in parser.xpath('//tbody/tr')[:10]:
if i.xpath('.//td[7][contains(text(),"yes")]'):
proxy = ":".join([i.xpath('.//td[1]/text()')[0], i.xpath('.//td[2]/text()')[0]])
proxies.add(proxy)
print(proxies)
return proxies
CMD Output
For the first 2 lines in the try block respectively
2018-07-26 18:19:21 [selenium.webdriver.remote.remote_connection] DEBUG: POST http://127.0.0.1:52898/session {"capabilities": {"firstMatch": [{}], "alwaysMatch": {"browserName": "internet explorer", "platformName": "windows", "proxy": {"proxyType": "manual", "httpProxy": "46.227.162.167:8080", "sslProxy": "46.227.162.167:8080", "socksProxy": "46.227.162.167:8080"}}}, "desiredCapabilities": {"browserName": "internet explorer", "version": "", "platform": "WINDOWS", "proxy": {"proxyType": "MANUAL", "httpProxy": "46.227.162.167:8080", "sslProxy": "46.227.162.167:8080", "socksProxy": "46.227.162.167:8080"}}}
2018-07-26 18:19:21 [selenium.webdriver.remote.remote_connection] DEBUG: b'<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">\n<html><head>\n<meta type="copyright" content="Copyright (C) 1996-2015 The Squid Software Foundation and contributors">\n<meta http-equiv="Content-Type" CONTENT="text/html; charset=utf-8">\n<title>ERROR: The requested URL could not be retrieved</title>\n<style type="text/css"><!-- \n /*\n * Copyright (C) 1996-2016 The Squid Software Foundation and contributors\n *\n * Squid software is distributed under GPLv2+ license and includes\n * contributions from numerous individuals and organizations.\n * Please see the COPYING and CONTRIBUTORS files for details.\n */\n\n/*\n Stylesheet for Squid Error pages\n Adapted from design by Free CSS Templates\n http://www.freecsstemplates.org\n Released for free under a Creative Commons Attribution 2.5 License\n*/\n\n/* Page basics */\n* {\n\tfont-family: verdana, sans-serif;\n}\n\nhtml body {\n\tmargin: 0;\n\tpadding: 0;\n\tbackground: #efefef;\n\tfont-size: 12px;\n\tcolor: #1e1e1e;\n}\n\n/* Page displayed title area */\n#titles {\n\tmargin-left: 15px;\n\tpadding: 10px;\n\tpadding-left: 100px;\n\tbackground: url(\'/squid-internal-static/icons/SN.png\') no-repeat left;\n}\n\n/* initial title */\n#titles h1 {\n\tcolor: #000000;\n}\n#titles h2 {\n\tcolor: #000000;\n}\n\n/* special event: FTP success page titles */\n#titles ftpsuccess {\n\tbackground-color:#00ff00;\n\twidth:100%;\n}\n\n/* Page displayed body content area */\n#content {\n\tpadding: 10px;\n\tbackground: #ffffff;\n}\n\n/* General text */\np {\n}\n\n/* error brief description */\n#error p {\n}\n\n/* some data which may have caused the problem */\n#data {\n}\n\n/* the error message received from the system or other software */\n#sysmsg {\n}\n\npre {\n font-family:sans-serif;\n}\n\n/* special event: FTP / Gopher directory listing */\n#dirmsg {\n font-family: courier;\n color: black;\n font-size: 10pt;\n}\n#dirlisting {\n margin-left: 2%;\n margin-right: 2%;\n}\n#dirlisting tr.entry td.icon,td.filename,td.size,td.date {\n border-bottom: groove;\n}\n#dirlisting td.size {\n width: 50px;\n text-align: right;\n padding-right: 5px;\n}\n\n/* horizontal lines */\nhr {\n\tmargin: 0;\n}\n\n/* page displayed footer area */\n#footer {\n\tfont-size: 9px;\n\tpadding-left: 10px;\n}\n\n\nbody\n:lang(fa) { direction: rtl; font-size: 100%; font-family: Tahoma, Roya, sans-serif; float: right; }\n:lang(he) { direction: rtl; }\n --></style>\n</head><body id=ERR_CONNECT_FAIL>\n<div id="titles">\n<h1>ERROR</h1>\n<h2>The requested URL could not be retrieved</h2>\n</div>\n<hr>\n\n<div id="content">\n<p>The following error was encountered while trying to retrieve the URL: http://127.0.0.1:52898/session</p>\n\n<blockquote id="error">\n<p><b>Connection to 127.0.0.1 failed.</b></p>\n</blockquote>\n\n<p id="sysmsg">The system returned: <i>(111) Connection refused</i></p>\n\n<p>The remote host or network may be down. Please try the request again.</p>\n\n<p>Your cache administrator is webmaster.</p>\n\n<br>\n</div>\n\n<hr>\n<div id="footer">\n<p>Generated Fri, 27 Jul 2018 04:19:20 GMT by vps188962 (squid/3.5.23)</p>\n<!-- ERR_CONNECT_FAIL -->\n</div>\n</body></html>\n'
My guess is that this is a problem with these proxies. Free proxies are often unreliable (in my experience - very often) and you must be prepared for them to yield anything realistically possible - errors, timeouts or even mangled responses. The second line of your log seems like a generic response from squid proxy software indicating a proxy error in this case.
I'm trying to enter text into a login page. The login page is:
https://ppair.uspto.gov/TruePassSample/AuthenticateUserLocalEPF.html
using "inspect elements" in Internet Explorer (the website only load in Internet Explorer) it seems to me that the name for the "select Digital certificate" text field is: "username"
This is my script:
import os
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# Digital Certificate Path
DigitalCertificateFolder = 'C:\FOLDER'
DigitalCertificateFile = 'FILE.epf'
DigitalCertificatePath = DigitalCertificateFolder + '\\' + DigitalCertificateFile
password = 'PASSWORD'
# get the path of IEDriverServer
dir = 'C:\FOLDER2'
ie_driver_path = dir + "\IEDriverServer.exe"
# create a new Internet Explorer session
driver = webdriver.Ie(ie_driver_path)
driver.implicitly_wait(30)
driver.maximize_window()
# navigate to the application home page
driver.get("https://ppair.uspto.gov/TruePassSample/AuthenticateUserLocalEPF.html")
# get the search textbox
Select_Digital_Certificate = driver.find_element_by_name("username")
Select_Digital_Certificate.send_keys(DigitalCertificatePath)
This is the output from inspect element in Internet Explorer:
<INPUT name=username style="CURSOR: auto; BACKGROUND-IMAGE: url(); BACKGROUND-REPEAT: no-repeat; BACKGROUND-ATTACHMENT: scroll; BACKGROUND-POSITION: right center" type=text size=38 lpcachedvisval="1" lpcachedvistime="1491220212">
When I try to run the script in the console to receive the following error: "NameError: name 'Select_Digital_Certificate' is not defined".
Can someone please explain to me what I'm doing wrong?
Required input field located inside an iframe, so you need to switch to that iframe before handling input:
driver.get("https://ppair.uspto.gov/TruePassSample/AuthenticateUserLocalEPF.html")
driver.switch_to.frame('entrustTruePassGuiFrame')
Select_Digital_Certificate = driver.find_element_by_name("username")
...
To switch back to main HTML document you might need to use
driver.switch_to.default_content()