Python - requests.exceptions.SSLError - dh key too small - python

I'm scraping some internal pages using Python and requests. I've turned off SSL verifications and warnings.
requests.packages.urllib3.disable_warnings()
page = requests.get(url, verify=False)
On certain servers I receive an SSL error I can't get past.
Traceback (most recent call last):
File "scraper.py", line 6, in <module>
page = requests.get(url, verify=False)
File "/cygdrive/c/Users/jfeocco/VirtualEnv/scraping/lib/python3.4/site-packages/requests/api.py", line 71, in get
return request('get', url, params=params, **kwargs)
File "/cygdrive/c/Users/jfeocco/VirtualEnv/scraping/lib/python3.4/site-packages/requests/api.py", line 57, in request
return session.request(method=method, url=url, **kwargs)
File "/cygdrive/c/Users/jfeocco/VirtualEnv/scraping/lib/python3.4/site-packages/requests/sessions.py", line 475, in request
resp = self.send(prep, **send_kwargs)
File "/cygdrive/c/Users/jfeocco/VirtualEnv/scraping/lib/python3.4/site-packages/requests/sessions.py", line 585, in send
r = adapter.send(request, **kwargs)
File "/cygdrive/c/Users/jfeocco/VirtualEnv/scraping/lib/python3.4/site-packages/requests/adapters.py", line 477, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: [SSL: SSL_NEGATIVE_LENGTH] dh key too small (_ssl.c:600)
This happens both in/out of Cygwin, in Windows and OSX. My research hinted at outdated OpenSSL on the server. I'm looking for a fix client side ideally.
Edit:
I was able to resolve this by using a cipher set
import requests
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS += 'HIGH:!DH:!aNULL'
try:
requests.packages.urllib3.contrib.pyopenssl.DEFAULT_SSL_CIPHER_LIST += 'HIGH:!DH:!aNULL'
except AttributeError:
# no pyopenssl support used / needed / available
pass
page = requests.get(url, verify=False)

this is not an extra answer just try to combine the solution code from question with extra information
So others can copy it directly without extra try
It is not only a DH Key issues in server side, but also lots of different libraries are mismatched in python modules.
Code segment below is used to ignore those securitry issues because it may be not able be solved in server side. For example if it is internal legacy server, no one wants to update it.
Besides the hacked string for 'HIGH:!DH:!aNULL', urllib3 module can be imported to disable the warning if it has
import requests
import urllib3
requests.packages.urllib3.disable_warnings()
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS += ':HIGH:!DH:!aNULL'
try:
requests.packages.urllib3.contrib.pyopenssl.util.ssl_.DEFAULT_CIPHERS += ':HIGH:!DH:!aNULL'
except AttributeError:
# no pyopenssl support used / needed / available
pass
page = requests.get(url, verify=False)

This also worked for me:
import requests
import urllib3
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ALL:#SECLEVEL=1'
openssl SECLEVELs documentation:
https://www.openssl.org/docs/manmaster/man3/SSL_CTX_set_security_level.html
SECLEVEL=2 is the openssl default nowadays, (at least on my setup: ubuntu 20.04, openssl 1.1.1f); SECLEVEL=1 lowers the bar.
Security levels are intended to avoid the complexity of tinkering with individual ciphers.
I believe most of us mere mortals don't have in depth knowledge of the security strength/weakness of individual ciphers, I surely don't.
Security levels seem a nice method to keep some control over how far you are opening the security door.
Note: I got a different SSL error, WRONG_SIGNATURE_TYPE instead of SSL_NEGATIVE_LENGTH, but the underlying issue is the same.
Error:
Traceback (most recent call last):
[...]
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 581, in post
return self.request('POST', url, data=data, json=json, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='somehost.com', port=443): Max retries exceeded with url: myurl (Caused by SSLError(SSLError(1, '[SSL: WRONG_SIGNATURE_TYPE] wrong signature type (_ssl.c:1108)')))

Disabling warnings or certificate validation will not help. The underlying problem is a weak DH key used by the server which can be misused in the Logjam Attack.
To work around this you need to chose a cipher which does not make any use of Diffie Hellman Key Exchange and thus is not affected by the weak DH key. And this cipher must be supported by the server. It is unknown what the server supports but you might try with the cipher AES128-SHA or a cipher set of HIGH:!DH:!aNULL
Using requests with your own cipher set is tricky. See Why does Python requests ignore the verify parameter? for an example.

I had the same issue.
And it was fixed by commenting
CipherString = DEFAULT#SECLEVEL=2
line in /etc/ssl/openssl.cnf .

Someone from the requests python library's core development team has documented
a recipe to keep the changes limited to one or a few servers:
https://lukasa.co.uk/2017/02/Configuring_TLS_With_Requests/
If your code interacts with multiple servers, it makes sense not to lower the security requirements of all connections because one server has a problematic configuration.
The code worked for me out of the box.
That is, using my own value for CIPHERS, 'ALL:#SECLEVEL=1'.

I will package my solution here. I had to modify the python SSL library, which was possible since I was running my code within a docker container, but it's something that probably you don't want to do.
Get the supported cipher of your server. In my case was a third party e-mail server, and I used script described list SSL/TLS cipher suite
check_supported_ciphers.sh
#!/usr/bin/env bash
# OpenSSL requires the port number.
SERVER=$1
DELAY=1
ciphers=$(openssl ciphers 'ALL:eNULL' | sed -e 's/:/ /g')
echo Obtaining cipher list from $(openssl version).
for cipher in ${ciphers[#]}
do
echo -n Testing $cipher...
result=$(echo -n | openssl s_client -cipher "$cipher" -connect $SERVER 2>&1)
if [[ "$result" =~ ":error:" ]] ; then
error=$(echo -n $result | cut -d':' -f6)
echo NO \($error\)
else
if [[ "$result" =~ "Cipher is ${cipher}" || "$result" =~ "Cipher :" ]] ; then
echo YES
else
echo UNKNOWN RESPONSE
echo $result
fi
fi
sleep $DELAY
done
Give it permissions:
chmod +x check_supported_ciphers.sh
And execute it:
./check_supported_ciphers.sh myremoteserver.example.com | grep OK
After some seconds you will see an output similar to:
Testing AES128-SHA...YES (AES128-SHA_set_cipher_list)
So will use "AES128-SHA" as SSL cipher.
Force the error in your code:
Traceback (most recent call last):
File "my_custom_script.py", line 52, in
imap = IMAP4_SSL(imap_host)
File "/usr/lib/python2.7/imaplib.py", line 1169, in init
IMAP4.init(self, host, port)
File "/usr/lib/python2.7/imaplib.py", line 174, in init
self.open(host, port)
File "/usr/lib/python2.7/imaplib.py", line 1181, in open
self.sslobj = ssl.wrap_socket(self.sock, self.keyfile, self.certfile)
File "/usr/lib/python2.7/ssl.py", line 931, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 599, in init
self.do_handshake()
File "/usr/lib/python2.7/ssl.py", line 828, in do_handshake
self._sslobj.do_handshake()
ssl.SSLError: [SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:727)
Get the python SSL library path used, in this case:
/usr/lib/python2.7/ssl.py
Edit it:
cp /usr/lib/python2.7/ssl.py /usr/lib/python2.7/ssl.py.bak
vim /usr/lib/python2.7/ssl.py
And replace:
_DEFAULT_CIPHERS = (
'ECDH+AESGCM:ECDH+CHACHA20:DH+AESGCM:DH+CHACHA20:ECDH+AES256:DH+AES256:'
'ECDH+AES128:DH+AES:ECDH+HIGH:DH+HIGH:RSA+AESGCM:RSA+AES:RSA+HIGH:'
'!aNULL:!eNULL:!MD5:!3DES'
)
By:
_DEFAULT_CIPHERS = (
'AES128-SHA'
)

I encounter this problem afer upgrading to Ubuntu 20.04 from 18.04, following command works for me .
pip install --ignore-installed pyOpenSSL --upgrade

It's may be safer not to override the default global ciphers, but instead create custom HTTPAdapter with the required ciphers in a specific session:
import ssl
from typing import Any
import requests
class ContextAdapter(requests.adapters.HTTPAdapter):
"""Allows to override the default context."""
def __init__(self, *args: Any, **kwargs: Any) -> None:
self.ssl_context: ssl.SSLContext|None = kwargs.pop("ssl_context", None)
super().__init__(*args, **kwargs)
def init_poolmanager(self, *args: Any, **kwargs: Any) -> Any:
# See available keys in urllib3.poolmanager.SSL_KEYWORDS
kwargs.setdefault("ssl_context", self.ssl_context)
return super().init_poolmanager(*args, **kwargs)
then you need to create custom context, for example:
import ssl
def create_context(
ciphers: str, minimum_version: int, verify: bool
) -> ssl.SSLContext:
"""See https://peps.python.org/pep-0543/."""
ctx = ssl.create_default_context()
# Allow to use untrusted certificates.
if not verify:
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
# Just for example.
if minimum_version == ssl.TLSVersion.TLSv1:
ctx.options &= (
~getattr(ssl, "OP_NO_TLSv1_3", 0)
& ~ssl.OP_NO_TLSv1_2
& ~ssl.OP_NO_TLSv1_1
)
ctx.minimum_version = minimum_version
ctx.set_ciphers(ciphers)
return ctx
and then you need to configure each website with custom context rules:
session = requests.Session()
session.mount(
"https://dh.affected-website.com",
ContextAdapter(
ssl_context=create_context(
ciphers="HIGH:!DH:!aNULL"
),
),
)
session.mount(
"https://only-elliptic.modern-website.com",
ContextAdapter(
ssl_context=create_context(
ciphers="ECDHE+AESGCM"
),
),
)
session.mount(
"https://only-tls-v1.old-website.com",
ContextAdapter(
ssl_context=create_context(
ciphers="DEFAULT:#SECLEVEL=1",
minimum_version=ssl.TLSVersion.TLSv1,
),
),
)
result = session.get("https://only-tls-v1.old-website.com/object")
After reading all the answers, I can say that #bgoeman's answer is close to mine, you can follow their link to learn more.

On CentOS 7, search for the following content in /etc/pki/tls/openssl.cnf:
[ crypto_policy ]
.include /etc/crypto-policies/back-ends/opensslcnf.config
[ new_oids ]
Set 'ALL:#SECLEVEL=1' in /etc/crypto-policies/back-ends/opensslcnf.config.

In docker image you can add the following command in your Dockerfile to get rid of this issue:
RUN sed -i '/CipherString = DEFAULT/s/^#\?/#/' /etc/ssl/openssl.cnf
This automatically comments out the problematic CipherString line.

If you are using httpx library, with this you skip the warning:
import httpx
httpx._config.DEFAULT_CIPHERS += ":HIGH:!DH:!aNULL"

I had the following error:
SSLError: [SSL: DH_KEY_TOO_SMALL] dh key too small (_ssl.c:727)
I solved it(Fedora):
python2.7 -m pip uninstall requests
python2.7 -m pip uninstall pyopenssl
python2.7 -m pip install pyopenssl==yourversion
python2.7 -m pip install requests==yourversion
The order module install cause that:
requests.packages.urllib3.contrib.pyopenssl.util.ssl_.DEFAULT_CIPHERS
AttributeError "pyopenssl" in "requests.packages.urllib3.contrib" when the module did exist.

Based on the answer given by the user bgoeman, the following code, which keeps the default ciphers only adding the security level, works.
import requests
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS += '#SECLEVEL=1'

Related

Python HTTPS request SSLError CERTIFICATE_VERIFY_FAILED

PYTHON
import requests
url = "https://REDACTED/pb/s/api/auth/login"
r = requests.post(
url,
data = {
'username': 'username',
'password': 'password'
}
)
NIM
import httpclient, json
let client = newHttpClient()
client.headers = newHttpHeaders({ "Content-Type": "application/json" })
let body = %*{
"username": "username",
"password": "password"
}
let resp = client.request("https://REDACTED.com/pb/s/api/auth/login", httpMethod = httpPOST, body = $body)
echo resp.body
I'm calling an API to get some data. Running the python code I get the traceback below. However, the nim code works perfectly so there must be something wrong with the python code or setup.
I'm running Python version 2.7.15.
requests lib version 2.19.1
Traceback (most recent call last):
File "C:/Python27/testht.py", line 21, in <module>
"Referer": "https://REDACTED.com/pb/a/"
File "C:\Python27\lib\site-packages\requests\api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 512, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 622, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests\adapters.py", line 511, in send
raise SSLError(e, request=request)
SSLError: HTTPSConnectionPool(host='REDACTED.com', port=443): Max retries exceeded with url: /pb/s/api/auth/login (Caused by SSLError(SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:726)'),))
The requests module will verify the cert it gets from the server, much like a browser would. Rather than being able to click through and say "add exception" like you would in your browser, requests will raise that exception.
There's a way around it though: try adding verify=False to your post call.
However, the nim code works perfectly so there must be something wrong with the python code or setup.
Actually, your Python code or setup is less to blame but instead the nim code or better the defaults on the httpclient library. In the documentation for nim can be seen that httpclient.request uses a SSL context returned by getDefaultSSL by default which according to this code creates a context which does not verify the certificate:
proc getDefaultSSL(): SSLContext =
result = defaultSslContext
when defined(ssl):
if result == nil:
defaultSSLContext = newContext(verifyMode = CVerifyNone)
Your Python code instead attempts to properly verify the certificate since the requests library does this by default. And it fails to verify the certificate because something is wrong - either with your setup or the server.
It is unclear who has issued the certificate for your site but if it is not in your default CA store you can use the verify argument of requests to specify the issuer CA. See this documentation for details.
If the site you are trying to access works with the browser but fails with your program it might be that it uses a special CA which was added as trusted to the browser (like a company certificate). Browsers and Python use different trust stores so this added certificate needs to be added to Python or at least to your program as trusted too. It might also be that the setup of the server has problems. Browsers can sometimes work around problems like a missing intermediate certificate but Python doesn't. In case of a public accessible site you could use SSLLabs to check what's wrong.

python httplib2 certificate verify failed

I have tried everything I can find to get this to work...
I'm working on a plugin for a python-based task program (called GTG). I'm running Gnome on Opensuse Linux.
Code (Python 2.7):
def initialize(self):
"""
Intialize backend: try to authenticate. If it fails, request an authorization.
"""
super(Backend, self).initialize()
path = os.path.join(CoreConfig().get_data_dir(), 'backends/gtask', 'storage_file-%s' % self.get_id())
# Try to create leading directories that path
path_dir = os.path.dirname(path)
if not os.path.isdir(path_dir):
os.makedirs(path_dir)
self.storage = Storage(path)
self.authenticate()
def authenticate(self):
""" Try to authenticate by already existing credences or request an authorization """
self.authenticated = False
credentials = self.storage.get()
if credentials is None or credentials.invalid == True:
self.request_authorization()
else:
self.apply_credentials(credentials)
# Request periodic import, avoid waiting a long time
# self.start_get_tasks()
def apply_credentials(self, credentials):
""" Finish authentication or request for an authorization by applying the credentials """
http = httplib2.Http(ca_certs = '/etc/ssl/certs/ca_certs.pem', disable_ssl_certificate_validation=True)
http = credentials.authorize(http)
# Build a service object for interacting with the API.
self.service = build_service(serviceName='tasks', version='v1', http=http, developerKey='AIzaSyAmUlk8_iv-rYDEcJ2NyeC_KVPNkrsGcqU')
# self.service = build_service(serviceName='tasks', version='v1')
self.authenticated = True
def _authorization_step2(self, code):
credentials = self.flow.step2_exchange(code)
# credential = self.flow.step2_exchange(code)
self.storage.put(credentials)
credentials.set_store(self.storage)
return credentials
def request_authorization(self):
""" Make the first step of authorization and open URL for allowing the access """
self.flow = OAuth2WebServerFlow(client_id=self.CLIENT_ID,
client_secret=self.CLIENT_SECRET,
scope='https://www.googleapis.com/auth/tasks',
redirect_uri='http://localhost:8080',
user_agent='GTG')
oauth_callback = 'oob'
auth_uri = self.flow.step1_get_authorize_url(oauth_callback)
# credentials = self.flow.step2_exchange(code)
# url = self.flow.step1_get_authorize_url(oauth_callback)
browser_thread = threading.Thread(target=lambda: webbrowser.open_new(auth_uri))
browser_thread.daemon = True
browser_thread.start()
# Request the code from user
BackendSignals().interaction_requested(self.get_id(), _(
"You need to <b>authorize GTG</b> to access your tasks on <b>Google</b>.\n"
"<b>Check your browser</b>, and follow the steps there.\n"
"When you are done, press 'Continue'."),
BackendSignals().INTERACTION_TEXT,
"on_authentication_step")
def on_authentication_step(self, step_type="", code=""):
if step_type == "get_ui_dialog_text":
return _("Code request"), _("Paste the code Google has given you"
"here")
elif step_type == "set_text":
try:
credentials = self._authorization_step2(code)
except FlowExchangeError, e:
# Show an error to user and end
self.quit(disable = True)
BackendSignals().backend_failed(self.get_id(),
BackendSignals.ERRNO_AUTHENTICATION)
return
self.apply_credentials(credentials)
# Request periodic import, avoid waiting a long time
self.start_get_tasks()
The browser window opens up and I am presented with a code from Google. The program opens a small window where I can enter the code from Google.When that happens I get this in the console :
No handlers could be found for logger "oauth2client.util"
Created new window in existing browser session.
[522:549:0108/063825:ERROR:nss_util.cc(821)] After loading Root Certs, loaded==false: NSS error code: -8018
but the SSL icon is green in Chrome...
then when I submit the code, I get :
Exception in thread Thread-10:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 810, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 763, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib/python2.7/site-packages/GTG/backends/backend_gtask.py", line 204, in on_authentication_step
credentials = self._authorization_step2(code)
File "/usr/lib/python2.7/site-packages/GTG/backends/backend_gtask.py", line 151, in _authorization_step2
credentials = self.flow.step2_exchange(code)
File "/usr/lib/python2.7/site-packages/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/usr/lib/python2.7/site-packages/oauth2client/client.py", line 1283, in step2_exchange
headers=headers)
File "/usr/lib/python2.7/site-packages/httplib2/__init__.py", line 1586, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/lib/python2.7/site-packages/httplib2/__init__.py", line 1328, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/lib/python2.7/site-packages/httplib2/__init__.py", line 1250, in _conn_request
conn.connect()
File "/usr/lib/python2.7/site-packages/httplib2/__init__.py", line 1037, in connect
raise SSLHandshakeError(e)
SSLHandshakeError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:581)
The file is called backend_gtask.py...
I have tried importing the certificate as stated here : How to update cacerts.txt of httplib2 for Github?
I have tried to disable verification (httplib2.Http(disable_ssl_certificate_validation=True)) as stated all over the web,
I have updated the python packages (which seemed to make things worse)
I have copied ca_certs.pem back and forth between /etc/ssl... and /usr/lib/python2.7/...
When I visit the auth page in a browser, it says the certificate is verified...
What else can I possibly check?
SHORT TEST CODE :
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.tools import run
from oauth2client.file import Storage
CLIENT_ID = 'id'
CLIENT_SECRET = 'secret'
flow = OAuth2WebServerFlow(client_id=CLIENT_ID,
client_secret=CLIENT_SECRET,
scope='https://www.googleapis.com/auth/tasks',
redirect_uri='http://localhost:8080')
storage = Storage('creds.data')
credentials = run(flow, storage)
print "access_token: %s" % credentials.access_token
Found that here: https://github.com/burnash/gspread/wiki/How-to-get-OAuth-access-token-in-console%3F
OK...
Big thanks to Steffen Ullrich.
httplib2 version 0.9 tries to use the system certificates and not the certs.txt file that used to be shipped with it. It also enforces verification.
httplib2 can take a couple of useful parameters - notably ca_certs. Use it to point to the actual *.pem file in you ssl installation. I cannot be a folder, must be a real file.
I use the following in the initialization of the plugin :
self.http = httplib2.Http(ca_certs = '/etc/ssl/ca-bundle.pem')
Then, for all subsequent calls to httplib or google client libraries, I pass my pre-built http object as a parameter like this:
credentials = self.flow.step2_exchange(code, self.http)
self.http = credentials.authorize(self.http)
Now ssl connections work with the new httplib2...
I will eventually have to make sure the plugin can find certificates on any system, but at least I know what the problem was.
Thanks again to Steffen Ullrich for walking me through this.
See this answer for an easier fix without touching your code: just set your certificate bundle pem file path in an environment variable:
export HTTPLIB2_CA_CERTS="\path\to\your\ca-bundle"

python: how to use/change proxy with mechanize

Im writing a web scraping program in python using mechanize. The problem I'm having is that the website I'm scraping from limits the amount of time that you can be on the website. When I was doing everything by hand, I would use a SOCKS proxy as a work-around.
What I tried to do is go to the network preferences (Macbook Pro Retina 13', mavericks) and change to the proxy. However, the program didn't respond to that change. It kept running without the proxy.
Then I added .set_proxies() so now the code to open the website looks something like this:
b=mechanize.Browser() #open browser
b.set_proxies({"http":"96.8.113.76:8080"}) #proxy
DBJ=b.open(URL) #open url
When I ran the program, I got this error:
Traceback (most recent call last):
File "GM1.py", line 74, in <module>
DBJ=b.open(URL)
File "build/bdist.macosx-10.9-intel/egg/mechanize/_mechanize.py", line 203, in open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_mechanize.py", line 230, in _mech_open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_opener.py", line 193, in open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_urllib2_fork.py", line 344, in _open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_urllib2_fork.py", line 332, in _call_chain
File "build/bdist.macosx-10.9-intel/egg/mechanize/_urllib2_fork.py", line 1142, in http_open
File "build/bdist.macosx-10.9-intel/egg/mechanize/_urllib2_fork.py", line 1118, in do_open
urllib2.URLError: <urlopen error [Errno 54] Connection reset by peer>
Im assuming that the proxy was changed and that this error is in response to that proxy.
Maybe I am misusing .set_proxies().
Im not sure if the proxy itself is the issue or the connection is really slow.
Should I even be using SOCKS proxies for this type of thing or is there a better alternative for what I am trying to do?
Any information would be extremely helpful. Thanks in advance.
A SOCKS proxy is not the same as a HTTP proxy. The protocol between client and proxy is different. The line:
b.set_proxies({"http":"96.8.113.76:8080"})
tells mechanize to use the HTTP proxy at 96.8.113.76:8080 for requests having the http scheme in the URL, e.g. a request for URL http://httpbin.org/get will be sent via the proxy at 96.8.113.76:8080. Mechanize expects this to be a HTTP proxy server, and uses the corresponding protocol. It seems that your SOCKS proxy is closing the connection because it is not receiving a valid SOCKS proxy request (because it is a actually a HTTP proxy request).
I don't think that mechanize has builtin support for SOCKS, so you may have to resort to some dirty tricks such as those in this answer. For that you will need to install the PySocks package. This might work for you:
import socks
import socket
from mechanize import Browser
SOCKS_PROXY_HOST = '96.8.113.76'
SOCKS_PROXY_PORT = 8080
def create_connection(address, timeout=None, source_address=None):
sock = socks.socksocket()
sock.connect(address)
return sock
# add username and password arguments if proxy authentication required.
socks.setdefaultproxy(socks.PROXY_TYPE_SOCKS5, SOCKS_PROXY_HOST, SOCKS_PROXY_PORT)
# patch the socket module
socket.socket = socks.socksocket
socket.create_connection = create_connection
br = Browser()
response = br.open('http://httpbin.org/get')
>>> print response.read()
{
"args": {},
"headers": {
"Accept-Encoding": "identity",
"Connection": "close",
"Host": "httpbin.org",
"User-Agent": "Python-urllib/2.7",
"X-Request-Id": "e728cd40-002c-4f96-a26a-78ce4d651fda"
},
"origin": "192.161.1.100",
"url": "http://httpbin.org/get"
}

Max retries exceeded with URL in requests

I'm trying to get the content of App Store > Business:
import requests
from lxml import html
page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)
flist = []
plist = []
for i in range(0, 100):
app = tree.xpath("//div[#class='column first']/ul/li/a/#href")
ap = app[0]
page1 = requests.get(ap)
When I try the range with (0,2) it works, but when I put the range in 100s it shows this error:
Traceback (most recent call last):
File "/home/preetham/Desktop/eg.py", line 17, in <module>
page1 = requests.get(ap)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
Just use requests features:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
session.get(url)
This will GET the URL and retry 3 times in case of requests.exceptions.ConnectionError. backoff_factor will help to apply delays between attempts to avoid failing again in case of periodic request quota.
Take a look at urllib3.util.retry.Retry, it has many options to simplify retries.
What happened here is that itunes server refuses your connection (you're sending too many requests from same ip address in short period of time)
Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8
error trace is misleading it should be something like "No connection could be made because the target machine actively refused it".
There is an issue at about python.requests lib at Github, check it out here
To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:
try:
page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec) function in python (don't forget to import sleep)
from time import sleep
All in all requests is awesome python lib, hope that solves your problem.
Just do this,
Paste the following code in place of page = requests.get(url):
import time
page = ''
while page == '':
try:
page = requests.get(url)
break
except:
print("Connection refused by the server..")
print("Let me sleep for 5 seconds")
print("ZZzzzz...")
time.sleep(5)
print("Was a nice sleep, now let me continue...")
continue
You're welcome :)
I got similar problem but the following code worked for me.
url = <some REST url>
page = requests.get(url, verify=False)
"verify=False" disables SSL verification. Try and catch can be added as usual.
pip install pyopenssl seemed to solve it for me.
https://github.com/requests/requests/issues/4246
Specifying the proxy in a corporate environment solved it for me.
page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})
The full error is:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
It is always good to implement exception handling. It does not only help to avoid unexpected exit of script but can also help to log errors and info notification. When using Python requests I prefer to catch exceptions like this:
try:
res = requests.get(adress,timeout=30)
except requests.ConnectionError as e:
print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
print(str(e))
renewIPadress()
continue
except requests.Timeout as e:
print("OOPS!! Timeout Error")
print(str(e))
renewIPadress()
continue
except requests.RequestException as e:
print("OOPS!! General Error")
print(str(e))
renewIPadress()
continue
except KeyboardInterrupt:
print("Someone closed the program")
Here renewIPadress() is a user define function which can change the IP address if it get blocked. You can go without this function.
Adding my own experience for those who are experiencing this in the future. My specific error was
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'
It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.
When I was writing a selenium browser test script, I encountered this error when calling driver.quit() before a usage of a JS api call.Remember that quiting webdriver is last thing to do!
i wasn't able to make it work on windows even after installing pyopenssl and trying various python versions (while it worked fine on mac), so i switched to urllib and it works on python 3.6 (from python .org) and 3.7 (anaconda)
import urllib
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)
just import time
and add :
time.sleep(6)
somewhere in the for loop, to avoid sending too many request to the server in a short time.
the number 6 means: 6 seconds.
keep testing numbers starting from 1, until you reach the minimum seconds that will help to avoid the problem.
It could be network config issue also. So, for that u need to re-config ur network confgurations.
for Ubuntu :
sudo vim /etc/network/interfaces
add 8.8.8.8 in dns-nameserver and save it.
reset ur network : /etc/init.d/networking restart
Now try..
Adding my own experience :
r = requests.get(download_url)
when I tried to download a file specified in the url.
The error was
HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
I corrected it by adding verify = False in the function as follows :
r = requests.get(download_url + filename)
open(filename, 'wb').write(r.content)
Check your network connection. I had this and the VM did not have a proper network connection.
I had the same error when I run the route in the browser, but in postman, it works fine. It issue with mine was that, there was no / after the route before the query string.
127.0.0.1:5000/api/v1/search/?location=Madina raise the error and removing / after the search worked for me.
This happens when you send too many requests to the public IP address of https://itunes.apple.com. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://itunes.apple.com. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.
import re
import socket
import subprocess
from typing import Tuple
ENDPOINT = 'https://anydomainname.example.com/'
ENDPOINT = 'https://itunes.apple.com/'
def get_public_ip() -> Tuple[str, str, str]:
"""
Command to get public_ip address of host machine and endpoint domain
Returns
-------
my_public_ip : str
Ip address string of host machine.
end_point_ip_address : str
Ip address of endpoint domain host.
end_point_domain : str
domain name of endpoint.
"""
# bash_command = """host myip.opendns.com resolver1.opendns.com | \
# grep "myip.opendns.com has" | awk '{print $4}'"""
# bash_command = """curl ifconfig.co"""
# bash_command = """curl ifconfig.me"""
bash_command = """ curl icanhazip.com"""
my_public_ip = subprocess.getoutput(bash_command)
my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
end_point_domain = (
ENDPOINT.replace("https://", "")
.replace("http://", "")
.replace("/", "")
)
end_point_ip_address = socket.gethostbyname(end_point_domain)
return my_public_ip, end_point_ip_address, end_point_domain
def set_etc_host(ip_address: str, domain: str) -> str:
"""
A function to write mapping of ip_address and domain name in /etc/hosts.
Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build
Parameters
----------
ip_address : str
IP address of the domain.
domain : str
domain name of endpoint.
Returns
-------
str
Message to identify success or failure of the operation.
"""
bash_command = """echo "{} {}" >> /etc/hosts""".format(ip_address, domain)
output = subprocess.getoutput(bash_command)
return output
if __name__ == "__main__":
my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
print("My public IP address:", my_public_ip)
print("ENDPOINT public IP address:", end_point_ip_address)
print("ENDPOINT Domain Name:", end_point_domain )
print("Command output:", output)
You can call the above script before running your desired function :)
My situation is rather special. I tried the answers above, none of them worked. I suddenly thought whether it has something to do with my Internet proxy? You know, I'm in mainland China, and I can't access sites like google without an internet proxy. Then I turned off my Internet proxy and the problem was solved.
In my case, I am deploying some docker containers inside the python script and then calling one of the deployed services. Error is fixed when I add some delay before calling the service. I think it needs time to get ready to accept connections.
from time import sleep
#deploy containers
#get URL of the container
sleep(5)
response = requests.get(url,verify=False)
print(response.json())
First I ran the run.py file and then I ran the unit_test.py file, it works for me
Add headers for this request.
headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
requests.get(ap, headers=headers)
I am coding a test with Gauge and I encountered this error as well, it was because I was trying to request an internal URL without activating VPN.

Proxy using Twython

I keep getting this error everytime I try running my code through proxy. I have gone through every single link available on how to get my code running behind proxy and am simply unable to get this done.
import twython
import requests
TWITTER_APP_KEY = 'key' #supply the appropriate value
TWITTER_APP_KEY_SECRET = 'key-secret'
TWITTER_ACCESS_TOKEN = 'token'
TWITTER_ACCESS_TOKEN_SECRET = 'secret'
t = twython.Twython(app_key=TWITTER_APP_KEY,
app_secret=TWITTER_APP_KEY_SECRET,
oauth_token=TWITTER_ACCESS_TOKEN,
oauth_token_secret=TWITTER_ACCESS_TOKEN_SECRET,
client_args = {'proxies': {'http': 'proxy.company.com:10080'}})
now if I do
t = twython.Twython(app_key=TWITTER_APP_KEY,
app_secret=TWITTER_APP_KEY_SECRET,
oauth_token=TWITTER_ACCESS_TOKEN,
oauth_token_secret=TWITTER_ACCESS_TOKEN_SECRET,
client_args = client_args)
print t.client_args
I get only a {}
and when I try running
t.update_status(status='See how easy this was?')
I get this problem :
Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
t.update_status(status='See how easy this was?')
File "build\bdist.win32\egg\twython\endpoints.py", line 86, in update_status
return self.post('statuses/update', params=params)
File "build\bdist.win32\egg\twython\api.py", line 223, in post
return self.request(endpoint, 'POST', params=params, version=version)
File "build\bdist.win32\egg\twython\api.py", line 213, in request
content = self._request(url, method=method, params=params, api_call=url)
File "build\bdist.win32\egg\twython\api.py", line 134, in _request
response = func(url, **requests_args)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\sessions.py", line 377, in post
return self.request('POST', url, data=data, **kwargs)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\sessions.py", line 335, in request
resp = self.send(prep, **send_kwargs)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\sessions.py", line 438, in send
r = adapter.send(request, **kwargs)
File "C:\Python27\lib\site-packages\requests-1.2.3-py2.7.egg\requests\adapters.py", line 327, in send
raise ConnectionError(e)
ConnectionError: HTTPSConnectionPool(host='api.twitter.com', port=443): Max retries exceeded with url: /1.1/statuses/update.json (Caused by <class 'socket.gaierror'>: [Errno 11004] getaddrinfo failed)
I have searched everywhere. Tried everything that I possibly could. The only resources available were :
https://twython.readthedocs.org/en/latest/usage/advanced_usage.html#manipulate-the-request-headers-proxies-etc
https://groups.google.com/forum/#!topic/twython-talk/GLjjVRHqHng
https://github.com/fumieval/twython/commit/7caa68814631203cb63231918e42e54eee4d2273
https://groups.google.com/forum/#!topic/twython-talk/mXVL7XU4jWw
There were no topics I could find here (on Stack Overflow) either.
Please help. Hope someone replies. If you have already done this please help me with some code example.
Your code isn't using your proxy. The example shows, you specified a proxy for plain HTTP but your stackstrace shows a HTTPSConnectionPool. Your local machine probably can't resolve external domains.
Try setting your proxy like this:
client_args = {'proxies': {'https': 'http://proxy.company.com:10080'}}
In combination with #t-8ch's answer (which is that you must use a proxy as he has defined it), you should also realize that as of this moment, requests (the underlying library of Twython) does not support proxying over HTTPS. This is a problem with requests underlying library urllib3. It's a long running issue as far as I'm aware.
On top of that, reading a bit of Twython's source explains why t.client_args returns an empty dictionary. In short, if you were to instead print t.client.proxies, you'd see that indeed your proxies are being processed as they very well should be.
Finally, complaining about your workplace while on StackOverflow and linking to GitHub commits that have your GitHub username (and real name) associated with them in the comments is not the best idea. StackOverflow is indexed quite thoroughly by Google and there is little doubt that someone else might find this and associate it with you as easily as I have. On top of that, that commit has absolutely no effect on Twython's current behaviour. You're running down a rabbit hole with no end by chasing the author of that commit.
It looks like a domain name lookup failed. Assuming your configured DNS server can resolve Twitter's domain name (and surely it can), I would presume your DNS lookup for proxy.company.com failed. Try using a proxy by IP address instead of by hostname.

Categories