make any internet-accessing python code work (proxy + custom .crt)

make any internet-accessing python code work (proxy + custom .crt) - python

The situation
If the following is not done, all outgoing HTTP or HTTPS requests made with python ends in a WinError 10054 Connection Reset, or a SSL bad handshake error.
set the HTTP_PROXY, HTTPS_PROXY environment variable, or their counterparts
What needs to be verified must be verified with a custom .crt file.
For example, assuming the .crt file is in place, both gets me a 200 OK:
import os
os.environ['HTTP_PROXY'] = #some_appropriate_address
os.environ['HTTPS_PROXY'] = #some appropriate_address
requests.get('http://www.google.com',verify="C:\the_file.crt") # 200 OK
requests.get('http://httpbin.org',verify=False) # 200 OK, but unsafe
requests.get('http://httpbin.org') # SSL bad handshake error
The Problem
There is this massive jumble of pre-written code (heavily utilizing urllib3 and requests and possibly other pieces of internet-accessing code) I have, and I have to make it work under the conditions outlined above.
Sure, I can write verify='C:\the_file.crt' for every requests.get(), but that can very quickly get hairy, right? And the code may also be using some other library (that is not requests). So I am looking for a global setting (environment variable etc.) I should alter, so that everything works well (return a 200 OK upon a GET request to a server, whether or not the code is written in requests-py).
Also, if there is no such way, I would like an explanation as to why.
What I tried (am trying)
Maybe editing the .condarc file (via conda --config) is a solution. I tried, to no avail: python gives me a "SSL verification failed" error. On the contrary, note that the code snippet above gave me a 200 OK. To my knowledge, this does not fit nicely with many situations that were previously discussed in Stack Overflow.
By the way, setting ssl_verify to false does not solve the problem either; I still get a bad handshake error for some reason.
I am using Win 10, Python 3.7.4 (Anaconda).
Update
I have edited the question to prevent future misunderstandings about the content of this question. A few answers below are a reiteration of what was written here from the start.
The current answers are not entirely satisfactory either, as they only seem to address the case where I am using requests or urllib3.

You should be able to get any python code that uses the requests module(which is inside urllib3) to work behind a proxy without modifying the python code itself by setting the following environment variables in Windows.
http_proxy http://[<user>:<pwd>#]<http_host>:<http_port>
https_proxy http://[<user>:<pwd>#]<https_host>:<https_port>
requests_ca_bundle <path_to_ca_bundle.crt>
curl_ca_bundle <path_to_ca_bundle.crt>
You can set environment variables by doing the following:
Press Windows-Key + R, enter sysdm.cpl ,3 (mind the space before the comma) and press Enter
Click the Environment variables button
In either of the fields (User variables or System variables), add the four variables

According to Doc in Requests:
https://requests.readthedocs.io/en/master/user/advanced/#proxies
you can use proxy in this way:
proxies = { 'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080',}
requests.get('http://example.org', proxies=proxies)
Then depending on if you want to add .crt or .pem:
requests.get('https://kennethreitz.com', cert=('/path/server.crt', '/path/key'))
requests.get('https://kennethreitz.org', cert='/path/client.pem')
https://2.python-requests.org//en/v1.0.4/user/advanced/

You are trying to make https requests to an outer url and you need to provide the proper certificate files for verification. You are trying to make these configurations inside each component. But I would suggest that you make those configurations globally and system-wide so neither of the components need to provide certificates and deal with ssl-verification stuff.
I am awful at windows related networking configurations, but I would suggest you go check Proxifier and I am pretty sure you can configure a ssl proxy with proper certificates.

Related

SSLError("bad handshake: Error([('SSL routines', 'tls_process_ske_dhe', 'dh key too small' in Python

I have seen a few links for this issue and most people want the server to be updated for security reasons. I am looking to make an internal only tool and connect to a server that is not able to be modified. My code is below and I am hopeful I can get clarity on how I can accept the small key and process the request.
Thank you all in advance
import requests
from requests.auth import HTTPBasicAuth
import warnings
import urllib3
warnings.filterwarnings("ignore")
requests.packages.urllib3.disable_warnings()
requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS += 'HIGH:!DH:!aNULL'
#requests.packages.urllib3.contrib.pyopenssl.DEFAULT_SSL_CIPHER_LIST += 'HIGH:!DH:!aNULL'
url = "https://x.x.x.x/place/stuff"
userName = 'stuff'
passW = 'otherstuff'
dataR = requests.get(url,auth=HTTPBasicAuth(userName, passW),verify=False)
print(dataR.text)

The problem with too small DH keys is discussed in length at https://weakdh.org` with various remediations.
Now in your case it depends on OpenSSL which Python uses under the hood. It hardcodes thing to reject too small values.
Have a look at: How to reject weak DH parameters in an OpenSSL client?
Currently OpenSSL in client mode stops handshake only if the keylength of server selected DH parameters is less than 768 bit (hardcoded in source).
Based on the answer there, you could use SSL_CTX_set_tmp_dh_callback and SSL_set_tmp_dh_callback to control things more to your liking... except that at that time it did not seem to work at the client side only the server side.
Based on http://openssl.6102.n7.nabble.com/How-to-enforce-DH-field-size-in-the-client-td60442.html it seems that some work was added in the 1.1.0 branch for that problem. It seems to hint at a commit 2001129f096d10bbd815936d23af3e97daf7882d in 1.0.2 so first maybe try a newer version of OpenSSL (you did not specify which versions you are using).
However even if you manage to have everything working with OpenSSL you still need your Python to use it (so probably to compile python yourself) and then have the specific API inside Python to work on that... to be honest I think you will loose far less time fixing the service (even if you say you can not modify it) instead of trying to basically cripple the client, as rejecting small keys is a good thing (for reasons explained in the first link).

How can a CGI server based on CGIHTTPRequestHandler require that a script start its response with headers that include a `content-type`?

Later note: the issues in the original posting below have been largely resolved.
Here's the background: For an introductory comp sci course, students develop html and server-side Python 2.7 scripts using a server provided by the instructors. That server is based on CGIHTTPRequestHandler, like the one at pointlessprogramming. When the students' html and scripts seem correct, they port those files to a remote, slow Apache server. Why support two servers? Well, the initial development using a local server has the benefit of reducing network issues and dependency on the remote, weak machine that is running Apache. Eventually porting to the Apache-running machine has the benefit of publishing their results for others to see.
For the local development to be most useful, the local server should closely resemble the Apache server. Currently there is an important difference: Apache requires that a script start its response with headers that include a content-type; if the script fails to provide such a header, Apache sends the client a 500 error ("Internal Server Error"), which too generic to help the students, who cannot use the server logs. CGIHTTPRequestHandler imposes no similar requirement. So it is common for a student to write header-free scripts that work with the local server, but get the baffling 500 error after copying files to the Apache server. It would be helpful to have a version of the local server that checks for a content-type header and gives a good error if there is none.
I seek advice about creating such a server. I am new to Python and to writing servers. Here are the issues that occur to me, but any helpful advice would be appreciated.
Is a content-type header required by the CGI standard? If so, other people might benefit from an answer to the main question here. Also, if so, I despair of finding a way to disable Apache's requirement. Maybe the relevant part of the CGI RFC is section 6.3.1 (CGI Response, Content-Type): "If an entity body is returned, the script MUST supply a Content-Type field in the response."
To make a local server that checks for the content-type header, perhaps I should sub-class CGIHTTPServer.CGIHTTPRequestHandler, to override run_cgi() with a version that issues an error for a missing header. I am looking at CGIHTTPServer.py __version__ = "0.4", which was installed with Python 2.7.3. But run_cgi() does a lot of processing, so it is a little unappealing to copy all its code, just to add a couple calls to a header-checking routine. Is there a better way?
If the answer to (2) is something like "No, overriding run_cgi() is recommended," I anticipate writing a version that invokes the desired script, then checks the script's output for headers before that output is sent to the client. There are apparently two places in the existing run_cgi() where the script is invoked:
3a. When run_cgi() is executed on a non-Unix system, the script is executed using Python's subprocess module. As a result, the standard output from the script will be available as an in-memory string, which I can presumably check for headers before the call to self.wfile.write. Does this sound right?
3b. But when run_cgi() is executed on a *nix system, the script is executed by a forked process. I think the child's stdout will write directly to self.wfile (I'm a little hazy on this), so I see no opportunity for the code in run_cgi() to check the output. Ugh. Any suggestions?
If analyzing the script's output is recommended, is email.parser the standard way to recognize whether there is a content-type header? Is another standard module recommended instead?
Is there a more appropriate forum for asking the main question ("How can a CGI server based on CGIHTTPRequestHandler require...")? It seems odd to ask if there is a better forum for asking programming questions than Stack Overflow, but I guess anything is possible.
Thanks for any help.

Httplib2 ssl error

Today I faced one interesting issue.
I'm using the foursquare recommended python library httplib2 raise
SSLHandshakeError(SSLError(1, '_ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed'),)
while trying to request an oauth token
response, body = h.request(url, method, headers=headers, body=data)
in
_process_request_with_httplib2 function
does anyone know why this happens?

If you know that the site you're trying to get is a "good guy", you can try creating your "opener" like this:
import httplib2
if __name__ == "__main__":
h = httplib2.Http(".cache", disable_ssl_certificate_validation=True)
resp, content = h.request("https://site/whose/certificate/is/bad/", "GET")
(the interesting part is disable_ssl_certificate_validation=True )
From the docs:
http://bitworking.org/projects/httplib2/doc/html/libhttplib2.html#httplib2.Http
EDIT 01:
Since your question was actually why does this happen, you can check this or this.
EDIT 02:
Seeing how this answer has been visited by more people than I expected, I'd like to explain a bit when disabling certificate validation could be useful.
First, a bit of light background on how these certificates work. There's quite a lot of information in the links provided above, but here it goes, anyway.
The SSL certificates need to be verified by a well known (at least, well known to your browser) Certificate Authority. You usually buy the whole certificate from one of those authorities (Symantec, GoDaddy...)
Broadly speaking, the idea is: Those Certificate Authorities (CA) give you a certificate that also contains the CA information in it. Your browsers have a list of well known CAs, so when your browser receives a certificate, it will do something like: "HmmmMMMmmm.... [the browser makes a supiciuous face here] ... I received a certificate, and it says it's verified by Symantec. Do I know that "Symantec" guy? [the browser then goes to its list of well known CAs and checks for Symantec] Oh, yeah! I do. Ok, the certificate is good!
You can see that information yourself if you click on the little lock by the URL in your browser:
However, there are cases in which you just want to test the HTTPS, and you create your own Certificate Authority using a couple of command line tools and you use that "custom" CA to sign a "custom" certificate that you just generated as well, right? In that case, your browser (which, by the way, in the question is httplib2.Http) is not going to have your "custom" CA among the list of trusted CAs, so it's going to say that the certificate is invalid. The information is still going to travel encrypted, but what the browser is telling you is that it doesn't fully trust that is traveling encrypted to the place you are supposing it's going.
For instance, let's say you created a set of custom keys and CAs and all the mambo-jumbo following this tutorial for your localhost FQDN and that your CA certificate file is located in the current directory. You could very well have a server running on https://localhost:4443 using your custom certificates and whatnot. Now, your CA certificate file is located in the current directory, in the file ./ca.crt (in the same directory your Python script is going to be running in). You could use httplib2 like this:
h = httplib2.Http(ca_certs='./ca.crt')
response, body = h.request('https://localhost:4443')
print(response)
print(body)
... and you wouldn't see the warning anymore. Why? Because you told httplib2 to go look for the CA's certificate to ./ca.crt)
However, since Chrome (to cite a browser) doesn't know about this CA's certificate, it will consider it invalid:
Also, certificates expire. There's a chance you are working in a company which uses an internal site with SSL encryption. It works ok for a year, and then your browser starts complaining. You go to the person that is in charge of the security, and ask "Yo!! I get this warning here! What's happening?" And the answer could very well be "Oh boy!! I forgot to renew the certificate! It's ok, just accept it from now, until I fix that." (true story, although there were swearwords in the answer I received :-D )

Recent versions of httplib2 is defaulting to its own certificate store.
# Default CA certificates file bundled with httplib2.
CA_CERTS = os.path.join(
os.path.dirname(os.path.abspath(__file__ )), "cacerts.txt")
In case if you're using ubuntu/debian, you can explicitly pass the path to system certificate file like
httplib2.HTTPSConnectionWithTimeout(HOST, ca_certs="/etc/ssl/certs/ca-certificates.crt")

Maybe this could be the case:
I got the same problem and debugging the Google Lib I found out that the reason was that I was using an older version of httplib2(0.9.2). When I updated to the most recent (0.14.0) it worked.
If you already install the most recent, make sure that some lib is not installing an older version of httplib2 inside its dependencies.

When you see this error with a self-signed certificate, as often happens inside a corporate proxy, you can point httplib2 to your custom certificate bundle using an environment variable. When, for example, you don't want to (or can't) modify the code to pass the ca_certs parameter.
You can also do this when you don't want to modify the system certificate store to append your CA cert.
export HTTPLIB2_CA_CERTS="\path\to\your\CA_certs_bundle"

Suds ignoring proxy setting

I'm trying to use the salesforce-python-toolkit to make web services calls to the Salesforce API, however I'm having trouble getting the client to go through a proxy. Since the toolkit is based on top of suds, I tried going down to use just suds itself to see if I could get it to respect the proxy setting there, but it didn't work either.
This is tested on suds 0.3.9 on both OS X 10.7 (python 2.7) and ubuntu 12.04.
an example request I've made that did not end up going through the proxy (just burp or charles proxy running locally):
import suds
ws = suds.client.Client('file://sandbox.xml',proxy={'http':'http://localhost:8888'})
ws.service.login('user','pass')
I've tried various things with the proxy - dropping http://, using an IP, using a FQDN. I've stepped through the code in pdb and see it setting the proxy option. I've also tried instantiating the client without the proxy and then setting it with:
ws.set_options(proxy={'http':'http://localhost:8888'})
Is proxy not used by suds any longer? I don't see it listed directly here http://jortel.fedorapeople.org/suds/doc/suds.options.Options-class.html, but I do see it under transport. Do I need to set it differently through a transport? When I stepped through in pdb it did look like it was using a transport, but I'm not sure how.
Thank you!

I went into #suds on freenode and Xelnor/rbarrois provided a great answer! Apparently the custom mapping in suds overrides urllib2's behavior for using the system configuration environment variables. This solution now relies on having the http_proxy/https_proxy/no_proxy environment variables set accordingly.
I hope this helps anyone else running into issues with proxies and suds (or other libraries that use suds). https://gist.github.com/3721801
from suds.transport.http import HttpTransport as SudsHttpTransport
class WellBehavedHttpTransport(SudsHttpTransport):
"""HttpTransport which properly obeys the ``*_proxy`` environment variables."""
def u2handlers(self):
"""Return a list of specific handlers to add.
The urllib2 logic regarding ``build_opener(*handlers)`` is:
- It has a list of default handlers to use
- If a subclass or an instance of one of those default handlers is given
in ``*handlers``, it overrides the default one.
Suds uses a custom {'protocol': 'proxy'} mapping in self.proxy, and adds
a ProxyHandler(self.proxy) to that list of handlers.
This overrides the default behaviour of urllib2, which would otherwise
use the system configuration (environment variables on Linux, System
Configuration on Mac OS, ...) to determine which proxies to use for
the current protocol, and when not to use a proxy (no_proxy).
Thus, passing an empty list will use the default ProxyHandler which
behaves correctly.
"""
return []
client = suds.client.Client(my_wsdl, transport=WellBehavedHttpTransport())

I think you can do by using a urllib2 opener like below.
import suds
t = suds.transport.http.HttpTransport()
proxy = urllib2.ProxyHandler({'http': 'http://localhost:8888'})
opener = urllib2.build_opener(proxy)
t.urlopener = opener
ws = suds.client.Client('file://sandbox.xml', transport=t)

I was actually able to get it working by doing two things:
making sure there were keys in the proxy dict for http as well as https.
setting the proxy using set_options AFTER creation of the client.
So, my relevant code looks like this:
self.suds_client = suds.client.Client(wsdl)
self.suds_client.set_options(proxy={'http': 'http://localhost:8888', 'https': 'http://localhost:8888'})

I had multiple issues using Suds, even though my proxy was configured properly I could not connect to the endpoint wsdl. After spending significant time attempting to formulate a workaround, I decided to give soap2py a shot - https://code.google.com/p/pysimplesoap/wiki/SoapClient
Worked straight off the bat.

For anyone who's attempting cji's solution over HTTPS, you actually need to keep one of the handlers for the basic authentication. I also am using python3.7 so urllib2 has been replaced with urllib.request.
from suds.transport.https import HttpAuthenticated as SudsHttpsTransport
from urllib.request import HTTPBasicAuthHandler
class WellBehavedHttpsTransport(SudsHttpsTransport):
""" HttpsTransport which properly obeys the ``*_proxy`` environment variables."""
def u2handlers(self):
""" Return a list of specific handlers to add.
The urllib2 logic regarding ``build_opener(*handlers)`` is:
- It has a list of default handlers to use
- If a subclass or an instance of one of those default handlers is given
in ``*handlers``, it overrides the default one.
Suds uses a custom {'protocol': 'proxy'} mapping in self.proxy, and adds
a ProxyHandler(self.proxy) to that list of handlers.
This overrides the default behaviour of urllib2, which would otherwise
use the system configuration (environment variables on Linux, System
Configuration on Mac OS, ...) to determine which proxies to use for
the current protocol, and when not to use a proxy (no_proxy).
Thus, passing an empty list (asides from the BasicAuthHandler)
will use the default ProxyHandler which behaves correctly.
"""
return [HTTPBasicAuthHandler(self.pm)]

Inexplicable Urllib2 problem between virtualenv's.

I have some test code (as a part of a webapp) that uses urllib2 to perform an operation I would usually perform via a browser:
Log in to a remote website
Move to another page
Perform a POST by filling in a form
I've created 4 separate, clean virtualenvs (with --no-site-packages) on 3 different machines, all with different versions of python but the exact same packages (via pip requirements file), and the code only works on the two virtualenvs on my local development machine(2.6.1 and 2.7.2) - it won't work on either of my production VPSs
In the failing cases, I can log in successfully, move to the correct page but when I submit the form, the remote server replies telling me that there has been an error - it's an application server error page ('we couldn't complete your request') and not a webserver error.
because I can successfully log in and maneuver to a second page, this doesn't seem to be a session or a cookie problem - it's particular to the final POST
because I can perform the operation on a particular machine with the EXACT same headers and data, this doesn't seem to be a problem with what I am requesting/posting
because I am trying the code on two separate VPS rented from different companies, this doesn't seem to be a problem with the VPS physical environment
because the code works on 2 different python versions, I can't imagine it being an incompabilty problem
I'm completely lost at this stage as to why this wouldn't work. I've even 'turned-it-off-and-turn-it-on-again' because I just can't see what the problem could be.
I think it has to be something to do with the final POST coming from a VPS that the remote server doesn't like, but I can't figure out what that could be. I feel like there is something going on under the hood of URLlib that is causing the remote server to dislike the reply.
EDIT
I've installed the exact same Python version (2.6.1) on the VPS as is on my working local copy and it doesn't work remotely, so it must be something to do with originating from a VPS. How could this effect the Http request? Is it something lower level?

You might try setting the debuglevel=1 for urllib2 and see what it comes up with:
import urllib2
h=urllib2.HTTPHandler(debuglevel=1)
opener = urllib2.build_opener(h)
...

This is a total shot in the dark, but are your VPSs 64-bit and your home computer 32-bit, or vice versa? Maybe a difference in default sizes or accuracies of something could be freaking out the server.
Barring that, can you try to find out any information on the software stack the web server is using?

I had similar issues with urllib2 (working with Zimbra's REST api), in the end switched to pycurl with success.
PS
for operations of login/navigate/post, I usually find Mechanize useful and easier to use. Maybe you can give it a show.

Well, it looks like I know why the problem was happening, but I'm not 100% the reason for it.
I simply had to make the server wait (time.sleep()) after it sent the 2nd request (Move to another page) before doing the 3rd request (Perform a POST by filling in a form).
I don't know is it because of a condition with the 3rd party server, or if it's some sort of odd issue with URLlib? The reason it seemed to work on my development machine is presumably because it was slower then the server at running the code?

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.