I've been using the requests library in some Python code, and I need some help navigating the murky waters of corporate proxy servers.
Consider the following code:
response = requests.get(url, proxies={...})
All good so far. The requests call uses the proxies passed to it.
In the next example, requests uses proxies defined in the environment variables HTTP_PROXY and HTTPS_PROXY:
response = requests.get(url)
(The proxies parameter defaults to None, which triggers it to go and look at the environment variables.)
What I want to do is something different. I want to leave the environment variables as they are, because they're necessary for other applications I use. But I want requests NOT to use a proxy. I've tried:
response = requests.get(url, proxies={})
but requests still goes off and gets the environment variable proxies. I can't seem to stop it doing that, without unsetting my environment variables.
Any ideas?
Set the trust_env variable on the session to False. If not true (True is the default), proxy information from the environment is ignored altogether:
session = requests.Session()
session.trust_env = False
response = session.get(url)
This does also disable .netrc authentication support. If you need that still, then you have two more options that I can see:
add a NO_PROXY environment variable; set to * means no proxies should be used at all. You could do this by directly setting the key in the os.environ dictionary.
simply delete the proxy keys from os.environ.
Take into account that on OSX and Windows Python will look for proxies in the system configuration too (so the registry on Windows, and SysConf on Mac OS X).
Altering os.environ is safe. It is a regular dictionary, adding or deleting keys in your program is fine, the parent shell environment won't be altered.
Related
Up until now, whenever I have needed to store a "secret" for a simple python application, I have relied on environment variables. In Windows, I set the variables via the Computer Properties dialog and I access them in my Python code like this:
database_password = os.environ['DB_PASS']
The simplicity of this approach has served me well. Now I have a project that uses Oauth2 authentication and I have a need to store tokens to the environment that may change throughout program execution. I want them to persist the next time I execute the program. This is what I have come up with:
#fetch a new token
token = oauth.fetch_token('https://api.example.com/oauth/v2/token', code=secretcode)
access_token = token['access_token']
#make sure it persists in the current session
os.environ['TOKEN'] = access_token
#store to the system environment (Windows)
cmd = 'SETX /M TOKEN ' + access_token
os.system(cmd)
It gets the job done quickly for me today, but does not seem like the right approach to add to my toolbox. Does anyone have a more elegant way of doing what I am trying to do that does not add too many layers of complexity? If the solution worked across platforms that would be a bonus.
I have used the Python keyring module with great success. It's an interface to credential vaults provided by the operating system (e.g., Windows Credential Manager). I haven't used it on Linux, but it appears to be supported, as well.
Storing a password/token and then retrieving it can be as simple as:
import keyring
keyring.set_password("system", "username", "password")
keyring.get_password("system", "username")
The situation
If the following is not done, all outgoing HTTP or HTTPS requests made with python ends in a WinError 10054 Connection Reset, or a SSL bad handshake error.
set the HTTP_PROXY, HTTPS_PROXY environment variable, or their counterparts
What needs to be verified must be verified with a custom .crt file.
For example, assuming the .crt file is in place, both gets me a 200 OK:
import os
os.environ['HTTP_PROXY'] = #some_appropriate_address
os.environ['HTTPS_PROXY'] = #some appropriate_address
requests.get('http://www.google.com',verify="C:\the_file.crt") # 200 OK
requests.get('http://httpbin.org',verify=False) # 200 OK, but unsafe
requests.get('http://httpbin.org') # SSL bad handshake error
The Problem
There is this massive jumble of pre-written code (heavily utilizing urllib3 and requests and possibly other pieces of internet-accessing code) I have, and I have to make it work under the conditions outlined above.
Sure, I can write verify='C:\the_file.crt' for every requests.get(), but that can very quickly get hairy, right? And the code may also be using some other library (that is not requests). So I am looking for a global setting (environment variable etc.) I should alter, so that everything works well (return a 200 OK upon a GET request to a server, whether or not the code is written in requests-py).
Also, if there is no such way, I would like an explanation as to why.
What I tried (am trying)
Maybe editing the .condarc file (via conda --config) is a solution. I tried, to no avail: python gives me a "SSL verification failed" error. On the contrary, note that the code snippet above gave me a 200 OK. To my knowledge, this does not fit nicely with many situations that were previously discussed in Stack Overflow.
By the way, setting ssl_verify to false does not solve the problem either; I still get a bad handshake error for some reason.
I am using Win 10, Python 3.7.4 (Anaconda).
Update
I have edited the question to prevent future misunderstandings about the content of this question. A few answers below are a reiteration of what was written here from the start.
The current answers are not entirely satisfactory either, as they only seem to address the case where I am using requests or urllib3.
You should be able to get any python code that uses the requests module(which is inside urllib3) to work behind a proxy without modifying the python code itself by setting the following environment variables in Windows.
http_proxy http://[<user>:<pwd>#]<http_host>:<http_port>
https_proxy http://[<user>:<pwd>#]<https_host>:<https_port>
requests_ca_bundle <path_to_ca_bundle.crt>
curl_ca_bundle <path_to_ca_bundle.crt>
You can set environment variables by doing the following:
Press Windows-Key + R, enter sysdm.cpl ,3 (mind the space before the comma) and press Enter
Click the Environment variables button
In either of the fields (User variables or System variables), add the four variables
According to Doc in Requests:
https://requests.readthedocs.io/en/master/user/advanced/#proxies
you can use proxy in this way:
proxies = { 'http': 'http://10.10.1.10:3128', 'https': 'http://10.10.1.10:1080',}
requests.get('http://example.org', proxies=proxies)
Then depending on if you want to add .crt or .pem:
requests.get('https://kennethreitz.com', cert=('/path/server.crt', '/path/key'))
requests.get('https://kennethreitz.org', cert='/path/client.pem')
https://2.python-requests.org//en/v1.0.4/user/advanced/
You are trying to make https requests to an outer url and you need to provide the proper certificate files for verification. You are trying to make these configurations inside each component. But I would suggest that you make those configurations globally and system-wide so neither of the components need to provide certificates and deal with ssl-verification stuff.
I am awful at windows related networking configurations, but I would suggest you go check Proxifier and I am pretty sure you can configure a ssl proxy with proper certificates.
I'm trying to use the salesforce-python-toolkit to make web services calls to the Salesforce API, however I'm having trouble getting the client to go through a proxy. Since the toolkit is based on top of suds, I tried going down to use just suds itself to see if I could get it to respect the proxy setting there, but it didn't work either.
This is tested on suds 0.3.9 on both OS X 10.7 (python 2.7) and ubuntu 12.04.
an example request I've made that did not end up going through the proxy (just burp or charles proxy running locally):
import suds
ws = suds.client.Client('file://sandbox.xml',proxy={'http':'http://localhost:8888'})
ws.service.login('user','pass')
I've tried various things with the proxy - dropping http://, using an IP, using a FQDN. I've stepped through the code in pdb and see it setting the proxy option. I've also tried instantiating the client without the proxy and then setting it with:
ws.set_options(proxy={'http':'http://localhost:8888'})
Is proxy not used by suds any longer? I don't see it listed directly here http://jortel.fedorapeople.org/suds/doc/suds.options.Options-class.html, but I do see it under transport. Do I need to set it differently through a transport? When I stepped through in pdb it did look like it was using a transport, but I'm not sure how.
Thank you!
I went into #suds on freenode and Xelnor/rbarrois provided a great answer! Apparently the custom mapping in suds overrides urllib2's behavior for using the system configuration environment variables. This solution now relies on having the http_proxy/https_proxy/no_proxy environment variables set accordingly.
I hope this helps anyone else running into issues with proxies and suds (or other libraries that use suds). https://gist.github.com/3721801
from suds.transport.http import HttpTransport as SudsHttpTransport
class WellBehavedHttpTransport(SudsHttpTransport):
"""HttpTransport which properly obeys the ``*_proxy`` environment variables."""
def u2handlers(self):
"""Return a list of specific handlers to add.
The urllib2 logic regarding ``build_opener(*handlers)`` is:
- It has a list of default handlers to use
- If a subclass or an instance of one of those default handlers is given
in ``*handlers``, it overrides the default one.
Suds uses a custom {'protocol': 'proxy'} mapping in self.proxy, and adds
a ProxyHandler(self.proxy) to that list of handlers.
This overrides the default behaviour of urllib2, which would otherwise
use the system configuration (environment variables on Linux, System
Configuration on Mac OS, ...) to determine which proxies to use for
the current protocol, and when not to use a proxy (no_proxy).
Thus, passing an empty list will use the default ProxyHandler which
behaves correctly.
"""
return []
client = suds.client.Client(my_wsdl, transport=WellBehavedHttpTransport())
I think you can do by using a urllib2 opener like below.
import suds
t = suds.transport.http.HttpTransport()
proxy = urllib2.ProxyHandler({'http': 'http://localhost:8888'})
opener = urllib2.build_opener(proxy)
t.urlopener = opener
ws = suds.client.Client('file://sandbox.xml', transport=t)
I was actually able to get it working by doing two things:
making sure there were keys in the proxy dict for http as well as https.
setting the proxy using set_options AFTER creation of the client.
So, my relevant code looks like this:
self.suds_client = suds.client.Client(wsdl)
self.suds_client.set_options(proxy={'http': 'http://localhost:8888', 'https': 'http://localhost:8888'})
I had multiple issues using Suds, even though my proxy was configured properly I could not connect to the endpoint wsdl. After spending significant time attempting to formulate a workaround, I decided to give soap2py a shot - https://code.google.com/p/pysimplesoap/wiki/SoapClient
Worked straight off the bat.
For anyone who's attempting cji's solution over HTTPS, you actually need to keep one of the handlers for the basic authentication. I also am using python3.7 so urllib2 has been replaced with urllib.request.
from suds.transport.https import HttpAuthenticated as SudsHttpsTransport
from urllib.request import HTTPBasicAuthHandler
class WellBehavedHttpsTransport(SudsHttpsTransport):
""" HttpsTransport which properly obeys the ``*_proxy`` environment variables."""
def u2handlers(self):
""" Return a list of specific handlers to add.
The urllib2 logic regarding ``build_opener(*handlers)`` is:
- It has a list of default handlers to use
- If a subclass or an instance of one of those default handlers is given
in ``*handlers``, it overrides the default one.
Suds uses a custom {'protocol': 'proxy'} mapping in self.proxy, and adds
a ProxyHandler(self.proxy) to that list of handlers.
This overrides the default behaviour of urllib2, which would otherwise
use the system configuration (environment variables on Linux, System
Configuration on Mac OS, ...) to determine which proxies to use for
the current protocol, and when not to use a proxy (no_proxy).
Thus, passing an empty list (asides from the BasicAuthHandler)
will use the default ProxyHandler which behaves correctly.
"""
return [HTTPBasicAuthHandler(self.pm)]
I am writing a python app that needs to send and retrieve some information from Internet. I would like to auto-detect the proxy setting (to avoid asking the user to set up the proxy configuration). It seems that urllib can do this on Windows and Mac OsX and not on Unix/Linux.
I need/prefer to use the mechanize module, instead of urllib/urllib2. (It is easier to handle data encoded as "multipart/form-data).
Can the mechanize module auto-detect the proxy setting? If true, it will work on Windows, Mac OsX and Linux?
The following code does not work (I am behind a proxy on Linux), unless I uncomment the fourth line.
import mechanize
br = mechanize.Browser()
#br.set_proxies({'http': 'myproxy.com:3128'})
br.open('http://www.google.com')
response = br.geturl()
print response
I guess this means that mechanize canĀ“t auto-detect proxy setting (or may be I am doing something wrong)
How can I auto-detect the proxy setting on Linux (using python)?
EDIT: added on 9th september
I could confirm that Mechanize autodetects proxy setting on Windows, but not on Linux.
As mru correctly pointed out there is no standardized way under Linux to determine the proxy, so I guess the best solution is to check if the user is using Linux and In that case try to get the proxy settings from http_proxy environment variable or from gconf (for Gnome) or from kioslaverc (KDE). And if everything fails I will ask the user to provided the correct proxy settings (I think this is a fair solution because on one hand I think most Linux users know what a proxy is and on the other hand at least I tried to make things easier for them :-) )
One way is to check the HTTP_PROXY environment variable (that's the way wget checks if it has to use a proxy). The code could look like this for example:
import os
import mechanize
br = mechanize.Browser()
proxy = os.environ.get('HTTP_PROXY')
if proxy is not None:
br.set_proxies({'http': proxy})
br.open('http://www.google.com')
response = br.geturl()
print response
But this won't work on Windows (I don't know for MacOS as it's UNIX based).
I am working on web-crawler [using python].
Situation is, for example, I am behind server-1 and I use proxy setting to connect to the Outside world. So in Python, using proxy-handler I can fetch the urls.
Now thing is, I am building a crawler so I cannot use only one IP [otherwise I will be blocked]. To solve this, I have bunch of Proxies, I want to shuffle through.
My question is: This is two level proxy, one to connect to main server-1, I use proxy and then after to shuffle through proxies, I want to use proxy. How can I achieve this?
Update Sounds like you're looking to connect to proxy A and from there initiate HTTP connections via proxies B, C, D which are outside of A. You might look into the proxychains project which says it can "tunnel any protocol via a user-defined chain of TOR, SOCKS 4/5, and HTTP proxies".
Version 3.1 is available as a package in Ubuntu Lucid. If it doesn't work directly for you, the proxychains source code may provide some insight into how this capability could be implemented for your app.
Orig answer:
Check out the urllib2.ProxyHandler. Here is an example of how you can use several different proxies to open urls:
import random
import urllib2
# put the urls for all of your proxies in a list
proxies = ['http://localhost:8080/']
# construct your list of url openers which each use a different proxy
openers = []
for proxy in proxies:
opener = urllib2.build_opener(urllib2.ProxyHandler({'http': proxy}))
openers.append(opener)
# select a url opener randomly, round-robin, or with some other scheme
opener = random.choice(openers)
req = urllib2.Request(url)
res = opener.open(req)
I recommend you take a look at CherryProxy. It lets you send a proxy request to an intermediate server (where CherryProxy is running) and then forward your HTTP request to a proxy on a second level machine (e.g. squid proxy on another server) for processing. Viola! A two-level proxy chain.
http://www.decalage.info/python/cherryproxy