When using urllib2 (and maybe urllib) on windows python seems to magically pick up the authenticated proxy setting applied to InternetExplorer. However, it doesn't seem to check and process the Advance setting "Exceptions" list.
Is there a way I can get it to process the exceptions list? Or, ignore the IE proxy setting and apply my own proxy opener to address this issue?
I played with creating a proxy opener before, but couldn't get it to work. Here's what I managed to dig out, but I still don't see how/where to apply any exceptions and I'm not even sure if this is right:
proxy_info = {
'host':'myproxy.com',
'user':Username,
'pass':Password,
'port':1080
}
http_str = "http://%(user)s:%(pass)s#%(host)s:%(port)d" % proxy_info
authInfo = urllib2.HTTPBasicAuthHandler()
authInfo.add_password()
proxy_dict = {'http':http_str}
proxyHandler = urllib2.ProxyHandler(proxy_dict)
# apply the handler to an opener
proxy_opener = urllib2.build_opener(proxyHandler, urllib2.HTTPHandler)
urllib2.install_opener(proxy_opener)
By default urllib2 gets the proxy settings from the environment variable, which is why it is using the IE settings. This is very handy, because you don't need to setup authentication yourself.
You can't apply exceptions like you want to, the easiest way to do this would be to have two openers and decide which one to use depending on whether the domain is in your exception list or not.
Use the default opener for when you want to use the proxy, and one without a proxy for when you don't need it:
>>> no_proxy = urllib2.ProxyHandler({})
>>> opener = urllib2.build_opener(no_proxy)
>>> urllib2.install_opener(opener)
From here.
Edit:
Here's how I'd do it:
exclusion_list = ['http://www.google.com/', 'http://localhost/']
no_proxy = urllib2.ProxyHandler({})
no_proxy_opener = urllib2.build_opener(no_proxy)
default_proxy_opener = urllib2.build_opener()
url = 'http://www.example.com/'
if url in exclusion_list:
opener = no_proxy_opener
else:
opener = default_proxy_opener
page = opener.open(url)
print page
Your biggest problem will be matching the url to the exclusion list, but that's a whole new question.
Related
I've looked through some of the other posts on this and I hope I'm not duplicating, but I'm stuck on a real headscratcher with setting a proxy server for urllib2. I'm running the below:
file, site = argv
uri = 'https://'+site
http_proxy_server = "http://newyork.wonderproxy.com"
http_proxy_port = "11001"
http_proxy_user = "user"
http_proxy_passwd = "password"
http_proxy_full_auth_string = "http://%s:%s#%s:%s" % (http_proxy_user,
http_proxy_passwd,
http_proxy_server,
http_proxy_port)
proxy_handler = urllib2.ProxyHandler({"http": http_proxy_full_auth_string})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
html = opener.open(uri).read()
print html, 'it opened!'
I'm running this against an IP info site, but try as I might the response always comes out with my non-proxy IP address. When I manually set my proxy through system settings I do get a different response, so I've confirmed it's not an issue with the proxy criteria itself.
Any help that could be offered would be much appreciated!
Well this is a bit silly, but I tried a different example and my connection is working fine now.
import urllib2
proxlist= ['minneapolis.wonderproxy.com', 'newyork.wonderproxy.com']
ports = [0,1,2,3]
for prox in proxlist:
for port in ports:
proxy = urllib2.ProxyHandler({'http': 'http://user:password#%s:1100%s'%(prox,port)})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
try:
conn = urllib2.urlopen('http://www.howtofindmyipaddress.com/')
return_str = conn.read()
str_find = '<span style="font-size: 80px; color: #22BB22; font-family: Calibri,Arial;">'
strt = return_str.find(str_find)+len(str_find)
print prox, port, return_str[strt:return_str.find('</span',strt)-1]
except urllib2.URLError:
print prox, port, 'That\'s a no go'
Only difference I can see is that the second one used HTTPHandler instead of Proxy, as I have an apparently solution I'm not too worried, but woudl still be interested to know why I had this issue in the first place.
Your question sets the proxy URL to
http://user:password#http://newyork.wonderproxy.com:11001
which isn't valid. If you changed http_proxy_server to newyork.wonderproxy.com then your first solution might work better.
I'm making HTTP requests with Python's urllib2 which go through a proxy.
proxy_handler = urllib2.ProxyHandler({'http': 'http://myproxy'})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
r = urllib2.urlopen('http://www.pbr.com')
I'd like to log all headers from this request. I know that using a standard HTTPHandler you can do:
handler = urllib2.HTTPHandler(debuglevel=1)
Is there something like this for ProxyHandler?
I'm pretty sure debuglevel isn't documented.
In practice, it's actually a feature of httplib that urllib2 just forwards along for convenience, so you don't have to pass lambda: httplib.HTTPConnection(debuglevel=1) in place of the default httplib.HTTPConnection as your HTTP object factory. So, you're unlikely to find anything similar in any of the other handlers.
But if you want to rely on an undocumented feature of the implementation, you're really going to need to read the source to see for yourself.
At any rate, the obvious way to add debugging to any of the handlers is to subclass them and do it yourself. For example:
class LoggingProxyHandler(urllib2.ProxyHandler):
def proxy_open(self, req, proxy, type):
had_proxy = req.has_proxy()
response = super(LoggingProxyHandler, self).proxy_open(req, proxy, type)
if not had_proxy and req.has_proxy():
# log stuff here
return response
I'm relying on internal knowledge that ProxyHandler calls set_proxy on the request if it doesn't have one and needs one. It might be cleaner to instead examine the response… but you may not get all the information you want that way.
My problem is that I want to save a file given by an url.
say the url is something like 'http://www.somesitename.com/Something/filename.fileextension"
for example
some_url = 'http://www.fordantitrust.com/files/python.pdf'
filename = myfile.pdf
I want to download this file.
I know I can do it easily with urllib.urlretrieve(some_url,filename) as soon as you dont't have any proxy in between your system and the requested url.
I am having a proxy so each time I want to download this file I have to pass that proxy.
I don't know how to do this.
Any help is appreciated.
Urllib is deprecated since Python 2.6, use urllib2 instead. Genereally, proxy is handled by urllib2 transparently if a global proxy is set. If not, try use urllib2.proxyhandler to set your proxy.
Sample code from python docs :
proxy_handler = urllib2.ProxyHandler({'http': 'http://www.example.com:3128/'})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler()
proxy_auth_handler.add_password('realm', 'host', 'username', 'password')
opener = urllib2.build_opener(proxy_handler, proxy_auth_handler)
# This time, rather than install the OpenerDirector, we use it directly:
opener.open('http://www.example.com/login.html')
I have this program that check a website, and I want to know how can I check it via proxy in Python...
this is the code, just for example
while True:
try:
h = urllib.urlopen(website)
break
except:
print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
time.sleep(5)
By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:
$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py # Using http://myproxy.example.com:1234 as a proxy
If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:
proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)
Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?
candidate_proxies = ['http://proxy1.example.com:1234',
'http://proxy2.example.com:1234',
'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
print("Trying HTTP proxy %s" % proxy)
try:
result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
print("Got URL using proxy %s" % proxy)
break
except:
print("Trying next proxy in 5 seconds")
time.sleep(5)
Python 3 is slightly different here. It will try to auto detect proxy settings but if you need specific or manual proxy settings, think about this kind of code:
#!/usr/bin/env python3
import urllib.request
proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass#server:port',
'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
with urllib.request.urlopen(url) as response:
# ... implement things such as 'html = response.read()'
Refer also to the relevant section in the Python 3 docs
Here example code guide how to use urllib to connect via proxy:
authinfo = urllib.request.HTTPBasicAuthHandler()
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.google.com/')
"""
For http and https use:
proxies = {'http':'http://proxy-source-ip:proxy-port',
'https':'https://proxy-source-ip:proxy-port'}
more proxies can be added similarly
proxies = {'http':'http://proxy1-source-ip:proxy-port',
'http':'http://proxy2-source-ip:proxy-port'
...
}
usage
filehandle = urllib.urlopen( external_url , proxies=proxies)
Don't use any proxies (in case of links within network)
filehandle = urllib.urlopen(external_url, proxies={})
Use proxies authentication via username and password
proxies = {'http':'http://username:password#proxy-source-ip:proxy-port',
'https':'https://username:password#proxy-source-ip:proxy-port'}
Note: avoid using special characters such as :,# in username and passwords
What's the best way to specify a proxy with username and password for an http connection in python?
This works for me:
import urllib2
proxy = urllib2.ProxyHandler({'http': 'http://
username:password#proxyurl:proxyport'})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
conn = urllib2.urlopen('http://python.org')
return_str = conn.read()
Use this:
import requests
proxies = {"http":"http://username:password#proxy_ip:proxy_port"}
r = requests.get("http://www.example.com/", proxies=proxies)
print(r.content)
I think it's much simpler than using urllib. I don't understand why people love using urllib so much.
Setting an environment var named http_proxy like this: http://username:password#proxy_url:port
The best way of going through a proxy that requires authentication is using urllib2 to build a custom url opener, then using that to make all the requests you want to go through the proxy. Note in particular, you probably don't want to embed the proxy password in the url or the python source code (unless it's just a quick hack).
import urllib2
def get_proxy_opener(proxyurl, proxyuser, proxypass, proxyscheme="http"):
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, proxyurl, proxyuser, proxypass)
proxy_handler = urllib2.ProxyHandler({proxyscheme: proxyurl})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler(password_mgr)
return urllib2.build_opener(proxy_handler, proxy_auth_handler)
if __name__ == "__main__":
import sys
if len(sys.argv) > 4:
url_opener = get_proxy_opener(*sys.argv[1:4])
for url in sys.argv[4:]:
print url_opener.open(url).headers
else:
print "Usage:", sys.argv[0], "proxy user pass fetchurls..."
In a more complex program, you can seperate these components out as appropriate (for instance, only using one password manager for the lifetime of the application). The python documentation has more examples on how to do complex things with urllib2 that you might also find useful.
Or if you want to install it, so that it is always used with urllib2.urlopen (so you don't need to keep a reference to the opener around):
import urllib2
url = 'www.proxyurl.com'
username = 'user'
password = 'pass'
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# None, with the "WithDefaultRealm" password manager means
# that the user/pass will be used for any realm (where
# there isn't a more specific match).
password_mgr.add_password(None, url, username, password)
auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
print urllib2.urlopen("http://www.example.com/folder/page.html").read()
Here is the method use urllib
import urllib.request
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.python.org/')
"""