How can I open a website with urllib via proxy in Python?

How can I open a website with urllib via proxy in Python? - python

I have this program that check a website, and I want to know how can I check it via proxy in Python...
this is the code, just for example
while True:
try:
h = urllib.urlopen(website)
break
except:
print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
time.sleep(5)

By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:
$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py # Using http://myproxy.example.com:1234 as a proxy
If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:
proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)
Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?
candidate_proxies = ['http://proxy1.example.com:1234',
'http://proxy2.example.com:1234',
'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
print("Trying HTTP proxy %s" % proxy)
try:
result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
print("Got URL using proxy %s" % proxy)
break
except:
print("Trying next proxy in 5 seconds")
time.sleep(5)

Python 3 is slightly different here. It will try to auto detect proxy settings but if you need specific or manual proxy settings, think about this kind of code:
#!/usr/bin/env python3
import urllib.request
proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass#server:port',
'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
with urllib.request.urlopen(url) as response:
# ... implement things such as 'html = response.read()'
Refer also to the relevant section in the Python 3 docs

Here example code guide how to use urllib to connect via proxy:
authinfo = urllib.request.HTTPBasicAuthHandler()
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.google.com/')
"""

For http and https use:
proxies = {'http':'http://proxy-source-ip:proxy-port',
'https':'https://proxy-source-ip:proxy-port'}
more proxies can be added similarly
proxies = {'http':'http://proxy1-source-ip:proxy-port',
'http':'http://proxy2-source-ip:proxy-port'
...
}
usage
filehandle = urllib.urlopen( external_url , proxies=proxies)
Don't use any proxies (in case of links within network)
filehandle = urllib.urlopen(external_url, proxies={})
Use proxies authentication via username and password
proxies = {'http':'http://username:password#proxy-source-ip:proxy-port',
'https':'https://username:password#proxy-source-ip:proxy-port'}
Note: avoid using special characters such as :,# in username and passwords

Related

Why does the proxy server use my private IP?

I am trying to scrape using proxies (this proxy server is a free one from the internet); in particular I would like to use their IP, not my private one. To test my script I am trying to Access "http://whatismyipaddress.com/" to see which IP this site sees. As it turns out it will see my private IP. Can somebody tell me what's wrong here?
import requests
from fake_useragent import UserAgent
def getMyIP(proxyServer,myPrivateIP):
scrape_website = "http://whatismyipaddress.com/"
ua = UserAgent()
headers = {'User-Agent': ua.random}
try:
response = requests.get(scrape_website,headers=headers,proxies={"https":proxyServer})
except:
faultString = proxyServer + " did not work; " + "\n"
print(faultString)
return
if myPrivateIP in str(response.content):
print("They found my private IP.")
proxyServer = "http://103.250.158.23:61219"
myPrivateIP = "xxx.xxx.xxx.xxx"
getMyIP(proxyServer,myPrivateIP)

Two things:
You set an {'https': ...} proxy configuration. This means for any HTTPS requests, it will use that proxy. You're requesting an HTTP URL however, so that proxy isn't getting used. Configure an 'http' proxy instead or in addition.
If the proxy forwards your IP in an HTTP header, and the target server heeds that header, that's tough luck and nothing you can do anything about, besides using a different proxy which doesn't forward your IP. I think point 1 is more likely the issue though.

Python urllib2 proxy setting issue

I've looked through some of the other posts on this and I hope I'm not duplicating, but I'm stuck on a real headscratcher with setting a proxy server for urllib2. I'm running the below:
file, site = argv
uri = 'https://'+site
http_proxy_server = "http://newyork.wonderproxy.com"
http_proxy_port = "11001"
http_proxy_user = "user"
http_proxy_passwd = "password"
http_proxy_full_auth_string = "http://%s:%s#%s:%s" % (http_proxy_user,
http_proxy_passwd,
http_proxy_server,
http_proxy_port)
proxy_handler = urllib2.ProxyHandler({"http": http_proxy_full_auth_string})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
html = opener.open(uri).read()
print html, 'it opened!'
I'm running this against an IP info site, but try as I might the response always comes out with my non-proxy IP address. When I manually set my proxy through system settings I do get a different response, so I've confirmed it's not an issue with the proxy criteria itself.
Any help that could be offered would be much appreciated!

Well this is a bit silly, but I tried a different example and my connection is working fine now.
import urllib2
proxlist= ['minneapolis.wonderproxy.com', 'newyork.wonderproxy.com']
ports = [0,1,2,3]
for prox in proxlist:
for port in ports:
proxy = urllib2.ProxyHandler({'http': 'http://user:password#%s:1100%s'%(prox,port)})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
try:
conn = urllib2.urlopen('http://www.howtofindmyipaddress.com/')
return_str = conn.read()
str_find = '<span style="font-size: 80px; color: #22BB22; font-family: Calibri,Arial;">'
strt = return_str.find(str_find)+len(str_find)
print prox, port, return_str[strt:return_str.find('</span',strt)-1]
except urllib2.URLError:
print prox, port, 'That\'s a no go'
Only difference I can see is that the second one used HTTPHandler instead of Proxy, as I have an apparently solution I'm not too worried, but woudl still be interested to know why I had this issue in the first place.

Your question sets the proxy URL to
http://user:password#http://newyork.wonderproxy.com:11001
which isn't valid. If you changed http_proxy_server to newyork.wonderproxy.com then your first solution might work better.

Setting proxy to urllib.request (Python3)

How can I set proxy for the last urllib in Python 3.
I am doing the next
from urllib import request as urlrequest
ask = urlrequest.Request(url) # note that here Request has R not r as prev versions
open = urlrequest.urlopen(req)
open.read()
I tried adding proxy as follows :
ask=urlrequest.Request.set_proxy(ask,proxies,'http')
However I don't know how correct it is since I am getting the next error:
336 def set_proxy(self, host, type):
--> 337 if self.type == 'https' and not self._tunnel_host:
338 self._tunnel_host = self.host
339 else:
AttributeError: 'NoneType' object has no attribute 'type'

You should be calling set_proxy() on an instance of class Request, not on the class itself:
from urllib import request as urlrequest
proxy_host = 'localhost:1234' # host and port of your proxy
url = 'http://www.httpbin.org/ip'
req = urlrequest.Request(url)
req.set_proxy(proxy_host, 'http')
response = urlrequest.urlopen(req)
print(response.read().decode('utf8'))

I needed to disable the proxy in our company environment, because I wanted to access a server on localhost. I could not disable the proxy server with the approach from #mhawke (tried to pass {}, None and [] as proxies).
This worked for me (can also be used for setting a specific proxy, see comment in code).
import urllib.request as request
# disable proxy by passing an empty
proxy_handler = request.ProxyHandler({})
# alertnatively you could set a proxy for http with
# proxy_handler = request.ProxyHandler({'http': 'http://www.example.com:3128/'})
opener = request.build_opener(proxy_handler)
url = 'http://www.example.org'
# open the website with the opener
req = opener.open(url)
data = req.read().decode('utf8')
print(data)

Urllib will automatically detect proxies set up in the environment - so one can just set the HTTP_PROXY variable either in your environment e.g. for Bash:
export HTTP_PROXY=http://proxy_url:proxy_port
or using Python e.g.
import os
os.environ['HTTP_PROXY'] = 'http://proxy_url:proxy_port'
Note from the urllib docs: "HTTP_PROXY[environment variable] will be ignored if a variable REQUEST_METHOD is set; see the documentation on getproxies()"

import urllib.request
def set_http_proxy(proxy):
if proxy == None: # Use system default setting
proxy_support = urllib.request.ProxyHandler()
elif proxy == '': # Don't use any proxy
proxy_support = urllib.request.ProxyHandler({})
else: # Use proxy
proxy_support = urllib.request.ProxyHandler({'http': '%s' % proxy, 'https': '%s' % proxy})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
proxy = 'user:pass#ip:port'
set_http_proxy(proxy)
url = 'https://www.httpbin.org/ip'
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
html = response.read()
html

Can't send HTTPS request through proxy using urlib2

I'm trying to create a Python script that sends a HTTPS request through a proxy (Burp, to be exact), but it keeps failing with
ssl.CertificateError: hostname 'example.com:443' doesn't match u'example.com'
Here's an abbreviated version of my code:
proxy = urllib2.ProxyHandler({'https': '127.0.0.1:8080'})
opener = urllib2.build_opener(proxy)
opener.addheaders = [ ("Host", "example.com"),
...
]
urllib2.install_opener(opener)
try:
req = opener.open( 'https://example.com/service', 'data' ).read()
except urllib2.URLError, e:
print e
So it looks like Python thinks that Python (ssl.CertificateError is, I believe, a Python error, not an OpenSSL error) has a problem with either the port or that one of the addresses is in Unicode. Neither makes sense to me. Any sugestions?

try this code. i got it working with burp
test.py
import urllib2
opener = urllib2.build_opener(
urllib2.HTTPHandler(),
urllib2.HTTPSHandler(),
urllib2.ProxyHandler({'https': 'localhost:8080'}))
urllib2.install_opener(opener)
print opener.open( 'https://example.com', 'data' ).read()
burp configuration
Demo

How to specify an authenticated proxy for a python http connection?

What's the best way to specify a proxy with username and password for an http connection in python?

This works for me:
import urllib2
proxy = urllib2.ProxyHandler({'http': 'http://
username:password#proxyurl:proxyport'})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
conn = urllib2.urlopen('http://python.org')
return_str = conn.read()

Use this:
import requests
proxies = {"http":"http://username:password#proxy_ip:proxy_port"}
r = requests.get("http://www.example.com/", proxies=proxies)
print(r.content)
I think it's much simpler than using urllib. I don't understand why people love using urllib so much.

Setting an environment var named http_proxy like this: http://username:password#proxy_url:port

The best way of going through a proxy that requires authentication is using urllib2 to build a custom url opener, then using that to make all the requests you want to go through the proxy. Note in particular, you probably don't want to embed the proxy password in the url or the python source code (unless it's just a quick hack).
import urllib2
def get_proxy_opener(proxyurl, proxyuser, proxypass, proxyscheme="http"):
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, proxyurl, proxyuser, proxypass)
proxy_handler = urllib2.ProxyHandler({proxyscheme: proxyurl})
proxy_auth_handler = urllib2.ProxyBasicAuthHandler(password_mgr)
return urllib2.build_opener(proxy_handler, proxy_auth_handler)
if __name__ == "__main__":
import sys
if len(sys.argv) > 4:
url_opener = get_proxy_opener(*sys.argv[1:4])
for url in sys.argv[4:]:
print url_opener.open(url).headers
else:
print "Usage:", sys.argv[0], "proxy user pass fetchurls..."
In a more complex program, you can seperate these components out as appropriate (for instance, only using one password manager for the lifetime of the application). The python documentation has more examples on how to do complex things with urllib2 that you might also find useful.

Or if you want to install it, so that it is always used with urllib2.urlopen (so you don't need to keep a reference to the opener around):
import urllib2
url = 'www.proxyurl.com'
username = 'user'
password = 'pass'
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
# None, with the "WithDefaultRealm" password manager means
# that the user/pass will be used for any realm (where
# there isn't a more specific match).
password_mgr.add_password(None, url, username, password)
auth_handler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(auth_handler)
urllib2.install_opener(opener)
print urllib2.urlopen("http://www.example.com/folder/page.html").read()

Here is the method use urllib
import urllib.request
# set up authentication info
authinfo = urllib.request.HTTPBasicAuthHandler()
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.python.org/')
"""

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How can I open a website with urllib via proxy in Python? - python

Related

Why does the proxy server use my private IP?

Python urllib2 proxy setting issue

Setting proxy to urllib.request (Python3)

Can't send HTTPS request through proxy using urlib2

How to specify an authenticated proxy for a python http connection?

Categories

Resources