If I run:
urllib2.urlopen('http://google.com')
even if I use another url, I get the same error.
I'm pretty sure there is no firewall running on my computer or router, and the internet (from a browser) works fine.
The problem, in my case, was that some install at some point defined an environment variable http_proxy on my machine when I had no proxy.
Removing the http_proxy environment variable fixed the problem.
The site's DNS record is such that Python fails the DNS lookup in a peculiar way: it finds the entry, but zero associated IP addresses. (Verify with nslookup.) Hence, 11004, WSANO_DATA.
Prefix the site with 'www.' and try the request again. (Use nslookup to verify that its result is different, too.)
This fails essentially the same way with the Python Requests module:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='...', port=80): Max retries exceeded with url: / (Caused by : [Errno 11004] getaddrinfo failed)
This may not help you if it's a network-level issue but you can get some debugging info by setting debuglevel on httplib. Try this:
import urllib, urllib2, httplib
url = 'http://www.mozillazine.org/atom.xml'
httplib.HTTPConnection.debuglevel = 1
print "urllib"
data = urllib.urlopen(url);
print "urllib2"
request = urllib2.Request(url)
opener = urllib2.build_opener()
feeddata = opener.open(request).read()
Which is copied directly from here, hope that's kosher: http://bytes.com/topic/python/answers/517894-getting-debug-urllib2
You probably need to use a proxy. Check your normal browser settings to find out which. Take a look at opening websites using urllib2 from behind corporate firewall - 11004 getaddrinfo failed for a similar problem with solution.,
To troubleshoot the issue:
let us know on what OS is the script running and what version of Python
In command prompt on that very same machine, do ping google.com and observe if that works (or you get say "could not find host")
If (2) worked, open browser on that machine (try in IE if on Windows) and try opening "google.com" there. If there is a problem, look closely at proxy settings in Internet Options / Connections / LAN Settings
Let us know how it goes either way.
add s to the http i.e urllib2.urlopen('https://google.com')
worked for me
Related
I'm trying to connect to one of my internal services at: https://myservice.my-alternative-domain.com through Python Requests. I'm using Python 3.6
I'm using a custom CA bundle to verify the request, and I'm getting the next error:
SSLError: hostname 'myservice.my-domain.com' doesn't match either of 'my-domain.com', 'my-alternative-domain.com'
The SSL certificate that the internal service uses has as CN: my-domain.com, and as SAN (Subject Alternative Names): 'my-domain.com', 'my-alternative-domain.com'
So, I'm trying to access the service through one of the alternative names (this has to be like this and it's not under my control)
I think the error is correct, and that the certificate should have also as SAN:
'*.my-alternative-domain.com'
in order for the request to work.
The only thing that puzzles me is that I can access the service through the browser.
Can somebody confirm the behavior of Python Requests is correct?
This is how I call the service:
response = requests.get('https://myservice.my-alternative-domain.com', params=params, headers=headers, verify=ca_bundle)
Thanks
pass verify as false might work
x=requests.get(-----,verify=false)
I've been struggling with my company proxy to make an https request.
import requests
from requests.auth import HTTPProxyAuth
proxy_string = 'http://user:password#url_proxt:port_proxy'
s = requests.Session()
s.proxies = {"http": proxy_string , "https": proxy_string}
s.auth = HTTPProxyAuth(user,password)
r = s.get('http://www.google.com') # OK
print(r.text)
r = s.get('https://www.google.com',proxies={"http": proxy_string , "https": proxy_string}) #OK
print(r.text)
r = s.get('https://www.google.com') # KO
print(r.text)
When KO, I have the following exception :
HTTPSConnectionPool(host='www.google.com', port=443): Max retries exceeded with url: / (Caused by ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 407 Proxy Authentication Required',)))
I looked online but didn't find someone having this specific issue with HTTPS.
Thank you for your time
Thanks to the amazing help of Lukasa, I solved my issue.
Please see discussion on fix here
or set :
session.trust_env=False
I personally solved the above problem on my system by updating the environment variables http_proxy,https_proxy,socks_proxy,ftp_proxy.
First enter the command on your terminal : printenv
This should show you the environment variables on your system.
In my case intially:
http_proxy=http://proxyserver:port/
I changed it to : http_proxy=http://username:password#proxy:port/
using the command
export http_proxy="http://username:password#proxy:port/"
Similarly for https_proxy,socks_proxy,ftp_proxy
Other way i have resolved is - speak with your corporate IT administrator and find a direct proxy port which connects to external domain (with / without password)
pip install --proxy=http://proxyhost:proxy_port pixiedust
Found from other colleagues using the proxy (proxy_port direct connection) in their eclipse settings (network)
To anyone else that tried the accepted answer's "session.trust_env=False" with no success, there may be a deeper issue that produces a similar error (which is probably not the issue the OP had): There may be a corporate proxy configuration that requires specific headers to be sent upon CONNECT, and python requests doesn't send them ('User-Agent' and 'Host', for example).
I do not have a solution for that at the moment. See https://github.com/psf/requests/issues/5028 for a discussion on the subject.
I'm trying use SSH tunnels inside of Python's urllib2.
Creating the tunnel:
ssh -N user#machine.place.edu -L 1337:localhost:80
The above line should use port 80 on the remote machine and port 1337 on the local machine.
I used -N, so the bash prompt (intentionally) hangs so long as the this tunnel is running.
Using the tunnel in urllib2:
import urllib2
url = "http://ifconfig.me/ip"
headers={'User-agent' : 'Mozilla/5.0'}
proxy_support = urllib2.ProxyHandler({'http': 'http://127.0.0.1:1337'})
opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1))
urllib2.install_opener(opener)
req = urllib2.Request(url, None, headers)
html = urllib2.urlopen(req).read()
print html
When I run the above code, html = urllib2.urlopen(req).read() throws the error urllib2.HTTPError: HTTP Error 404: Not Found.
What might be going wrong, and how can we fix it?
Troubleshooting:
If I turn off the SSH tunnel, the error changes to urllib2.URLError: <urlopen error [Errno 61] Connection refused>. So, Python is clearly "seeing" the SSH tunnel.
If I comment out the proxy stuff by replacing opener = urllib2.build_opener(proxy_support, urllib2.HTTPHandler(debuglevel=1)) with opener = urllib2.build_opener(), then the ifconfig.me page downloads properly. (Of course, the project that I'm working on requires me to access documents from a few different networks, so I still need proxies to work.)
Some StackOverflow posts suggest using Requests instead of urllib2. I wouldn't mind using Requests instead -- I just used urllib2 here because I wasn't sure how to do custom headers (e.g. user-agent, referer) in Requests.
Unfortunately, since you're the only one with access to machine.place.edu, it's going to be impossible for anyone else to reproduce the problem.
First of all, try something like...
$ telnet localhost 1337
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
GET http://ifconfig.me/ip HTTP/1.0
...and hit enter a couple of times after the 'GET' line, and see what you get back.
If you get a 404, there's probably something wrong with the proxy.
If you get a 200, then you should be able to recreate that fairly easily with httplib.
Does urllib2 in Python 2.6.1 support proxy via https?
I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml:
NOTE
Currently urllib2 does not support
fetching of https locations through a
proxy. This can be a problem.
I'm trying automate login in to web site and downloading document, I have valid username/password.
proxy_info = {
'host':"axxx", # commented out the real data
'port':"1234" # commented out the real data
}
proxy_handler = urllib2.ProxyHandler(
{"http" : "http://%(host)s:%(port)s" % proxy_info})
opener = urllib2.build_opener(proxy_handler,
urllib2.HTTPHandler(debuglevel=1),urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
fullurl = 'https://correct.url.to.login.page.com/user=a&pswd=b' # example
req1 = urllib2.Request(url=fullurl, headers=headers)
response = urllib2.urlopen(req1)
I've had it working for similar pages but not using HTTPS and I suspect it does not get through proxy - it just gets stuck in the same way as when I did not specify proxy. I need to go out through proxy.
I need to authenticate but not using basic authentication, will urllib2 figure out authentication when going via https site (I supply username/password to site via url)?
EDIT:
Nope, I tested with
proxies = {
"http" : "http://%(host)s:%(port)s" % proxy_info,
"https" : "https://%(host)s:%(port)s" % proxy_info
}
proxy_handler = urllib2.ProxyHandler(proxies)
And I get error:
urllib2.URLError: urlopen error
[Errno 8] _ssl.c:480: EOF occurred in
violation of protocol
Fixed in Python 2.6.3 and several other branches:
_bugs.python.org/issue1424152 (replace _ with http...)
http://www.python.org/download/releases/2.6.3/NEWS.txt
Issue #1424152: Fix for httplib, urllib2 to support SSL while working through
proxy. Original patch by Christopher Li, changes made by Senthil Kumaran.
I'm not sure Michael Foord's article, that you quote, is updated to Python 2.6.1 -- why not give it a try? Instead of telling ProxyHandler that the proxy is only good for http, as you're doing now, register it for https, too (of course you should format it into a variable just once before you call ProxyHandler and just repeatedly use that variable in the dict): that may or may not work, but, you're not even trying, and that's sure not to work!-)
Incase anyone else have this issue in the future I'd like to point out that it does support https proxying now, make sure the proxy supports it too or you risk running into a bug that puts the python library into an infinite loop (this happened to me).
See the unittest in the python source that is testing https proxying support for further information:
http://svn.python.org/view/python/branches/release26-maint/Lib/test/test_urllib2.py?r1=74203&r2=74202&pathrev=74203
I am using Python's urllib2 with Tor as a proxy to access a website. When I
open the site's main page it works fine but when I try to view the login page
(not actually log-in but just view it) I get the following error...
URLError: <urlopen error (10060, 'Operation timed out')>
To counteract this I did the following:
import socket
socket.setdefaulttimeout(None).
I still get the same timeout error.
Does this mean the website is timing out on the server side? (I don't know much
about http processes so sorry if this is a dumb question)
Is there any way I can correct it so that Python is able to view the page?
Thanks,
Rob
According to the Python Socket Documentation the default is no timeout so specifying a value of "None" is redundant.
There are a number of possible reasons that your connection is dropping. One could be that your user-agent is "Python-urllib" which may very well be blocked. To change your user agent:
request = urllib2.Request('site.com/login')
request.add_header('User-Agent','Mozilla/5.0 (X11; U; Linux i686; it-IT; rv:1.9.0.2) Gecko/2008092313 Ubuntu/9.04 (jaunty) Firefox/3.5')
You may also want to try overriding the proxy settings before you try and open the url using something along the lines of:
proxy = urllib2.ProxyHandler({"http":"http://127.0.0.1:8118"})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
I don't know enough about Tor to be sure, but the timeout may not happen on the server side, but on one of the Tor nodes somewhere between you and the server. In that case there is nothing you can do other than to retry the connection.
urllib2.urlopen(url[, data][, timeout])
The optional timeout parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS, FTP and FTPS connections.
http://docs.python.org/library/urllib2.html