Python Requests - No connection adapters - python

I'm using the Requests: HTTP for Humans library and I got this weird error and I don't know what is mean.
No connection adapters were found for '192.168.1.61:8080/api/call'
Anybody has an idea?

You need to include the protocol scheme:
'http://192.168.1.61:8080/api/call'
Without the http:// part, requests has no idea how to connect to the remote server.
Note that the protocol scheme must be all lowercase; if your URL starts with HTTP:// for example, it won’t find the http:// connection adapter either.

One more reason, maybe your url include some hiden characters, such as '\n'.
If you define your url like below, this exception will raise:
url = '''
http://google.com
'''
because there are '\n' hide in the string. The url in fact become:
\nhttp://google.com\n

In my case, I received this error when I refactored a url, leaving an erroneous comma thus converting my url from a string into a tuple.
My exact error message:
741 # Nothing matches :-/
--> 742 raise InvalidSchema("No connection adapters were found for {!r}".format(url))
743
744 def close(self):
InvalidSchema: No connection adapters were found for "('https://api.foo.com/data',)"
Here's how that error came to be born:
# Original code:
response = requests.get("api.%s.com/data" % "foo", headers=headers)
# --------------
# Modified code (with bug!)
api_name = "foo"
url = f"api.{api_name}.com/data", # !!! Extra comma doesn't belong here!
response = requests.get(url, headers=headers)
# --------------
# Solution: Remove erroneous comma!
api_name = "foo"
url = f"api.{api_name}.com/data" # No extra comma!
response = requests.get(url, headers=headers)

As stated in a comment by christian-long
Your url may accidentally be a tuple because of a trailing comma
url = self.base_url % endpoint,
Make sure it is a string

Related

Derive protocol from url

I do have a list of urls such as ["www.bol.com ","www.dopper.com"]format.
In order to be inputted on scrappy as start URLs I need to know the correct HTTP protocol.
For example:
["https://www.bol.com/nl/nl/", "https://dopper.com/nl"]
As you see the protocol might differ from https to http or even with or without www.
Not sure if there are any other variations.
is there any python tool that can determine the right protocol?
If not and I have to build the logic by myself what are the cases that I should take into account?
For option 2, this is what I have so far:
def identify_protocol(url):
try:
r = requests.get("https://" + url + "/", timeout=10)
return r.url, r.status_code
except requests.HTTPError:
r = requests.get("http//" + url + "/", timeout=10)
return r.url, r.status_code
except requests.HTTPError:
r = requests.get("https//" + url.replace("www.","") + "/", timeout=10)
return r.url, r.status_code
except:
return None, None
is there any other possibility I should take into account?
There is no way to determine the protocol/full domain from the fragment directly, the information simply isn't there. In order to find it you would either need:
a database of the correct protocol/domains, which you can lookup your domain fragment in
to make the request and see what the server tells you
If you do (2) you can of course gradually build your own database to avoid needing the request in future.
On many https servers, if you attempt a http connection you will be redirected to https. If you are not, then you can reliably use the http. If the http fails, then you could try again with https and see if it works.
The same applies to the domain: if the site usually redirects, you can perform the request using the original domain and see where you are redirected.
An example using requests:
>>> import requests
>>> r = requests.get('http://bol.com')
>>> r
<Response [200]>
>>> r.url
'https://www.bol.com/nl/nl/'
As you can see the request object url parameter has the final destination URL, plus protocol.
As I understood question, you need to retrieve final url after all possible redirections. It could be done with built-in urllib.request. If provided url has no scheme you can use http as default. To parse input url I used combination of urlsplit() and urlunsplit().
Code:
import urllib.request as request
import urllib.parse as parse
def find_redirect_location(url, proxy=None):
parsed_url = parse.urlsplit(url.strip())
url = parse.urlunsplit((
parsed_url.scheme or "http",
parsed_url.netloc or parsed_url.path,
parsed_url.path.rstrip("/") + "/" if parsed_url.netloc else "/",
parsed_url.query,
parsed_url.fragment
))
if proxy:
handler = request.ProxyHandler(dict.fromkeys(("http", "https"), proxy))
opener = request.build_opener(handler, request.ProxyBasicAuthHandler())
else:
opener = request.build_opener()
with opener.open(url) as response:
return response.url
Then you can just call this function on every url in list:
urls = ["bol.com ","www.dopper.com", "https://google.com"]
final_urls = list(map(find_redirect_location, urls))
You can also use proxies:
from itertools import cycle
urls = ["bol.com ","www.dopper.com", "https://google.com"]
proxies = ["http://localhost:8888"]
final_urls = list(map(find_redirect_location, urls, cycle(proxies)))
To make it a bit faster you can make checks in parallel threads using ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor
urls = ["bol.com ","www.dopper.com", "https://google.com"]
final_urls = list(ThreadPoolExecutor().map(find_redirect_location, urls))

HTTP Error 307: Temporary Redirect in Python3 - INTRANET [duplicate]

I'm using Python 3.7 with urllib.
All work fine but it seems not to athomatically redirect when it gets an http redirect request (307).
This is the error i get:
ERROR 2020-06-15 10:25:06,968 HTTP Error 307: Temporary Redirect
I've to handle it with a try-except and manually send another request to the new Location: it works fine but i don't like it.
These is the piece of code i use to perform the request:
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
self.logger.debug(req.headers)
self.logger.info(req.data)
resp = urllib.request.urlopen(req)
url is an https resource and i set an header with some Authhorization info and content-type.
req.data is a JSON
From urllib documentation i've understood that the redirects are authomatically performed by the the library itself, but it doesn't work for me. It always raises an http 307 error and doesn't follow the redirect URL.
I've also tried to use an opener specifiyng the default redirect handler, but with the same result
opener = urllib.request.build_opener(urllib.request.HTTPRedirectHandler)
req = urllib.request.Request(url)
req.add_header('Authorization', auth)
req.add_header('Content-Type','application/json; charset=utf-8')
req.data=jdati
resp = opener.open(req)
What could be the problem?
The reason why the redirect isn't done automatically has been correctly identified by yours truly in the discussion in the comments section. Specifically, RFC 2616, Section 10.3.8 states that:
If the 307 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
Back to the question - given that data has been assigned, this automatically results in get_method returning POST (as per how this method was implemented), and since that the request method is POST, and the response code is 307, an HTTPError is raised instead as per the above specification. In the context of Python's urllib, this specific section of the urllib.request module raises the exception.
For an experiment, try the following code:
import urllib.request
import urllib.parse
url = 'http://httpbin.org/status/307'
req = urllib.request.Request(url)
req.data = b'hello' # comment out to not trigger manual redirect handling
try:
resp = urllib.request.urlopen(req)
except urllib.error.HTTPError as e:
if e.status != 307:
raise # not a status code that can be handled here
redirected_url = urllib.parse.urljoin(url, e.headers['Location'])
resp = urllib.request.urlopen(redirected_url)
print('Redirected -> %s' % redirected_url) # the original redirected url
print('Response URL -> %s ' % resp.url) # the final url
Running the code as is may produce the following
Redirected -> http://httpbin.org/redirect/1
Response URL -> http://httpbin.org/get
Note the subsequent redirect to get was done automatically, as the subsequent request was a GET request. Commenting out req.data assignment line will result in the lack of the "Redirected" output line.
Other notable things to note in the exception handling block, e.read() may be done to retrieve the response body produced by the server as part of the HTTP 307 response (since data was posted, there might be a short entity in the response that may be processed?), and that urljoin is needed as the Location header may be a relative URL (or simply has the host missing) to the subsequent resource.
Also, as a matter of interest (and for linkage purposes), this specific question has been asked multiple times before and I am rather surprised that they never got any answers, which follows:
How to handle 307 redirection using urllib2 from http to https
HTTP Error 307: Temporary Redirect in Python3 - INTRANET
HTTP Error 307 - Temporary redirect in python script

Why would HTTPConnection not work? nonnumeric port

I'm trying to use httplib to check if each url in a list of 30k+ websites still works. Each url is read in from a .csv file, and into a matrix, and then that matrix goes through a for-loop for each url in the file. Afterwards, (where my problem is), I run a function, runInternet(url), which takes in the url string, and returns true if the url works, and false if it doesn't.
I've used this as my baseline, and have also looked into this. While I've tried both, I don't quite understand the latter, and neither works...
def runInternet(url):
try:
page = httplib.HTTPConnection(url)
page.connect()
except httplib.HTTPException as e:
return False
return True
However, afterwards, all the links are stated as broken! I randomly chose a few that worked, and they work when I input them into my browser...so what's happening? I've narrowed down the problem spot to this line:
page = httplib.HTTPConnection(url)
Edit: I tried inputting 'www.google.com' in exchange for the url, and the program works, and when I try printing e, it says nonnumeric port...
You could troubleshoot this by allowing the HTTPException to propagate instead of catching it. The specific exception type would likely help understand what is wrong.
I suspect though that the problem is this line:
page = httplib.HTTPConnection(url)
The first argument to the constructor is not a URL. Instead, it's a host name. For example, this code sample passing a URL to the constructor fails:
page = httplib.HTTPConnection('https://www.google.com/')
page.connect()
httplib.InvalidURL: nonnumeric port: '//www.google.com/'
Instead, if I pass host name to the constructor, and then URL to the request method, then it works:
conn = httplib.HTTPConnection('www.google.com')
conn.request('GET', '/')
resp = conn.getresponse()
print resp.status, resp.reason
200 OK
For reference, here is the relevant abridged documentation of HTTPConnection:
class HTTPConnection
| Methods defined here:
|
| __init__(self, host, port=None, strict=None, timeout=<object object>, source_address=None)
...
| request(self, method, url, body=None, headers={})
| Send a complete request to the server.

urllib2.urlopen does not add a "/" to the last of url with chinese automatically

Examples:
url_1 = "http://yinwang.org/blog-cn/2013/04/21/ydiff-%E7%BB%93%E6%9E%84%E5%8C%96%E7%9A%84%E7%A8%8B%E5%BA%8F%E6%AF%94%E8%BE%83/"
url_2 = "http://yinwang.org/blog-cn/2013/04/21/ydiff-%E7%BB%93%E6%9E%84%E5%8C%96%E7%9A%84%E7%A8%8B%E5%BA%8F%E6%AF%94%E8%BE%83"
As you see, if I don't add a / to the last of the URL, when I use urllib2.urlopen(url_2) it returns 400 error because the effective URL should be url_1, if the URL doesn't include any Chinese, the urllib2.urlopen and urllib.urlopen will add a / automatically.
The question is the urllib.urlopen works well all of these situations, but urllib2.urlopen just works well when the URL without Chinese.
So I wonder that if it is a little bug to urllib2.urlopen, or is there another explaination to it?
What actually happens here is a couple of redirections initiated by the server, before the acual error:
Request: http://yinwang.org/blog-cn/2013/04/21/ydiff-%E7%BB%93%E6%9E%84%E5%8C%96%E7%9A%84%E7%A8%8B%E5%BA%8F%E6%AF%94%E8%BE%83
Response: Redirect to http://www.yinwang.org/blog-cn/2013/04/21/ydiff-%E7%BB%93%E6%9E%84%E5%8C%96%E7%9A%84%E7%A8%8B%E5%BA%8F%E6%AF%94%E8%BE%83
Request: http://www.yinwang.org/blog-cn/2013/04/21/ydiff-%E7%BB%93%E6%9E%84%E5%8C%96%E7%9A%84%E7%A8%8B%E5%BA%8F%E6%AF%94%E8%BE%83
Response: Redirect to http://www.yinwang.org/blog-cn/2013/04/21/ydiff-结构化的程序比较/ (actually 'http://www.yinwang.org/blog-cn/2013/04/21/ydiff-\xe7\xbb\x93\xe6\x9e\x84\xe5\x8c\x96\xe7\x9a\x84\xe7\xa8\x8b\xe5\xba\x8f\xe6\xaf\x94\xe8\xbe\x83/', to be precise)
AFAIK, that last redirect is invalid. The address should be plain ASCII (non-ascii characters should be encoded). The correct encoded address would be: http://www.yinwang.org/blog-cn/2013/04/21/ydiff-%E7%BB%93%E6%9E%84%E5%8C%96%E7%9A%84%E7%A8%8B%E5%BA%8F%E6%AF%94%E8%BE%83/
Now, it seems that urllib is playing nice and doing the conversion itself, before requesting the final address, whereas urllib2 simply uses the address it receives.
You can see that if you try to open the final address manually:
urllib
>>> print urllib.urlopen('http://www.yinwang.org/blog-cn/2013/04/21/ydiff-\xe7\xbb\x93\xe6\x9e\x84\xe5\x8c\x96\xe7\x9a\x84\xe7\xa8\x8b\xe5\xba\x8f\xe6\xaf\x94\x
e8\xbe\x83/').geturl()
http://www.yinwang.org/blog-cn/2013/04/21/ydiff-%E7%BB%93%E6%9E%84%E5%8C%96%E7%9A%84%E7%A8%8B%E5%BA%8F%E6%AF%94%E8%BE%83/
urllib2
>>> try:
... urllib2.urlopen('http://www.yinwang.org/blog-cn/2013/04/21/ydiff-\xe7\xbb\x93\xe6\x9e\x84\xe5\x8c\x96\xe7\x9a\x84\xe7\xa8\x8b\xe5\xba\x8f\xe6\xaf\x94\xe8\xbe\x83/')
... except Exception as e:
... print e.geturl()
...
http://www.yinwang.org/blog-cn/2013/04/21/ydiff-š╗ôŠ×äňîľšÜäšĘőň║ĆŠ»öŔżâ/
Solution
If it is your server, you should fix the problem there. Otherwise, I guess it should be possible to write a urllib2.HTTPRedirectHandler which would encode the redirection URLs in urllib2.

How do I get HTTP header info without authentication using python?

I'm trying to write a small program that will simply display the header information of a website. Here is the code:
import urllib2
url = 'http://some.ip.add.ress/'
request = urllib2.Request(url)
try:
html = urllib2.urlopen(request)
except urllib2.URLError, e:
print e.code
else:
print html.info()
If 'some.ip.add.ress' is google.com then the header information is returned without a problem. However if it's an ip address that requires basic authentication before access then it returns a 401. Is there a way to get header (or any other) information without authentication?
I've worked it out.
After try has failed due to unauthorized access the following modification will print the header information:
print e.info()
instead of:
print e.code()
Thanks for looking :)
If you want just the headers, instead of using urllib2, you should go lower level and use httplib
import httplib
conn = httplib.HTTPConnection(host)
conn.request("HEAD", path)
print conn.getresponse().getheaders()
If all you want are HTTP headers then you should make HEAD not GET request. You can see how to do this by reading Python - HEAD request with urllib2.

Categories