I'm trying to use httplib to check if each url in a list of 30k+ websites still works. Each url is read in from a .csv file, and into a matrix, and then that matrix goes through a for-loop for each url in the file. Afterwards, (where my problem is), I run a function, runInternet(url), which takes in the url string, and returns true if the url works, and false if it doesn't.
I've used this as my baseline, and have also looked into this. While I've tried both, I don't quite understand the latter, and neither works...
def runInternet(url):
try:
page = httplib.HTTPConnection(url)
page.connect()
except httplib.HTTPException as e:
return False
return True
However, afterwards, all the links are stated as broken! I randomly chose a few that worked, and they work when I input them into my browser...so what's happening? I've narrowed down the problem spot to this line:
page = httplib.HTTPConnection(url)
Edit: I tried inputting 'www.google.com' in exchange for the url, and the program works, and when I try printing e, it says nonnumeric port...
You could troubleshoot this by allowing the HTTPException to propagate instead of catching it. The specific exception type would likely help understand what is wrong.
I suspect though that the problem is this line:
page = httplib.HTTPConnection(url)
The first argument to the constructor is not a URL. Instead, it's a host name. For example, this code sample passing a URL to the constructor fails:
page = httplib.HTTPConnection('https://www.google.com/')
page.connect()
httplib.InvalidURL: nonnumeric port: '//www.google.com/'
Instead, if I pass host name to the constructor, and then URL to the request method, then it works:
conn = httplib.HTTPConnection('www.google.com')
conn.request('GET', '/')
resp = conn.getresponse()
print resp.status, resp.reason
200 OK
For reference, here is the relevant abridged documentation of HTTPConnection:
class HTTPConnection
| Methods defined here:
|
| __init__(self, host, port=None, strict=None, timeout=<object object>, source_address=None)
...
| request(self, method, url, body=None, headers={})
| Send a complete request to the server.
Related
so I want to check if a URL is reachable from python, and I got this code from googling:
def checkUrl(url):
p = urlparse(url)
conn = http.client.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
Here is my URL: https://eurotableau.nomisonline.com.
It works fine if I just pass that in to the function. The resp.status is 302. However, if I add a port 443 at the end of it, https://eurotableau.nomisonline.com:443, it returns false. The resp.status is 400. I tried both URL in google Chrome, both of them work. So my question is why is this happening? Anyway I can include the port value and still get valid resp.status value (< 400)? Thanks.
Use http.client.HTTPSConnection instead. The plain old HTTPConnection ignores the protocol that is part of the URL.
If you do not require the HEAD method but just wish to check if host is available then why not do:
from urllib2 import urlopen
try:
u = urlopen("https://eurotableau.nomisonline.com")
u.close()
print "Everything fine!"
except Exception, e:
if hasattr(e, "code"):
print "Server is there but something is wrong with rest of URL"
else: print "Server is on vacations or was never there!"
print e
This will establish a connection with server but it won't download any data unless you read it. It'll only read few KB to get the header (like when using HEAD method) and wait for you to request more. But you will close it there.
So, you can catch an exception and see what the problem is, or if there is no exception, just close the connection.
urllib2 will handle HTTPS and protocol://user#URL:PORT for you neatly.
No worries about anything.
I'm using the Requests: HTTP for Humans library and I got this weird error and I don't know what is mean.
No connection adapters were found for '192.168.1.61:8080/api/call'
Anybody has an idea?
You need to include the protocol scheme:
'http://192.168.1.61:8080/api/call'
Without the http:// part, requests has no idea how to connect to the remote server.
Note that the protocol scheme must be all lowercase; if your URL starts with HTTP:// for example, it won’t find the http:// connection adapter either.
One more reason, maybe your url include some hiden characters, such as '\n'.
If you define your url like below, this exception will raise:
url = '''
http://google.com
'''
because there are '\n' hide in the string. The url in fact become:
\nhttp://google.com\n
In my case, I received this error when I refactored a url, leaving an erroneous comma thus converting my url from a string into a tuple.
My exact error message:
741 # Nothing matches :-/
--> 742 raise InvalidSchema("No connection adapters were found for {!r}".format(url))
743
744 def close(self):
InvalidSchema: No connection adapters were found for "('https://api.foo.com/data',)"
Here's how that error came to be born:
# Original code:
response = requests.get("api.%s.com/data" % "foo", headers=headers)
# --------------
# Modified code (with bug!)
api_name = "foo"
url = f"api.{api_name}.com/data", # !!! Extra comma doesn't belong here!
response = requests.get(url, headers=headers)
# --------------
# Solution: Remove erroneous comma!
api_name = "foo"
url = f"api.{api_name}.com/data" # No extra comma!
response = requests.get(url, headers=headers)
As stated in a comment by christian-long
Your url may accidentally be a tuple because of a trailing comma
url = self.base_url % endpoint,
Make sure it is a string
I'm trying to learn how urllib2 works and how it encapsulates its various components before sending out an actual request or response.
So far I have:
theurl = "www.example.com"
That obviously specifies the URL to look at.
req = urllib2.Request(theurl)
Don't know what this does, hence the question.
handle = urllib2.urlopen(req)
This one gets the page and does all the requests and responses required.
So my question is, what does urllib2.Request actually do?
To try and look at it to get an idea I tried
print req
and just got
<urllib2.Request instance at 0x123456789>
I also tried
print req.read()
and got:
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/lib64/python2.4/urllib2.py, line 207, in `__`getattr`__`
raise AttributeError, attr
AttributeError: read
So I'm obviously doing something wrong. If anyone can help in one of both my questions that would be great.
The class "Request" you're asking about:
http://docs.python.org/library/urllib2.html#urllib2.Request
class urllib2.Request(url[, data][,
headers][, origin_req_host][,
unverifiable])
This class is an abstraction of a URL
request.
The function you actually want to make a request (which can accept a Request object or wrap one around a URL string you provice) constructing a Request object): http://docs.python.org/library/urllib2.html#urllib2.urlopen
urllib2.urlopen(url[, data][,timeout])
Open the URL url, which can be either a string or a Request object.
Example:
theurl = "www.example.com"
try:
resp = urllib2.urlopen(theurl)
print resp.read()
except IOError as e:
print "Error: ", e
Example 2 (with Request):
theurl = "www.example.com"
try:
req = urllib2.Request(theurl)
print req.get_full_url()
print req.get_method()
print dir(req) # list lots of other stuff in Request
resp = urllib2.urlopen(req)
print resp.read()
except IOError as e:
print "Error: ", e
urllib2.Request() looks like a function call, but isn't - it's an object constructor. It creates an object of type Request from the urllib2 module, documented here.
As such, it probably doesn't do anything except initialise itself. You can verify this by looking at the source code, which should be in your Python installation's lib directory (urllib2.py, at least in Python 2.x).
If you want to have the constructed URL in the Request object use :
print(urllib2.Request.get_full_url())
I'm newbie for python, I'm having task so I need to scan wifi and send the data to the server, the below is the format which i have to send, this work fine when enter manually in browser url text box,
http://223.56.124.58:8080/ppod-web/ProcessRawData?data={"userId":"2220081127-14","timestamp":"2010-04-12 10:54:24","wifi":{"ssid":"guest","rssi":"80"}}
here is my code:
import httplib
import urllib
params = urllib.urlencode('{\"userId\":\"20081127-14\",\"timestamp\":\"2010-04-12 10:54:24\",\"wifi\":{\"ssid\":\"guest\",\"rssi\":\"80\"}}')
headers = {"Content-type":"application/x-www-form-urlencoded","Accept":"text/plain"}
conn = httplib.HTTPConnection("http://223.56.124.58:8080")
conn.request("POST","ppod-web/ProcessRawData?data=",params,headers)
response = conn.getresponse()
print response.status
print "-----"
print response.reason
data = response.read()
print data
conn.close()
thanks
Most likely, the issue with the script you posted in the question is you cannot directly do:
conn=httplib.HTTPConnection("http://223.56.124.58:8080/wireless")
The exception is triggered in getaddrinfo(), which calls the C function getaddrinfo() which returns EAI_NONAME:
The node or service is not known; or both node and service are NULL; or AI_NUMERICSERV was specified in hints.ai_flags and service was not a numeric port-number string."
There obviously is a problem with the parameters passed to getaddrinfo, and most likely you are trying to get information for the "223.56.124.58:8080/wireless" host. Ooops!
Indeed, you cannot directly connect to an URL address. As the documentation clearly states and shows, you connect to the server:
conn = httplib.HTTPConnection("223.56.124.58", 8080)
Then you can do:
conn.request("POST", "wireless", params, headers)
What about the script you are actually using?
conn.request("POST","http://202.45.139.58:8080/ppod-web",params,headers)
Even if the connection was correctly formed, that would have you POSTing to http://202.45.139.58:8080/http://202.45.139.58:8080/ppod-web. What you really want probably is:
conn = httplib.HTTPConnection("202.45.139.58", 8080)
conn.request("POST", "ppod-web", params, headers)
The error is shown for this line because most likely HTTPConnection is a lazy object and only attempts to actually connect to the server when you call request().
After you're done fixing the above, you'll need to fix params.
>>> urllib.urlencode({"wifi":{"ssid":"guest","rssi","80"}})
SyntaxError: invalid syntax
>>> urllib.urlencode({"wifi":{"ssid":"guest","rssi":"80"}})
'wifi=%7B%27rssi%27%3A+%2780%27%2C+%27ssid%27%3A+%27guest%27%7D'
To get what you think you want to get, you should do:
>>> urllib.urlencode({"data": {"wifi":{"ssid":"guest","rssi":"80"}}})
'data=%7B%27wifi%27%3A+%7B%27rssi%27%3A+%2780%27%2C+%27ssid%27%3A+%27guest%27%7D%7D'
Instead of:
conn = httplib.HTTPConnection("http://223.56.124.58:8080/wireless")
conn.request("POST", "data", params, headers)
try:
conn = httplib.HTTPConnection("223.56.124.58", port=8080)
conn.request("POST", "/wireless", params, headers)
Not sure if it will resolve all your problems, but at least your code will conform to the method/constructor signatures.
The traceback doesn't come from the same code you pasted.
On the error traceback there's a line:
conn.request("POST","http://202.45.139.58:8080/ppod-web",params,headers)
It is the line 9 of http.py however it is not on the code you pasted.
Please paste the actual code.
I've been trying to get this to work for a while now and I don't know what to try. I made a Flask app and I'm testing it now on localhost. I run a server for my client on localhost:8000 and another for the Flask app on localhost:8080, then I use flask-cors to enable cross-domain requests. All seems to work fine, the app does its job. However, when it comes to storing values in its session, it suddenly fails. Here's the key parts of the code:
def retrieve_tokens(self, session, code):
# Get tokens from ORCID given a request code
headers = {'Accept': 'application/json'}
payload = dict(self._details)
payload.update({
'grant_type': 'authorization_code',
'code': code,
})
try:
r = post(self._url + 'oauth/token',
data=payload, headers=headers)
except ConnectionError:
return None
# Save them (if no error has occurred)
rjson = r.json()
if 'error' not in rjson:
print rjson
session['login_details'] = rjson
print session['login_details']
return r.json()
def get_tokens(self, session, code=None):
# Retrieve existing tokens, or ask for new ones
if 'login_details' in session and code is None:
print session['login_details']
return session['login_details']
elif code is not None:
# Retrieve them
return self.retrieve_tokens(session, code)
else:
# Something went wrong
return None
As you can see, there's some print debug calls in there. When retrieve_tokens is called everything goes fine; most importantly, the second print gives the same result as the first (namely, a JSON object with all the requested tokens). However get_tokens obstinately returns None. Any clue what I may be doing wrong? A few things I ruled out:
my app_secret is set up and should be fine. I use a static one, loaded from a JSON file (so that I can put it into .gitignore for my repository)
setting SERVER_NAME doesn't help
importing session in the main file and passing it to these functions (that are inside another file) instead of importing it directly in the external file doesn't change anything
using session.modified = True after changing the value doesn't change anything
What can I try? Nothing seems to work. Is it a problem with running on localhost? A bug? I'm using Chrome on Ubuntu 16.04, if that helps.
The problem is that you are assigning a value within an if-statement. So if the condition of the if-statement is not given you will not assign a value that your will use later and your code will fail with that. Try to change the if-statement by a if-else statement to ensure this field of session will have a value.