I'm getting a weird error that I can't seem to find a solution for.
This error does not occur every time I hit this segment of code, and neither does it happen for the same iteration through the loop (it happens in a loop). If I run it enough, it doesn't seem to encounter the error and the program executes successfully. Regardless, I'd still like to figure out why this is happening.
Here is my error, versions, trace, etc: http://dpaste.com/681658/
It seems to happen with the following line in my code:
page = urllib2.urlopen(url)
Where url is.... a URL obviously.
And do have import urllib2 in my code.
The BadStatusLine exception is raised when you call urllib2.urlopen(url) and the remote server responds with a status code that python cannot understand.
Assuming that you don't control url, you can't prevent this from happening. All you can do is catch the exception, and manage it gracefully.
from httplib import BadStatusLine
try:
page = urllib2.urlopen(url)
# do something with page
except BadStatusLine:
print "could not fetch %s" % url
Explanations from other users are right and good, but in practice you may find this useful:
In my experience this usually happens when you are sending unquoted values to the url parameters, like values containing spaces or other characters that need to be quotes or url encoded.
This doesn't have anything to do with Django, it's an exception thrown by urllib2 which couldn't parse the response after fetching your url. It may be a network issue, a malformed response… Some servers / applications throw this kind of error randomly. If you don't control what this URL returns you're left with catching the exception, debugging which URLs are causing problems and trying to identify a pattern.
Related
I'm using soundcloud-python https://github.com/soundcloud/soundcloud-python for Soundcloud API on Ubuntu Server 16.04.1 (installed with pip install soundcloud).
Soundcloud API Rate Limits official page https://developers.soundcloud.com/docs/api/rate-limits#play-requests says that, in case an app exceeds the API rate limits, the body of the 429 Client Error response would be a JSON object, containing some additional info.
I'm interested in getting reset_time field, to inform the user when the block will be over.
The problem is that when, for example, like rate limits is exceeded, doing response = client.put('/me/favorites/%d' % song_id) the app crashes and response is null.
How can I get the JSON response body?
Why don't you read the package's source code and find out by yourself ?
Let's see... You don't explain you got that client object but browsing the source code we can see there's a "client.py" module that defines a Client class. This class does'nt not define a put method explicitely but it defines the __getattr__ hook :
def __getattr__(self, name, **kwargs):
"""Translate an HTTP verb into a request method."""
if name not in ('get', 'post', 'put', 'head', 'delete'):
raise AttributeError
return partial(self._request, name, **kwargs)
Ok, so Client.put(...) returns a partial object wrapping Client._request, which is a quite uselessly convoluted way to define Client.put(**kwargs) as return self._request("put", **kwargs).
Now let's look at Client._request: it basically make a couple sanity checks, updates **kwargs and returns wrapped_resource(make_request(method, url, kwargs)).
Looking up the imports at the beginning of the module, we can see that make_request comes from "request.py" and wrapped_resource from "resources.py".
You mention that doing an api call while over the rate limit "crashes the application" - I assume you mean "raises an exception" (BTW please post exceptions and tracebacks when saking about such problems) - so assuming this is handled at the lower level, let's start with request.make_request. A lot of data formatting / massaging obviously and finally the interesting part: a call to response.raise_for_status(). This is a hint that we are actually delegating to the famous python-requests package, which is actually confirmed a few lines above and in the requirements file
If we read python-requests fine manual, we find out what raise_for_status does - it raises a requests.exceptions.HTTPError for client (4XX) and server (5XX) status codes.
Ok now we know what exception we have. Note that you had all those informations already in your exception and traceback, which would have saved us a lot of pain here had you posted it.
But anyway... It looks like we won't get the response content, does it ? Well, wait, we're not done yet - python-requests is a fairly well designed package, so chances are we can still rescue our response. And indeed, if we look at requests.exceptions source code, we find out that HttpError is a subclass of RequestException, and that RequestException is "Initialize(d)" with "request and response objects."
Hurray, we do have our response - in the exception. So all we have to do is catch the exception and check it's response attribute - which should contains the "additional informations".
Now please understand that this took me more than half an hour to write, but about 7 minutes to sort out without the traceback - with the traceback it would have boiled down to a mere 2 minutes, the time to go to the requests.exceptions source code and make sure it keeped the request and response. Ok I'm cheating, I'm used to read source code and I use python-requests a lot, but still: you could have solved this by yourself in less than an hour, specially with python's interactive shell which let's you explore and test live objects in real time.
I am parsing through various links using requests, however some links are "bad" aka they basically just don't load which causes my program to become hung up and eventually crash.
Is there a way to set a time limit for getting a request, and if that time passes (fails to get a request from the url) it will just return some kind of error? Or is there some other way I can prevent bad links from breaking my program?
urllib2 is one option
import urllib2
test_url = "http://www.test.com"
try:
urllib2.urlopen(test_url)
except:
pass
The default message for Flask 400 exception (abort()) is:
{
"message": "The browser (or proxy) sent a request that this server could not understand."
}
For 404:
{
"message": "The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again. You have requested this URI [/obj/] but did you mean /obj/ or /obj/<int:id>/ or /obj/<int:id>/kill/ ?"
}
I have trouble comprehending these messages when I'm getting them as replies in my API (especially the first one, I thought there's something wrong with encryption or headers) and I thing it's kinda tiresome to try to override text manually for every abort() exception. So I change the mapping:
from flask import abort
from werkzeug.exceptions import HTTPException
class BadRequest(HTTPException):
code = 400
description = 'Bad request.'
class NotFound(HTTPException):
code = 404
description = 'Resource not found.'
abort.mapping.update({
400: BadRequest,
404: NotFound
})
For the case of 400 it works beautifully. But when it comes to 404 it is still the same message. I tested it in the same place in my code - it works for abort(400), abort(403) and some of the others, but it gets mysteriously overridden by default message on abort(404). Debugging didn't help much. What may be the culprit here?
Update. Yes, I'm using abort imported from flask not flask_restful as the latter doesn't have the mapping and it's a function not an Aborter object. Besides, it does work for most exceptions, so it's probably not the real issue here.
Update 2. The abort.mapping seems to be perfectly fine on execution. The exceptions in question are overridden, including 404.
Update 3: I've put together a little sandbox, that I use for debugging. (removed the repo since the mystery is long solved).
It took me some time, but now I actually found the place, where it all derails on 404 error. It's actually an undocumented feature in flask-restful. Look at the code here. Whatever message you chose persists until that very place and then it becomes the default. What we need now is just put ERROR_404_HELP = False to our config and everything works as intended.
Why is this code even there in the first place? OK, perhaps, I can live with that, but it should be all over the documentation. Even when I googled the name of the constant, I only got a couple of GitHub issues (1, 2).
Anyways, the mystery is officially solved.
By the way... I can't point to documentation for how I discovered this, I just tried it (that's how I learn most development!) but, you can simply abort with the desired response code but instead return a custom string with it. I think this makes sense because you're using the framework the way it's intended, you're not writing tons of code, you're returning the correct response code and in the fashion the framework expects, and you're informing any human who reads it as to the application's context for the error.
from flask import abort
abort(404, "And here's why.")
I have a scenario where I need to download the xml of the provided url of rss feed.
I am using following code for the same:
urls = ['http://static.espncricinfo.com/rss/livescores.xml', 'http://ibnlive.in.com/ibnrss/top.xml']
for rssUrl in urls:
if rssUrl is not None:
dom = parse(urllib.urlopen(rssUrl))
tmp = dom.toprettyxml()
When I am running this as an independent application it is running fine without any issue.
But when I am calling this code from a websocket application there is no consistency in the execution of the code.
Some times it works properly and some times it doesn't. Plus this behavior is random. Please can any one tell what may be the reason behind it?
The error shown is:-
<urlopen error [Errno 66] unknown>
I have tried using urllib2 instead of urllib. But the problem persists.
Edit:I have found that i made a mistake, because the cause of the error wasn't urllib but nltk, which wasn't able to process a long string which came from this exact page. Sorry for this one.
I do not know why, but this no matter if I use Urllib2.urlopen or request when I come across a specific url.
import requests
r = requests.get('SomeURL')
print html = r.text
Here is its behavior.
1) When I go thought a loop of 200 urls it freezes each time at that exactly the same URL. It stays here for hours if i do not terminate program.
2) When u try with just example of the code outside of the loop it works.
3) If i blacklist just this url it goes through the loop without problems.
It actually doesn't return any kind of error code and it work good outside of loop and also timeout is set but it doesn't do anything. It still hangs for an indefinite time.
So is there any other way to forcefully stop the http get request after a certain time, because the timeout doesn't work. Is there any other library other than urllib2 and request that could do the job, and that follows timeout limits?
for i in range(0,mincount):
code(call the request for urlist[i])
It always works but freezes only when I request this site. If i had 200 request to yahoo for example it would work. But when i try go to this particular url i cannot.
#end
edit: It is a standard for cycle and there is not much room for error.
I think it's simply a very slow page; on my system, it takes about 9.7s to load.
If you are trying to run it in a short loop, it would indeed seem to freeze.
You could try something like
links = [
'SomeURL',
'http://www.google.com/'
]
for link in links:
try:
html = requests.get(link, timeout=2.).content
print("Successfully loaded {}".format(link))
except requests.Timeout:
print("Timed out loading {}".format(link))
which gave me
Timed out loading SomeURL
Successfully loaded http://www.google.com/