I am using urlib library in python,
For any error in URL ,I am using try catch block to catch it.
But sometimes I am getting empty data in url, how to check or validate the empty data from URL .And also using the timeout , given 25 seconds.is it good to give 25 seconds or it should be below 10?
You can use whatever timeout length is appropriate for your program. If you expect that it might sometimes take whatever URL you're querying up to 25 seconds to respond, then 25 is appropriate. If it should never take more than a few seconds to respond, and you can safely assume that if it's taken longer than a few seconds the URL must be dead, then you can lower the timeout. In general I think it's a good idea to be conservative with timeouts. It's better to make the error case a little slower with a timeout that's too long, rather than falsely triggering an error with a timeout that's too short.
You can check for an empty response from urllib2 by doing something like this
fh = urllib2.urlopen(url)
response = fh.read()
if not response:
# Do whatever error handling you want. You don't necessarily need to raise Exception.
raise Exception("Empty response")
Is that what you're looking for?
Related
I have a program and in order verify that the user doesnt download such big files using input i need a time limit on how long each request is allowed to take.
Does anyone know a good way to put a time limit(/lifetime) on each python requests get requests so if it takes 10 seconds an exception will be thrown.
Thanks
You can define your own timeout like:
requests.get('https://github.com/', timeout=0.001)
You can pass an additional timeout parameter to every request you make. This is always recommended as it will make your code more robust to hanging indefinitely in case you don't receive a response from the other end.
requests.get('https://github.com/', timeout=0.001)
Read the official python request documentation for timeouts here.
import requests
while True:
try:
posting = requests.post(url,json = data,headers,timeout = 3.05)
except requests.exceptions.ConnectionError as e:
continue
# If a read_timeout error occurs, start from the beginning of the loop
except requests.exceptions.ReadTimeout as e:
continue
a link to more code : Multiple accidental POST requests in Python
This code is using requests library to perform POST requests indefinitely. I noticed that when try fails multiple of times and the while loop starts all over multiple of times, that when I can finally send the post request, I find out multiple of entries from the server side at the same second. I was writing to a txt file at the same time and it showed one entry only. Each entry is 5 readings. Is this an issue with the library itself? Is there a way to fix this?! No matter what kind of conditions that I put it still doesn't work :/ !
You can notice the reading at 12:11:13 has 6 parameters per second while at 12:14:30 (after the delay, it should be every 10 seconds) it is a few entries at the same second!!! 3 entries that make up 18 readings in one second, instead of 6 only!
It looks like the server receives your requests and acts upon them but fails to respond in time (3s is a pretty low timeout, a load spike/paging operation can easily make the server miss it unless it employs special measures). I'd suggest to
process requests asynchronously (e.g. spawn threads; Asynchronous Requests with Python requests discusses ways to do this with requests) and do not use timeouts (TCP has its own timeouts, let it fail instead).
reuse the connection(s) (TCP has quite a bit of overhead for connection establishing/breaking) or use UDP instead.
include some "hints" (IDs, timestamps etc.) to prevent the server from adding duplicate records. (I'd call this one a workaround as the real problem is you're not making sure if your request was processed.)
From the server side, you may want to:
Respond ASAP and act upon the info later. Do not let pending action prevent answering further requests.
In queries like aggregations and cardinality search there might be a timeout.
I noticed that when executing queries from python client the response sometimes contains:
{
"took":1200184,
"timed_out":true,
"_shards":{
"total":84,
"successful":84,
"failed":0
}
And returns less results than the expected.
My main problem is that when timeout occurs, response still contains a number of results.
I could check if timeout is true before parsing response results but there is probably a better way to do that :)... like raise an exception or somehow catch timeout and retry
You can increase the timeout for elasticsearch using:-
es.search(index="my_index",
doc_type="document",
body=get_req_body(),
request_timeout=30)
By default the value assigned is 10. If ,on the other hand you want to catch exception you can use a scheduler and check the time elapsed and catch the exception if it exceeds the time limit.
Elasticsearch-py client has a named argument you can pass that will let you set timeout value for the search request.
But I'd suggest using scrolling to obtain results in such scenarios, it is similar to a cursor for database query. Here's a really good example of how to use scrolling. With a limited scroll size, the request is less likely to timeout and you will be able to fetch all the results instead of receiving partial results.
Example search call with timeout parameter
es.search(index="index", doc_type="doc_type", body=body, timeout=50)
I have an API manager that connects to an URL and grabs some json. Very simple.
Cut from the method:
req = Request(url)
socket.setdefaulttimeout(timeout)
resp = urlopen(req, None, timeout)
data = resp.read()
resp.close()
It works fine most of the time, but at random intervals it takes 5 s to complete the request. Even when timeout is set to 0.5 or 1.0 or whatever.
I have logged it very closely so I am 100% sure that the line that takes time is number #3 (ie. resp = urlopen(req, None, timeout)).
Ive tried all solutions Ive found on the topic of timeout decorators and Timers etc.
(To list some of them:
Python urllib2.urlopen freezes script infinitely even though timeout is set,
How can I force urllib2 to time out?, Timing out urllib2 urlopen operation in Python 2.4, Timeout function if it takes too long to finish
)
But nothing works. My impression is that the thread freezes while urlopen does something and when its done it unfreezes and then all the timers and timeouts returns w timeout errors. but the execution time is still more then 5s.
I've found this old mailing list regarding urllib2 and handling of chunked encoding. So if the problem is still present then the solution might be to write a custom urlopen based on httplib.HTTP and not httplib.HTTPConnection.
Another possible solution is to try some multithreading magic....
Both solutions seem to aggresive. And it bugs me that the timeout does not work all the way.
It is very important that the execution time of the script does not exceed 0.5s. Anyone that knows why I am experiencing the freezes or maybe a way to help me?
Update based on accepted answer:
I changed the approach and use curl instead. Together w unix timeout it works just as I want. Example code follows:
t_timeout = str(API_TIMEOUT_TIME)
c_timeout = str(CURL_TIMEOUT_TIME)
cmd = ['timeout', t_timeout, 'curl', '--max-time', c_timeout, url]
prc = Popen(cmd, stdout=PIPE, stderr=PIPE)
response = prc.communicate()
Since curl only accepts int as timeout I added timeout. timeout accepts floats.
Looking through the source code, the timeout value is actually the maximum amount of time that Python will wait between receiving packets from the remote host.
So if you set the timeout to two seconds, and the remote host sends 60 packets at the rate of one packet per second, the timeout will never occur, although the overall process will still take 60 seconds.
Since the urlopen() function doesn't return until the remote host has finished sending all the HTTP headers, then if it sends the headers very slowly, there's not much you can do about it.
If you need an overall time limit, you'll probably have to implement your own HTTP client with non-blocking I/O.
I'm still relatively new to Python, so if this is an obvious question, I apologize.
My question is in regard to the urllib2 library, and it's urlopen function. Currently I'm using this to load a large amount of pages from another server (they are all on the same remote host) but the script is killed every now and then by a timeout error (I assume this is from the large requests).
Is there a way to keep the script running after a timeout? I'd like to be able to fetch all of the pages, so I want a script that will keep trying until it gets a page, and then moves on.
On a side note, would keeping the connection open to the server help?
Next time the error occurs, take note of the error message. The last line will tell you the type of exception. For example, it might be a urllib2.HTTPError. Once you know the type of exception raised, you can catch it in a try...except block. For example:
import urllib2
import time
for url in urls:
while True:
try:
sock=urllib2.urlopen(url)
except (urllib2.HTTPError, urllib2.URLError) as err:
# You may want to count how many times you reach here and
# do something smarter if you fail too many times.
# If a site is down, pestering it every 10 seconds may not
# be very fruitful or polite.
time.sleep(10)
else:
# Success
contents=sock.read()
# process contents
break # break out of the while loop
The missing manual of urllib2 might help you