Python elasticsearch timeout - python

In queries like aggregations and cardinality search there might be a timeout.
I noticed that when executing queries from python client the response sometimes contains:
{
"took":1200184,
"timed_out":true,
"_shards":{
"total":84,
"successful":84,
"failed":0
}
And returns less results than the expected.
My main problem is that when timeout occurs, response still contains a number of results.
I could check if timeout is true before parsing response results but there is probably a better way to do that :)... like raise an exception or somehow catch timeout and retry

You can increase the timeout for elasticsearch using:-
es.search(index="my_index",
doc_type="document",
body=get_req_body(),
request_timeout=30)
By default the value assigned is 10. If ,on the other hand you want to catch exception you can use a scheduler and check the time elapsed and catch the exception if it exceeds the time limit.

Elasticsearch-py client has a named argument you can pass that will let you set timeout value for the search request.
But I'd suggest using scrolling to obtain results in such scenarios, it is similar to a cursor for database query. Here's a really good example of how to use scrolling. With a limited scroll size, the request is less likely to timeout and you will be able to fetch all the results instead of receiving partial results.
Example search call with timeout parameter
es.search(index="index", doc_type="doc_type", body=body, timeout=50)

Related

Does AsyncIOMotorCommandCursor.fetch_next prevent a retryable read?

I'm trying to diagnose a read failure with motor during a Mongo Atlas cluster failover. The Retryable-Reads specification defines that aggregate calls are retryable, but Cursor.getMore is not. I have code that looks like this:
cursor = db.foo.aggregate([...])
if not await cursor.fetch_next:
raise SomeException
doc = cursor.next_object()
This code appears to not retry during a cluster failover, because it's internally calling getMore I assume. I'm not entirely clear whether that's the case or not. Not to mention that fetch_next is deprecated anyway.
Would changing it to this make it a retryable read?
async for doc in cursor:
break
else:
raise SomeException
Or does this result in the same internal processing, and the problem is elsewhere?
The goal is to try to read the single result document from an aggregation pipeline (there's either one or none) in a retryable manner and raise an exception if there's none.
You need to rescue whatever exception the driver raises on a network error outside of this code and repeat the entire iteration in that case.
The driver automatically retries initial queries but not getMores, because those don't have a way of specifying the position in the result set. Thus the driver's retryable reads logic is insufficient for a robust application, you still need to handle the possibility of read errors in your application on the level of complete iterations.
If you are retrieving a single document, it should generally be included in the initial query response and no getMores would be needed, thus in practice this question wouldn't apply.

Put a time limit on a request

I have a program and in order verify that the user doesnt download such big files using input i need a time limit on how long each request is allowed to take.
Does anyone know a good way to put a time limit(/lifetime) on each python requests get requests so if it takes 10 seconds an exception will be thrown.
Thanks
You can define your own timeout like:
requests.get('https://github.com/', timeout=0.001)
You can pass an additional timeout parameter to every request you make. This is always recommended as it will make your code more robust to hanging indefinitely in case you don't receive a response from the other end.
requests.get('https://github.com/', timeout=0.001)
Read the official python request documentation for timeouts here.

requests + grequests: is the "Connection pool is full, discarding connection:" warning relevant?

I'm hosting a server on localhost and I want to fire hundreds of GET requests asynchronously. For this I am using grequests. Everything appears to work fine but I repeatedly get the warning:
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: date.jsontest.com
A search shows how the full pool issue can be avoided when creating a Session() in requests e.g. here. However, a couple of things:
Even if I don't take any steps to avoid the warning, I appear to consistently get the expected results. If I do use the workaround, any requests over the number of the pool_maxsize will give a warning.
The linked workaround will still result in the warning if the number of requests exceeds the pool size. I assumed there would be some kind of throttling to prevent the pool size being exceeded at any one time
I can't seem to find a way to disable the warning. requests.packages.urllib3.disable_warnings() doesn't seem to do anything.
So my questions are:
What does this warning actually mean? My interpretation is that it is simply dropping the requests from firing, but it doesn't seem to be the case.
Is this warning actually relevant for the grequests library, especially when I take steps to limit the pool size? Am I inviting unexpected behaviour and fluking my expected result in my tests?
Is there a way to disable it?
Some code to test:
import grequests
import requests
requests.packages.urllib3.disable_warnings() # Doesn't seem to work?
session = requests.Session()
# Hashing the below will cause 105 warnings instead of 5
adapter = requests.adapters.HTTPAdapter(pool_connections=100,
pool_maxsize=100)
session.mount('http://', adapter)
# Test query
query_list = ['http://date.jsontest.com/' for x in xrange(105)]
rs = [grequests.get(item, session=session) for item in query_list]
responses = grequests.map(rs)
print len([item.json() for item in responses])
1) What does this warning actually mean? My interpretation is that it
is simply dropping the requests from firing, but it doesn't seem to be
the case.
This is actually still unclear to me. Even firing one request was enough to get the warning but would still give me the expected response.
2) Is this warning actually relevant for the grequests library,
especially when I take steps to limit the pool size? Am I inviting
unexpected behaviour and fluking my expected result in my tests?
For the last part: yes. The server I was communicating with could handle 10 queries concurrently. With the following code I could send 400 or so requests in a single list comprehension and everything worked out fine (i.e. my server never got swamped so it must have been throttling in some way). After some tipping point in the number of requests, the code would stop firing any requests and simply give a list of None. It's not as though it even tried to get through the list, it didn't even fire the first query, it just blocks up.
sess = requests.Session()
adapter = requests.adapters.HTTPAdapter(pool_connections=10,
pool_maxsize=10)
sess.mount('http://', adapter)
# Launching ~500 or more requests will suddenly cause this to fail
rs = [grequests.get(item[0], session=session) for item in queries]
responses = grequests.map(rs)
3) Is there a way to disable it?
Yes, if you want to be a doofus like me and hash it out in the source code. I couldn't find any other way to silence it, and it came back to bite me.
SOLUTION
The solution was a painless transition to using requests-futures instead. The following code behaves exactly as expected, gives no warnings and, thus far, scales to any number of queries that I throw at it.
from requests_futures.sessions import FuturesSession
session = FuturesSession(max_workers = 10)
fire_requests = [session.get(url) for url in queries]
responses = [item.result() for item in fire_requests]

Issue with sending POST requests using the library requests

import requests
while True:
try:
posting = requests.post(url,json = data,headers,timeout = 3.05)
except requests.exceptions.ConnectionError as e:
continue
# If a read_timeout error occurs, start from the beginning of the loop
except requests.exceptions.ReadTimeout as e:
continue
a link to more code : Multiple accidental POST requests in Python
This code is using requests library to perform POST requests indefinitely. I noticed that when try fails multiple of times and the while loop starts all over multiple of times, that when I can finally send the post request, I find out multiple of entries from the server side at the same second. I was writing to a txt file at the same time and it showed one entry only. Each entry is 5 readings. Is this an issue with the library itself? Is there a way to fix this?! No matter what kind of conditions that I put it still doesn't work :/ !
You can notice the reading at 12:11:13 has 6 parameters per second while at 12:14:30 (after the delay, it should be every 10 seconds) it is a few entries at the same second!!! 3 entries that make up 18 readings in one second, instead of 6 only!
It looks like the server receives your requests and acts upon them but fails to respond in time (3s is a pretty low timeout, a load spike/paging operation can easily make the server miss it unless it employs special measures). I'd suggest to
process requests asynchronously (e.g. spawn threads; Asynchronous Requests with Python requests discusses ways to do this with requests) and do not use timeouts (TCP has its own timeouts, let it fail instead).
reuse the connection(s) (TCP has quite a bit of overhead for connection establishing/breaking) or use UDP instead.
include some "hints" (IDs, timestamps etc.) to prevent the server from adding duplicate records. (I'd call this one a workaround as the real problem is you're not making sure if your request was processed.)
From the server side, you may want to:
Respond ASAP and act upon the info later. Do not let pending action prevent answering further requests.

Urlib in python, how to check response received or not?

I am using urlib library in python,
For any error in URL ,I am using try catch block to catch it.
But sometimes I am getting empty data in url, how to check or validate the empty data from URL .And also using the timeout , given 25 seconds.is it good to give 25 seconds or it should be below 10?
You can use whatever timeout length is appropriate for your program. If you expect that it might sometimes take whatever URL you're querying up to 25 seconds to respond, then 25 is appropriate. If it should never take more than a few seconds to respond, and you can safely assume that if it's taken longer than a few seconds the URL must be dead, then you can lower the timeout. In general I think it's a good idea to be conservative with timeouts. It's better to make the error case a little slower with a timeout that's too long, rather than falsely triggering an error with a timeout that's too short.
You can check for an empty response from urllib2 by doing something like this
fh = urllib2.urlopen(url)
response = fh.read()
if not response:
# Do whatever error handling you want. You don't necessarily need to raise Exception.
raise Exception("Empty response")
Is that what you're looking for?

Categories