celery + eventlet = 100% CPU usage

celery + eventlet = 100% CPU usage - python

We are using celery to get flights data from different travel
agencies, every request takes ~20-30 seconds(most agencies require
request sequence - authorize, send request, poll for results).
Normal
celery task looks like this:
from eventlet.green import urllib2, time
def get_results(attr, **kwargs):
search, provider, minprice = attr
data = XXX # prepared data
host = urljoin(MAIN_URL, "RPCService/Flights_SearchStart")
req = urllib2.Request(host, data, {'Content-Type': 'text/xml'})
try:
response_stream = urllib2.urlopen(req)
except urllib2.URLError as e:
return [search, None]
response = response_stream.read()
rsp_host = urljoin(MAIN_URL, "RPCService/FlightSearchResults_Get")
rsp_req = urllib2.Request(rsp_host, response, {'Content-Type':
'text/xml'})
ready = False
sleeptime = 1
rsp_response = ''
while not ready:
time.sleep(sleeptime)
try:
rsp_response_stream = urllib2.urlopen(rsp_req)
except urllib2.URLError as e:
log.error('go2see: results fetch failed for %s IOError %s'%
(search.id, str(e)))
else:
rsp_response = rsp_response_stream.read()
try:
rsp = parseString(rsp_response)
except ExpatError as e:
return [search, None]
else:
ready = rsp.getElementsByTagName('SearchResultEx')
[0].getElementsByTagName('IsReady')[0].firstChild.data
ready = (ready == 'true')
sleeptime += 1
if sleeptime > 10:
return [search, None]
hash = "%032x" % random.getrandbits(128)
open(RESULT_TMP_FOLDER+hash, 'w+').write(rsp_response)
# call to parser
parse_agent_results.apply_async(queue='parsers', args=[__name__,
search, provider, hash])
This tasks are run in eventlet pool with concurency 300,
prefetch_multiplier = 1, broker_limit = 300
When ~100-200 task are fetched from queue - CPU usage raises up to 100%
( whole CPU core is used) and task fetching from queue is performed
with delays.
Could you please point on possible issues - blocking
operations( eventlet ALARM DETECTOR gives no exceptions ), wrong
architecture or whatever.

A problem occurs if you fire 200 requests to a server, responses could be delayed and therefore urllib.urlopen will hang.
Another thing i noticed: If an URLError is raised, the program stays in the while loop until sleeptime is greater than 10. So an URLError error will let this script sleep for 55 sec (1+2+3.. etc)

Sorry for late response.
Thing i would try first in such situation is to turn off Eventlet completely in both Celery and your code, use process or OS thread model. 300 threads or even processes is not that much load for OS scheduler (although you may lack memory to run many processes). So i would try it and see if CPU load drops dramatically. If it does not, then problem is in your code and Eventlet can't magically fix it. If it does drop, however, we would need to investigate the issue closer.
If bug still persists, please, report it via any of these ways:
https://bitbucket.org/which_linden/eventlet/issues/new
https://github.com/eventlet/eventlet/issues/new
email to eventletdev#lists.secondlife.com

Related

How to store concurrent.futures ProcessPoolExecutor HTTP responses and process in real time?

I have a project I am working on and I'm looking to use concurrent.futures ProcessPoolExecutor send a high number of HTTP requests. While the code below works great for getting the requests, I'm struggling with ideas to process the information as I get it. I tried inserting it into a sqlite3 database as I get responses, but it became tricky trying to manage locks and avoid the use of global variables.
Ideally, I'd like to start the Pool, and while it is executing, be able to read/store the data. Is this possible or should I take a different route with this...
pool = ProcessPoolExecutor(max_workers=60)
results = list(pool.map(http2_get, urls))
def http2_get(url):
while(True):
try:
start_time = millis()
result = s.get(url,verify=False)
print(url + " Total took " + str(millis() - start_time) + " ms")
return result
except Exception as e:
print(e,e.__traceback__.tb_lineno)
pass

As you noticed, map will not return until all the processes have finished. I assume that you want to process the data in the main process.
Instead of using map, submit all the tasks and process them as they finish:
from concurrent.futures import ProcessPoolExecutor, as_completed
pool = ProcessPoolExecutor(max_workers=60)
futures_list = [pool.submit(http2_get, url) for url in urls]
for future in as_completed(futures_list):
exception = future.exception()
if exception is not None:
# Handle exception in http2_get
pass
else:
result = future.result()
# process result...
Note that it is cleaner to use the ProcessPoolExecutor as a context manager:
with ProcessPoolExecutor(max_workers=60) as pool:
futures_list = [pool.submit(http2_get, url) for url in urls]
for future in as_completed(futures_list):
exception = future.exception()
if exception is not None:
# Handle exception in htt2_get
pass
else:
result = future.result()
# process result...

Python issue with time.sleep in sleekxmpp

I am using sleekxmpp as the xmpp client for python. The requests are coming which I am further forwarding to other users/agents.
Now the use case is, if a user is not available we need to check the availability every 10 seconds and transfer to him when he is available. We need to send a message to customer only 5 times but check the availability for a long time.
I am using time.sleep() if the user is not available to check again in 10 seconds, but the problem is it is blocking the entire thread and no new requests are coming to the server.
send_msg_counter = 0
check_status = False
while not check_status:
check_status = requests.post(transfer_chat_url, data=data)
if send_msg_counter < 5:
send_msg("please wait", customer)
send_msg_counter += 1
time.sleep(10)

This is true that time.sleep(10) will block your active thread. You may actually find that using Python 3's async/await to be the way to go. Sadly I don't have much experience with those keywords yet, but another route might be to implement python's threading.
https://docs.python.org/3/library/threading.html
Here might be one way to implement this feature.
import threading
def poll_counter(customer, transfer_chat_url, data, send_count=5, interval=10):
send_msg_counter = 0
check_status = False
while not check_status:
check_status = requests.post(transfer_chat_url, data=data)
if send_msg_counter < send_count:
send_msg("please wait", customer)
send_msg_counter += 1
time.sleep(interval)
# If we're here, check status became true
return None
... pre-existing code ...
threading.Thread(target=poll_counter, args=(customer, transfer_chat_url, data)).start()
... proceed to handle other tasks while the thread runs in the background.
Now, I won't go into detail, but there are use cases where threading is a major mistake. This shouldn't be one of them, but here is a good read for you to understand those use cases.
https://realpython.com/python-gil/
Also, for more details on asyncio (async/await) here is a good resource.
https://docs.python.org/3/library/asyncio-task.html

Try implementing
delay = min(self.reconnect_delay * 2, self.reconnect_max_delay)
delay = random.normalvariate(delay, delay * 0.1)
log.debug('Waiting %s seconds before connecting.', delay)
elapsed = 0
try:
while elapsed < delay and not self.stop.is_set():
time.sleep(0.1)
elapsed += 0.1
except KeyboardInterrupt:
self.set_stop()
return False
except SystemExit:
self.set_stop()
return False
Source Link

Restarting/Rebuilding a timed out process using Pebble in Python?

I am using concurrent futures to download reports from a remote server using an API. To inform me that the report has downloaded correctly, I just have the function print out its ID.
I have an issue where there are rare times that a report download will hang in-definitely. I do not get a Timeout Error or a Connection Reset error, just hanging there for hours until I kill the whole process. This is a known issue with the API with no known workaround.
I did some research and switched to using a Pebble based approach to implement a timeout on the function. My aim is then to record the ID of the report that failed to download and start again.
Unfortunately, I ran into a bit of a brick wall as I do not know how to actually retrieve the ID of the report I failed to download. I am using a similar layout to this answer:
from pebble import ProcessPool
from concurrent.futures import TimeoutError
def sometimes_stalling_download_function(report_id):
...
return report_id
with ProcessPool() as pool:
future = pool.map(sometimes_stalling_download_function, report_id_list, timeout=10)
iterator = future.result()
while True:
try:
result = next(iterator)
except StopIteration:
break
except TimeoutError as error:
print("function took longer than %d seconds" % error.args[1])
#Retrieve report ID here
failed_accounts.append(result)
What I want to do is retrieve the report ID in the event of a timeout but it does not seem to be reachable from that exception. Is it possible to have the function output the ID anyway in the case of a timeout exception or will I have to re-think how I am downloading the reports entirely?

The map function returns a future object which yields the results in the same order they were submitted.
Therefore, to understand which report_id is causing the timeout you can simply check its position in the report_id_list.
index = 0
while True:
try:
result = next(iterator)
except StopIteration:
break
except TimeoutError as error:
print("function took longer than %d seconds" % error.args[1])
#Retrieve report ID here
failed_accounts.append(report_id_list[index])
finally:
index += 1

How to write python code that will work with threads or coroutines and will complete in deterministic time?

What I mean by "deterministic time"? For example AWS offer a service "AWS Lambda". The process started as lambda function has time limit, after that lambda function will stop execution and will assume that task was finished with error. And example task - send data to http endpoint. Depending of a network connection to http endpoint, or other factors, process of sending data can take a long time. If I need to send the same data to the many endpoints, then full process time will take one process time times endpoints amount. Which increase a chance that lambda function will be stopped before all data will be send to all endpoints.
To solve this I need to send data to different endpoints in parallel mode using threads.
The problem with threads - started thread can't be stopped. If http request will take more time than it dedicated by lambda function time limit, lambda function will be aborted and return error. So I need to use timeout with http request, to abort it, if it take more time than expected.
If http request will be canceled by timeout or endpoint will return error, I need to save not processed data somewhere to not lost the data. The time needed to save unprocessed data can be predicted, because I control the storage where data will be saved.
And the last part that consume time - procedure or loop where threads are scheduled executor.submit(). If there is only one endpoint or small number of them then the consumed time will be small. And there is no necessary to control this. But if I have deal with many endpoints, I have to take this into account.
So basically full time will consists of:
scheduling threads
http request execution
saving unprocessed data
There is example of how I can manage time using threads
import concurrent.futures
from functools import partial
import requests
import time
start = time.time()
def send_data(data):
host = 'http://127.0.0.1:5000/endpoint'
try:
result = requests.post(host, json=data, timeout=(0.1, 0.5))
# print('done')
if result.status_code == 200:
return {'status': 'ok'}
if result.status_code != 200:
return {'status': 'error', 'msg': result.text}
except requests.exceptions.Timeout as err:
return {'status': 'error', 'msg': 'timeout'}
def get_data(n):
return {"wait": n}
def done_cb(a, b, future):
pass # save unprocessed data
def main():
executor = concurrent.futures.ThreadPoolExecutor()
futures = []
max_time = 0.5
for i in range(1):
future = executor.submit(send_data, *[{"wait": 10}])
future.add_done_callback(partial(done_cb, 2, 3))
futures.append(future)
if time.time() - s_time > max_time:
print('stopping creating new threads')
# save unprocessed data
break
try:
for item in concurrent.futures.as_completed(futures, timeout=1):
item.result()
except concurrent.futures.TimeoutError as err:
pass
I was thinking of how I can use asyncio library instead of threads, to do the same thing.
import asyncio
import time
from functools import partial
import requests
start = time.time()
def send_data(data):
...
def get_data(n):
return {"wait": n}
def done_callback(a,b, future):
pass # save unprocessed data
def main(loop):
max_time = 0.5
futures = []
start_appending = time.time()
for i in range(1):
event_data = get_data(1)
future = (loop.run_in_executor(None, send_data, event_data))
future.add_done_callback(partial(done_callback, 2, 3))
futures.append(future)
if time.time() - s_time > max_time:
print('stopping creating new futures')
# save unprocessed data
break
finished, unfinished = loop.run_until_complete(
asyncio.wait(futures, timeout=1)
)
_loop = asyncio.get_event_loop()
result = main(_loop)
Function send_data() the same as in previous code snipped.
Because request library is not async code I use run_in_executor() to create future object. The main problems I have is that done_callback() is not executed when the thread that started but executor done it's job. But only when the futures will be "processed" by asyncio.wait() expression.
Basically I seeking the way to start execute asyncio future, like ThreadPoolExecutor start execute threads, and not wait for asyncio.wait() expression to call done_callback(). If you have other ideas how to write python code that will work with threads or coroutines and will complete in deterministic time. Please share it, I will be glad to read them.
And other question. If thread or future done its job, it can return result, that I can use in done_callback(), for example to remove message from queue by id returned in result. But if thread or future was canceled, I don't have result. And I have to use functools.partial() pass in done_callback additional data, that can help me to understand for what data this callback was called. If passed data are small this is not a problem. If data will be big, I need to put data in array/list/dictionary and pass in callback only index of array or put "full data: in callback.
Can I somehow get access to variable that was passed to future/thread, from done_callback(), that was triggered on canceled future/thread?

You can use asyncio.wait_for to wait for a future (or multiple futures, when combined with asyncio.gather) and cancel them in case of a timeout. Unlike threads, asyncio supports cancellation, so you can cancel a task whenever you feel like it, and it will be cancelled at the first blocking call it makes (typically a network call).
Note that for this to work, you should be using asyncio-native libraries such as aiohttp for HTTP. Trying to combine requests with asyncio using run_in_executor will appear to work for simple tasks, but it will not bring you the benefits of using asyncio, such as being able to spawn a massive number of tasks without encumbering the OS, or the possibility of cancellation.

Python and mongoDB connection pool ( pymongo )

I have a web application, and there are thousands of requests every minutes.
the following is my python code for mongoDB connection:
Tool.py:
globalconnection = None
def getCollection(name,safe=False,readpref=ReadPreference.PRIMARY):
global globalconnection
while globalconnection is None:
try:
if not globalconnection is None:
globalconnection.close()
globalconnection = Connection('mongodb://host:port',replicaSet='mysetname',safe=False,read_preference=ReadPreference.PRIMARY,network_timeout=30,max_pool_size=1024)
except Exception as e:
globalconnection = None
request_context.connection = globalconnection
return request_context.connection["mydb"]["mycoll"]
web.py
#app.route("/test")
def test():
request_collection = getCollection("user")
results = request_collection.find()
for result in results:
#do something...
request_collection.save(result)
request_collection.end_request()
One http request gets the connection through this function,
and the http request calls end_request before the end of the request.
But I found that there are many AutoReconnect errors and over 20000 connections in mongoDB while increasing requests.
Do you have any suggestion?

For auto-reconnection you simply catch the exception, and try to get the connection again:
http://api.mongodb.org/python/current/api/pymongo/errors.html
30 secs timeout sounds too long for, try shorter timeout instead?
Increase max number of connection from mongodb (default:20000)
http://www.mongodb.org/display/DOCS/Connections

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

celery + eventlet = 100% CPU usage - python

Related

How to store concurrent.futures ProcessPoolExecutor HTTP responses and process in real time?

Python issue with time.sleep in sleekxmpp

Restarting/Rebuilding a timed out process using Pebble in Python?

How to write python code that will work with threads or coroutines and will complete in deterministic time?

Python and mongoDB connection pool ( pymongo )

Categories

Resources