Profiling an application that uses reactors/websockets and threads - python

Hi I wrote a Python program that should run unattended. What it basically does is fetching some data via http get requests in a couple of threads and fetching data via websockets and the autobahn framework. Running it for 2 days shows me that it has a growing memory demand and even stops without any notice.
The documentation says I have to run the reactor as last line of code in the app.
I read that yappi is capable of profiling threaded applications
Here is some pseudo code
from autobahn.twisted.websocket import WebSocketClientFactory,connectWS
if __name__ == "__main__":
#setting up a thread
#start the thread
Consumer.start()
xfactory = WebSocketClientFactory("wss://url")
cex_factory.protocol = socket
## SSL client context: default
##
if factory.isSecure:
contextFactory = ssl.ClientContextFactory()
else:
contextFactory = None
connectWS(xfactory, contextFactory)
reactor.run()
The example from the yappi project site is the following:
import yappi
def a():
for i in range(10000000): pass
yappi.start()
a()
yappi.get_func_stats().print_all()
yappi.get_thread_stats().print_all()
So I could put yappi.start() at the beginning and yappi.get_func_stats().print_all() plus yappi.get_thread_stats().print_all() after reactor.run() but since this code is never executed I will never get it executed.
So how do I profile a program like that ?
Regards

It's possible to use twistd profilers by the following way:
twistd -n --profile=profiling_results.txt --savestats --profiler=hotshot your_app
hotshot is a default profiler, you are also able to use cprofile.
Or you can run twistd from your python script by means of:
from twistd.scripts import run
run()
And add necessary parameters to script by sys.argv[1:1] = ["--profile=profiling_results.txt", ...]
After all you can convert hotshot format to calltree by means of:
hot2shot2calltree profiling_results.txt > calltree_profiling
And open generated calltree_profiling file:
kcachegrind calltree_profiling
There is a project for profiling of asynchronous execution time twisted-theseus
You can also try tool of pycharm: thread concurrency
There is a related question here sof
You can also run your function by:
reactor.callWhenRunning(your_function, *parameters_list)
Or by reactor.addSystemEventTrigger() with event description and your profiling function call.

Related

Is it possible to run code after finishing a locust instance?

Let's say this is the primary file I would run in terminal i.e locusts -f main.py. Is it possible to have it also run code after the locust instance is terminated or when time limit is reached? Possibly a cleanup script or sending the csv reports generated somewhere.
class setup(HttpUser):
wait_time = between(3, 5)
host = 'www.example.com'
tasks = [a, b, c...]
#do something after time limit reached
There is a test_stop event (https://docs.locust.io/en/stable/extending-locust.html) as well as a quitting event (https://docs.locust.io/en/stable/api.html#locust.event.Events.quitting) you can use for this purpose.

Terminate thread, process, function in python

I'm quite new at python and for a while I try to fight specific problem. I have function to listen and print radio frames.To do that I'm using NRF24 Lib and whole function is so easy. The point is that I run this function and from time to time I need to terminate it and again run. So in code it looks like
def recv():
radio.openWritingPipe(pipes[0])
radio.openReadingPipe(1, pipes[1])
radio.startListening()
radio.stopListening()
radio.printDetails()
radio.startListening()
while True:
pipe = [0]
while not radio.available(pipe):
time.sleep(10000/1000000.0)
recv_buffer = []
radio.read(recv_buffer)
print(recv_buffer)
I run this function from a server side and now I want to stop it and run again? There is it posible ? why I just cant recv.kill()? I read about threading, multiprocessing but all this didn't give me proper result.
How I run it:
from multiprocessing import Process
def request_handler(api: str, arg: dict) -> dict:
process_radio = Process(target=recv())
if api == 'start_radio':
process_radio.start()
...
elif api == 'stop_radio':
process_radio.terminate():
...
...
There is no way to stop a Python thread "from the outside." If the thread goes into a wait state (e.g. not running because it's waiting for radio.recv() to complete) there's nothing you can do.
Inside a single process the threads are autonomous, and the best you can do it so set a flag for the thread to action (by terminating) when it examines it.
As you have already discovered, it appears, you can terminate a subprocess, but you then have the issue of how the processes communicate with each other.
Your code and the test with it don't really give enough information (there appear to be several NRF24 implementations in Python) to debug the issues you report.

Implement delayed Slack slash response

I want to implement slack slash command that has to process fucntion pipeline which takes roughly 30 seconds to process. Now since Slack slash commands only allows 3 seconds to respond, how to go about implementing this. I referred this but don't how to implement it.
Please hold up with me. I am doing this first time.
This is what I have tried. I know how to respond with ok status within 3 seconds but I don't understand how to again call pipeline
import requests
import json
from bottle import route, run, request
from S3_download import s3_download
from index import main_func
#route('/action')
def action():
pipeline()
return "ok"
def pipeline():
s3_download()
p = main_func()
print (p)
if __name__ == "__main__":
run(host='0.0.0.0', port=8082, debug=True)
I came across this article. Is using AWS lambda the only solution?
Can't we do this completely in python?
Something like this:
from boto import sqs
#route('/action', method='POST')
def action():
#retrieving all the required request example
params = request.forms.get('response_url')
sqs_queue = get_sqs_connection(queue_name)
message_object = sqs.message.Message()
message_object.set_body(params)
mail_queue.write(message_object)
return "request under process"
and you can have another process which processes the queue and call long running function:
sqs_queue = get_sqs_connection(queue_name)
for sqs_msg in sqs_queue.get_messages(10, wait_time_seconds=5):
processed_msg = json.loads(sqs_msg.get_body())
response = pipeline(processed_msg)
if response:
sqs_queue.delete_message(sqs_msg)
you can run this 2nd process maybe in a diff standalone python file, as a daemon process or cron.
I`v used sqs Amazon Queue here, but there are different options available.
You have an option or two for doing this in a single process, but it's fraught with peril. If you spin up a new Thread to handle the long process, you might end up deploying or crashing in the middle and losing it.
If durability is important to you, look into background-task workers like SQS, Lambda, or even a Celery task queue backed with Redis. A separate task has some interesting failure modes, and these tools will help you deal with them better than just spawning a thread.

Running twisted reactor in iPython

I'm aware this is normally done with twistd, but I'm wanting to use iPython to test out code 'live' on twisted code.
How to start twisted's reactor from ipython asked basically the same thing but the first solution no longer works with current ipython/twisted, while the second is also unusable (thread raises multiple errors).
https://gist.github.com/kived/8721434 has something called TPython which purports to do this, but running that seems to work except clients never connect to the server (while running the same clients works in the python shell).
Do I have to use Conch Manhole, or is there a way to get iPython to play nice (probably with _threadedselect).
For reference, I'm asking using ipython 5.0.0, python 2.7.12, twisted 16.4.1
Async code in general can be troublesome to run in a live interpreter. It's best just to run an async script in the background and do your iPython stuff in a separate interpreter. You can intercommunicate using files or TCP. If this went over your head, that's because it's not always simple and it might be best to avoid the hassle of possible.
However, you'll be happy to know there is an awesome project called crochet for using Twisted in non-async applications. It truly is one of my favorite modules and I'm shocked that it's not more widely used (you can change that ;D though). The crochet module has a run_in_reactor decorator that runs a Twisted reactor in a separate thread managed by crochet itself. Here is a quick class example that executes requests to a Star Wars RESTFul API, then stores the JSON response in a list.
from __future__ import print_function
import json
from twisted.internet import defer, task
from twisted.web.client import getPage
from crochet import run_in_reactor, setup as setup_crochet
setup_crochet()
class StarWarsPeople(object):
people_id = [_id for _id in range(1, 89)]
people = []
#run_in_reactor
def requestPeople(self):
"""
Request Star Wars JSON data from the SWAPI site.
This occurs in a Twisted reactor in a separate thread.
"""
for _id in self.people_id:
url = 'http://swapi.co/api/people/{0}'.format(_id).encode('utf-8')
d = getPage(url)
d.addCallback(self.appendJSON)
def appendJSON(self, response):
"""
A callback which will take the response from the getPage() request,
convert it to JSON, then append it to self.people, which can be
accessed outside of the crochet thread.
"""
response_json = json.loads(response.decode('utf-8'))
#print(response_json) # uncomment if you want to see output
self.people.append(response_json)
Save this in a file (example: swapi.py), open iPython, import the newly created module, then run a quick test like so:
from swapi import StarWarsPeople
testing = StarWarsPeople()
testing.requestPeople()
from time import sleep
for x in range(5):
print(len(testing.people))
sleep(2)
As you can see it runs in the background and stuff can still occur in the main thread. You can continue using the iPython interpreter as you usually do. You can even have a manhole running in the background for some cool hacking too!
References
https://crochet.readthedocs.io/en/1.5.0/introduction.html#crochet-use-twisted-anywhere
While this doesn't answer the question I thought I had, it does answer (sort of) the question I posted. Embedding ipython works in the sense that you get access to business objects with the reactor running.
from twisted.internet import reactor
from twisted.internet.endpoints import serverFromString
from myfactory import MyFactory
class MyClass(object):
def __init__(self, **kwargs):
super(MyClass, self).__init__(**kwargs)
server = serverFromString(reactor, 'tcp:12345')
server.list(MyFactory(self))
def interact():
import IPython
IPython.embed()
reactor.callInThread(interact)
if __name__ == "__main__":
myclass = MyClass()
reactor.run()
Call the above with python myclass.py or similar.

Reload python flask server by function

I'm writing a python/flask application and would like to add the functionality of reloading the server.
I'm currently running the server with the following option
app.run(debug=True)
which results in the following, each time a code change happens
* Running on http://127.0.0.1:5000/
* Restarting with reloader
In a production environment however, I would rather not have debug=True set, but be able to only reload the application server whenever I need to.
I'm trying to get two things working:
if reload_needed: reload_server(), and
if a user clicks on a "Reload Server" button in the admin panel, the reload_server() function should be called.
However, despite the fact that the server get's reloaded after code changes, I couldn't find a function that let's me do exactly that.
If possible I would like to use the flask/werkzeug internal capabilities. I am aware that I could achieve something like that by adding things like gunicorn/nginx/apache, etc.
I think I've had the same problem.
So there was a python/flask application (XY.py), on clients. I wrote a build step (Teamcity) which deploys this python code to the clients. Let's suppose the XY.py is already running on the clients. After deploying this new/fixed/corrected XY.py I had to restart it for applying the changes on the running code.
The problem what I've had is that after using the fine restarting oneliner os.execl(sys.executable, *([sys.executable]+sys.argv)) my port used by app is still busy/established, so after restarting I can't reach it.
This is how I resolved the problem:
I put my app to run on a separate Process and made a queue for it. To see it more cleanly here is some code.
global some_queue = None
#app.route('/restart')
def restart():
try:
some_queue.put("something")
return "Quit"
def start_flaskapp(queue):
some_queue = queue
app.run(your_parameters)
Add this to your main:
q = Queue()
p = Process(target=start_flaskapp, args=[q,])
p.start()
while True: #wathing queue, sleep if there is no call, otherwise break
if q.empty():
time.sleep(1)
else:
break
p.terminate() #terminate flaskapp and then restart the app on subprocess
args = [sys.executable] + [sys.argv[0]]
subprocess.call(args)
Hope it was clean and short enough and it helped to you!
How following in your Python code in order to kill the server:
#app.route('/quit')
def _quit():
os._exit(0)
When process is killed it will repeat itself in the while loop.
app_run.sh:
#!/bin/bash
while true
do
hypercorn app_async:app -b 0.0.0.0:5000
sleep 1
done

Categories