Tornado code deployment

Tornado code deployment - python

Is there a canonical code deployment strategy for tornado-based web application deployment. Our current configuration is 4 tornado processes running behind NginX? (Our specific use case is behind EC2.)
We've currently got a solution that works well enough, whereby we launch the four tornado processes and save the PIDs to a file in /tmp/. Upon deploying new code, we run the following sequence via fabric:
Do a git pull from the prod branch.
Remove the machine from the load balancer.
Wait for all in flight connections to finish with a sleep.
Kill all the tornadoes in the pid file and remove all *.pyc files.
Restart the tornadoes.
Attach the machine back to the load balancer.
We've taken some inspiration from this: http://agiletesting.blogspot.com/2009/12/deploying-tornado-in-production.html
Are there any other complete solutions out there?

We run Tornado+Nginx with supervisord as the supervisor.
Sample configuration (names changed)
[program:server]
process_name = server-%(process_num)s
command=/opt/current/vrun.sh /opt/current/app.py --port=%(process_num)s
stdout_logfile=/var/log/server/server.log
stderr_logfile=/var/log/server/server.err
numprocs = 6
numprocs_start = 7000
I've yet to find the "best" way to restart things, what I'll probably finally do is have Nginx have a "active" file which is updated letting HAProxy know that we're messing with configuration then wait a bit, swap things around, then re-enable everything.
We're using Capistrano (we've got a backlog task to move to Fabric), but instead of dealing with removing *.pyc files we symlink /opt/current to the release identifier.

I haven't deployed Tornado in production, but I've been playing with Gevent + Nginx and have been using Supervisord for process management - start/stop/restart, logging, monitoring - supervisorctl is very handy for this. Like I said, not a deployment solution, but maybe a tool worth using.

Related

How do you schedule some python scripts to run regularly on a Windows PC?

I have some python scripts that I look to run daily form a Windows PC.
My current workflow is:
The desktop PC stays all every day except for a weekly restart over the weekend
After the restart I open VS Code and run a little bash script ./start.sh that kicks off the tasks.
The above works reasonably fine, but it is also fairly painful. I need to re-run start.sh if I ever close VS Code (eg. for an update). Also the processes use some local python libraries so I need to stop them if I'm going to update them.
With regards to how to do this properly, 4 tools came to mind:
Windows Scheduler
Airflow
Prefect (https://www.prefect.io/)
Rocketry (https://rocketry.readthedocs.io/en/stable/)
However, I can't quite get my head around the fundamental issue that Prefect/Airflow/Rocketry run on my PC then there is nothing that will restart them after the PC reboots. I'm also not sure they will give me the isolation I'd prefer on these tools.
Docker came to mind, I could put each task into a docker image and run them via some form of docker swarm or something like that. But not sure if I'm re-inventing the wheel.
I'm 100% sure I'm not the first person in this situation. Could anyone point me to a guide on how this could be done well?
Note:
I am not considering running the python scripts in the cloud. They interact with local tools that are only licenced for my PC.

You can definitely use Prefect for that - it's very lightweight and seems to be matching what you're looking for. You install it with pip install prefect, start Orion API server: prefect orion start and once you create a Deployment, and start an agent prefect agent start -q default you can even configure schedule from the UI
For more information about Deployments, check our FAQ section.

It sounds Rocketry could also be suitable. Rocketry can shut down itself using a task. You could do a task that:
Runs on the main thread and process (blocking starting new tasks)
Waits or terminates all the currently running tasks (use the session)
Calls session.shut_down() which sets a flag to the scheduler.
There is also a app configuration shut_cond which is simply a condition. If this condition is True, the scheduler exits so alternatively you can use this.
Then after the line app.run() you simply have a line that runs shutdown -r (restart) command on shell using a subprocess library, for example. Then you need something that starts Rocketry again when the restart is completed. For this, perhaps this could be an answer: https://superuser.com/a/954957, or use Windows scheduler to have a simple startup task that starts Rocketry.
Especially if you had Linux machines (Raspberry Pis for example), you could integrate Rocketry with FastAPI and make a small cluster in which Rocketry apps communicate with each other, just put script with Rocketry as a startup service. One machine could be a backup that calls another machine's API which runs Linux restart command. Then the backup executes tasks until the primary machine answers to requests again (is up and running).
But as the author of the library, I'm possibly biased toward my own projects. But Rocketry very capable on complex scheduling problems, that's the purpose of the project.

You can use schtasks for windows to schedule the tasks like running bash script or python script and it's pretty reliable too.

Proper architecture/pattern to use redis-rq as remote task runner

I am doing research that requires me to run multiple experiments with many permutations of parameters. My main issue is that I would like to free up my main machine for my daily tasks and offload my experiments to other machines (I have two extra laptops).
Currently I am using redis-rq to queue and run tasks on those "remote servers". Once redis is running on my main machine and my remote servers, I simply run my code which queues the tasks to the specific redis-rq port on my remote (via ssh). This seems to work fine except that I need to make sure to push my code to my remote server before doing this otherwise the task will fail. The remote will essentially have old code on it.
I have two questions:
Does this pattern make sense or is there a better way for me to offload tasks to remote servers?
Is there a way I can ensure the remote always has up-to-date code when I start queueing tasks? (currently thinking of including a function in my code that will copy the current directory to the remote via scp)
Thanks for your help with this,
NB: my code is all in python and would prefer to keep it that way

rabbitmq in production php or python

I am new to rabbitmq. In all tutorials of rabbitmq in python/php says that receiver side
php receiver.php
or
python receiver.py
but how can we do this in production?
if we have to run above command in production either we have to use & at last or we have to use nohup. Which is not a good idea?
How to implement rabbitmq receiver in production server in php/python?

Consumers/Receivers tend to be managed by a process controller. Either initd, systemd can work. What I saw used a lot more is something like http://supervisord.org/ or http://godrb.com/ or https://mmonit.com/
In production you ideally want to not only have something that will make sure a process is running, but also that the logs are separated and rolled, that you have some amount of monitoring to make sure a process is not just constantly restarting at boot or otherwise. Those tools are better adapted than running by hand.

mod_wsgi: Reload Code via Inotify - not every N seconds

Up to now I followed this advice to reload the code:
https://code.google.com/archive/p/modwsgi/wikis/ReloadingSourceCode.wiki
This has the drawback, that the code changes get detected only every N second. I could use N=0.1, but this results in useless disk IO.
AFAIK the inotify callback of the linux kernel is available via python.
Is there a faster way to detect code changes and restart the wsgi handler?
We use daemon mode on linux.
Why code reload for mod_wsgi at all
There is interest in why I want this at all. Here is my setup:
Most people use "manage.py runserver" for development and some other wsgi deployment for for production.
In my context we have automated the creation of new systems and prod and development systems are mostly identical.
One operating system (linux) can host N systems (virtual environments).
Developers can use runserver or mod_wsgi. Using runserver has the benefit that it's easy for debugging, mod_wsgi has the benefit that you don't need to start the server first.
mod_wsgi has the benefit, that you know the URL: https://dev-server/system-name/myurl/
With runserver you don't know the port. Use case: You want to link from an internal wiki to a dev-system ....
A dirty hack to get code reload for mod_wsgi, which we used in the past: maximum-requests=1 but this is slow.

Preliminaries.
Developers can use runserver or mod_wsgi. Using runserver has the
benefit that you it easy for debugging, mod_wsgi has the benefit that
you don't need to start the server first.
But you do, the server needs to be setup first and that takes a lot of effort. And the server needs to be started here as well though you can configure it to start automatically at boot.
If you are running on port 80 or 443 which is usually the case, the server can be started only by the root. If it needs to be restarted you will have to ask the super user's help again. So ./manage.py runserver scores heavily here.
mod_wsgi has the benefit, that you know the URL:
https://dev-server/system-name/myurl/
Which is no different from the dev server. By default it starts on port 8000 so you can access it as http://dev-server:8000/system-name/myurl/. If you wanted to use SSL with the development server you can use a package such as django-sslserver or you can put nginx in front of django development server.
With runserver you don't know the port. Use case: You want to link from >an internal wiki to a dev-system ....
With runserver, the port is well defined as mentioned above. And you can make it listen on a different port for exapmle with:
./manage.py runserver 0.0.0.0:9090
Note that if you put development server behind apache (as a reverse proxy) or NGINX, restarting problems etc that I have mentioned above do not apply here.
So in short, for development work, what ever you do with mod_wsgi can be done with the django development server (aka ./manage.py runserver).
Inotify
Here we are getting to the main topic at last. Assuming you have installed inotify-tools you could type this into your shell. You don't need to write a script.
while inotifywait -r -e modify .; do sudo kill -2 yourpid ; done
This will result in the code being reloaded when ...
... using daemon mode with a single process you can send a SIGINT
signal to the daemon process using the ‘kill’ command, or have the
application send the signal to itself when a specific URL is
triggered.
ref: http://modwsgi.readthedocs.io/en/develop/user-guides/frequently-asked-questions.html#application-reloading
alternatively
while inotifywait -r -e modify .; do touch wsgi.py ; done
when
... using daemon mode, with any number of processes, and the process
reload mechanism of mod_wsgi 2.0 has been enabled, then all you need
to do is touch the WSGI script file, thereby updating its modification
time, and the daemon processes will automatically shutdown and restart
the next time they receive a request.
In both situations we are using the -r flag to tell inotify to monitor subdirectories. That means each time you save a .css or .js file apache will reload. But without the -r flag changes to python code in subfolders will be undetected. To have the best of both worls, remove css, js, images etc with the --exclude directive.
What about when your IDE saves an auto backup file? or vim saves the .swp file? That too will cause a code reload. So you would have to exclude those file types too.
So in short, it's a lot of hard work to reproduce what the django development server does free of charge.

You can use inotify hooktables to run any command you want depending on a i-notify signal (here's my source link: http://terokarvinen.com/2016/when-files-change-take-action-inotify-hookable).
After looking the tables you can just reload the code of apache.
For your specific problem, it should be something like:
inotify-hookable --watch-directories sources/ --recursive --on-modify-command './code_reload.sh'
In the previous link, the command to execute was just a simple touch flask/init.wsgi
So, the whole code (adding ignored files was):
inotify-hookable --watch-directories flask/ --recursive --ignore-paths='flask/init.wsgi' --on-modify-command 'touch flask/init.wsgi'
As stated here: Flask + mod_wsgi automatic reload on source code change, if you have enabled WSGIScriptReloading, you can just touch that file. It will cause the entire code to reload (not just the config file). But, if you prefer, you can set any other script to reload the code.
After googling a bit, it seems to be a pretty standard solution for that problem and I think that you can use it for your application.

Check if Twisted Server launched with twistd was started successfully

I need a reliable way to check if a Twisted-based server, started via twistd (and a TAC-file), was started successfully. It may fail because some network options are setup wrong. Since I cannot access the twistd log (as it is logged to /dev/null, because I don't need the log-clutter twistd produces), I need to find out if the Server was started successfully within a launch-script which wraps the twistd-call.
The launch-script is a Bash script like this:
#!/usr/bin/bash
twistd \
--pidfile "myservice.pid" \
--logfile "/dev/null" \
--python \
myservice.tac
All I found on the net are some hacks using ps or stuff like that. But I don't like an approach like that, because I think it's not reliable.
So I'm thinking about if there is a way to access the internals of Twisted, and get all currently running Twisted applications? That way I could query the currently running apps for the the name of my Twisted application (as I named it in the TAC-file) to start.
I'm also thinking about not using the twistd executable but implementing a Python-based launch script which includes the twistd-content, like the answer to this question provides, but I don't know if that helps me in getting the status of the server to run.
So my question is just: is there a reliable not-ugly way to tell if a Twisted Server started with twistd was started successfully, when twistd-logging is disabled?

You're explicitly specifying a PID file. twistd will write its PID into that file. You can check the system to see if there is a process with that PID.
You could also re-enable logging with a custom log observer which only logs your startup event and discards all other log messages. Then you can watch the log for the startup event.
Another possibility is to add another server to your application which exposes the internals you mentioned. Then try connecting to that server and looking around to see what you wanted to see (just the fact that the server is running seems like a good indication that the process started up properly, though). If you make it a manhole server then you get the ability to evaluate arbitrary Python code, which lets you inspect any state in the process you want.
You could also just have your application code write out an extra state file that explicitly indicates successful startup. Make sure you delete it before starting the application and you'll have a fine indicator of success vs failure.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.