Proper architecture/pattern to use redis-rq as remote task runner - python

I am doing research that requires me to run multiple experiments with many permutations of parameters. My main issue is that I would like to free up my main machine for my daily tasks and offload my experiments to other machines (I have two extra laptops).
Currently I am using redis-rq to queue and run tasks on those "remote servers". Once redis is running on my main machine and my remote servers, I simply run my code which queues the tasks to the specific redis-rq port on my remote (via ssh). This seems to work fine except that I need to make sure to push my code to my remote server before doing this otherwise the task will fail. The remote will essentially have old code on it.
I have two questions:
Does this pattern make sense or is there a better way for me to offload tasks to remote servers?
Is there a way I can ensure the remote always has up-to-date code when I start queueing tasks? (currently thinking of including a function in my code that will copy the current directory to the remote via scp)
Thanks for your help with this,
NB: my code is all in python and would prefer to keep it that way

Related

How do you schedule some python scripts to run regularly on a Windows PC?

I have some python scripts that I look to run daily form a Windows PC.
My current workflow is:
The desktop PC stays all every day except for a weekly restart over the weekend
After the restart I open VS Code and run a little bash script ./start.sh that kicks off the tasks.
The above works reasonably fine, but it is also fairly painful. I need to re-run start.sh if I ever close VS Code (eg. for an update). Also the processes use some local python libraries so I need to stop them if I'm going to update them.
With regards to how to do this properly, 4 tools came to mind:
Windows Scheduler
Airflow
Prefect (https://www.prefect.io/)
Rocketry (https://rocketry.readthedocs.io/en/stable/)
However, I can't quite get my head around the fundamental issue that Prefect/Airflow/Rocketry run on my PC then there is nothing that will restart them after the PC reboots. I'm also not sure they will give me the isolation I'd prefer on these tools.
Docker came to mind, I could put each task into a docker image and run them via some form of docker swarm or something like that. But not sure if I'm re-inventing the wheel.
I'm 100% sure I'm not the first person in this situation. Could anyone point me to a guide on how this could be done well?
Note:
I am not considering running the python scripts in the cloud. They interact with local tools that are only licenced for my PC.
You can definitely use Prefect for that - it's very lightweight and seems to be matching what you're looking for. You install it with pip install prefect, start Orion API server: prefect orion start and once you create a Deployment, and start an agent prefect agent start -q default you can even configure schedule from the UI
For more information about Deployments, check our FAQ section.
It sounds Rocketry could also be suitable. Rocketry can shut down itself using a task. You could do a task that:
Runs on the main thread and process (blocking starting new tasks)
Waits or terminates all the currently running tasks (use the session)
Calls session.shut_down() which sets a flag to the scheduler.
There is also a app configuration shut_cond which is simply a condition. If this condition is True, the scheduler exits so alternatively you can use this.
Then after the line app.run() you simply have a line that runs shutdown -r (restart) command on shell using a subprocess library, for example. Then you need something that starts Rocketry again when the restart is completed. For this, perhaps this could be an answer: https://superuser.com/a/954957, or use Windows scheduler to have a simple startup task that starts Rocketry.
Especially if you had Linux machines (Raspberry Pis for example), you could integrate Rocketry with FastAPI and make a small cluster in which Rocketry apps communicate with each other, just put script with Rocketry as a startup service. One machine could be a backup that calls another machine's API which runs Linux restart command. Then the backup executes tasks until the primary machine answers to requests again (is up and running).
But as the author of the library, I'm possibly biased toward my own projects. But Rocketry very capable on complex scheduling problems, that's the purpose of the project.
You can use schtasks for windows to schedule the tasks like running bash script or python script and it's pretty reliable too.

Run python script via different server

I have two servers A and B.
On the server A there are running multiple python scripts.
I want certain operations of those scripts to run from the IP of the server B.
Those operations are using requests and urllib.request
I don't want to rewrite the whole application so that it runs on server B.
Is it possible to continue running from server A but for some of the scripts to make the requests via server B? What technologies should I look into using?
It looks like a Celery use case. Celery is a library used to execute Python code on worker through Redis or RabbitMQ, it's pretty useful when you want to execute some function in background.
https://github.com/celery/celery
Does this solution fit with your problem ?

How to start asyncio server on remote server with Python?

I have a virtual server available which runs Linux, having 8 cores. 32 GB RAM, and 1 TB in addition. It should be a development environment. (same for test and prod) This is what I could get from IT. Server can only be accessed via so-called jump servers by putty or direct tcp/ip ports (ssh is a must).
The application I am working on starts several processes via multiprocessing. In every process an asyncio event loop is started, and an asyncio socket server in some cases. Basically it is a low level data streaming and processing application (unfortunately no kafka or similar technology available yet). The live application runs forever, no or limited interaction with the user (reads/processes/writes data).
I assume, IPython is an option for this, but - and maybe I am wrong - I think it starts new kernels per client request, but I need to start new process from the main code w/o user interaction. If so, this can be an option for monitoring the application, gathering data from it, sending new user commands to the main module, but not sure how to run processes and asyncio servers remotely.
I would like to understand how these can be done on the given environment. I do not know where to start, what alternatives there are. And I do not understand ipython properly, their page is not obviuos to me yet.
Please help me out! Thank you in advance!
After lots of research and learning I came across to a possible solution in our "sandbox" environment. First, I had to split the problem into several sub-problems:
"remote" development
paralllization
scheduling and executing parallel codes
data sharing between these "engines"
controlling these "engines"
Let's see in details:
Remote development means you want write your code on your laptop, but the code must be executed on a remote server. Easy answer is Jupyter Notebook (or equivalent solution) although it has several trade-offs, also other solutions are available, but this was faster to deploy and use and had the least dependency, maintenance, etc.
parallelization: had several challenges with iPython kernel when working with multiprocessing, so every code that must run parallel will be written in separated Jupyter Notebook. In a single code I can still use eventloop to get async behaviour
executing parallel codes: there are several options I will use :
iPyParallel - "workaround" for multiprocessing
papermill - execute JNs with parameters from command line (optional)
using %%writefile magic command in Jupyter Notebook - create importables
os task scheduler like cron.
async with eventloops
No option yet: docker, multiprocessing, multithreading, cloud (aws, azure, google...)
data sharing: selected ZeroMQ, took time to learn but was simpler and easier than writing everything on pure sockets. There are alternatives but come with extra dependency, and some very useful benefit (will check them later): RabbitMQ, Redis message broker, etc. The reasons for preferring ZMQ: fast, simple, elegant, and just a library. (Knonw risk: our IT will prefer RabbitMQ, but that problem comes later :-) )
controlling the engines: now this answer is obvious: separate python code (can be tested as JN code but easy to turn into pure .py and schedule it). This one can communicate with the other modules via ZMQ sockets: healthcheck, sending new parameters, commands, etc....

sending data from one thread on host machine to some kind of service thread running on the other machine over network

I want to send some commands from my host machine to the other machine over the network.
What I want is some kind of listener or service thread which is running all the time on the other machine and soon it encounters a command from the host act accordingly.
How can I achieve this in python?
I think you are confusing IPC (interprocess communication) with RPC (remote procedure call). IPC is something that happens exclusively locally while RPC is something that you can use to execute stuff on a remote system.
What you want to do here is basic client-server communication scenario so I suggest you look at the thousands of tutorials on the Internet how to achieve that. A properly implemented server has one or multiple service listeners so the scenario you are referring to is a pretty standard one.

Tornado code deployment

Is there a canonical code deployment strategy for tornado-based web application deployment. Our current configuration is 4 tornado processes running behind NginX? (Our specific use case is behind EC2.)
We've currently got a solution that works well enough, whereby we launch the four tornado processes and save the PIDs to a file in /tmp/. Upon deploying new code, we run the following sequence via fabric:
Do a git pull from the prod branch.
Remove the machine from the load balancer.
Wait for all in flight connections to finish with a sleep.
Kill all the tornadoes in the pid file and remove all *.pyc files.
Restart the tornadoes.
Attach the machine back to the load balancer.
We've taken some inspiration from this: http://agiletesting.blogspot.com/2009/12/deploying-tornado-in-production.html
Are there any other complete solutions out there?
We run Tornado+Nginx with supervisord as the supervisor.
Sample configuration (names changed)
[program:server]
process_name = server-%(process_num)s
command=/opt/current/vrun.sh /opt/current/app.py --port=%(process_num)s
stdout_logfile=/var/log/server/server.log
stderr_logfile=/var/log/server/server.err
numprocs = 6
numprocs_start = 7000
I've yet to find the "best" way to restart things, what I'll probably finally do is have Nginx have a "active" file which is updated letting HAProxy know that we're messing with configuration then wait a bit, swap things around, then re-enable everything.
We're using Capistrano (we've got a backlog task to move to Fabric), but instead of dealing with removing *.pyc files we symlink /opt/current to the release identifier.
I haven't deployed Tornado in production, but I've been playing with Gevent + Nginx and have been using Supervisord for process management - start/stop/restart, logging, monitoring - supervisorctl is very handy for this. Like I said, not a deployment solution, but maybe a tool worth using.

Categories