I have a virtual server available which runs Linux, having 8 cores. 32 GB RAM, and 1 TB in addition. It should be a development environment. (same for test and prod) This is what I could get from IT. Server can only be accessed via so-called jump servers by putty or direct tcp/ip ports (ssh is a must).
The application I am working on starts several processes via multiprocessing. In every process an asyncio event loop is started, and an asyncio socket server in some cases. Basically it is a low level data streaming and processing application (unfortunately no kafka or similar technology available yet). The live application runs forever, no or limited interaction with the user (reads/processes/writes data).
I assume, IPython is an option for this, but - and maybe I am wrong - I think it starts new kernels per client request, but I need to start new process from the main code w/o user interaction. If so, this can be an option for monitoring the application, gathering data from it, sending new user commands to the main module, but not sure how to run processes and asyncio servers remotely.
I would like to understand how these can be done on the given environment. I do not know where to start, what alternatives there are. And I do not understand ipython properly, their page is not obviuos to me yet.
Please help me out! Thank you in advance!
After lots of research and learning I came across to a possible solution in our "sandbox" environment. First, I had to split the problem into several sub-problems:
"remote" development
paralllization
scheduling and executing parallel codes
data sharing between these "engines"
controlling these "engines"
Let's see in details:
Remote development means you want write your code on your laptop, but the code must be executed on a remote server. Easy answer is Jupyter Notebook (or equivalent solution) although it has several trade-offs, also other solutions are available, but this was faster to deploy and use and had the least dependency, maintenance, etc.
parallelization: had several challenges with iPython kernel when working with multiprocessing, so every code that must run parallel will be written in separated Jupyter Notebook. In a single code I can still use eventloop to get async behaviour
executing parallel codes: there are several options I will use :
iPyParallel - "workaround" for multiprocessing
papermill - execute JNs with parameters from command line (optional)
using %%writefile magic command in Jupyter Notebook - create importables
os task scheduler like cron.
async with eventloops
No option yet: docker, multiprocessing, multithreading, cloud (aws, azure, google...)
data sharing: selected ZeroMQ, took time to learn but was simpler and easier than writing everything on pure sockets. There are alternatives but come with extra dependency, and some very useful benefit (will check them later): RabbitMQ, Redis message broker, etc. The reasons for preferring ZMQ: fast, simple, elegant, and just a library. (Knonw risk: our IT will prefer RabbitMQ, but that problem comes later :-) )
controlling the engines: now this answer is obvious: separate python code (can be tested as JN code but easy to turn into pure .py and schedule it). This one can communicate with the other modules via ZMQ sockets: healthcheck, sending new parameters, commands, etc....
Related
I have some python scripts that I look to run daily form a Windows PC.
My current workflow is:
The desktop PC stays all every day except for a weekly restart over the weekend
After the restart I open VS Code and run a little bash script ./start.sh that kicks off the tasks.
The above works reasonably fine, but it is also fairly painful. I need to re-run start.sh if I ever close VS Code (eg. for an update). Also the processes use some local python libraries so I need to stop them if I'm going to update them.
With regards to how to do this properly, 4 tools came to mind:
Windows Scheduler
Airflow
Prefect (https://www.prefect.io/)
Rocketry (https://rocketry.readthedocs.io/en/stable/)
However, I can't quite get my head around the fundamental issue that Prefect/Airflow/Rocketry run on my PC then there is nothing that will restart them after the PC reboots. I'm also not sure they will give me the isolation I'd prefer on these tools.
Docker came to mind, I could put each task into a docker image and run them via some form of docker swarm or something like that. But not sure if I'm re-inventing the wheel.
I'm 100% sure I'm not the first person in this situation. Could anyone point me to a guide on how this could be done well?
Note:
I am not considering running the python scripts in the cloud. They interact with local tools that are only licenced for my PC.
You can definitely use Prefect for that - it's very lightweight and seems to be matching what you're looking for. You install it with pip install prefect, start Orion API server: prefect orion start and once you create a Deployment, and start an agent prefect agent start -q default you can even configure schedule from the UI
For more information about Deployments, check our FAQ section.
It sounds Rocketry could also be suitable. Rocketry can shut down itself using a task. You could do a task that:
Runs on the main thread and process (blocking starting new tasks)
Waits or terminates all the currently running tasks (use the session)
Calls session.shut_down() which sets a flag to the scheduler.
There is also a app configuration shut_cond which is simply a condition. If this condition is True, the scheduler exits so alternatively you can use this.
Then after the line app.run() you simply have a line that runs shutdown -r (restart) command on shell using a subprocess library, for example. Then you need something that starts Rocketry again when the restart is completed. For this, perhaps this could be an answer: https://superuser.com/a/954957, or use Windows scheduler to have a simple startup task that starts Rocketry.
Especially if you had Linux machines (Raspberry Pis for example), you could integrate Rocketry with FastAPI and make a small cluster in which Rocketry apps communicate with each other, just put script with Rocketry as a startup service. One machine could be a backup that calls another machine's API which runs Linux restart command. Then the backup executes tasks until the primary machine answers to requests again (is up and running).
But as the author of the library, I'm possibly biased toward my own projects. But Rocketry very capable on complex scheduling problems, that's the purpose of the project.
You can use schtasks for windows to schedule the tasks like running bash script or python script and it's pretty reliable too.
I am doing research that requires me to run multiple experiments with many permutations of parameters. My main issue is that I would like to free up my main machine for my daily tasks and offload my experiments to other machines (I have two extra laptops).
Currently I am using redis-rq to queue and run tasks on those "remote servers". Once redis is running on my main machine and my remote servers, I simply run my code which queues the tasks to the specific redis-rq port on my remote (via ssh). This seems to work fine except that I need to make sure to push my code to my remote server before doing this otherwise the task will fail. The remote will essentially have old code on it.
I have two questions:
Does this pattern make sense or is there a better way for me to offload tasks to remote servers?
Is there a way I can ensure the remote always has up-to-date code when I start queueing tasks? (currently thinking of including a function in my code that will copy the current directory to the remote via scp)
Thanks for your help with this,
NB: my code is all in python and would prefer to keep it that way
I am new to rabbitmq. In all tutorials of rabbitmq in python/php says that receiver side
php receiver.php
or
python receiver.py
but how can we do this in production?
if we have to run above command in production either we have to use & at last or we have to use nohup. Which is not a good idea?
How to implement rabbitmq receiver in production server in php/python?
Consumers/Receivers tend to be managed by a process controller. Either initd, systemd can work. What I saw used a lot more is something like http://supervisord.org/ or http://godrb.com/ or https://mmonit.com/
In production you ideally want to not only have something that will make sure a process is running, but also that the logs are separated and rolled, that you have some amount of monitoring to make sure a process is not just constantly restarting at boot or otherwise. Those tools are better adapted than running by hand.
We run many Python scripts for data processing tasks. We have a modeling computer that has been upgraded to provide the best performance for these tasks, but it is shared by many people that all need to run different scripts on it at the same time.
Is it possible for me to run a Python script remotely on that machine from my laptop while others are either directly logged into it or also remotely running a script?
Is SSH a possibility? I haven't ever run any scripts remotely aside from logging in via remote desktop. Ideally, I could start the Python script on that remote machine, but all the messages would be visible to me on my laptop. Does this sound doable?
EDIT:
I forgot to mention all machines are running Windows 7.
SSH is definitely the way to go and also have a look at Fabric.
Regarding your edit. You can use Fabric on Windows. And I think that using SSH on Windows will be a bit easier than dancing with their Powershell's remoting capabilities.
SSH does seem like it should meet your needs.
You could also consider setting up an iPython notebook server that everyone could use.
Its got nice parallel processing capabilities if you are doing some serious number crunching.
I've a GUI based on python and gtk, my program manages varios applications in the same machine using subprocesses, i.e a button launchs an emulated host, another button connects it to a switch ... now I want to but each component in a separated machine (some hosts will work as soft switches, others may work as routers...)
I may link GUI to machines by sockets, xml, spawning ssh subprocess or using telnet, I also find that fabric may be cleaner than handmade ssh sessions.
Q: What is the easiest, most robust technology to use? I don't mind installing any client on the controlled machines.
Answers based on similar experiance would be great, but also any suggestion would be helpful.
Thanks in advance.
Since you already have a mechanism that works to spawn the processes on the local machine, it sounds like the only thing you're lacking is a way to spawn the processes on the remote machines. I would go with simple ssh connections as it would require the fewest changes to both your infrastructure and your code.