Advice: Python Framework Server/Worker Queue management (not Website)

Advice: Python Framework Server/Worker Queue management (not Website) - python

I am looking for some advice/opinions of which Python Framework to use in an implementation of multiple 'Worker' PCs co-ordinated from a central Queue Manager.
For completeness, the 'Worker' PCs will be running Audio Conversion routines (which I do not need advice on, and have standalone code that works).
The Audio conversion takes a long time, and I need to co-ordinate an arbitrary number of the 'Workers' from a central location, handing them conversion tasks (such as where to get the source files, or where to ask for the job configuration) with them reporting back some additional info, such as the runtime of the converted audio etc.
At present, I have a script that makes a webservice call to get the 'configuration' for a conversion task, based on source files located on the worker already (we manually copy the source files to the worker, and that triggers a conversion routine). I want to change this, so that we can distribute conversion tasks ("Oy you, process this: xxx") based on availability, and in an ideal world, based on pending tasks too.
There is a chance that Workers can go offline mid-conversion (but this is not likely).
All the workers are Windows based, the co-ordinator can be WIndows or Linux.
I have (in my initial searches) come across the following - and I know that some are cross-dependent:
Celery (with RabbitMQ)
Twisted
Django
Using a framework, rather than home-brewing, seems to make more sense to me right now. I have a limited timeframe in which to develop this functional extension.
An additional consideration would be using a Framework that is compatible with PyQT/PySide so that I can write a simple UI to display Queue status etc.
I appreciate that the specifics above are a little vague, and I hope that someone can offer me a pointer or two.
Again: I am looking for general advice on which Python framework to investigate further, for developing a Server/Worker 'Queue management' solution, for non-web activities (this is why DJango didn't seem the right fit).

How about using pyro? It gives you remote object capability and you just need a client script to coordinate the work.

Related

Creating a Numerical Simulation Microservice in C++ with Docker

Greetings Stackoverflow community!
I recently learned about the power of microservices and containers, and I decided to wrap some of my numerical simulations codes in C++ and make them available as an API. Here are some requirements/details of my applications:
My simulators are coded in C++ with a few dependencies that I link via dynamic or static libraries in windows (e.g. Hypre, for solution of linear systems). They also run in parallel with MPI/OpenMP (in the future I would like to implement CUDA support as well).
The input for a simulator is a simple configuration file with some keys (.json format) and a data file (ascii but could be binary as well) with millions of entries (these are fields with one value for each simulation cells, and my models can be as large as 500x500x500 (=125000000 cells).
A typical call to the simulator in Windows is: mpiexec -n 4 mysimulator.exe "C:\path\to\config.json". Inside my configuration file I have other absolute path to the ascii file with the cellwise values.
I would like to "containerize" this whole mess and create an api available through HTTP requests or any other protocol that would allow the code to be run from outside the container. While the simulation microservice is running on a remote machine, anyone should be able to send a configuration file and the big ascii or binary file to the container, which would receive the request, perfom the simulation and somehow send back the results (which can be files and/or numerical values).
After some research, I feel this could be achieved with the following approach.
Create a docker image with the C++ code. When the container is created using the image as a blueprint, we obtain a binary executable of the C++ simulator.
Implement a python interface that handles the incoming requests using flask or django. We listen to requests at a certain port and once we get a request, we call the binary executable using python's subprocess.
The simulator somehow needs to send a "simulation status" back since these simulations can take hours to finish.
I have a few questions:
Is python "subprocess" call to a binary executable with the C++ code the way to go? Or is it easier/more recommended to implement the treatment to the API calls inside the C++ code?
How do you typically send a big binary/ascii file through HTTP to the microservice running inside a docker container?
If I have a workstation with - let's say - 16 cores...and I want to allow each user to run at most 2 processors, I could have a max of 8 parallel instances. This way, would I need 8 containers running simultaneously in the computer?
Since the simulations take hours to finish, what's the best approach to interact with the client who's requesting the simulation results? Are events typically used in this context?
Thanks,
Rafael.

Is python "subprocess" call to a binary executable with the C++ code the way to go? Or is it easier/more recommended to implement the treatment to the API calls inside the C++ code?
If you don't have performance concerns, use whatever faster to achieve and easier to scale according to your skills. Use the language that you're comfortable with. If performance is essential, then choose it wisely or refactor them later.
How do you typically send a big binary/ascii file through HTTP to the microservice running inside a docker container?
Depends on the scenario. It's possible to send a data through end point or send them part by part. You may refer to this post for restful update.
If I have a workstation with - let's say - 16 cores...and I want to allow each user to run at most 2 processors, I could have a max of 8 parallel instances. This way, would I need 8 containers running simultaneously in the computer?
Keep your service simple. If one service uses only 1 or 2 cores. Then run multiple instance. Since it's easy to scale rather than create a complex multithreading program.
Since the simulations take hours to finish, what's the best approach to interact with the client who's requesting the simulation results? Are events typically used in this context?
Event would be good enough. Use polling if simulation status is important.
Note: This is more of opinion based post, but it has general scenarios worth answering.

Is it convenient to use Airflow Architecture to orchestrate Complex Python Microservices?

I have a data product that receives realtime streams of vehicles data, process the information and then displays it through rest APIS. I am looking to rebuild the ETL side of the system in order to enhance reliability and architecture order. I am considering to use Apache Airflow, but have some doubts.
The python microservices I need to orchestrate are complex and have many different dependencies, hence a monolithic solution would be huge and tricky if implemented with python operators in Airflow. Do you see any convenient options of using Airflow for these needs? Might be a better solution to use Kafka or any other Messaging System?
Thanks

Airflow is definitely used to orchestrate ETL systems however if you require real time ETL, a message queue is probably better.
However you can Trigger DAGs Externally if it suits your use case. Now depending on how distributed you want, you can also have multiple instances running to parallelize and speed up your operations.
If you think batch processing will be helpful, you can have a pseudo realtime set up with a very fast schedule. This will ( in my opinion ) be easier to debug then a traditional message queue system.
Airflow gives you a lot of modularity which lets you break down a complex architecture and debug components easily compared to a messaging queue. It also has provisions for auto requeueing and also has named queues which you can use in case you need to send large data loads to a separate processing queue on a machine with more resources.
I have worked on a similar use case as yours where we needed to process realtime data and display it via APIs and we used a mixture of Airflow and messaging queues. In my experience it was always a lot easier to figure out where and why something went wrong with Airflow due to the Airflow UI which makes getting a high level overview incredibly efficient.

IPython parallel computing vs pyzmq for cluster computing

I am currently working on some simulation code written in C, which runs on different remote machines. While the C part is finished I want to simplify my work by extending it with a python simulation api and some kind of a job-queue system, which should do the following:
1.specifiy a set of parameters on which simulations should be performed and put them into a queue on a host computer
2.perform simulation on remote machines by workers
3.return results to host computer
I had a look at different frameworks for accomplishing this task and my first choice goes down to IPython.parallel. I had a look at the documentation and from what I tested out it seems pretty easy to use. My approach would be to use a load balanced view like explained at
http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#creating-a-loadbalancedview-instance
But what I dont see is:
what happens i.e. if the ipcontroller crashes, is my job queue gone?
what happens if a remote machine crashes? is there some kind of error handling?
Since I run relatively long simulations (1-2 weeks) I don't want my simulations to fail if some part of the system crashes. So is there maybe some way to handle this in IPython.parallel?
My Second approach would be to use pyzmq and implement the jobsystem from scratch.
In this case what would be the best zmq-pattern for this situation?
And last but not least, is there maybe a better framework for this scenario?

What lies behind the curtain is a bit more complex view on how to arrange the work-package flow alongside the ( parallelised ) number-crunching pipeline(s).
Being the work-package of a many CPU-core-week(s),
or
being the lumpsum volume of the job above a few hundred-of-thousands of CPU-core-hours, the principles are similar and follow a common sense.
Key Features
scaleability of the computing performance of all resources involved ( ideally a linear one )
ease of task submission role
fault-resilience of submitted task(s) ( ideally with an automated self-healing )
feasible TCO cost of access to / use of a sufficient pool of resources ( upfront co$ts, recurring co$ts, adaptation$ co$ts, co$ts of $peed )
Approaches to Solution
home-brew architecture for a distributed massively parallel scheduler based self-healing computation engine
re-use of available grid-based computing resources
Based on own experience to solve a need for repetitive runs of numerical intensive optimisation problem over a vast parameterSetVectorSPACE ( which could not be de-composed into any trivialised GPU parallelisation scheme ), selection of the second approach has been validated to be more productive rather than an attempt to burn dozens of man*years in just-another-trial to re-invent a wheel.
Being in academia environment, one may get a way easier to an acceptable access to resources-pool(s) for processing the work-packages, while commercial entities may acquire the same, based on their acceptable budgeting tresholds.

My gut instinct is to suggest rolling your own solution for this, because like you said otherwise you're depending on IPython not crashing.
I would run a simple python service on each node which listens for run commands. When it receives one it launches your C program. However, I suggest you ensure the C program is a true Unix daemon, so when it runs it completely disconnects itself from python. That way if your node python instance crashes you can still get data if the C program executes successfully. Have the C program write the output data to a file or database, and when the task is finished write "finished" to a "status" or something similar. The python service should monitor that file and when finished is indicated it should retrieve the data and send it back to the server.
The central idea of this design is to have as few possible points of failure as possible. As long as the C program doesn't crash, you can still get the data one way or another. As far as handling system crashes, network disconnects, etc, that's up to you.

Saltstack Manage and Query a Tally/Threshold via events and salt-call?

I have over 100 web servers instances running a php application using apc and we occasionally (order of once per week across the entire fleet) see a corruption to one of the caches which results in a distinctive error log message.
Once this occurs then the application is dead on that node any transactions routed to it will fail.
I've written a simple wrapper around tail -F which can spot the patter any time it appears in the log file and evaluate a shell command (using bash eval) to react. I have this using the salt-call command from salt-stack to trigger processing a custom module which shuts down the nginx server, warms (refreshes) the cache, and, of course, restarts the web server. (Actually I have two forms of this wrapper, bash and Python).
This is fine and the frequency of events is such that it's unlikely to be an issue. However my boss is, quite reasonably, concerned about a common mode failure pattern ... that the regular expression might appear in too many of these logs at once and take town the entire site.
My first thought would be to wrap my salt-call in a redis check (we already have a Redis infrastructure used for caching and certain other data structures). That would be implemented as an integer, with an expiration. The check would call INCR, check the result, and sleep if more than N returned (or if the Redis server were unreachable). If the result were below the threshold then salt-call would be dispatched and a decrement would be called after the server is back up and running. (Expiration of the Redis key would kill off any stale increments after perhaps a day or even a few hours ... our alerting system will already have notified us of down servers and our response time is more than adequate for such time frames).
However, I was reading about the Saltstack event handling features and wondering if it would be better to use that instead. (Advantage, the nodes don't have redis-cli command tool nor the Python Redis libraries, but, obviously, salt-call is already there with its requisite support). So using something in Salt would minimize the need to add additional packages and dependencies to these systems. (Alternatively I could just write all the Redis handling as a separate PHP command line utility and just have my shell script call that).
Is there a HOWTO for writing simple Saltstack modules? The docs seem to plunge deeply into reference details without any orientation. Even some suggestions about which terms to search on would be helpful (because their use of terms like pillars, grains, minions, and so on seems somewhat opaque).

The main doc for writing a Salt module is here: http://docs.saltstack.com/en/latest/ref/modules/index.html
There are many modules shipped with Salt that might be helpful for inspiration. You can find them here: https://github.com/saltstack/salt/tree/develop/salt/modules
One thing to keep in mind is that the Salt Minion doesn't do anything unless you tell it to do something. So you could create a module that checks for the error pattern you mention, but you'd need to add it to the Salt Scheduler or cron to make sure it gets run frequently.
If you need more help you'll find helpful people on IRC in #salt on freenode.

Is there a better way to serve the results of an expensive, blocking python process over HTTP?

We have a web service which serves small, arbitrary segments of a fixed inventory of larger MP3 files. The MP3 files are generated on-the-fly by a python application. The model is, make a GET request to a URL specifying which segments you want, get an audio/mpeg stream in response. This is an expensive process.
We're using Nginx as the front-end request handler. Nginx takes care of caching responses for common requests.
We initially tried using Tornado on the back-end to handle requests from Nginx. As you would expect, the blocking MP3 operation kept Tornado from doing its thing (asynchronous I/O). So, we went multithreaded, which solved the blocking problem, and performed quite well. However, it introduced a subtle race condition (under real world load) that we haven't been able to diagnose or reproduce yet. The race condition corrupts our MP3 output.
So we decided to set our application up as a simple WSGI handler behind Apache/mod_wsgi (still w/ Nginx up front). This eliminates the blocking issue and the race condition, but creates a cascading load (i.e. Apache creates too many processses) on the server under real world conditions. We're working on tuning Apache/mod_wsgi right now, but still at a trial-and-error phase. (Update: we've switched back to Tornado. See below.)
Finally, the question: are we missing anything? Is there a better way to serve CPU-expensive resources over HTTP?
Update: Thanks to Graham's informed article, I'm pretty sure this is an Apache tuning problem. In the mean-time, we've gone back to using Tornado and are trying to resolve the data-corruption issue.
For those who were so quick to throw more iron at the problem, Tornado and a bit of multi-threading (despite the data integrity problem introduced by threading) handles the load acceptably on a small (single core) Amazon EC2 instance.

Have you tried Spawning? It is a WSGI server with a flexible assortment of threading modes.

Are you making the mistake of using embedded mode of Apache/mod_wsgi? Read:
http://blog.dscpl.com.au/2009/03/load-spikes-and-excessive-memory-usage.html
Ensure you use daemon mode if using Apache/mod_wsgi.

You might consider a queuing system with AJAX notification methods.
Whenever there is a request for your expensive resource, and that resource needs to be generated, add that request to the queue (if it's not already there). That queuing operation should return an ID of an object that you can query to get its status.
Next you have to write a background service that spins up worker threads. These workers simply dequeue the request, generate the data, then saves the data's location in the request object.
The webpage can make AJAX calls to your server to find out the progress of the generation and to give a link to the file once it's available.
This is how LARGE media sites work - those that have to deal with video in particular. It might be overkill for your MP3 work however.
Alternatively, look into running a couple machines to distribute the load. Your threads on Apache will still block, but atleast you won't consume resources on the web server.

Please define "cascading load", as it has no common meaning.
Your most likely problem is going to be if you're running too many Apache processes.
For a load like this, make sure you're using the prefork mpm, and make sure you're limiting yourself to an appropriate number of processes (no less than one per CPU, no more than two).

It looks like you are doing things right -- just lacking CPU power: can you determine what is the CPU loading in the process of generating these MP3?
I think the next thing you have to do there is to add more hardware to render the MP3's on other machines. Or that or find a way to deliver pre-rendered MP3 (maybe you can cahce some of your media?)
BTW, scaling for the web was the theme of a Keynote lecture by Jacob Kaplan-Moss on PyCon Brasil this year, and it is far from being a closed problem. The stack of technologies one needs to handle is quite impressible - (I could not find an online copy o f the presentation, though - -sorry for that)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.