Probing running python process for information

Probing running python process for information - python

I've created a python (and C, but the "controlling" part is Python) program for carrying out Bayesian inversion using Markov chain Monte Carlo methods. Unfortunately, McMC can take several days to run. Part of my research is in reducing the time, but we can only reduce so much.
I'm running it on a headless Centos 7 machine using nohup since maintaining a connection and receiving prints for several days is not practical. However, I would like to be able to check in on my program occasionally to see how many iterations it's done, how many proposals have been accepted, whether it's out of burn-in, etc.
Is there something I can use to interact with the python process to get this info?

I would recommend SAWs (Scientific Application Web server). It creates a thread in your process to handle HTTP request. The variables of interest are returned to the client in JSON format .
For the variables managed by the python runtime, write them into a (JSON) file on the harddisk (or any shared memory) and use SimpleHTTPServer to serve it. The SAWs web interface on the client side can still be used as long as you follow the JSON format of SAWs.

Related

MongoDB Python parallel operation for a long time

I am developing an automation tool that is supposed to upgrade IP network devices.
I developed 2 totally separated script for the sake of simplicity - I am not an expert developer - one for the core and aggregation nodes, and one for the access nodes.
The tool executes software upgrade on the routers, and verifies the result by executing a set of post check commands. The device role implies the "size" of the router. Bigger routers take much more to finish the upgrade. Meanwhile the smaller ones are up much earlier than the bigger ones, the post check cannot be started until the bigger ones finish the upgrade, because they are connected to each other.
I want to implement a reliable signaling between the 2 scripts. That is, the slower script(core devices) flips a switch when the core devices are up, while the other script keeps checking this value, and start the checks for the access devices.
Both script run 200+ concurrent sessions moreover, each and every access device(session) needs individual signaling, so all the sessions keep checking the same value in the DB.
First I used the keyring library, but noticed that the keys do disappear sometimes. Now I am using a txt file to manipulate the signal values. It looks pretty much armature, so I would like to use MongoDB.
Would it cause any performance issues or unexpected exception?
The script will be running for 90+ minutes. Is it OK to connect to the DB once at the beginning of the script, set the signal to False, then 20~30 minutes later keep checking for an additional 20 minutes. Or is it advised to establish a new connection for reading the value for each and every parallel session?
The server runs on the same VM as the script. What exceptions shall I expect?
Thank you!

Creating a Numerical Simulation Microservice in C++ with Docker

Greetings Stackoverflow community!
I recently learned about the power of microservices and containers, and I decided to wrap some of my numerical simulations codes in C++ and make them available as an API. Here are some requirements/details of my applications:
My simulators are coded in C++ with a few dependencies that I link via dynamic or static libraries in windows (e.g. Hypre, for solution of linear systems). They also run in parallel with MPI/OpenMP (in the future I would like to implement CUDA support as well).
The input for a simulator is a simple configuration file with some keys (.json format) and a data file (ascii but could be binary as well) with millions of entries (these are fields with one value for each simulation cells, and my models can be as large as 500x500x500 (=125000000 cells).
A typical call to the simulator in Windows is: mpiexec -n 4 mysimulator.exe "C:\path\to\config.json". Inside my configuration file I have other absolute path to the ascii file with the cellwise values.
I would like to "containerize" this whole mess and create an api available through HTTP requests or any other protocol that would allow the code to be run from outside the container. While the simulation microservice is running on a remote machine, anyone should be able to send a configuration file and the big ascii or binary file to the container, which would receive the request, perfom the simulation and somehow send back the results (which can be files and/or numerical values).
After some research, I feel this could be achieved with the following approach.
Create a docker image with the C++ code. When the container is created using the image as a blueprint, we obtain a binary executable of the C++ simulator.
Implement a python interface that handles the incoming requests using flask or django. We listen to requests at a certain port and once we get a request, we call the binary executable using python's subprocess.
The simulator somehow needs to send a "simulation status" back since these simulations can take hours to finish.
I have a few questions:
Is python "subprocess" call to a binary executable with the C++ code the way to go? Or is it easier/more recommended to implement the treatment to the API calls inside the C++ code?
How do you typically send a big binary/ascii file through HTTP to the microservice running inside a docker container?
If I have a workstation with - let's say - 16 cores...and I want to allow each user to run at most 2 processors, I could have a max of 8 parallel instances. This way, would I need 8 containers running simultaneously in the computer?
Since the simulations take hours to finish, what's the best approach to interact with the client who's requesting the simulation results? Are events typically used in this context?
Thanks,
Rafael.

Is python "subprocess" call to a binary executable with the C++ code the way to go? Or is it easier/more recommended to implement the treatment to the API calls inside the C++ code?
If you don't have performance concerns, use whatever faster to achieve and easier to scale according to your skills. Use the language that you're comfortable with. If performance is essential, then choose it wisely or refactor them later.
How do you typically send a big binary/ascii file through HTTP to the microservice running inside a docker container?
Depends on the scenario. It's possible to send a data through end point or send them part by part. You may refer to this post for restful update.
If I have a workstation with - let's say - 16 cores...and I want to allow each user to run at most 2 processors, I could have a max of 8 parallel instances. This way, would I need 8 containers running simultaneously in the computer?
Keep your service simple. If one service uses only 1 or 2 cores. Then run multiple instance. Since it's easy to scale rather than create a complex multithreading program.
Since the simulations take hours to finish, what's the best approach to interact with the client who's requesting the simulation results? Are events typically used in this context?
Event would be good enough. Use polling if simulation status is important.
Note: This is more of opinion based post, but it has general scenarios worth answering.

AWS ec2 increasing load handling

I have code that uses a GET command from the Python REQUESTS library to pull data from an API. I am expecting, for example, 10 large files to be sent to me.
Can someone help explain to me how my code should be written where I can take 1 file and analyze it and then take another file in parallel to analyze that and so on? Is it possible to analyze all 10 at once?

First, this is not really a question about AWS and EC2.
Assuming that you don't want to rewrite your code too significantly, you may want to concurrently run many instances of your Python program, each with a different input file as the argument.
Assuming a typical workflow is:
python blah.py inputfile.xyz
You could now run something like:
python blah.py inputfile1.xyz &
python blah.py inputfile2.xyz &
...
python blah.py inputfileN.xyz &
wait
Note: this is the lazy way out. Optimal solutions will require rewriting code to be multithreaded, and analyzing your various resource limits.
The number of processes that you run should be limited by the number of vCPUs provided by your EC2 instance.
You may also be limited by your network bandwidth, in terms of multiple parallel downloads. Finally, some EC2 instances have burst limits after which they perform noticeably poorly.

IPython parallel computing vs pyzmq for cluster computing

I am currently working on some simulation code written in C, which runs on different remote machines. While the C part is finished I want to simplify my work by extending it with a python simulation api and some kind of a job-queue system, which should do the following:
1.specifiy a set of parameters on which simulations should be performed and put them into a queue on a host computer
2.perform simulation on remote machines by workers
3.return results to host computer
I had a look at different frameworks for accomplishing this task and my first choice goes down to IPython.parallel. I had a look at the documentation and from what I tested out it seems pretty easy to use. My approach would be to use a load balanced view like explained at
http://ipython.org/ipython-doc/dev/parallel/parallel_task.html#creating-a-loadbalancedview-instance
But what I dont see is:
what happens i.e. if the ipcontroller crashes, is my job queue gone?
what happens if a remote machine crashes? is there some kind of error handling?
Since I run relatively long simulations (1-2 weeks) I don't want my simulations to fail if some part of the system crashes. So is there maybe some way to handle this in IPython.parallel?
My Second approach would be to use pyzmq and implement the jobsystem from scratch.
In this case what would be the best zmq-pattern for this situation?
And last but not least, is there maybe a better framework for this scenario?

What lies behind the curtain is a bit more complex view on how to arrange the work-package flow alongside the ( parallelised ) number-crunching pipeline(s).
Being the work-package of a many CPU-core-week(s),
or
being the lumpsum volume of the job above a few hundred-of-thousands of CPU-core-hours, the principles are similar and follow a common sense.
Key Features
scaleability of the computing performance of all resources involved ( ideally a linear one )
ease of task submission role
fault-resilience of submitted task(s) ( ideally with an automated self-healing )
feasible TCO cost of access to / use of a sufficient pool of resources ( upfront co$ts, recurring co$ts, adaptation$ co$ts, co$ts of $peed )
Approaches to Solution
home-brew architecture for a distributed massively parallel scheduler based self-healing computation engine
re-use of available grid-based computing resources
Based on own experience to solve a need for repetitive runs of numerical intensive optimisation problem over a vast parameterSetVectorSPACE ( which could not be de-composed into any trivialised GPU parallelisation scheme ), selection of the second approach has been validated to be more productive rather than an attempt to burn dozens of man*years in just-another-trial to re-invent a wheel.
Being in academia environment, one may get a way easier to an acceptable access to resources-pool(s) for processing the work-packages, while commercial entities may acquire the same, based on their acceptable budgeting tresholds.

My gut instinct is to suggest rolling your own solution for this, because like you said otherwise you're depending on IPython not crashing.
I would run a simple python service on each node which listens for run commands. When it receives one it launches your C program. However, I suggest you ensure the C program is a true Unix daemon, so when it runs it completely disconnects itself from python. That way if your node python instance crashes you can still get data if the C program executes successfully. Have the C program write the output data to a file or database, and when the task is finished write "finished" to a "status" or something similar. The python service should monitor that file and when finished is indicated it should retrieve the data and send it back to the server.
The central idea of this design is to have as few possible points of failure as possible. As long as the C program doesn't crash, you can still get the data one way or another. As far as handling system crashes, network disconnects, etc, that's up to you.

Tricks to improve performance of python backend

I am using python programs to nearly everything:
deploy scripts
nagios routines
website backend (web2py)
The reason why I am doing this is because I can reuse the code to provide different kind of services.
Since a while ago I have noticed that those scripts are putting a high CPU load on my servers. I have taken several steps to mitigate this:
late initialization, using cached_property (see here and here), so that only those objects needed are indeed initialized (including import of the related modules)
turning some of my scripts into http services (with a simple web.py implementation, wrapping-up my classes). The services are then triggered (by nagios for example), with simple curl calls.
This has reduced the load dramatically, going from over 20 CPU load to well under 1. It seems python startup is very resource intensive, for complex programs with lots of inter-dependencies.
I would like to know what other strategies are people here implementing to improve the performance of python software.

An easy one-off improvement is to use PyPy instead of the standard CPython for long-lived scripts and daemons (for short-lived scripts it's unlikely to help and may actually have longer startup times). Other than that, it sounds like you've already hit upon one of the biggest improvements for short-lived system scripts, which is to avoid the overhead of starting the Python interpreter for frequently-invoked scripts.
For example, if you invoke one script from another and they're both in Python you should definitely consider importing the other script as a module and calling its functions directly, as opposed to using subprocess or similar.
I appreciate that it's not always possible to do this, since some use-cases rely on external scripts being invoked - Nagios checks, for example, are going to be tricky to keep resident at all times. Your approach of making the actual check script a simple HTTP request seems reasonable enough, but the approach I took was to use passive checks and run an external service to periodically update the status. This allows the service generating check results to be resident as a daemon rather than requiring Nagios to invoke a script for each check.
Also, watch your system to see whether the slowness really is CPU overload or IO issues. You can use utilities like vmstat to watch your IO usage. If you're IO bound then optimising your code won't necessarily help a lot. In this case, if you're doing something like processing lots of text files (e.g. log files) then you can store them gzipped and access them directly using Python's gzip module. This increases CPU load but reduces IO load because you only need transfer the compressed data from disk. You can also write output files directly in gzipped format using the same approach.
I'm afraid I'm not particularly familiar with web2py specifically, but you can investigate whether it's easy to put a caching layer in front if the freshness of the data isn't totally critical. Try and make sure both your server and clients use conditional requests correctly, which will reduce request processing time. If they're using a back-end database, you could investigate whether something like memcached will help. These measures are only likely to give you real benefit if you're experiencing a reasonably high volume of requests or if each request is expensive to handle.
I should also add that generally reducing system load in other ways can occasionally give surprising benefits. I used to have a relatively small server running Apache and I found moving to nginx helped a surprising amount - I believe it was partly more efficient request handling, but primarily it freed up some memory that the filesystem cache could then use to further boost IO-bound operations.
Finally, if overhead is still a problem then carefully profile your most expensive scripts and optimise the hotspots. This could be improving your Python code, or it could mean pushing code out to C extensions if that's an option for you. I've had some great performance by pushing data-path code out into C extensions for large-scale log processing and similar tasks (talking about hundreds of GB of logs at a time). However, this is a heavy-duty and time-consuming approach and should be reserved for the few places where you really need the speed boost. It also depends whether you have someone available who's familiar enough with C to do it.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.