How to farm possibly unstable code out to a separate process?

How to farm possibly unstable code out to a separate process? - python

I've got a web.py server that has a problem: it's supposed to take input from the user and process it. Some of this processing takes place in a .pyd that I didn't write and I can't debug or rebuild, and it has some bugs. There are certain inputs that can cause it to throw an Access Violation, which for some reason cannot be caught and causes python.exe to crash, bringing down the server.
This is unacceptable, but also unavoidable, so I need to change the rules a little.
What I'd like to do is to move the unstable functionality, the interface to which is already cleanly contained in a single .py file, out of the server and have the server launch it as a separate process. If it works successfully, it should print the output, which (conveniently enough) is a text string. If not, the server should detect that the separate process crashed, and return HTTP 500. Unfortunately, I don't know how to do multi-process work in Python.
How would I implement this?

Related

"Crash-proofing" script vs.using systemd to guarantee near-constant operation

Perhaps this is a broad question, but I haven't found an answer elsewhere, so here goes.
The Python script I'm writing needs to run constantly (in a perfect world, I recognize this may not be exactly possible) on a deployed device. I've already dedicated time to adding "try...except" statements throughout so that, should an issue arise, the script will recover and continue to work.
The issue is that I'm not sure I can (nor should) handle every single possible exception that may be thrown. As such, I've decided it may be better to allow the script to die and to use systemd to restart it.
The three options:
Making no attempt to handle any exception, and just allowing systemd to restart it whenever it dies.
Meticulously creating handlers for every possible exception to guarantee that, short of loss of power, interpreter bug, or heat death of the universe, the script will always run.
A mix of the two -- making an effort to prevent crashes in some cases while allowing them in others and letting systemd restart the script.
The third choice seems the most reasonable to me. So the question is this: What factors should be considered when optimizing between "crash-proof" code and allowing a crash and restart by systemd?
For some more application specific information: there is a small but noticeable overhead involved with starting the script, the main portion will run between 50 to 100 times per second, it is not "mission critical" in that there will be no death/damage in the event of failure (just some data loss), and I already expect intermittent issues with the network it will be on.

All known exceptional cases should be handled. Any undefined behavior is a potential security issue.
As you suggest, it is also prudent to plan for unknown exceptions. Perhaps there's also a small memory leak that will also cause the application to crash even when it's running correctly. So, it's still prudent to have systemd automatically restart it if it fails, even when all expected failure modes have been handled.

Running an infinite Python script while connected to database

I'm working on a project to learn Python, SQL, Javascript, running servers -- basically getting a grip of full-stack. Right now my basic goal is this:
I want to run a Python script infinitely, which is constantly making API calls to different services, which have different rate limits (e.g. 200/hr, 1000/hr, etc.) and storing the results (ints) in a database (PostgreSQL). I want to store these results over a period of time and then begin working with that data to display fun stuff on the front. I need this to run 24/7. I'm trying to understand the general architecture here, and searching around has proven surprisingly difficult. My basic idea in rough pseudocode is this:
database.connect()
def function1(serviceA):
while(True):
result = makeAPIcallA()
INSERT INTO tableA result;
if(hitRateLimitA):
sleep(limitTimeA)
def function2(serviceB):
//same thing, different limits, etc.
And I would ssh into my server, run python myScript.py &, shut my laptop down, and wait for the data to roll in. Here are my questions:
Does this approach make sense, or should I be doing something completely different?
Is it considered "bad" or dangerous to open a database connection indefinitely like this? If so, how else do I manage the DB?
I considered using a scheduler like cron, but the rate limits are variable. I can't run the script every hour when my limit is hit say, 5min into start time and has a wait time of 60min after that. Even running it on minute intervals seems messy: I need to sleep for persistent rate limit wait times which will keep varying. Am I correct in assuming a scheduler is not the way to go here?
How do I gracefully handle any unexpected potentially fatal errors (namely, logging and restarting)? What about manually killing the script, or editing it?
I'm interested in learning different approaches and best practices here -- any and all advice would be much appreciated!

I actually do exactly what you do for one of my personal applications and I can explain how I do it.
I use Celery instead of cron because it allows for finer adjustments in scheduling and it is Python and not bash, so it's easier to use. I have different tasks (basically a group of API calls and DB updates) to different sites running at different intervals to account for the various different rate limits.
I have the Celery app run as a service so that even if the system restarts it's trivial to restart the app.
I use the logging library in my application extensively because it is difficult to debug something when all you have is one difficult to read stack trace. I have INFO-level and DEBUG-level logs spread throughout my application, and any WARNING-level and above log gets printed to the console AND gets sent to my email.
For exception handling, the majority of what I prepare for are rate limit issues and random connectivity issues. Make sure to surround whatever HTTP request you send to your API endpoints in try-except statements and possibly just implement a retry mechanism.
As far as the DB connection, it shouldn't matter how long your connection is, but you need to make sure to surround your main application loop in a try-except statement and make sure it gracefully fails by closing the connection in the case of an exception. Otherwise you might end up with a lot of ghost connections and your application not being able to reconnect until those connections are gone.

Multiple rs.exe invocations

I've a python based SSRS report generation utility that I'm using to generate multiple reports (often 100+). The way it's setup is -
Multiple threads are invoked using threading.Thread and each of them is given a dictionary.
Each thread parses the dictionary and calls rs.exe passing in relevant arguments via python's subprocess.call
Reports get generated with the following caveats -
If there are around 20-30 reports everything works fine without much issues.
If the number of reports go beyond 40-50+ (for reasons unknown to me so far), some of the reports don't get rendered and come back with error
as obtained by subprocess.call non-zero status (Error message from subprocess.call does not point to any real error).
But there is no error in those rs.exe commands, as they get rendered when i run them from windows command prompt.
Additionally when i try to re-run all those failed reports they get rendered. There's no change in the commands or data while they're being re-run.
To work around this, I employed a retry logic for 2 iterations which seems to fix the issue at times. However when the reports go beyond 100/150+
even the retry doesn't work. Now i could extend the retry logic to keep retrying until all the reports are rendered and whatever failures happen
are genuine ones (like RDL not found, corrupted and so on). But before i do any such thing, want to know if there's any limitation on how many
rs.exe can be launched simultaneously or if there's any limitation on python's subproces.call when invoked in a multi-threaded context.
Can someone please share their expertise if they've faced this kind of issue and resolved it?
Thanks.

I suspect the limit you are hitting is not rs.exe itself but the target Report Server. This will use as much physical memory as is available but when that is exhausted, further requests will start to fail. This is described in the SSRS doco:
https://msdn.microsoft.com/en-us/library/ms159206.aspx
To avoid this issue and leave some server resources for other users, I would reduce your thread limit as low as you can stand - ideally to 1.

Infinite While Loop - Memory Use?

I am trying to figure out how a while loop determines how much memory to use.
At the basic level:
while True:
pass
If I did a similar thing in PHP, it would grind my localhost to a crawl. But in this case I would expect it to use next to nothing.
So for an infinite loop in python, should I expect that it would use up tons of memory, or does it scale according to what is being done, and where?

Your infinite loop is a no-op (it doesn't do anything), so it won't increase the memory use beyond what is being used by the rest of your program. To answer your question, you need to post the code that you suspect is causing memory problems.
In PHP however, the same loop will "hang" because the web server is expecting a response to send back to the client. Since no response is being received, the web browser will simply "freeze". Depending on how the web server is configured, it may choose to end the process an issue a timeout error.
You could do the same if you used Python and a web framework and put an infinite loop in one of your methods that returns a response to the client.
If you ran the equivalent PHP code from the shell, it will have the same effect as if it was written in Python (or any other language). That is, your console will block until you kill the process.
I'm asking because I want to create a program that runs infinitely,
but I'm not sure how to determine it's footprint or how much it will
take from system resources.
A program that runs indefinitely (I think that's what you mean) - it generally has two cases:
Its waiting to do some work on a trigger (like a web server runs indefinitely, but its just sitting there until someone visits your website)
Its doing a process that is taking a long time.
For #2, you need to determine the resource use by figuring out what is the work being done.
If its building a large list of items to do some calculations/sorting, then memory use will grow as the list grows.
If its processing a bunch of files, and during this process, it generates a lot of output stored on disk - then disk usage will grow, and then shrink when the process is done.
If its a rendering engine, then memory use and CPU use will increase, along with disk use as the memory is swapped out during rendering. However, such a system will not tax the disk too much.
The bottom line is, you can't get an answer to this unless you explain the process being run.

How do I run long term (infinite) Python processes?

I've recently started experimenting with using Python for web development. So far I've had some success using Apache with mod_wsgi and the Django web framework for Python 2.7. However I have run into some issues with having processes constantly running, updating information and such.
I have written a script I call "daemonManager.py" that can start and stop all or individual python update loops (Should I call them Daemons?). It does that by forking, then loading the module for the specific functions it should run and starting an infinite loop. It saves a PID file in /var/run to keep track of the process. So far so good. The problems I've encountered are:
Now and then one of the processes will just quit. I check ps in the morning and the process is just gone. No errors were logged (I'm using the logging module), and I'm covering every exception I can think of and logging them. Also I don't think these quitting processes has anything to do with my code, because all my processes run completely different code and exit at pretty similar intervals. I could be wrong of course. Is it normal for Python processes to just die after they've run for days/weeks? How should I tackle this problem? Should I write another daemon that periodically checks if the other daemons are still running? What if that daemon stops? I'm at a loss on how to handle this.
How can I programmatically know if a process is still running or not? I'm saving the PID files in /var/run and checking if the PID file is there to determine whether or not the process is running. But if the process just dies of unexpected causes, the PID file will remain. I therefore have to delete these files every time a process crashes (a couple of times per week), which sort of defeats the purpose. I guess I could check if a process is running at the PID in the file, but what if another process has started and was assigned the PID of the dead process? My daemon would think that the process is running fine even if it's long dead. Again I'm at a loss just how to deal with this.
Any useful answer on how to best run infinite Python processes, hopefully also shedding some light on the above problems, I will accept
I'm using Apache 2.2.14 on an Ubuntu machine.
My Python version is 2.7.2

I'll open by stating that this is one way to manage a long running process (LRP) -- not de facto by any stretch.
In my experience, the best possible product comes from concentrating on the specific problem you're dealing with, while delegating supporting tech to other libraries. In this case, I'm referring to the act of backgrounding processes (the art of the double fork), monitoring, and log redirection.
My favorite solution is http://supervisord.org/
Using a system like supervisord, you basically write a conventional python script that performs a task while stuck in an "infinite" loop.
#!/usr/bin/python
import sys
import time
def main_loop():
while 1:
# do your stuff...
time.sleep(0.1)
if __name__ == '__main__':
try:
main_loop()
except KeyboardInterrupt:
print >> sys.stderr, '\nExiting by user request.\n'
sys.exit(0)
Writing your script this way makes it simple and convenient to develop and debug (you can easily start/stop it in a terminal, watching the log output as events unfold). When it comes time to throw into production, you simply define a supervisor config that calls your script (here's the full example for defining a "program", much of which is optional: http://supervisord.org/configuration.html#program-x-section-example).
Supervisor has a bunch of configuration options so I won't enumerate them, but I will say that it specifically solves the problems you describe:
Backgrounding/Daemonizing
PID tracking (can be configured to restart a process should it terminate unexpectedly)
Log normally in your script (stream handler if using logging module rather than printing) but let supervisor redirect to a file for you.

You should consider Python processes as able to run "forever" assuming you don't have any memory leaks in your program, the Python interpreter, or any of the Python libraries / modules that you are using. (Even in the face of memory leaks, you might be able to run forever if you have sufficient swap space on a 64-bit machine. Decades, if not centuries, should be doable. I've had Python processes survive just fine for nearly two years on limited hardware -- before the hardware needed to be moved.)
Ensuring programs restart when they die used to be very simple back when Linux distributions used SysV-style init -- you just add a new line to the /etc/inittab and init(8) would spawn your program at boot and re-spawn it if it dies. (I know of no mechanism to replicate this functionality with the new upstart init-replacement that many distributions are using these days. I'm not saying it is impossible, I just don't know how to do it.)
But even the init(8) mechanism of years gone by wasn't as flexible as some would have liked. The daemontools package by DJB is one example of process control-and-monitoring tools intended to keep daemons living forever. The Linux-HA suite provides another similar tool, though it might provide too much "extra" functionality to be justified for this task. monit is another option.

I assume you are running Unix/Linux but you don't really say. I have no direct advice on your issue. So I don't expect to be the "right" answer to this question. But there is something to explore here.
First, if your daemons are crashing, you should fix that. Only programs with bugs should crash. Perhaps you should launch them under a debugger and see what happens when they crash (if that's possible). Do you have any trace logging in these processes? If not, add them. That might help diagnose your crash.
Second, are your daemons providing services (opening pipes and waiting for requests) or are they performing periodic cleanup? If they are periodic cleanup processes you should use cron to launch them periodically rather then have them run in an infinite loop. Cron processes should be preferred over daemon processes. Similarly, if they are services that open ports and service requests, have you considered making them work with INETD? Again, a single daemon (inetd) should be preferred to a bunch of daemon processes.
Third, saving a PID in a file is not very effective, as you've discovered. Perhaps a shared IPC, like a semaphore, would work better. I don't have any details here though.
Fourth, sometimes I need stuff to run in the context of the website. I use a cron process that calls wget with a maintenance URL. You set a special cookie and include the cookie info in with wget command line. If the special cookie doesn't exist, return 403 rather than performing the maintenance process. The other benefit here is login to the database and other environmental concerns of avoided since the code that serves normal web pages are serving the maintenance process.
Hope that gives you ideas. I think avoiding daemons if you can is the best place to start. If you can run your python within mod_wsgi that saves you having to support multiple "environments". Debugging a process that fails after running for days at a time is just brutal.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.