I am working on a web service that requires user input python code to be executed on my server (we have checks for code injection). I have to import a rather large module so I would like to make sure that I am not starting up python and importing the module from scratch each time something runs (it takes about 4-6s).
To do this I was planning to create a python (3.2) deamon that imports the user input code as a module, executes it and then delete/garbage collect that module. I need to make sure that that module is completely gone from RAM since this process will continue until the server is restarted. I have read a bunch of things that say this is a very difficult thing to do in python.
What is the best way to do this? Would it be better to use exec to define a function with the user input code (for variable scoping) and then execute that function and somehow remove the function? Or is there a better way to do this process that I have missed?
You could perhaps consider to create a pool of python daemon processes?
Their purpose would be to serve one request and to die afterwards.
You would have to write a pool-manager that ensures that there are always X daemon processes waiting for an incoming request. (X being the number of waiting daemon processes: depending on the required workload). The pool-manager would have to observe the pool of daemon processes and start new instances every time a process was finished.
Related
I need to run a specific manage.py commands on an EC2 instance every X minutes. For example: python manage.py some_command.
I have looked up django-chronograph. Following the instructions, I've added chronograph to my settings.py but on runserver it keeps telling me No module named chronograph.
Is there something I'm missing to get this running? And after running how do I get manage.py commands to run using chronograph?
Edit: It's installed in the EC2 instance's virtualenv.
I would suggest you to configure cron to run your command at specific times/intervals.
First, install it by running pip install django-chronograph.
I would say handle this through cross, but if you don't want to use cross then:
Make sure you installed the module in the virtualenv (With easy_install, pip, or any other way that Amazon EC2 allows). After that you might want to look up the threading module documentation:
Python 2 threading module documentation
Python 3 threading module documentation
The purpose of using threading will be to have the following structure:
A "control" thread, which will use the chronograph module and do the time measurements, and putting the new work to do in an "input queue" on each scheduled time, for the worker threads (which will be active already) to process, or just trigger each worker thread (make it active) at the time you want to trigger each execution. In the first case you'll be taking advantage of parallel threads to do a big chunk of work and minimize io wait times, but since the work is in a queue, the workers will process one at a time. Meaning if you schedule two things too close together and the previous element is still being processed, the new item will have to wait (Depending on your programming logic and amount of worker threads some workers might start processing the new item, but is a bit more complex logic).
In the second case your control thread will actually trigger the start of a new thread (or group of threads) each time you want to trigger a scheduled action. If there's big data to process you might need to spawn a new queue for each task to process and create a group of worker threads for it for each task, but if the data is not that big then you can just get away with having the worker process just one data package and be done once execution is done and you get a result. Either way this method will allow you to schedule tasks without limitation on how close they can be, since new independent worker threads will be created for them every time.
Finally, you might want to create an "output queue" and output thread, to store and process (or output, or anything else you want to do with it...) the results of each worker threads.
The control thread will be basically trying to imitate cron in its logic, triggering actions at certain times depending on how it was configured.
There's also a multiprocessing module in python which will work with processes instead and take advantage of true multiprocessing hardware, but I don't think you'll really need it in this case, unless you see performance issues caused by cpu performance.
If you need any clarification, help, examples, just let me know.
I've been looking at examples from other people but I can't seem to get it to work properly.
It'll either use a single core, or basically freeze up maya if given too much to process, but I never seem to get more than one core working at once.
So for example, this is kind of what I'd like it to do, on a very basic level. Mainly just let each loop run simultaneously on a different processor with the different values (in this case, the two values would use two processors)
mylist = [50, 100, 23]
newvalue = [50,51]
for j in range(0, len(newvalue)):
exists = False
for i in range(0, len(mylist)):
#search list
if newvalue[j] == mylist[i]:
exists = True
#add to list
if exists == True:
mylist.append(mylist)
Would it be possible to pull this off? The actual code I'm wanting to use it on can take from a few seconds to like 10 minutes for each loop, but they could theoretically all run at once, so I thought multithreading would speed it up loads
Bear in mind I'm still relatively new to python so an example would be really appreciated
Cheers :)
There are really two different answers to this.
Maya scripts are really supposed to run in the main UI thread, and there are lots of ways they can trip you up if run from a separate thread. Maya includes a module called maya.utils which includes methods for deferred evaluation in the main thread. Here's a simple example:
import maya.cmds as cmds
import maya.utils as utils
import threading
def do_in_main():
utils.executeDeferred (cmds.sphere)
for i in range(10):
t = threading.Thread(target=do_in_main, args=())
t.start()
That will allow you to do things with the maya ui from a separate thread (there's another method in utils that will allow the calling thread to await a response too). Here's a link to the maya documentation on this module
However, this doesn't get you around the second aspect of the question. Maya python isn't going to split up the job among processors for you: threading will let you create separate threads but they all share the same python intepreter and the global interpreter lock will mean that they end up waiting for it rather than running along independently.
You can't use the multiprocessing module, at least not AFAIK, since it spawns new mayas rather than pushing script execution out into other processors in the Maya you are running within. Python aside, Maya is an old program and not very multi-core oriented in any case. Try XSI :)
Any threading stuff in Maya is tricky in any case - if you touch the main application (basically, any function from the API or a maya.whatever module) without the deferred execution above, you'll probably crash maya. Only use it if you have to.
And, BTW, you cant use executeDeferred, etc in batch mode since they are implemented using the main UI loop.
What theodox says is still true today, six years later. However one may go another route by spawning a new process by using the subprocess module. You'll have to communicate and share data via sockets or something similar since the new process is in a seperate interpreter. The new interpreter runs on its own and doesn't know about Maya but you can do any other work in it benefitting from the multi-threaded environment your OS provides before communicating it back to your Maya python script.
I've recently started experimenting with using Python for web development. So far I've had some success using Apache with mod_wsgi and the Django web framework for Python 2.7. However I have run into some issues with having processes constantly running, updating information and such.
I have written a script I call "daemonManager.py" that can start and stop all or individual python update loops (Should I call them Daemons?). It does that by forking, then loading the module for the specific functions it should run and starting an infinite loop. It saves a PID file in /var/run to keep track of the process. So far so good. The problems I've encountered are:
Now and then one of the processes will just quit. I check ps in the morning and the process is just gone. No errors were logged (I'm using the logging module), and I'm covering every exception I can think of and logging them. Also I don't think these quitting processes has anything to do with my code, because all my processes run completely different code and exit at pretty similar intervals. I could be wrong of course. Is it normal for Python processes to just die after they've run for days/weeks? How should I tackle this problem? Should I write another daemon that periodically checks if the other daemons are still running? What if that daemon stops? I'm at a loss on how to handle this.
How can I programmatically know if a process is still running or not? I'm saving the PID files in /var/run and checking if the PID file is there to determine whether or not the process is running. But if the process just dies of unexpected causes, the PID file will remain. I therefore have to delete these files every time a process crashes (a couple of times per week), which sort of defeats the purpose. I guess I could check if a process is running at the PID in the file, but what if another process has started and was assigned the PID of the dead process? My daemon would think that the process is running fine even if it's long dead. Again I'm at a loss just how to deal with this.
Any useful answer on how to best run infinite Python processes, hopefully also shedding some light on the above problems, I will accept
I'm using Apache 2.2.14 on an Ubuntu machine.
My Python version is 2.7.2
I'll open by stating that this is one way to manage a long running process (LRP) -- not de facto by any stretch.
In my experience, the best possible product comes from concentrating on the specific problem you're dealing with, while delegating supporting tech to other libraries. In this case, I'm referring to the act of backgrounding processes (the art of the double fork), monitoring, and log redirection.
My favorite solution is http://supervisord.org/
Using a system like supervisord, you basically write a conventional python script that performs a task while stuck in an "infinite" loop.
#!/usr/bin/python
import sys
import time
def main_loop():
while 1:
# do your stuff...
time.sleep(0.1)
if __name__ == '__main__':
try:
main_loop()
except KeyboardInterrupt:
print >> sys.stderr, '\nExiting by user request.\n'
sys.exit(0)
Writing your script this way makes it simple and convenient to develop and debug (you can easily start/stop it in a terminal, watching the log output as events unfold). When it comes time to throw into production, you simply define a supervisor config that calls your script (here's the full example for defining a "program", much of which is optional: http://supervisord.org/configuration.html#program-x-section-example).
Supervisor has a bunch of configuration options so I won't enumerate them, but I will say that it specifically solves the problems you describe:
Backgrounding/Daemonizing
PID tracking (can be configured to restart a process should it terminate unexpectedly)
Log normally in your script (stream handler if using logging module rather than printing) but let supervisor redirect to a file for you.
You should consider Python processes as able to run "forever" assuming you don't have any memory leaks in your program, the Python interpreter, or any of the Python libraries / modules that you are using. (Even in the face of memory leaks, you might be able to run forever if you have sufficient swap space on a 64-bit machine. Decades, if not centuries, should be doable. I've had Python processes survive just fine for nearly two years on limited hardware -- before the hardware needed to be moved.)
Ensuring programs restart when they die used to be very simple back when Linux distributions used SysV-style init -- you just add a new line to the /etc/inittab and init(8) would spawn your program at boot and re-spawn it if it dies. (I know of no mechanism to replicate this functionality with the new upstart init-replacement that many distributions are using these days. I'm not saying it is impossible, I just don't know how to do it.)
But even the init(8) mechanism of years gone by wasn't as flexible as some would have liked. The daemontools package by DJB is one example of process control-and-monitoring tools intended to keep daemons living forever. The Linux-HA suite provides another similar tool, though it might provide too much "extra" functionality to be justified for this task. monit is another option.
I assume you are running Unix/Linux but you don't really say. I have no direct advice on your issue. So I don't expect to be the "right" answer to this question. But there is something to explore here.
First, if your daemons are crashing, you should fix that. Only programs with bugs should crash. Perhaps you should launch them under a debugger and see what happens when they crash (if that's possible). Do you have any trace logging in these processes? If not, add them. That might help diagnose your crash.
Second, are your daemons providing services (opening pipes and waiting for requests) or are they performing periodic cleanup? If they are periodic cleanup processes you should use cron to launch them periodically rather then have them run in an infinite loop. Cron processes should be preferred over daemon processes. Similarly, if they are services that open ports and service requests, have you considered making them work with INETD? Again, a single daemon (inetd) should be preferred to a bunch of daemon processes.
Third, saving a PID in a file is not very effective, as you've discovered. Perhaps a shared IPC, like a semaphore, would work better. I don't have any details here though.
Fourth, sometimes I need stuff to run in the context of the website. I use a cron process that calls wget with a maintenance URL. You set a special cookie and include the cookie info in with wget command line. If the special cookie doesn't exist, return 403 rather than performing the maintenance process. The other benefit here is login to the database and other environmental concerns of avoided since the code that serves normal web pages are serving the maintenance process.
Hope that gives you ideas. I think avoiding daemons if you can is the best place to start. If you can run your python within mod_wsgi that saves you having to support multiple "environments". Debugging a process that fails after running for days at a time is just brutal.
I'm writing some code that needs to run on different OS platforms and interact with separate processes. To write tests for it, I need to be able to create processes from python that do nothing but wait to be signaled to stop. I would like to be able to create some processes that recursively create more.
Also (this part might be a little strange), it would be best for my testing if I were able to create processes that weren't children of the creating process, so I could emulate conditions where, e.g., os.waitpid won't have permission to interact with the process, or where one process signals a factory to create a process rather than creating it directly.
If you're using Python 2.6 the multiprocessing package has some stuff you might find useful.
There's a very simple example on my github. If you run spawner it will create 3 processes that run seperately, but use a channel to talk back to the spawner. So if you kill the spawner process the others you have started will die. I'm afraid there's a lot of redundant code in here, I'm in the middle of a refactoring, but I hope it gives a basic idea.
hi lets assume i have a simple programm in python. This programm is running every five minutes throught cron. but i dont know how to write it so the programm will allow to run multiple processes of its self simultaneously. i want to speed things up ...
I'd handle the forking and process control inside your main python program. Let the cron spawn only a single process and that process be a master for (possible multiple) worker processes.
As for how you can create multiple workers, there's the threading module for multi threading and multiprocessing module for multi processing. You can also keep your actual worker code as separate files and use the subprocess module.
Now that I think about it, maybe you should use supervisord to do the actual process control and simply write the actual work code.