How do I keep track of time constantly without the use of threads? I'm asking this as threads in python are generally looked down on especially because it "adds complexity" to any program. My program needs to receive/send WiFi commands, receive/send XBee(Serial) commands, as well as keeping track of time constantly.
What is the best solution to this? Should I go ahead and use threads or is there an alternative solution?
Outside of threading/multiprocessing, you can also use timer signals, which might be a little easier than "instrumenting the program" to do the calls itself, or otherwise creating your own main loop.
You can either instrument the program so that it makes regular calls to a timekeeping function, or you can use threads.
If the timekeeping could be done by a second process that'd be better in terms of management complexity, but you don't say why you need the time, so I can't judge whether that's a possibility.
Threads are not evil, they just shouldn't be your first resort.
Depending on what else you’re doing: if the other operations you’re performing just involve I/O through file descriptors, then you can use one of the select calls. Specify a timeout, so that if nothing happens within that interval, the call returns so you can update your clock before making the call again.
Related
I have some functions which may take a longer time to execute, in which case I would like to cut them short (in such a case I do not care about what happens in the function and the consequences to stop it short)
Since these functions are not all mine, I would like to exert this control from the calling program and not implement a check within the function itself (it would be a solution if there is a loop in the function where I could check the time spent, or use timeouts on some calls which support it etc.).
In other words I do not want to change the function.
Is such a mechanism available?
My immediate idea was to start a thread with the function as worker and periodically check if the thread is still live, killing it if it has not came back after the limit time. Unfortunately I learned that killing a thread in a non-cooperative way is not possible (that would be a solution for a process, though but using a process is not practical because it would complexify the existing shared objects).
EDIT: please note that this is not a duplicate of How to limit execution time of a function call in Python, the solutions there either point to a cooperative shutdown, or process vs thread. Both are addressed in my question. I slightly modified the title to emphasize where the control is.
I'm wanting to take photos from 2 different cameras at exactly the same time (or as close as possible).
If I use multithreading or multiprocessing, it still runs the threads/processes consecutively.. For instance if I start the following processes:
Take_photo_1.start()
Take_photo_2.start()
While those processes would run in parallel, the commands to start the processes are still executed sequentially. Is there any way to execute both those processes at exactly the same time?
There's no way to make this exact even if you're writing directly in machine code. Even if you have all the threads wait on a kernel barrier, that wait can take different times on different cores, and there are opcodes to process between the barrier wait and the camera get that have to get fetched and run on a system where the caches may be in different states, and there's nothing stopping the OS from stealing the CPU from one of the threads to run some completely unrelated code, and the I/O to the camera (even if it isn't serialized, which it may be) probably isn't a guaranteed static time, and so on.
When you throw an interpreted language on top of it (especially one with a GIL, like Python, which means the bytecodes between the barrier wait and the camera get can't be run in parallel)… well, you're not really changing anything; "impossible * 7" is still "impossible". But you are making it even more obvious.
Fortunately, very few real-life problems have a true hard real-time requirement like that. Instead, you have a requirement like "99.9% of the time, all camera gets should happen within +/-4ms of the desired exact 30fps". Or, maybe, "90% of the time it's within +/-1ms, 99.9% of the time it's within +/-4ms, 99.999% of the time it's within +/-20ms, as long as you don't do anything stupid like change the wall-power state of the laptop while running the code".
Or… well, only you know why you wanted "exact", and can figure out what the actual requirements are that would satisfy you.
And for that case, often the simplest thing to do is write the code the obvious way, stress test the hell out of it, see if it meets your requirements, and figure out how to optimize things only if it doesn't.
So, your existing code may well be fine.
If not, adding a shared barrier = threading.Barrier() and doing a barrier.wait() right before the camera.get() may be all you need.
You may need to add logic to detect timer lag and re-synchronize (which you might do independently in each thread, or have whichever thread gets there first compute it and just make everyone else wait at the barrier).
You may need to rewrite the core loop in C. Or dump whichever OS you're using for one with better real-time guarantees like QNX. Or throw out the OS entirely so there's no scheduler to get in the way. Or throw out the complex superscalar CPUs and implement the whole thing as a hardware state machine. Or…
But, assuming you have reasonable requirements in the first place, you usually don't have to go very far.
I'm using CCKeyDerivationPBKDF to generate and verify password hashes in a concurrent environment and I'd like to know whether it it thread safe. The documentation of the function doesn't mention thread safety at all, so I'm currently using a lock to be on the safe side but I'd prefer not to use a lock if I don't have to.
After going through the source code of the CCKeyDerivationPBKDF() I find it to be "thread unsafe". While the code for CCKeyDerivationPBKDF() uses many library functions which are thread-safe(eg: bzero), most user-defined function(eg:PRF) and the underlying functions being called from those user-defined functions, are potentially thread-unsafe. (For eg. due to use of several pointers and unsafe casting of memory eg. in CCHMac). I would suggest unless they make all the underlying functions thread-safe or have some mechanism to alteast make it conditionally thread-safe, stick with your approach, or modify the commoncrypto code to make it thread-safe and use that code.
Hope it helps.
Lacking documentation or source code, one option is to build a test app with say 10 threads looping on calls to CCKeyDerivationPBKDF with a random selection from say 10 different sets of arguments with 10 known results.
Each thread checks the result of a call to make sure it is what is expected. Each thread should also have a usleep() call for some random amount of time (bell curve sitting on say 10% of the time each call to CCKeyDerivationPBKDF takes) in this loop in order to attempt to interleave operations as much as possible.
You'll probably want to instrument it with debugging that keeps track of how much concurrency you are able to generate. With a 10% sleep time and 10 threads, you should be able to keep 9 threads concurrent.
If it makes it through an aggregate of say 100,000,000 calls without an error, I'd assume it was thread safe. Of course you could run it for much longer than that to get greater assurances.
My script accepts arbitrary-length and -content strings of Python code, then runs them inside exec() statements. If the time to run the arbitrary code passes over some predetermined limit, then the exec() statement needs to exit and a boolean flag needs to be set to indicate that a premature exit has occurred.
How can this be accomplished?
Additional information
These pieces of code will be running in parallel in numerous threads (or at least as parallel as you can get with the GIL).
If there is an alternative method in another language, I am willing to try it out.
I plan on cleaning the code to prevent access to anything that might accidentally damage my system (file and system access, import statements, nested calls to exec() or eval(), etc.).
Options I've considered
Since the exec() statements are running in threads, use a poison pill to kill the thread. Unfortunately, I've read that poison pills do not work for all cases.
Running the exec() statements inside processes, then using process.terminate() to kill everything. But I'm running on Windows and I've read that process creation can be expensive. It also complicates communication with the code that's managing all of this.
Allowing only pre-written functions inside the exec() statements and having those functions periodically check for an exit flag then perform clean-up as necessary. This is complicated, time-consuming, and there are too many corner-cases to consider; I am looking for a simpler solution.
I know this is a bit of an oddball question that deserves a "Why would you ever want to allow arbitrary code to run in an exec() statement?" type of response. I'm trying my hand at a bit of self-evolving code. This is my major stumbling block at the moment: if you allow your code to do almost anything, then it can potentially hang forever. How do you regain control and stop it when it does?
This isn't a very detailed answer, but its more than I wanted to put into a comment.
You may want to consider something like this other question for creating functions with timeouts, using multiprocessing as a start.
The problem with threads is that you probably can't use your poison pill approach, as they are not workers taking many small bits of tasks. They would be sitting there blocking on a statement. It would never get the value to exit.
You mentioned that your concern about using processes on Windows is that they are expensive. So what you might do is create your own kind of process pool (a list of processes). They are all pulling from a queue, and you submit new tasks to the queue. If any process exceeds the timeout, you kill it, and replace it in the pool with a new one. That way you limit the overhead of creating new processes only to when they are timing out, instead of creating a new one for every task.
There are a few different options here.
First, start with jdi's suggestion of using multiprocessing. It may be that Windows process creation isn't actually expensive enough to break your use case.
If it actually is a problem, what I'd personally do is use Virtual PC, or even User Mode Linux, to just run the same code in another OS, where process creation is cheap. You get a free sandbox out of that, as well.
If you don't want to do that, jdi's suggestion of processes pools is a bit more work, but should work well as long as you don't have to kill processes very often.
If you really do want everything to be threads, you can do so, as long as you can restrict the way the jobs are written. If the jobs can always be cleanly unwound, you can kill them just by raising an exception. Of course they also have to not catch the specific exception you choose to raise. Obviously neither of these conditions is realistic as a general-purpose solution, but for your use case, it may be fine. The key is to make sure your code evolver never inserts any manual resource-management statements (like opening and closing a file); only with statements. (Alternatively, insert the open and close, but inside a try/finally.) And that's probably a good idea even if you're not doing things this way, because spinning off hundreds of processes that, e.g., each leak as many file handles as they can until they either time out or hit the file limit would slow your machine to a crawl.
If you can restrict the code generator/evolver even further, you could use some form of cooperative threading (e.g., greenlets), which makes things even nicer.
Finally, you could switch from CPython to a different Python implementation that can run multiple interpreter instances in a single process. I don't know whether jython or IronPython can do so. PyPy can do that, and also has a restricted-environment sandbox, but unfortunately I think both of those—and Python 3.x support—are not-ready-for-prime-time features, which means you either have to get a special build of PyPy (probably without the JIT optimizer), or build it yourself. This might be the best long-term solution, but it's probably not what you want today.
I need to dynamically load code (comes as source), run it and get the results. The code that I load always includes a run method, which returns the needed results. Everything looks ridiculously easy, as usual in Python, since I can do
exec(source) #source includes run() definition
result = run(params)
#do stuff with result
The only problem is, the run() method in the dynamically generated code can potentially not terminate, so I need to only run it for up to x seconds. I could spawn a new thread for this, and specify a time for .join() method, but then I cannot easily get the result out of it (or can I). Performance is also an issue to consider, since all of this is happening in a long while loop
Any suggestions on how to proceed?
Edit: to clear things up per dcrosta's request: the loaded code is not untrusted, but generated automatically on the machine. The purpose for this is genetic programming.
The only "really good" solutions -- imposing essentially no overhead -- are going to be based on SIGALRM, either directly or through a nice abstraction layer; but as already remarked Windows does not support this. Threads are no use, not because it's hard to get results out (that would be trivial, with a Queue!), but because forcibly terminating a runaway thread in a nice cross-platform way is unfeasible.
This leaves high-overhead multiprocessing as the only viable cross-platform solution. You'll want a process pool to reduce process-spawning overhead (since presumably the need to kill a runaway function is only occasional, most of the time you'll be able to reuse an existing process by sending it new functions to execute). Again, Queue (the multiprocessing kind) makes getting results back easy (albeit with a modicum more caution than for the threading case, since in the multiprocessing case deadlocks are possible).
If you don't need to strictly serialize the executions of your functions, but rather can arrange your architecture to try two or more of them in parallel, AND are running on a multi-core machine (or multiple machines on a fast LAN), then suddenly multiprocessing becomes a high-performance solution, easily paying back for the spawning and IPC overhead and more, exactly because you can exploit as many processors (or nodes in a cluster) as you can use.
You could use the multiprocessing library to run the code in a separate process, and call .join() on the process to wait for it to finish, with the timeout parameter set to whatever you want. The library provides several ways of getting data back from another process - using a Value object (seen in the Shared Memory example on that page) is probably sufficient. You can use the terminate() call on the process if you really need to, though it's not recommended.
You could also use Stackless Python, as it allows for cooperative scheduling of microthreads. Here you can specify a maximum number of instructions to execute before returning. Setting up the routines and getting the return value out is a little more tricky though.
I could spawn a new thread for this, and specify a time for .join() method, but then I cannot easily get the result out of it
If the timeout expires, that means the method didn't finish, so there's no result to get. If you have incremental results, you can store them somewhere and read them out however you like (keeping threadsafety in mind).
Using SIGALRM-based systems is dicey, because it can deliver async signals at any time, even during an except or finally handler where you're not expecting one. (Other languages deal with this better, unfortunately.) For example:
try:
# code
finally:
cleanup1()
cleanup2()
cleanup3()
A signal passed up via SIGALRM might happen during cleanup2(), which would cause cleanup3() to never be executed. Python simply does not have a way to terminate a running thread in a way that's both uncooperative and safe.
You should just have the code check the timeout on its own.
import threading
from datetime import datetime, timedelta
local = threading.local()
class ExecutionTimeout(Exception): pass
def start(max_duration = timedelta(seconds=1)):
local.start_time = datetime.now()
local.max_duration = max_duration
def check():
if datetime.now() - local.start_time > local.max_duration:
raise ExecutionTimeout()
def do_work():
start()
while True:
check()
# do stuff here
return 10
try:
print do_work()
except ExecutionTimeout:
print "Timed out"
(Of course, this belongs in a module, so the code would actually look like "timeout.start()"; "timeout.check()".)
If you're generating code dynamically, then generate a timeout.check() call at the start of each loop.
Consider using the stopit package that could be useful in some cases you need timeout control. Its doc emphasizes the limitations.
https://pypi.python.org/pypi/stopit
a quick google for "python timeout" reveals a TimeoutFunction class
Executing untrusted code is dangerous, and should usually be avoided unless it's impossible to do so. I think you're right to be worried about the time of the run() method, but the run() method could do other things as well: delete all your files, open sockets and make network connections, begin cracking your password and email the result back to an attacker, etc.
Perhaps if you can give some more detail on what the dynamically loaded code does, the SO community can help suggest alternatives.