I'm wanting to take photos from 2 different cameras at exactly the same time (or as close as possible).
If I use multithreading or multiprocessing, it still runs the threads/processes consecutively.. For instance if I start the following processes:
Take_photo_1.start()
Take_photo_2.start()
While those processes would run in parallel, the commands to start the processes are still executed sequentially. Is there any way to execute both those processes at exactly the same time?
There's no way to make this exact even if you're writing directly in machine code. Even if you have all the threads wait on a kernel barrier, that wait can take different times on different cores, and there are opcodes to process between the barrier wait and the camera get that have to get fetched and run on a system where the caches may be in different states, and there's nothing stopping the OS from stealing the CPU from one of the threads to run some completely unrelated code, and the I/O to the camera (even if it isn't serialized, which it may be) probably isn't a guaranteed static time, and so on.
When you throw an interpreted language on top of it (especially one with a GIL, like Python, which means the bytecodes between the barrier wait and the camera get can't be run in parallel)… well, you're not really changing anything; "impossible * 7" is still "impossible". But you are making it even more obvious.
Fortunately, very few real-life problems have a true hard real-time requirement like that. Instead, you have a requirement like "99.9% of the time, all camera gets should happen within +/-4ms of the desired exact 30fps". Or, maybe, "90% of the time it's within +/-1ms, 99.9% of the time it's within +/-4ms, 99.999% of the time it's within +/-20ms, as long as you don't do anything stupid like change the wall-power state of the laptop while running the code".
Or… well, only you know why you wanted "exact", and can figure out what the actual requirements are that would satisfy you.
And for that case, often the simplest thing to do is write the code the obvious way, stress test the hell out of it, see if it meets your requirements, and figure out how to optimize things only if it doesn't.
So, your existing code may well be fine.
If not, adding a shared barrier = threading.Barrier() and doing a barrier.wait() right before the camera.get() may be all you need.
You may need to add logic to detect timer lag and re-synchronize (which you might do independently in each thread, or have whichever thread gets there first compute it and just make everyone else wait at the barrier).
You may need to rewrite the core loop in C. Or dump whichever OS you're using for one with better real-time guarantees like QNX. Or throw out the OS entirely so there's no scheduler to get in the way. Or throw out the complex superscalar CPUs and implement the whole thing as a hardware state machine. Or…
But, assuming you have reasonable requirements in the first place, you usually don't have to go very far.
Related
In Python, while I was testing a bruteforce script I saw that not printing something like Trying Password: *password* with every attempt significantly decreases the time it takes in order to find the password. I just let it run on a blank screen but if I put something as simple as a loading animation (Running . . .)in the beginning to let me know it's working fine, would that slow down my program too?
(Excuse me if any of what I said was hard to understand. I'm confused as well)
When attempting a bruteforce, it's best to have as much processing power available. A constant call from Python to update the screen (with a loading status, in this case) takes up some processing power and would indeed slow down the bruteforce.
By how much it slows down depends on how your script is written and the hardware it's running on. Better hardware - faster. Better threading for the script - faster. You might be able to avoid a noticeable impact if you offload the "animation" to a thread which isn't fully utilized (if your script leaves any such threads in the first place).
Though unless you are on a very slow PC, the main slow down probably doesn't come from the CPU, but from the data bus. Sending information between components at a very rapid pace could cause a bottleneck. So if your script waits for that bottleneck to pass before it continues cycling passwords - it gets slowed down. Try to separate the "loading" status from the rest of the logic, so that the CPU can keep cycling passwords without waiting for each screen refresh to pass.
I hope this helped.
I/O bound operations like printing are very slow compared to CPU bound ops like calculations.
So, everytime you printed, trying password, your program could have tried 1000 more combinations.
But if you want to print once in the beginning, it wont slow down, printing repetitively will.
I'm not a Python programmer and i rarely work with linux but im forced to use it for a project. The project is fairly straightforward, one task constantly gathers information as a single often updating numpy float32 value in a class, the other task that is also running constantly needs to occasionally grab the data in that variable asynchronously but not in very time critical way, and entirely on linux. My default for such tasks is the creation of a thread but after doing some research it appears as if python threading might not be the best solution for this from what i'm reading.
So my question is this, do I use multithreading, multiprocessing, concurrent.futures, asyncio, or (just thinking out loud here) some stdin triggering / stdout reading trickery or something similar to DDS on linux that I don't know about, on 2 seperate running scripts?
Just to append, both tasks do a lot of IO, task 1 does a lot of USB IO, the other task does a bit serial and file IO. I'm not sure if this is useful. I also think the importance of resetting the data once pulled and having as little downtime in task 1 as possible should be stated. Having 2 programs talk via a file probably won't satisfy this.
Any help would be appreciated, this has proven to be a difficult use case to google for.
Threading will probably work fine, a lot of the problems with the BKL (big kernel lock) are overhyped. You just need to make sure both threads provide active opportunities for the scheduler to switch contexts on a regular basis, typically by calling sleep(0). Most threads run in a loop and if the loop body is fairly short then calling sleep(0) at the top or bottom of it on every iteration is usually enough. If the loop body is long you might want to put a few more in along the way. It’s just a hint to the scheduler that this would be a good time to switch if other threads want to run.
Have you considered a double ended queue? You write to one end and grab data from the other end. Then with your multi-threading you could write with one thread and read with the other:
https://docs.python.org/3/library/collections.html#collections.deque
Quoting the documentation: "Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction."
Does the presence of python GIL imply that in python multi threading the same operation is not so different from repeating it in a single thread?.
For example, If I need to upload two files, what is the advantage of doing them in two threads instead of uploading them one after another?.
I tried a big math operation in both ways. But they seem to take almost equal time to complete.
This seems to be unclear to me. Can someone help me on this?.
Thanks.
Python's threads get a slightly worse rap than they deserve. There are three (well, 2.5) cases where they actually get you benefits:
If non-Python code (e.g. a C library, the kernel, etc.) is running, other Python threads can continue executing. It's only pure Python code that can't run in two threads at once. So if you're doing disk or network I/O, threads can indeed buy you something, as most of the time is spent outside of Python itself.
The GIL is not actually part of Python, it's an implementation detail of CPython (the "reference" implementation that the core Python devs work on, and that you usually get if you just run "python" on your Linux box or something.
Jython, IronPython, and any other reimplementations of Python generally do not have a GIL, and multiple pure-Python threads can execute simultaneously.
The 0.5 case: Even if you're entirely pure-Python and see little or no performance benefit from threading, some problems are really convenient in terms of developer time and difficulty to solve with threads. This depends in part on the developer, too, of course.
It really depends on the library you're using. The GIL is meant to prevent Python objects and its internal data structures to be changed at the same time. If you're doing an upload, the library you use to do the actual upload might release the GIL while it's waiting for the actual HTTP request to complete (I would assume that is the case with the HTTP modules in the standard library, but I didn't check).
As a side note, if you really want to have things running in parallel, just use multiple processes. It will save you a lot of trouble and you'll end up with better code (more robust, more scalable, and most probably better structured).
It depends on the native code module that's executing. Native modules can release the GIL and then go off and do their own thing allowing another thread to lock the GIL. The GIL is normally held while code, both python and native, are operating on python objects. If you want more detail you'll probably need to go and read quite a bit about it. :)
See:
What is a global interpreter lock (GIL)? and Thread State and the Global Interpreter Lock
Multithreading is a concept where two are more tasks need be completed simultaneously, for example, I have word processor in this application there are N numbers of a parallel task have to work. Like listening to keyboard, formatting input text, sending a formatted text to display unit. In this context with sequential processing, it is time-consuming and one task has to wait till the next task completion. So we put these tasks in threads and simultaneously complete the task. Three threads are always up and waiting for the inputs to arrive, then take that input and produce the output simultaneously.
So multi-threading works faster if we have multi-core and processors. But in reality with single processors, threads will work one after the other, but we feel it's executing with greater speed, Actually, one instruction executes at a time and a processor can execute billions of instructions at a time. So the computer creates illusion that multi-task or thread working parallel. It just an illusion.
I'm working on simulating a mesh network with a large number of nodes. The nodes pass data between different master nodes throughout the network.
Each master comes live once a second to receive the information, but the slave nodes don't know when the master is up or not, so when they have information to send, they try and do so every 5 ms for 1 second to make sure they can find the master.
Running this on a regular computer with 1600 nodes results in 1600 threads and the performance is extremely bad.
What is a good approach to handling the threading so each node acts as if it is running on its own thread?
In case it matters, I'm building the simulation in python 2.7, but I'm open to changing to something else if that makes sense.
For one, are you really using regular, default Python threads available in the default Python 2.7 interpreter (CPython), and is all of your code in Python? If so, you are probably not actually using multiple CPU cores because of the global interpreter lock CPython has (see https://wiki.python.org/moin/GlobalInterpreterLock). You could maybe try running your code under Jython, just to check if performance will be better.
You should probably rethink your application architecture and switch to manually scheduling events instead of using threads, or maybe try using something like greenlets (https://stackoverflow.com/a/15596277/1488821), but that would probably mean less precise timings because of lack of parallelism.
To me, 1600 threads sounds like a lot but not excessive given that it's a simulation. If this were a production application it would probably not be production-worthy.
A standard machine should have no trouble handling 1600 threads. As to the OS this article could provide you with some insights.
When it comes to your code a Python script is not a native application but an interpreted script and as such will require more CPU resources to execute.
I suggest you try implementing the simulation in C or C++ instead which will produce a native application which should execute more efficiently.
Do not use threading for that. If sticking to Python, let the nodes perform their actions one by one. If the performance you get doing so is OK, you will not have to use C/C++. If the actions each node perform are simple, that may work. Anyway, there is no reason to use threads in Python at all. Python threads are usable mostly for making blocking I/O not to block your program, not for multiple CPU kernels utilization.
If you want to really use parallel processing and to write your nodes as if they were really separated and exchanging only using messages, you may use Erlang (http://www.erlang.org/). It is a functional language very well suited for executing parallel processes and making them exchange messages. Erlang processes do not map to OS threads, and you may create thousands of them. However, Erlang is a purely functional language and may seem extremely strange if you have never used such languages. And it also is not very fast, so, like Python, it is unlikely to handle 1600 actions every 5ms unless the actions are rather simple.
Finally, if you do not get desired performance using Python or Erlang, you may move to C or C++. However, still do not use 1600 threads. In fact, using threads to gain performance is reasonable only if the number of threads does not dramatically exceed number of CPU kernels. A reactor pattern (with several reactor threads) is what you may need in that case (http://en.wikipedia.org/wiki/Reactor_pattern). There is an excellent implementation of the reactor pattern in boost.asio library. It is explained here: http://www.gamedev.net/blog/950/entry-2249317-a-guide-to-getting-started-with-boostasio/
Some random thoughts here:
I did rather well with several hundred threads working like this in Java; it can be done with the right language. (But I haven't tried this in Python.)
In any language, you could run the master node code in one thread; just have it loop continuously, running the code for each master in each cycle. You'll lose the benefits of multiple cores that way, though. On the other hand, you'll lose the problems of multithreading, too. (You could have, say, 4 such threads, utilizing the cores but getting the multithreading headaches back. It'll keep the thread-overhead down, too, but then there's blocking...)
One big problem I had was threads blocking each other. Enabling 100 threads to call the same method on the same object at the same time without waiting for each other requires a bit of thought and even research. I found my multithreading program at first often used only 25% of a 4-core CPU even when running flat out. This might be one reason you're running slow.
Don't have your slave nodes repeat sending data. The master nodes should come alive in response to data coming in, or have some way of storing it until they do come alive, or some combination.
It does pay to have more threads than cores. Once you have two threads, they can block each other (and will if they share any data). If you have code to run that won't block, you want to run it in its own thread so it won't be waiting for code that does block to unblock and finish. I found once I had a few threads, they started to multiply like crazy--hence my hundreds-of-threads program. Even when 100 threads block at one spot despite all my brilliance, there's plenty of other threads to keep the cores busy!
My script accepts arbitrary-length and -content strings of Python code, then runs them inside exec() statements. If the time to run the arbitrary code passes over some predetermined limit, then the exec() statement needs to exit and a boolean flag needs to be set to indicate that a premature exit has occurred.
How can this be accomplished?
Additional information
These pieces of code will be running in parallel in numerous threads (or at least as parallel as you can get with the GIL).
If there is an alternative method in another language, I am willing to try it out.
I plan on cleaning the code to prevent access to anything that might accidentally damage my system (file and system access, import statements, nested calls to exec() or eval(), etc.).
Options I've considered
Since the exec() statements are running in threads, use a poison pill to kill the thread. Unfortunately, I've read that poison pills do not work for all cases.
Running the exec() statements inside processes, then using process.terminate() to kill everything. But I'm running on Windows and I've read that process creation can be expensive. It also complicates communication with the code that's managing all of this.
Allowing only pre-written functions inside the exec() statements and having those functions periodically check for an exit flag then perform clean-up as necessary. This is complicated, time-consuming, and there are too many corner-cases to consider; I am looking for a simpler solution.
I know this is a bit of an oddball question that deserves a "Why would you ever want to allow arbitrary code to run in an exec() statement?" type of response. I'm trying my hand at a bit of self-evolving code. This is my major stumbling block at the moment: if you allow your code to do almost anything, then it can potentially hang forever. How do you regain control and stop it when it does?
This isn't a very detailed answer, but its more than I wanted to put into a comment.
You may want to consider something like this other question for creating functions with timeouts, using multiprocessing as a start.
The problem with threads is that you probably can't use your poison pill approach, as they are not workers taking many small bits of tasks. They would be sitting there blocking on a statement. It would never get the value to exit.
You mentioned that your concern about using processes on Windows is that they are expensive. So what you might do is create your own kind of process pool (a list of processes). They are all pulling from a queue, and you submit new tasks to the queue. If any process exceeds the timeout, you kill it, and replace it in the pool with a new one. That way you limit the overhead of creating new processes only to when they are timing out, instead of creating a new one for every task.
There are a few different options here.
First, start with jdi's suggestion of using multiprocessing. It may be that Windows process creation isn't actually expensive enough to break your use case.
If it actually is a problem, what I'd personally do is use Virtual PC, or even User Mode Linux, to just run the same code in another OS, where process creation is cheap. You get a free sandbox out of that, as well.
If you don't want to do that, jdi's suggestion of processes pools is a bit more work, but should work well as long as you don't have to kill processes very often.
If you really do want everything to be threads, you can do so, as long as you can restrict the way the jobs are written. If the jobs can always be cleanly unwound, you can kill them just by raising an exception. Of course they also have to not catch the specific exception you choose to raise. Obviously neither of these conditions is realistic as a general-purpose solution, but for your use case, it may be fine. The key is to make sure your code evolver never inserts any manual resource-management statements (like opening and closing a file); only with statements. (Alternatively, insert the open and close, but inside a try/finally.) And that's probably a good idea even if you're not doing things this way, because spinning off hundreds of processes that, e.g., each leak as many file handles as they can until they either time out or hit the file limit would slow your machine to a crawl.
If you can restrict the code generator/evolver even further, you could use some form of cooperative threading (e.g., greenlets), which makes things even nicer.
Finally, you could switch from CPython to a different Python implementation that can run multiple interpreter instances in a single process. I don't know whether jython or IronPython can do so. PyPy can do that, and also has a restricted-environment sandbox, but unfortunately I think both of those—and Python 3.x support—are not-ready-for-prime-time features, which means you either have to get a special build of PyPy (probably without the JIT optimizer), or build it yourself. This might be the best long-term solution, but it's probably not what you want today.