I've got a program that combines multiple(up to 48 images) that are all color coded images. Only running through 32 of the 1024x768 images takes like 7.5 minutes to process through the images and give the output image. I'm doing each image separately using getdata(wxPython) and then going through each image pixel by pixel to get the underlying value and adding it to the final image list before creating/displaying the final image. The entire program works just like I want it to work with one SLOW exception. It takes 7.5 minutes to complete to run. If I want to do anything slightly different I have to start all over.
What I'm wandering if I were to go through the images using threading that should save time but I'm concerned about the possibility of running into one problem...calling the same exact code at the same time by different threads. Is that something to be worried about or not? As well is there something I'm not thinking about that I should be worried about?
Read about GIL - threading won't do the trick with your problem. It does it, when you can represent your computation as some kind of pipeline, or when it uses a lot of IO (because only then synchronization will be useful, if you want to run the same thing on several threads instead of single threaded loop, you'll end up with random order of computations, one thread at the time).
You probably want to map your data (maybe parts of picture, maybe whole pictures) to many processes, as they can work in parallel, without taking care of GIL, because they will be different interpreter processes. You can use Pool here.
Related
I noticed a lack of good soundfont-compatible synthesizers written in Python. So, a month or so ago, I started some work on my own (for reference, it's here). Making this was also a challenge that I set for myself.
I keep coming up against the same problem again and again and again, summarized by this:
To play sound, a stream of data with a more-or-less constant rate of flow must be sent to the audio device
To synthesize sound in real time based on user input, little-to-no buffering can be used
Thus, there is a cap on the amount of time one 'buffer generation loop' can take
Python, as a language, simply cannot run fast enough to do synthesize sound within this time limit
The problem is not my code, or at least, I've tried to optimize it to extreme levels - using local variables in time-sensitive parts of the code, avoiding using dots to access variables in loops, using itertools for iteration, using pre-compiled macros like max, changing thread switching parameters, doing as few calculations as possible, making approximations, this list goes on.
Using Pypy helps, but even that starts to struggle after not too long.
It's worth noting that (at best) my synth at the moment can play about 25 notes simultaneously. But this isn't enough. Fluidsynth, a synth written in C, has a cap on the number of notes per instrument at 128 notes. It also supports multiple instruments at a time.
Is my assertion that Python simply cannot be used to write a synthesizer correct? Or am I missing something very important?
I am totally new to Ray and have a question regarding it being a potential solution.
I am optimising an image modelling code and have successfully optimised it to run on a single machine, using multi-threaded numpy operations.
Each image generation is a serial operation, which scales across a single node.
What I’d like to do is scale each of these locally parallel jobs across multiple nodes.
Before refactoring, the code was parallelised serially at a high level, calculating single images in parallel. I would like to replicate this parallel behaviour again, across multiple nodes. Essentially this would be batch running a number of independent jobs which compute a single image in parallel across multiple nodes, where those computations themselves are independent of each other, the only communication requirement is sending parameters at the beginning (small) and image arrays at the end (large).
As mentioned the original parallel implementation used joblib to parallelise the serial image computation over cpus locally, with each image calculation on a separate cpu. Now, I want to replicate this, except with one image calculation process per node, which will them multi thread scale across that compute node.
So my idea is try the joblib backend for to control this process. This is the previous high level Joblib call for running multiple serial image computation in parallel.
enter image description here
I believe I can just encapsulate the above call with:
with joblib_backend(‘ray’):
The above loop is actually being called inside a method of a class, and the image computation uses the class self construct to pass around variables and arrays. Is there anything I have to do with actors to preserve this state?
Any thoughts or pointers would be greatly appreciated.
Right now I'm in need of a really fast screenshotter to feed screenshot into a CNN that will update mouse movement based on the screenshot. I'm looking to model the same kind of behavior presented in this paper, and similarly do the steps featured in Figure 6 (without the polar conversion). As a result of needing really fast input, I've searched around a bit and have been able to get this script from here slightly modified that outputs 10fps
from PIL import ImageGrab
from datetime import datetime
while True:
im = ImageGrab.grab([320, 180, 1600, 900])
dt = datetime.now()
fname = "pic_{}.{}.png".format(dt.strftime("%H%M_%S"), dt.microsecond // 100000)
im.save(fname, 'png')
Can I expect anything faster? I'd be fine with using a different program if it's available.
Writing to disk is very slow, and is probably a big part of what's making your loop take so long. Try commenting out the im.save() line and seeing how many screenshots can be captured (add a counter variable or something similar to count how many screenshots are being captured).
Assuming the disk I/O is the bottleneck, you'll want to split these two tasks up. Have one loop that just captures the screenshots and stores them in memory (e.g. to a dictionary with the timestamp as the key), then in a separate thread pull elements out of the dictionary and write them to disk.
See this question for pointers on threading in Python if you haven't done much of that before.
I'm a beginner with Python, but I'm at the final stages of a project I've been working on for the past year and I need help at the final step.
If needed I'll post my code though it's not really relevant.
Here is my problem:
I have a database of images, say for example a 100 images. On each one of those images, I run an algorithm called ICA. This algorithm is very heavy to compute and each picture individually usually takes 7-10 seconds, so 100 pictures can take 700-1000 seconds, and that's way too long to wait.
Thing is, my database of images never changes. I never add pictures or delete pictures, and so the output of the ICA algorithm is always the same. So in reality, every time I run my code, I wait forever and gain the same output every time.
Is there a way to save the data to the hard disk, and extract it at a later time?
Say, I compute the ICA of the 100 images, it takes forever, and I save it and close my computer. Now when I run the program, I don't want it to recompute ICA, I want it to use the values I stored previously.
Would such a thing be possible in Python? if so - how?
Since you're running computation-heavy algorithms, I'm going to assume you're using Numpy. If not, you should be.
Numpy has a numpy.save() function that lets you save arrays in binary format. You can then load them using numpy.load().
EDIT: Docs for aforementioned functions can be found here under the "NPZ Files" section.
I want to apply a 3x3 or larger image filter (gaussian or median) on a 2-d array.
Though there are several ways for doing that such as scipy.ndimage.gaussian_filter or applying a loop, I want to know if there is a way to apply a 3x3 or larger filter on each pixel of a mxn array simultaneously, because it would save a lot of time bypassing loops. Can functional programming be used for the purpose??
There is a module called scipy.ndimage.filters.convolve, please tell whether it is able to perform simultaneous operations.
You may want to learn about parallel processing in Python:
http://wiki.python.org/moin/ParallelProcessing
or the multiprocessing package in particular:
http://docs.python.org/library/multiprocessing.html
Check out using the Python Imaging Library (PIL) on multiprocessors.
Using multiprocessing with the PIL
and similar questions.
You could create four workers, divide your image in four, and assign each quadrant to a worker. You will likely lose time for the overhead however. If, on another hand, you have several images to process, then this approach may work (letting each worker opening its own image).
Even if python did provide functionality to apply an operation to an NxM array without looping over it, the operation would still not be executed simultaneously in the background since the amount of instructions a CPU can handle per cycle is limited and thus no time could be saved. For your use case this might even be counterproductive since the fields in your arrays proably have dependencies and if you don't know in what order they are accessed this will most likely end up in a mess.
Hugues provided some useful links about parallel processing in Python, but be careful when accessing the same data structure such as an array with multiple threads at the same time. If you don't synchronize the threads they might access the same part of the array at the same time and mess things up.
And be aware, the amount of threads that can effectively be run in parallel is limited by the number of processor cores.