How to iterate over split-up images efficiently with multiprocessing?

How to iterate over split-up images efficiently with multiprocessing? - python

I am studying information system engineering and need to suddenly use multi-threading/processing in my side job but we were never taught this at the university so I dont have any idea about it. I only read some introduction over the last couple of days. With that help I even got some simply test programs to perform better, but I wasnt able to do it for this task.
I have a picture with edges of objects as white dots (value 255 in an array).
Now I need to know the position in the array of these dots.
Because images are pretty large arrays and I need to optimize this for a raspberry pi I need to program this with multiprocessing in mind because in a test I only got exactly 25% CPU usage on the pi over 1 core and that was not enough for real time video processing.
def test_multi(self):
pool = mp.Pool(processes=4)
test_prep = self.test_prep.copy()
y, x = test_prep.shape
columns = []
for i in range(4):
c = test_prep[0:y, 0:int(x/4)*(i+1)]
c = copy(c)
columns.append(c)
results = [pool.apply_async(search, args=(c,)) for c in columns]
for r in results:
print(r.get())
def search(picture):
ys, xs = picture.shape
dots = []
for x in range(xs):
for y in range(ys):
if picture[y][x] == 255:
dots.append((x, y))
return dots
But in the result list I can see that it is not really done in parallel because there are 4 entries and each one is just the last one plus some new dots from the next image part. So I think the processes wait for the one before to finish and only than add to the same list but I would like 4 separate lists that I will join afterwards. Also it takes a lot longer than just iterating over the picture the old fashioned way. So what is the problem ?
I tried switching around with multithreading but some people were saying that GIL would not allow the distribution over separate cores.
I tried reading the multiprocessing doc and found a lot of possibilities like queues, managers, pools and pipes but I don't really now what the differences are and when to use what.
I want 4 lists with coordinates of dots in the 4 different columns of the image and I want the search function to be performed in parallel distributed over 4 cores. Or another process that allows a 1920*1080 image to be search faster than in half a second.
UPDATE:
As Mark Setchell suggested I am using cv2.findContour() now but as I said sometimes it detects one contour as multiple ones so I added a routine that combines the middle of the contour that are very similar like this:
def search(image):
ret, thresh = threshold(image, 0, 255, 0)
contours, hierarchy = findContours(thresh, RETR_TREE, CHAIN_APPROX_SIMPLE)
objects = []
for c in contours:
objects.append(middle(c))
any_neighbors = False
while not any_neighbors:
objects, any_neighbors = combine(objects)
return objects
def combine(objects):
ret = objects
any_neighbors = False
for o1 in objects:
for o2 in objects:
if is_neighbor(o1, o2):
ret.remove(o1)
any_neighbors = True
return ret, any_neighbors
def middle(contour):
xs = 0
ys = 0
size = len(contour)
for c in contour:
xs += c[0][0]
ys += c[0][1]
xs = int(xs/size)
ys = int(ys/size)
return xs, ys
def is_neighbor(p1, p2):
return p1[0] - th <= p2[0] <= p1[0] + th and p1[1] - th <= p2[1] <= p1[1] + th
This might still not be perfect but im done for today. I'll be back tomorrow and check if the performance improvement is enough or if I still need to

The process has become as fast as other parts of the program. Thats why multiprocessing wont be necessary anymore.

Related

How can I match an audio clip inside an audio clip with Python? [duplicate]

I have a load of 3 hour MP3 files, and every ~15 minutes a distinct 1 second sound effect is played, which signals the beginning of a new chapter.
Is it possible to identify each time this sound effect is played, so I can note the time offsets?
The sound effect is similar every time, but because it's been encoded in a lossy file format, there will be a small amount of variation.
The time offsets will be stored in the ID3 Chapter Frame MetaData.
Example Source, where the sound effect plays twice.
ffmpeg -ss 0.9 -i source.mp3 -t 0.95 sample1.mp3 -acodec copy -y
Sample 1 (Spectrogram)
ffmpeg -ss 4.5 -i source.mp3 -t 0.95 sample2.mp3 -acodec copy -y
Sample 2 (Spectrogram)
I'm very new to audio processing, but my initial thought was to extract a sample of the 1 second sound effect, then use librosa in python to extract a floating point time series for both files, round the floating point numbers, and try to get a match.
import numpy
import librosa
print("Load files")
source_series, source_rate = librosa.load('source.mp3') # 3 hour file
sample_series, sample_rate = librosa.load('sample.mp3') # 1 second file
print("Round series")
source_series = numpy.around(source_series, decimals=5);
sample_series = numpy.around(sample_series, decimals=5);
print("Process series")
source_start = 0
sample_matching = 0
sample_length = len(sample_series)
for source_id, source_sample in enumerate(source_series):
if source_sample == sample_series[sample_matching]:
sample_matching += 1
if sample_matching >= sample_length:
print(float(source_start) / source_rate)
sample_matching = 0
elif sample_matching == 1:
source_start = source_id;
else:
sample_matching = 0
This does not work with the MP3 files above, but did with an MP4 version - where it was able to find the sample I extracted, but it was only that one sample (not all 12).
I should also note this script takes just over 1 minute to process the 3 hour file (which includes 237,426,624 samples). So I can imagine that some kind of averaging on every loop would cause this to take considerably longer.

Trying to directly match waveforms samples in the time domain is not a good idea. The mp3 signal will preserve the perceptual properties but it is quite likely the phases of the frequency components will be shifted so the sample values will not match.
You could try trying to match the volume envelopes of your effect and your sample.
This is less likely to be affected by the mp3 process.
First, normalise your sample so the embedded effects are the same level as your reference effect. Constructing new waveforms from the effect and the sample by using the average of the peak values over time frames that are just short enough to capture the relevant features. Better still use overlapping frames. Then use cross-correlation in the time domain.
If this does not work then you could analyze each frame using an FFT this gives you a feature vector for each frame. You then try to find matches of the sequence of features in your effect with the sample. Similar to https://stackoverflow.com/users/1967571/jonnor suggestion. MFCC is used in speech recognition but since you are not detecting speech FFT is probably OK.
I am assuming the effect playing by itself (no background noise) and it is added to the recording electronically (as opposed to being recorded via a microphone). If this is not the case the problem becomes more difficult.

This is an Audio Event Detection problem. If the sound is always the same and there are no other sounds at the same time, it can probably be solved with a Template Matching approach. At least if there is no other sounds with other meanings that sound similar.
The simplest kind of template matching is to compute the cross-correlation between your input signal and the template.
Cut out an example of the sound to detect (using Audacity). Take as much as possible, but avoid the start and end. Store this as .wav file
Load the .wav template using librosa.load()
Chop up the input file into a series of overlapping frames. Length should be same as your template. Can be done with librosa.util.frame
Iterate over the frames, and compute cross-correlation between frame and template using numpy.correlate.
High values of cross-correlation indicate a good match. A threshold can be applied in order to decide what is an event or not. And the frame number can be used to calculate the time of the event.
You should probably prepare some shorter test files which have both some examples of the sound to detect as well as other typical sounds.
If the volume of the recordings is inconsistent you'll want to normalize that before running detection.
If cross-correlation in the time-domain does not work, you can compute the melspectrogram or MFCC features and cross-correlate that. If this does not yield OK results either, a machine learning model can be trained using supervised learning, but this requires labeling a bunch of data as event/not-event.

To follow up on the answers by #jonnor and #paul-john-leonard, they are both correct, by using frames (FFT) I was able to do Audio Event Detection.
I've written up the full source code at:
https://github.com/craigfrancis/audio-detect
Some notes though:
To create the templates, I used ffmpeg:
ffmpeg -ss 13.15 -i source.mp4 -t 0.8 -acodec copy -y templates/01.mp4;
I decided to use librosa.core.stft, but I needed to make my own implementation of this stft function for the 3 hour file I'm analysing, as it's far too big to keep in memory.
When using stft I tried using a hop_length of 64 at first, rather than the default (512), as I assumed that would give me more data to work with... the theory might be true, but 64 was far too detailed, and caused it to fail most of the time.
I still have no idea how to get cross-correlation between frame and template to work (via numpy.correlate)... instead I took the results per frame (the 1025 buckets, not 1024, which I believe relate to the Hz frequencies found) and did a very simple average difference check, then ensured that average was above a certain value (my test case worked at 0.15, the main files I'm using this on required 0.55 - presumably because the main files had been compressed quite a bit more):
hz_score = abs(source[0:1025,x] - template[2][0:1025,y])
hz_score = sum(hz_score)/float(len(hz_score))
When checking these scores, it's really useful to show them on a graph. I often used something like the following:
import matplotlib.pyplot as plt
plt.figure(figsize=(30, 5))
plt.axhline(y=hz_match_required_start, color='y')
while x < source_length:
debug.append(hz_score)
if x == mark_frame:
plt.axvline(x=len(debug), ymin=0.1, ymax=1, color='r')
plt.plot(debug)
plt.show()
When you create the template, you need to trim off any leading silence (to avoid bad matching), and an extra ~5 frames (it seems that the compression / re-encoding process alters this)... likewise, remove the last 2 frames (I think the frames include a bit of data from their surroundings, where the last one in particular can be a bit off).
When you start finding a match, you might find it's ok for the first few frames, then it fails... you will probably need to try again a frame or two later. I found it easier having a process that supported multiple templates (slight variations on the sound), and would check their first testable (e.g. 6th) frame and if that matched, put them in a list of potential matches. Then, as it progressed on to the next frames of the source, it could compare it to the next frames of the template, until all frames in the template had been matched (or failed).

This might not be an answer, it's just where I got to before I start researching the answers by #jonnor and #paul-john-leonard.
I was looking at the Spectrograms you can get by using librosa stft and amplitude_to_db, and thinking that if I take the data that goes in to the graphs, with a bit of rounding, I could potentially find the 1 sound effect being played:
https://librosa.github.io/librosa/generated/librosa.display.specshow.html
The code I've written below kind of works; although it:
Does return quite a few false positives, which might be fixed by tweaking the parameters of what is considered a match.
I would need to replace the librosa functions with something that can parse, round, and do the match checks in one pass; as a 3 hour audio file causes python to run out of memory on a computer with 16GB of RAM after ~30 minutes before it even got to the rounding bit.
import sys
import numpy
import librosa
#--------------------------------------------------
if len(sys.argv) == 3:
source_path = sys.argv[1]
sample_path = sys.argv[2]
else:
print('Missing source and sample files as arguments');
sys.exit()
#--------------------------------------------------
print('Load files')
source_series, source_rate = librosa.load(source_path) # The 3 hour file
sample_series, sample_rate = librosa.load(sample_path) # The 1 second file
source_time_total = float(len(source_series) / source_rate);
#--------------------------------------------------
print('Parse Data')
source_data_raw = librosa.amplitude_to_db(abs(librosa.stft(source_series, hop_length=64)))
sample_data_raw = librosa.amplitude_to_db(abs(librosa.stft(sample_series, hop_length=64)))
sample_height = sample_data_raw.shape[0]
#--------------------------------------------------
print('Round Data') # Also switches X and Y indexes, so X becomes time.
def round_data(raw, height):
length = raw.shape[1]
data = [];
range_length = range(1, (length - 1))
range_height = range(1, (height - 1))
for x in range_length:
x_data = []
for y in range_height:
# neighbours = []
# for a in [(x - 1), x, (x + 1)]:
# for b in [(y - 1), y, (y + 1)]:
# neighbours.append(raw[b][a])
#
# neighbours = (sum(neighbours) / len(neighbours));
#
# x_data.append(round(((raw[y][x] + raw[y][x] + neighbours) / 3), 2))
x_data.append(round(raw[y][x], 2))
data.append(x_data)
return data
source_data = round_data(source_data_raw, sample_height)
sample_data = round_data(sample_data_raw, sample_height)
#--------------------------------------------------
sample_data = sample_data[50:268] # Temp: Crop the sample_data (318 to 218)
#--------------------------------------------------
source_length = len(source_data)
sample_length = len(sample_data)
sample_height -= 2;
source_timing = float(source_time_total / source_length);
#--------------------------------------------------
print('Process series')
hz_diff_match = 18 # For every comparison, how much of a difference is still considered a match - With the Source, using Sample 2, the maximum diff was 66.06, with an average of ~9.9
hz_match_required_switch = 30 # After matching "start" for X, drop to the lower "end" requirement
hz_match_required_start = 850 # Out of a maximum match value of 1023
hz_match_required_end = 650
hz_match_required = hz_match_required_start
source_start = 0
sample_matched = 0
x = 0;
while x < source_length:
hz_matched = 0
for y in range(0, sample_height):
diff = source_data[x][y] - sample_data[sample_matched][y];
if diff < 0:
diff = 0 - diff
if diff < hz_diff_match:
hz_matched += 1
# print(' {} Matches - {} # {}'.format(sample_matched, hz_matched, (x * source_timing)))
if hz_matched >= hz_match_required:
sample_matched += 1
if sample_matched >= sample_length:
print(' Found # {}'.format(source_start * source_timing))
sample_matched = 0 # Prep for next match
hz_match_required = hz_match_required_start
elif sample_matched == 1: # First match, record where we started
source_start = x;
if sample_matched > hz_match_required_switch:
hz_match_required = hz_match_required_end # Go to a weaker match requirement
elif sample_matched > 0:
# print(' Reset {} / {} # {}'.format(sample_matched, hz_matched, (source_start * source_timing)))
x = source_start # Matched something, so try again with x+1
sample_matched = 0 # Prep for next match
hz_match_required = hz_match_required_start
x += 1
#--------------------------------------------------

Vectorizing a monte carlo simulation in python

I've recently been working on some code in python to simulate a 2 dimensional U(1) gauge theory using monte carlo methods. Essentially I have an n by n by 2 array (call it Link) of unitary complex numbers (their magnitude is one). I randomly select element of my Link array and propose a random change to the number at that site. I then compute the resulting change in the action that would occur due to that change. I then accept the change with a probability equal to min(1,exp(-dS)), where dS is the change in the action. The code for the iterator is as follows
def iteration(j1,B0):
global Link
Staple = np.zeros((2),dtype=complex)
for i0 in range(0,j1):
x1 = np.random.randint(0,n)
y1 = np.random.randint(0,n)
u1 = np.random.randint(0,1)
Linkrxp1 = np.roll(Link,-1, axis = 0)
Linkrxn1 = np.roll(Link, 1, axis = 0)
Linkrtp1 = np.roll(Link, -1, axis = 1)
Linkrtn1 = np.roll(Link, 1, axis = 1)
Linkrxp1tn1 = np.roll(np.roll(Link, -1, axis = 0),1, axis = 1)
Linkrxn1tp1 = np.roll(np.roll(Link, 1, axis = 0),-1, axis = 1)
Staple[0] = Linkrxp1[x1,y1,1]*Linkrtp1[x1,y1,0].conj()*Link[x1,y1,1].conj() + Linkrxp1tn1[x1,y1,1].conj()*Linkrtn1[x1,y1,0].conj()*Linkrtn1[x1,y1,1]
Staple[1] = Linkrtp1[x1,y1,0]*Linkrxp1[x1,y1,1].conj()*Link[x1,y1,0].conj() + Linkrxn1tp1[x1,y1,0].conj()*Linkrxn1[x1,y1,1].conj()*Linkrxn1[x1,y1,0]
uni = unitary()
Linkprop = uni*Link[x1,y1,u1]
dE3 = (Linkprop - Link[x1,y1,u1])*Staple[u1]
dE1 = B0*np.real(dE3)
d1 = np.random.binomial(1, np.minimum(np.exp(dE1),1))
d = np.random.uniform(low=0,high=1)
if d1 >= d:
Link[x1,y1,u1] = Linkprop
else:
Link[x1,y1,u1] = Link[x1,y1,u1]
At the beginning of program I call a routine called "randomize" to generate K random unitary complex numbers which have small imaginary parts and store them in an array called Cnum of length K. In the same routine I also go through my Link array and set each element to a random unitary complex number. The code is listed below.
def randommatrix():
global Cnum
global Link
for i1 in range(0,K):
C1 = np.random.normal(0,1)
Cnum[i1] = np.cos(C1) + 1j*np.sin(C1)
Cnum[i1+K] = np.cos(C1) - 1j*np.sin(C1)
for i3,i4 in itertools.product(range(0,n),range(0,n)):
C2 = np.random.uniform(low=0, high = 2*np.pi)
Link[i3,i4,0] = np.cos(C2) + 1j*np.sin(C2)
C2 = np.random.uniform(low=0, high = 2*np.pi)
Link[i3,i4,1] = np.cos(C2) + 1j*np.sin(C2)
The following routine is used during the iteration routine to get a random complex number with a small imaginary part (by retrieving a random element of the Cnum array we generated earlier).
def unitary():
I1 = np.random.randint((0),(2*K-1))
mat = Cnum[I1]
return mat
Here is an example of what the iteration routine would be used for. I've written a routine called plaquette, which calculates the mean plaquette (real part of a 1 by 1 closed loop of link variables) for a given B0. The iteration routine is being used to generate new field configurations which are independent of previous configurations. After we get a new field configuration we calculate the plaquette for said configuration. We then repeat this process j1 times using a while loop, and at the end we end up with the mean plaquette.
def Plq(j1,B0):
i5 = 0
Lboot = np.zeros(j1)
while i5<j1:
iteration(25000,B0)
Linkrxp1 = np.roll(Link,-1, axis = 0)
Linkrtp1 = np.roll(Link, -1, axis = 1)
c0 = np.real(Link[:,:,0]*Linkrxp1[:,:,1]*Linkrtp1[:,:,0].conj()*Link[:,:,1].conj())
i5 = i5 + 1
We need to define some variables before we run anything, so here's the initial variables which I define before defining any routines
K = 20000
n = 50
a = 1.0
Link = np.zeros((n,n,2),dtype = complex)
Cnum = np.zeros((2*K), dtype = complex)
This code works, but it is painfully slow. Is there a way that I can use multiprocessing or something to speed this up?

You should use cython and c data types. Another cython link. It's built for fast computation.
You could use multiprocessing, potentially, in one of two cases.
If you have one object that multiple process would need to share you would need to use Manager (see multiprocessing link), Lock, and Array to share the object between processes. However, there is no guarantee this will result in an increased speed since each process needs to lock the link to guarantee your prediction, assuming the predictions are affected by all elements in the link (if a process modifies an element at the same time another process is making a prediction for an element, the prediction wouldn't be based on the most current information).
If your predictions do not take into account the state of the other elements, i.e. it only cares about the one element, then you could break your Link array into segments and divvy chunks out to several processes in a process pool, and when done combine the segments back to one array. This would certainly save time, and you wouldn't have to use any additional multiprocessing mechanisms.

How to get return value from thread in Python?

I do some computationally expensive tasks in python and found the thread module for parallelization. I have a function which does the computation and returns a ndarray as result. Now I want to know how I can parallize my function and get back the calculated Arrays from each thread.
The followed example is strongly simplified with light functions and calculations.
import numpy as np
def calculate_result(input):
a=np.linspace(1.0, 1000.0, num=10000) # just an example
result = input*a
return(result)
input =[1,2,3,4]
for i in range(0,len(input(i))):
t.Thread(target=calculate_result, args=(input))
t. start()
#Here I want to receive the return value from the thread
I am looking for a way to get the return value from the thread / function for each thread, because in my task each thread calculates different values.
I found an other Question (how to get the return value from a thread in python?) where someone is looking for a similar problem (no ndarrays) and which is handled with ThreadPool and async...
-------------------------------------------------------------------------------
Thanks for your answers !
Due to your help now I am looking for a way to solve my problem with the multiprocessing modul. To give you a better understanding what I do, see my following Explanation.
Explanation:
My 'input_data' is an ndarray with 282240 elements of type uint32
In the 'calculation_function()'I use a for loop to calculate from
every 12 bit a result and put it into the 'output_data'
Because this is very slow, I split my input_data into e.g. 4 or 8
parts and calculate each part in the calculation_function().
Now I am looking for a way, how to parallize the 4 or 8 function
calls
The order of the data is elementary, because the data is in image and
each pixel have to be at the correct Position. So function call no. 1
calculates the first and the last function call the last pixel of the
image.
The calculations work fine and the image can be completly rebuilt
from my algo but I need the parallelization to speed up for time
critical aspects.
Summary:
One input ndarray is devided into 4 or 8 parts. In each part are 70560 or 35280 uint32 values. From each 12 bit I calculate one Pixel with 4 or 8 function calls. Each function returns one ndarray with 188160 or 94080 pixel. All return values will be put together in a row and reshaped into an image.
What allready works:
Calculations are allready working and I can reconstruct my image
Problem:
Function calls are done seriall and in a row but each image reconstruction is very slow
Main Goal:
Speed up the function calls by parallize the function calls.
Code:
def decompress(payload,WIDTH,HEIGHT):
# INPUTS / OUTPUTS
n_threads = 4
img_input = np.fromstring(payload, dtype='uint32')
img_output = np.zeros((WIDTH * HEIGHT), dtype=np.uint32)
n_elements_part = np.int(len(img_input) / n_threads)
input_part=np.zeros((n_threads,n_elements_part)).astype(np.uint32)
output_part =np.zeros((n_threads,np.int(n_elements_part/3*8))).astype(np.uint32)
# DEFINE PARTS (here 4 different ones)
start = np.zeros(n_threads).astype(np.int)
end = np.zeros(n_threads).astype(np.int)
for i in range(0,n_threads):
start[i] = i * n_elements_part
end[i] = (i+1) * n_elements_part -1
# COPY IMAGE DATA
for idx in range(0,n_threads):
input_part [idx,:] = img_input[start[idx]:end[idx]+1]
for idx in range(0,n_threads): # following line is the function_call that should be parallized
output_part[idx,:] = decompress_part2(input_part[idx],output_part[idx])
# COPY PARTS INTO THE IMAGE
img_output[0 : 188160] = output_part[0,:]
img_output[188160: 376320] = output_part[1,:]
img_output[376320: 564480] = output_part[2,:]
img_output[564480: 752640] = output_part[3,:]
# RESHAPE IMAGE
img_output = np.reshape(img_output,(HEIGHT, WIDTH))
return img_output
Please dont take care of my beginner programming style :)
Just looking for a solution how to parallize the function calls with the multiprocessing module and get back the return ndarrays.
Thank you so much for your help !

You can use process pool from the multiprocessing module
def test(a):
return a
from multiprocessing.dummy import Pool
p = Pool(3)
a=p.starmap(test, zip([1,2,3]))
print(a)
p.close()
p.join()

kar's answer works, however keep in mind that he's using the .dummy module which might be limited by the GIL. Heres more info on it:
multiprocessing.dummy in Python is not utilising 100% cpu

How to reconstruct an image that has been divided into several pieces?

I'm trying to write a code which gets the pieces of an image, that are represented by an m X m matrix. two pieces belong together if the right side of the matrix of one piece is the same as the left side of the other piece, these two pieces should be "glued".
I wrote the function join which joins two pictures together, I started writing the puzzle solving function but I got stuck and I don't know how to continue.
def join_h(mat1, mat2):
""" joins two matrices, side by side with some separation """
n1,m1 = mat1.dim()
n2,m2 = mat2.dim()
m = m1+m2+10
n = max(n1,n2)
new = Matrix(n, m, val=255) # fill new matrix with white pixels
new[:n1,:m1] = mat1
new[:n2,m1+10:m] = mat2
'''
#without slicing:
for i in range(n1):
for j in range(m1):
new[i,j] = mat1[i,j]
for i in range(n2):
for j in range(m2):
new[i,j+m1+10] = mat2[i,j]
'''
return new
def reconstruct_image(m):
matlst=getlistmatrix()
for i in range(len(matlst)):
if matlst[i][i]==matls[i+1][i+1]
def getlistmatrix():
matrixlst=[]
for i in range(1,1601):
matrixlst.append(Matrix.load('./im'+str(i) +'.bitmap'))
return matrixlst

scikit-learn library provides you with the function 'reconstruct_from_patches_2d' - sklearn.feature_extraction.image.reconstruct_from_patches_2d
If you know the order of the overlapping blocks, then you should be able to use this.
Refer: http://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.image.reconstruct_from_patches_2d.html#sklearn.feature_extraction.image.reconstruct_from_patches_2d
An example provided by them for the same - http://scikit-learn.org/stable/auto_examples/decomposition/plot_image_denoising.html#example-decomposition-plot-image-denoising-py

Improve the speed of the script with threads

I am trying this code, and it works well, however is really slow, because the number of iterations is high.
I am thinking about threads, that should increase the performance of this script, right? Well, the question is how can I change this code to works with synchronized threads.
def get_duplicated(self):
db_pais_origuem = self.country_assoc(int(self.Pais_origem))
db_pais_destino = self.country_assoc(int(self.Pais_destino))
condicao = self.condition_assoc(int(self.Condicoes))
origem = db_pais_origuem.query("xxx")
destino = db_pais_destino.query("xxx")
origem_result = origem.getresult()
destino_result = destino.getresult()
for i in origem_result:
for a in destino_result:
text1 = i[2]
text2 = a[2]
vector1 = self.text_to_vector(text1)
vector2 = self.text_to_vector(text2)
cosine = self.get_cosine(vector1, vector2)
origem_result and destino_result structure:
[(382360, 'name abcd', 'some data'), (361052, 'name abcd', 'some data'), (361088, 'name abcd', 'some data')]

From what I can see you are computing a distance function between pairs of vectors. Given a list of vectors, v1, ..., vn, and a second list w1,...wn you want the distance/similarity between all pairs from v and w. This is usually highly amenable to parallel computations, and is sometimes referred to as an embarassingly parallel computation. IPython works very well for this.
If your distance function distance(a,b) is independent and does not depend on results from other distance function values (this is usually the case that I have seen), then you can easily use ipython parallel computing toolbox. I would recommend it over threads, queues, etc... for a wide variety of tasks, especially exploratory. However, the same principles could be extended to threads or queue module in python.
I recommend following along with http://ipython.org/ipython-doc/stable/parallel/parallel_intro.html#parallel-overview and http://ipython.org/ipython-doc/stable/parallel/parallel_task.html#quick-and-easy-parallelism It provides a very easy, gentle introduction to parallelization.
In the simple case, you simply will use the threads on your computer (or network if you want a bigger speed up), and let each thread compute as many of the distance(a,b) as it can.
Assuming a command prompt that can see the ipcluster executable command type
ipcluster start -n 3
This starts the cluster. You will want to adjust the number of cores/threads depending on your specific circumstances. Consider using n-1 cores, to allow one core to handle the scheduling.
The hello world examples goes as follows:
serial_result = map(lambda z:z**10, range(32))
from IPython.parallel import Client
rc = Client()
rc
rc.ids
dview = rc[:] # use all engines
parallel_result = dview.map_sync(lambda z: z**10, range(32))
#a couple of caveats, are this template will not work directly
#for our use case of computing distance between a matrix (observations x variables)
#because the allV data matrix and the distance function are not visible to the nodes
serial_result == parallel_result
For the sake of simplicity I will show how to compute the distance between all pairs of vectors specified in allV. Assume that each row represents a data point (observation) that has three dimensions.
Also I am not going to present this the "pedagoically corret" way, but the way that I stumbled through it wrestling with the visiblity of my functions and data on the remote nodes. I found that to be the biggest hurdle to entry
dataPoints = 10
allV = numpy.random.rand(dataPoints,3)
mesh = list(itertools.product(arange(dataPoints),arange(dataPoints)))
#given the following distance function we can evaluate locally
def DisALocal(a,b):
return numpy.linalg.norm(a-b)
serial_result = map(lambda z: DisALocal(allV[z[0]],allV[z[1]]),mesh)
parallel_result = dview.map_sync(lambda z: DisALocal(allV[z[0]],allV[z[1]]),mesh)
#will not work as DisALocal is not visible to the nodes
#also will not work as allV is not visible to the nodes
There are a few ways to define remote functions.
Depending on whether we want to send our data matrix to the nodes or not.
There are tradeoffs as to how big the matrix is, whether you want to
send lots of vectors individually to the nodes or send the entire matrix
upfront...
#in first case we send the function def to the nodes via autopx magic
%autopx
def DisARemote(a,b):
import numpy
return numpy.linalg.norm(a-b)
%autopx
#It requires us to push allV. Also note the import numpy in the function
dview.push(dict(allV=allV))
parallel_result = dview.map_sync(lambda z: DisARemote(allV[z[0]],allV[z[1]]),mesh)
serial_result == parallel_result
#here we will generate the vectors to compute differences between
#and pass the vectors only, so we do not need to load allV across the
#nodes. We must pre compute the vectors, but this could, perhaps, be
#done more cleverly
z1,z2 = zip(*mesh)
z1 = array(z1)
z2 = array(z2)
allVectorsA = allV[z1]
allVectorsB = allV[z2]
#dview.parallel(block=True)
def DisB(a,b):
return numpy.linalg.norm(a-b)
parallel_result = DisB.map(allVectorsA,allVectorsB)
serial_result == parallel_result
In the final case we will do the following
#this relies on the allV data matrix being pre loaded on the nodes.
#note with DisC we do not import numpy in the function, but
#import it via sync_imports command
with dview.sync_imports():
import numpy
#dview.parallel(block=True)
def DisC(a):
return numpy.linalg.norm(allV[a[0]]-allV[a[1]])
#the data structure must be passed to all threads
dview.push(dict(allV=allV))
parallel_result = DisC.map(mesh)
serial_result == parallel_result
All the above can be easily extended to work in a load balanced fashion
Of course, the easiest speedup (assuming if distance(a,b) = distance(b,a)) would be the following. It will only cut run time in half, but can be used with the above parallelization ideas to compute only the upper triangle of the distance matrix.
for vIndex,currentV in enumerate(v):
for wIndex,currentW in enumerate(w):
if vIndex > wIndex:
continue#we can skip the other half of the computations
distance[vIndex,wIndex] = get_cosine(currentV, currentW)
#if distance(a,b) = distance(b,a) then use this trick
distance[wIndex,vIndex] = distance[vIndex,wIndex]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.