Improve the speed of the script with threads

Improve the speed of the script with threads - python

I am trying this code, and it works well, however is really slow, because the number of iterations is high.
I am thinking about threads, that should increase the performance of this script, right? Well, the question is how can I change this code to works with synchronized threads.
def get_duplicated(self):
db_pais_origuem = self.country_assoc(int(self.Pais_origem))
db_pais_destino = self.country_assoc(int(self.Pais_destino))
condicao = self.condition_assoc(int(self.Condicoes))
origem = db_pais_origuem.query("xxx")
destino = db_pais_destino.query("xxx")
origem_result = origem.getresult()
destino_result = destino.getresult()
for i in origem_result:
for a in destino_result:
text1 = i[2]
text2 = a[2]
vector1 = self.text_to_vector(text1)
vector2 = self.text_to_vector(text2)
cosine = self.get_cosine(vector1, vector2)
origem_result and destino_result structure:
[(382360, 'name abcd', 'some data'), (361052, 'name abcd', 'some data'), (361088, 'name abcd', 'some data')]

From what I can see you are computing a distance function between pairs of vectors. Given a list of vectors, v1, ..., vn, and a second list w1,...wn you want the distance/similarity between all pairs from v and w. This is usually highly amenable to parallel computations, and is sometimes referred to as an embarassingly parallel computation. IPython works very well for this.
If your distance function distance(a,b) is independent and does not depend on results from other distance function values (this is usually the case that I have seen), then you can easily use ipython parallel computing toolbox. I would recommend it over threads, queues, etc... for a wide variety of tasks, especially exploratory. However, the same principles could be extended to threads or queue module in python.
I recommend following along with http://ipython.org/ipython-doc/stable/parallel/parallel_intro.html#parallel-overview and http://ipython.org/ipython-doc/stable/parallel/parallel_task.html#quick-and-easy-parallelism It provides a very easy, gentle introduction to parallelization.
In the simple case, you simply will use the threads on your computer (or network if you want a bigger speed up), and let each thread compute as many of the distance(a,b) as it can.
Assuming a command prompt that can see the ipcluster executable command type
ipcluster start -n 3
This starts the cluster. You will want to adjust the number of cores/threads depending on your specific circumstances. Consider using n-1 cores, to allow one core to handle the scheduling.
The hello world examples goes as follows:
serial_result = map(lambda z:z**10, range(32))
from IPython.parallel import Client
rc = Client()
rc
rc.ids
dview = rc[:] # use all engines
parallel_result = dview.map_sync(lambda z: z**10, range(32))
#a couple of caveats, are this template will not work directly
#for our use case of computing distance between a matrix (observations x variables)
#because the allV data matrix and the distance function are not visible to the nodes
serial_result == parallel_result
For the sake of simplicity I will show how to compute the distance between all pairs of vectors specified in allV. Assume that each row represents a data point (observation) that has three dimensions.
Also I am not going to present this the "pedagoically corret" way, but the way that I stumbled through it wrestling with the visiblity of my functions and data on the remote nodes. I found that to be the biggest hurdle to entry
dataPoints = 10
allV = numpy.random.rand(dataPoints,3)
mesh = list(itertools.product(arange(dataPoints),arange(dataPoints)))
#given the following distance function we can evaluate locally
def DisALocal(a,b):
return numpy.linalg.norm(a-b)
serial_result = map(lambda z: DisALocal(allV[z[0]],allV[z[1]]),mesh)
parallel_result = dview.map_sync(lambda z: DisALocal(allV[z[0]],allV[z[1]]),mesh)
#will not work as DisALocal is not visible to the nodes
#also will not work as allV is not visible to the nodes
There are a few ways to define remote functions.
Depending on whether we want to send our data matrix to the nodes or not.
There are tradeoffs as to how big the matrix is, whether you want to
send lots of vectors individually to the nodes or send the entire matrix
upfront...
#in first case we send the function def to the nodes via autopx magic
%autopx
def DisARemote(a,b):
import numpy
return numpy.linalg.norm(a-b)
%autopx
#It requires us to push allV. Also note the import numpy in the function
dview.push(dict(allV=allV))
parallel_result = dview.map_sync(lambda z: DisARemote(allV[z[0]],allV[z[1]]),mesh)
serial_result == parallel_result
#here we will generate the vectors to compute differences between
#and pass the vectors only, so we do not need to load allV across the
#nodes. We must pre compute the vectors, but this could, perhaps, be
#done more cleverly
z1,z2 = zip(*mesh)
z1 = array(z1)
z2 = array(z2)
allVectorsA = allV[z1]
allVectorsB = allV[z2]
#dview.parallel(block=True)
def DisB(a,b):
return numpy.linalg.norm(a-b)
parallel_result = DisB.map(allVectorsA,allVectorsB)
serial_result == parallel_result
In the final case we will do the following
#this relies on the allV data matrix being pre loaded on the nodes.
#note with DisC we do not import numpy in the function, but
#import it via sync_imports command
with dview.sync_imports():
import numpy
#dview.parallel(block=True)
def DisC(a):
return numpy.linalg.norm(allV[a[0]]-allV[a[1]])
#the data structure must be passed to all threads
dview.push(dict(allV=allV))
parallel_result = DisC.map(mesh)
serial_result == parallel_result
All the above can be easily extended to work in a load balanced fashion
Of course, the easiest speedup (assuming if distance(a,b) = distance(b,a)) would be the following. It will only cut run time in half, but can be used with the above parallelization ideas to compute only the upper triangle of the distance matrix.
for vIndex,currentV in enumerate(v):
for wIndex,currentW in enumerate(w):
if vIndex > wIndex:
continue#we can skip the other half of the computations
distance[vIndex,wIndex] = get_cosine(currentV, currentW)
#if distance(a,b) = distance(b,a) then use this trick
distance[wIndex,vIndex] = distance[vIndex,wIndex]

Related

Multicore programming

At the company where I am interning, I was told about the use of multi-core programming and, in view of a project I am developing for my thesis (I'm not from the area but I'm working on something that involves coding).
I want to know if this is possible:I have a defined function that will be repeated 3x for 3 different variables. Is it possible to put the 3 running at the same time in different core (because they don't need each other information)? Because the calculation process is the same for all of them and instead of running 1 variable at a time, I would like to run all 3 at once (performing all the calculations at the same time) and in the end returning the results.
Some part of what I would like to optimize:
for v in [obj2_v1, obj2_v2, obj2_v3]:
distancia_final_v, \
pontos_intersecao_final_v = calculo_vertice( obj1_normal,
obj1_v1,
obj1_v2,
obj1_v3,
obj2_normal,
v,
criterio
)
def calculo_vertice( obj1_normal,
obj1_v1,
obj1_v2,
obj1_v3,
obj2_normal,
obj2_v,
criterio
):
i = 0
distancia_final_v = []
pontos_intersecao_final_v = []
while i < len(obj2_v):
distancia_relevante_v = []
pontos_intersecao_v = []
distancia_inicial = 1000
for x in range(len(obj1_v1)):
planeNormal = np.array( [obj1_normal[x][0],
obj1_normal[x][1],
obj1_normal[x][2]
] )
planePoint = np.array( [ obj1_v1[x][0],
obj1_v1[x][1],
obj1_v1[x][2]
] ) # Any point on the plane
rayDirection = np.array([obj2_normal[i][0],
obj2_normal[i][1],
obj2_normal[i][2]
] ) # Define a ray
rayPoint = np.array([ obj2_v[i][0],
obj2_v[i][1],
obj2_v[i][2]
] ) # Any point along the ray
Psi = Calculos.line_plane_collision( planeNormal,
planePoint,
rayDirection,
rayPoint
)
a = Calculos.area_trianglo_3d( obj1_v1[x][0],
obj1_v1[x][1],
obj1_v1[x][2],
obj1_v2[x][0],
obj1_v2[x][1],
obj1_v2[x][2],
obj1_v3[x][0],
obj1_v3[x][1],
obj1_v3[x][2]
)
b = Calculos.area_trianglo_3d( obj1_v1[x][0],
obj1_v1[x][1],
obj1_v1[x][2],
obj1_v2[x][0],
obj1_v2[x][1],
obj1_v2[x][2],
Psi[0][0],
Psi[0][1],
Psi[0][2]
)
c = Calculos.area_trianglo_3d( obj1_v1[x][0],
obj1_v1[x][1],
obj1_v1[x][2],
obj1_v3[x][0],
obj1_v3[x][1],
obj1_v3[x][2],
Psi[0][0],
Psi[0][1],
Psi[0][2]
)
d = Calculos.area_trianglo_3d( obj1_v2[x][0],
obj1_v2[x][1],
obj1_v2[x][2],
obj1_v3[x][0],
obj1_v3[x][1],
obj1_v3[x][2],
Psi[0][0],
Psi[0][1],
Psi[0][2]
)
if float("{:.5f}".format(a)) == float("{:.5f}".format(b + c + d)):
P1 = Ponto( Psi[0][0], Psi[0][1], Psi[0][2] )
P2 = Ponto( obj2_v[i][0], obj2_v[i][1], obj2_v[i][2] )
distancia = Calculos.distancia_pontos( P1, P2 ) * 10
if distancia < distancia_inicial and distancia < criterio:
distancia_inicial = distancia
distancia_relevante_v = []
distancia_relevante_v.append( distancia_inicial )
pontos_intersecao_v = []
pontos_intersecao_v.append( Psi )
x += 1
distancia_final_v.append( distancia_relevante_v )
pontos_intersecao_final_v.append( pontos_intersecao_v )
i += 1
return distancia_final_v, pontos_intersecao_final_v
In this example of my code, I want to make the same process happen for obj2_v1, obj2_v2, obj2_v3.
Is it possible to make them happen at the same time?
Because I will be using a considerable amount of data and it would probably save me some time of processing.

multiprocessing (using processes to avoid the GIL) is the easiest but you're limited to relatively small performance improvements, number of cores speedup is the limit, see Amdahl's law. there's also a bit of latency involved in starting / stopping work which means it's much better for things that take >10ms
in numeric heavy code (like this seems to be) you really want to be moving as much of the it "inside numpy", look at vectorisation and broadcasting. this can give speedups of >50x (just on a single core) while staying easier to understand and reason about
if your algorithm is difficult to express using numpy intrinsics then you could also look at using Cython. this allows you to write Python code that automatically gets compiled down to C, and hence a lot faster. 50x faster is probably also a reasonable speedup, and this is still running on a single core
the numpy and Cython techniques can be combined with multiprocessing (i.e. using multiple cores) to give code that runs hundreds of times faster than naive implementations
Jupyter notebooks have friendly extensions (known affectionately as "magic") that make it easier to get started with this sort of performance work. the %timeit special allows you to easily time parts of the code, and the Cython extension means you can put everything into the same file

It's possible, but use python multiprocessing lib, because the threading lib doesn't delivery parallel execution.
UPDATE
DON'T do something like that (thanks for #user3666197 for pointing the error) :
from multiprocessing.pool import ThreadPool
def calculo_vertice(obj1_normal,obj1_v1,obj1_v2,obj1_v3,obj2_normal,obj2_v,criterio):
#your code
return distancia_final_v,pontos_intersecao_final_v
pool = ThreadPool(processes=3)
async_result1 = pool.apply_async(calculo_vertice, (#your args here))
async_result2 = pool.apply_async(calculo_vertice, (#your args here))
async_result3 = pool.apply_async(calculo_vertice, (#your args here))
result1 = async_result1.get() # result1
result2 = async_result2.get() # result2
result3 = async_result3.get() # result3
Instead, something like this should do the job:
from multiprocessing import Process, Pipe
def calculo_vertice(obj1_normal,obj1_v1,obj1_v2,obj1_v3,obj2_normal,obj2_v,criterio, send_end):
#your code
send_end.send((distancia_final_v,pontos_intersecao_final_v))
numberOfWorkers = 3
jobs = []
pipeList = []
#Start process and build job list
for i in range(numberOfWorkers):
recv_end, send_end = Pipe(False)
process = Process(target=calculo_vertice, args=(#<... your args...>#, send_end))
jobs.append(process)
pipeList.append(recv_end)
process.start()
#Show the results
for job in jobs: job.join()
resultList = [x.recv() for x in pipeList]
print (resultList)
REF.
https://docs.python.org/3/library/multiprocessing.html
https://stackoverflow.com/a/37737985/8738174
This code will create a pool of 3 working process and each of it will async receive the function. It's important to point that in this case you should have 3+ CPU cores, otherwise, your system kernel will just switch between process and things won't real run in parallel.

Q : " Is it possible to make them happen at the same time? "
Yes.
The best results ever will be get if not adding any python ( the multiprocessing module is not necessary at all for doing 3 full-copies ( yes, top-down fully replicated copies ) of the __main__ python process for this so embarrasingly independent processing ).
The reasons for this are explained in detail here and here.
A just-enough tool is GNU's :$ parallel --jobs 3 python job-script.py {} ::: "v1" "v2" "v3"
For all performance-tweaking, read about more configuration details in man parallel.
"Because I will be using a considerable amount of data..."
The Devil is hidden in the details :
The O/P code may be syntactically driving the python interpreter to results, precise (approximate) within some 5 decimal places, yet the core sin is it's ultimately bad chances to demonstrate any reasonably achievable performance in doing that, the worse on "considerable amount of data".
If they, "at the company", expect some "considerable amount of data", you should do at least some elementary research on what is the processing aimed at.
The worst part ( not mentioning the decomposition of once vectorised-ready numpy-arrays back into atomic "float" coordinate values ) is the point-inside-triangle test.
For a brief analysis on how to speed-up this part ( the more if going to pour "considerable amount of data" on doing this ), get inspired from this post and get the job done in fraction of the time it was drafted in the O/P above.
Indirect testing of a point-inside-triangle by comparing an in-equality of a pair of re-float()-ed-strings, received from sums of triangle-areas ( b + c + d ) is just one of the performance blockers, you will find to get removed.

Using shared array in multiprocessing

I am trying to run a parallel process in python, wherein I have to extract certain polygons from a large array based on some conditions. The large array has 10k+ polygons that are indexed.
In a extract_polygon function I pass (array, index). Based on index the function has to either return the polygon corresponding to that index or not based on the conditions defined. The array is never changed and is only used for reading the polygon based on the index provided.
Since the array is very large, I am running into out of memory error during parallel processing. how can I avoid that? (In a way, how to effectively use shared array in multiprocessing?)
Below is my sample code:
def extract_polygon(array, index):
try:
islays = ndimage.find_objects(clone==index)
poly = clone[islays[0][0],islays[0][1]]
area = np.count_nonzero(ploy)
minArea = 100
maxArea = 10000
if (area > minArea) and (area < maxArea):
return poly
else:
return None
except:
return None
start = time.time()
pool = mp.Pool(10)
results = pool.starmap(get_objects,[(array, index) for index in indices])
pool.close()
pool.join()
#indices here is a list of all the indexes we have.
Can I use any other library like ray in this case?

You can absolutely use a library like Ray.
The structure would look something like this (simplified to remove your application logic).
import numpy as np
import ray
ray.init()
# Create the array and store it in shared memory once.
array = np.ones(10**6)
array_id = ray.put(array)
#ray.remote
def extract_polygon(array, index):
# Change this to actual extract the polygon.
return index
# Start 10 tasks that each take in the ID of the array in shared memory.
# These tasks execute in parallel (assuming there are enough CPU resources).
result_ids = [extract_polygon.remote(array_id, i) for i in range(10)]
# Fetch the results.
results = ray.get(result_ids)
You can read more about Ray in the documentation.
See some related answers below:
Shared-memory objects in multiprocessing
python3 multiprocess shared numpy array(read-only)

Massively parallel search operation with Dask, Distributed

I created a demo problem when testing out auto-scalling Dask Distributed implementation on Kubernetes and AWS and I I'm not sure I'm tackling the problem correctly.
My scenario is given a md5 hash of a string (representing a password) find the original string. I hit three main problems.
A) the parameter space is massive and trying to create a dask bag with 2.8211099e+12 members caused memory issues (hence the 'explode' function you'll see in the sample code below).
B) Clean exit on early find. I think using take(1, npartitions=-1) will achieve this but I wasn't sure. Originally I raised an exception raise Exception("%s is your answer' % test_str) which worked but felt "dirty"
C) Given this is long running and sometimes workers or AWS boxes die, how would it be best to store progress?
Example code:
import distributed
import math
import dask.bag as db
import hashlib
import dask
import os
if os.environ.get('SCHED_URL', False):
sched_url = os.environ['SCHED_URL']
client = distributed.Client(sched_url)
versions = client.get_versions(True)
dask.set_options(get=client.get)
difficulty = 'easy'
settings = {
'hard': (hashlib.md5('welcome1'.encode('utf-8')).hexdigest(),'abcdefghijklmnopqrstuvwxyz1234567890', 8),
'mid-hard': (hashlib.md5('032abgh'.encode('utf-8')).hexdigest(),'abcdefghijklmnop1234567890', 7),
'mid': (hashlib.md5('b08acd'.encode('utf-8')).hexdigest(),'0123456789abcdef', 6),
'easy': (hashlib.md5('0812'.encode('utf-8')).hexdigest(),'0123456789', 4)
}
hashed_pw, keyspace, max_guess_length = settings[difficulty]
def is_pw(guess):
return hashlib.md5(guess.encode('utf-8')).hexdigest() == hashed_pw
def guess(n):
guess = ''
size = len(keyspace)
while n>0 :
n -= 1
guess += keyspace[n % size];
n = math.floor(n / size);
return guess
def make_exploder(num_partitions, max_val):
"""Creates a function that maps a int to a range based on the number maximum value aimed for
and the number of partitions that are expected.
Used in this code used with map and flattent to take a short list
i.e 1->1e6 to a large one 1->1e20 in dask rather than on the host machine."""
steps = math.ceil(max_val / num_partitions)
def explode(partition):
return range(partition * steps, partition * steps + steps)
return explode
max_val = len(keyspace) ** max_guess_length # How many possiable password permutation
partitions = math.floor(max_val / 100)
partitions = partitions if partitions < 100000 else 100000 # split in to a maximum of 10000 partitions. Too many partitions caused issues, memory I think.
exploder = make_exploder(partitions, max_val) # Sort of the opposite of a reduce. make_exploder(10, 100)(3) => [30, 31, ..., 39]. Expands the problem back in to the full problem space.
print("max val: %s, partitions:%s" % (max_val, partitions))
search = db.from_sequence(range(partitions), npartitions=partitions).map(exploder).flatten().filter(lambda i: i <= max_val).map(guess).filter(is_pw)
search.take(1,npartitions=-1)
I find 'easy' works well locally, 'mid-hard' works well on our 6 to 8 * m4.2xlarge AWS cluster. But so far haven't got hard to work.

A) the parameter space is massive and trying to create a dask bag with 2.8211099e+12 members caused memory issues (hence the 'explode' function you'll see in the sample code below).
This depends strongly on how you arrange your elements into a bag. If each element is in its own partition then yes, this will certainly kill everything. 1e12 partitions is very expensive. I recommend keeping the number of partitions in the thousands or tens of thousands.
B) Clean exit on early find. I think using take(1, npartitions=-1) will achieve this but I wasn't sure. Originally I raised an exception raise Exception("%s is your answer' % test_str) which worked but felt "dirty"
If you want this then I recommend not using dask.bag, but instead using the concurrent.futures interface and in particular the as_completed iterator.
C) Given this is long running and sometimes workers or AWS boxes die, how would it be best to store progress?
Dask should be resilient to this as long as you can guarantee that the scheduler survives. If you use the concurrent futures interface rather than dask bag then you can also track intermediate results on the client process.

Python: Using multiprocessing module as possible solution to increase the speed of my function

I wrote a function in Python 2.7 (on Window OS 64bit) in order to calculate the mean value of of the intersection area from a reference polygon (Ref) and one or more segmented (Seg) polygon(s) in ESRI shapefile format. The code is quite slow because i have more that 2000 reference polygon (s) and for each Ref_polygon the function run for every time for all Seg polygons(s) (more than 7000). I am sorry but the function is a prototype.
I wish to know if multiprocessing can help me to increase the speed of my loop or there are more performance solutions. if multiprocessing can be a possible solution i wish to know the best way to optimize my following function
import numpy as np
import ogr
import osr,gdal
from shapely.geometry import Polygon
from shapely.geometry import Point
import osgeo.gdal
import osgeo.gdal as gdal
def AreaInter(reference,segmented,outFile):
# open shapefile
ref = osgeo.ogr.Open(reference)
if ref is None:
raise SystemExit('Unable to open %s' % reference)
seg = osgeo.ogr.Open(segmented)
if seg is None:
raise SystemExit('Unable to open %s' % segmented)
ref_layer = ref.GetLayer()
seg_layer = seg.GetLayer()
# create outfile
if not os.path.split(outFile)[0]:
file_path, file_name_ext = os.path.split(os.path.abspath(reference))
outFile_filename = os.path.splitext(os.path.basename(outFile))[0]
file_out = open(os.path.abspath("{0}\\{1}.txt".format(file_path, outFile_filename)), "w")
else:
file_path_name, file_ext = os.path.splitext(outFile)
file_out = open(os.path.abspath("{0}.txt".format(file_path_name)), "w")
# For each reference objects-i
for index in xrange(ref_layer.GetFeatureCount()):
ref_feature = ref_layer.GetFeature(index)
# get FID (=Feature ID)
FID = str(ref_feature.GetFID())
ref_geometry = ref_feature.GetGeometryRef()
pts = ref_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
# convert in a shapely polygon
ref_polygon = Polygon(points)
# get the area
ref_Area = ref_polygon.area
# create an empty list
Area_seg, Area_intersect = ([] for _ in range(2))
# For each segmented objects-j
for segment in xrange(seg_layer.GetFeatureCount()):
seg_feature = seg_layer.GetFeature(segment)
seg_geometry = seg_feature.GetGeometryRef()
pts = seg_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
seg_polygon = Polygon(points)
seg_Area.append = seg_polygon.area
# intersection (overlap) of reference object with the segmented object
intersect_polygon = ref_polygon.intersection(seg_polygon)
# area of intersection (= 0, No intersection)
intersect_Area.append = intersect_polygon.area
# Avarage for all segmented objects (because 1 or more segmented polygons can intersect with reference polygon)
seg_Area_average = numpy.average(seg_Area)
intersect_Area_average = numpy.average(intersect_Area)
file_out.write(" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")
file_out.close()

You can use the multiprocessing package, and especially the Pool class. First create a function that does all the stuff you want to do within the for loop, and that takes as an argument only the index:
def process_reference_object(index):
ref_feature = ref_layer.GetFeature(index)
# all your code goes here
return (" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")
Note that this doesn't write to a file itself- that would be messy because you'd have multiple processes writing to the same file at the same time. Instead, it returns the string that needs to be written. Also note that there are objects in this function like ref_layer or ref_geometry that will need to reach it somehow- that's up to you how to do it (you could put process_reference_object as the method in a class initialized with them, or it could be as ugly as just defining them globally).
Then, you create a pool of process resources, and run all of your indices using Pool.imap_unordered (which will itself allocate each index to a different process as necessary):
from multiprocessing import Pool
p = Pool() # run multiple processes
for l in p.imap_unordered(process_reference_object, range(ref_layer.GetFeatureCount())):
file_out.write(l)
This will parallelize the independent processing of your reference objects across multiple processes, and write them to the file (in an arbitrary order, note).

Threading can help to a degree, but first you should make sure you can't simplify the algorithm. If you're checking each of 2000 reference polygons against 7000 segmented polygons (perhaps I misunderstood), then you should start there. Stuff that runs at O(n2) is going to be slow, so maybe you can prune away things that will definitely not intersect or find some other way to speed things up. Otherwise, running multiple processes or threads will only improve things linearly when your data grows geometrically.

Parallelize resolution of differential equation in Python

i am solving a system of ordinary differential equations using the odeint function. Is it possible (and if yes how) to parallelize easily this kind of problem?

The answer above is wrong, solving a ODE nummerically needs to calculate the function f(t,y)=y' several times per iteration, e.g. four times for Runge-Kutta. But i dont know any package for python doing this.

Numerically integrating an ODE is an intrinsically sequential operation, since you need each result to compute the following one (well, except if you're integrating from multiple starting points). So I guess the answer is no.

EDIT: Wow, I've just realised this question is more than 3 years old. I'll still leave my answer hoping it finds its way to someone in the same predicament. Sorry for that.
I had the same problem. I was able to parallelise such process as follows.
First you need dispy. In there you'll find some programs that will do the paralelization process for you. I am not an expert on dispybut I had no problems using it, and I didn't need to configure anything.
So, how to use it?
Run python dispynode.py -d. If you do not run this script before running your main program, the parallel jobs won't be performed.
Run your main program. Here I post the one I used (sorry for the mess). You'll need to change the function sim, and change accordingly to what you want to do with the results. I hope however that my program works as a reference for you.
import os, sys, inspect
#Add dispy to your path
cmd_folder = os.path.realpath(os.path.abspath(os.path.split(inspect.getfile( inspect.currentframe() ))[0]))
if cmd_folder not in sys.path:
sys.path.insert(0, cmd_folder)
# use this if you want to include modules from a subforder
cmd_subfolder = os.path.realpath(os.path.abspath(os.path.join(os.path.split(inspect.getfile( inspect.currentframe() ))[0],cmd_folder+"/dispy-3.10/")))
if cmd_subfolder not in sys.path:
sys.path.insert(0, cmd_subfolder)
#----------------------------------------#
#This function contains the differential equation to be simulated.
def sim(ic,e,O): #ic=initial conditions; e=Epsiolon; O=Omega
from scipy.integrate import ode
import numpy as np
#Diff Eq.
def sys(t,x,e,O,z,b,l):
p = 2.*e*O*np.sin(O*t)*(1-e*np.cos(O*t))/(z+(1-e*np.cos(O*t))**2)
q = (1+4.*b/l*np.cos(O*t))*(z+(1-e*np.cos(O*t)))/( z+(1-e*np.cos(O*t))**2 )
dx=np.zeros(2)
dx[0] = x[1]
dx[1] = -q*x[0]-p*x[1]
return dx
#Simulation.
t0=0; tEnd=10000.; dt=0.1
r = ode(sys).set_integrator('dop853', nsteps=10,max_step=dt) #Definition of the integrator
Y=[];S=[];T=[]
# - parameters - #
z=0.5; l=1.0; b=0.06;
# -------------- #
color=1
r.set_initial_value(ic, t0).set_f_params(e,O,z,b,l) #Set the parameters, the initial condition and the initial time
#Loop to integrate.
while r.successful() and r.t +dt < tEnd:
r.integrate(r.t+dt)
Y.append(r.y)
T.append(r.t)
if r.y[0]>1.25*ic[0]: #Bound. This is due to my own requirements.
color=0
break
#r.y contains the solutions and r.t contains the time vector.
return e,O,color #For each pair e,O return e,O and a color (0,1) which correspond to the color of the point in the stability chart (0=unstable) (1=stable)
# ------------------------------------ #
#MAIN PROGRAM where the parallel magic happens
import matplotlib.pyplot as plt
import dispy
import numpy as np
F=100 #Total files
#Range of the values of Epsilon and Omega
Epsilon = np.linspace(0,1,100)
Omega_intervals = np.linspace(0,4,F)
ic=[0.1,0]
cluster = dispy.JobCluster(sim) #This function sets that the cluster (array of processors) will be assigned the job sim.
jobs = [] #Initialize the array of jobs
for i in range(F-1):
Data_Array=[]
jobs = []
Omega=np.linspace(Omega_intervals[i], Omega_intervals[i+1],10)
print Omega
for e in Epsilon:
for O in Omega:
job = cluster.submit(ic,e,O) #Send to the cluster a job with the specified parameters
jobs.append(job) #Join all the jobs specified above
cluster.wait()
#Do the jobs
for job in jobs:
e,O,color = job()
Data_Array.append([e,O,color])
#Save the results of the simulation.
file_name='Data'+str(i)+'.txt'
f=open(file_name, 'a')
f.write(str(Data_Array))
f.close()

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.