i am solving a system of ordinary differential equations using the odeint function. Is it possible (and if yes how) to parallelize easily this kind of problem?
The answer above is wrong, solving a ODE nummerically needs to calculate the function f(t,y)=y' several times per iteration, e.g. four times for Runge-Kutta. But i dont know any package for python doing this.
Numerically integrating an ODE is an intrinsically sequential operation, since you need each result to compute the following one (well, except if you're integrating from multiple starting points). So I guess the answer is no.
EDIT: Wow, I've just realised this question is more than 3 years old. I'll still leave my answer hoping it finds its way to someone in the same predicament. Sorry for that.
I had the same problem. I was able to parallelise such process as follows.
First you need dispy. In there you'll find some programs that will do the paralelization process for you. I am not an expert on dispybut I had no problems using it, and I didn't need to configure anything.
So, how to use it?
Run python dispynode.py -d. If you do not run this script before running your main program, the parallel jobs won't be performed.
Run your main program. Here I post the one I used (sorry for the mess). You'll need to change the function sim, and change accordingly to what you want to do with the results. I hope however that my program works as a reference for you.
import os, sys, inspect
#Add dispy to your path
cmd_folder = os.path.realpath(os.path.abspath(os.path.split(inspect.getfile( inspect.currentframe() ))[0]))
if cmd_folder not in sys.path:
sys.path.insert(0, cmd_folder)
# use this if you want to include modules from a subforder
cmd_subfolder = os.path.realpath(os.path.abspath(os.path.join(os.path.split(inspect.getfile( inspect.currentframe() ))[0],cmd_folder+"/dispy-3.10/")))
if cmd_subfolder not in sys.path:
sys.path.insert(0, cmd_subfolder)
#----------------------------------------#
#This function contains the differential equation to be simulated.
def sim(ic,e,O): #ic=initial conditions; e=Epsiolon; O=Omega
from scipy.integrate import ode
import numpy as np
#Diff Eq.
def sys(t,x,e,O,z,b,l):
p = 2.*e*O*np.sin(O*t)*(1-e*np.cos(O*t))/(z+(1-e*np.cos(O*t))**2)
q = (1+4.*b/l*np.cos(O*t))*(z+(1-e*np.cos(O*t)))/( z+(1-e*np.cos(O*t))**2 )
dx=np.zeros(2)
dx[0] = x[1]
dx[1] = -q*x[0]-p*x[1]
return dx
#Simulation.
t0=0; tEnd=10000.; dt=0.1
r = ode(sys).set_integrator('dop853', nsteps=10,max_step=dt) #Definition of the integrator
Y=[];S=[];T=[]
# - parameters - #
z=0.5; l=1.0; b=0.06;
# -------------- #
color=1
r.set_initial_value(ic, t0).set_f_params(e,O,z,b,l) #Set the parameters, the initial condition and the initial time
#Loop to integrate.
while r.successful() and r.t +dt < tEnd:
r.integrate(r.t+dt)
Y.append(r.y)
T.append(r.t)
if r.y[0]>1.25*ic[0]: #Bound. This is due to my own requirements.
color=0
break
#r.y contains the solutions and r.t contains the time vector.
return e,O,color #For each pair e,O return e,O and a color (0,1) which correspond to the color of the point in the stability chart (0=unstable) (1=stable)
# ------------------------------------ #
#MAIN PROGRAM where the parallel magic happens
import matplotlib.pyplot as plt
import dispy
import numpy as np
F=100 #Total files
#Range of the values of Epsilon and Omega
Epsilon = np.linspace(0,1,100)
Omega_intervals = np.linspace(0,4,F)
ic=[0.1,0]
cluster = dispy.JobCluster(sim) #This function sets that the cluster (array of processors) will be assigned the job sim.
jobs = [] #Initialize the array of jobs
for i in range(F-1):
Data_Array=[]
jobs = []
Omega=np.linspace(Omega_intervals[i], Omega_intervals[i+1],10)
print Omega
for e in Epsilon:
for O in Omega:
job = cluster.submit(ic,e,O) #Send to the cluster a job with the specified parameters
jobs.append(job) #Join all the jobs specified above
cluster.wait()
#Do the jobs
for job in jobs:
e,O,color = job()
Data_Array.append([e,O,color])
#Save the results of the simulation.
file_name='Data'+str(i)+'.txt'
f=open(file_name, 'a')
f.write(str(Data_Array))
f.close()
Related
SciPy can solve ode equations by scipy.integrate.odeint or other packages, but it gives result after the function has been solved completely. However, if the ode function is very complex, the program will take a lot of time(one or two days) to give the whole result. So how can I mointor the step it solve the equations(print out result when the equation hasn't been solved completely)?
When I was googling for an answer, I couldn't find a satisfactory one. So I made a simple gist with a proof-of-concept solution using the tqdm project. Hope that helps you.
Edit: Moderators asked me to give an explanation of what is going on in the link above.
First of all, I am using scipy's OOP version of odeint (solve_ivp) but you could adapt it back to odeint. Say you want to integrate from time T0 to T1 and you want to show progress for every 0.1% of progress. You can modify your ode function to take two extra parameters, a pbar (progress bar) and a state (current state of integration). Like so:
def fun(t, y, omega, pbar, state):
# state is a list containing last updated time t:
# state = [last_t, dt]
# I used a list because its values can be carried between function
# calls throughout the ODE integration
last_t, dt = state
# let's subdivide t_span into 1000 parts
# call update(n) here where n = (t - last_t) / dt
time.sleep(0.1)
n = int((t - last_t)/dt)
pbar.update(n)
# we need this to take into account that n is a rounded number.
state[0] = last_t + dt * n
# YOUR CODE HERE
dydt = 1j * y * omega
return dydt
This is necessary because the function itself must know where it is located, but scipy's odeint doesn't really give this context to the function. Then, you can integrate fun with the following code:
T0 = 0
T1 = 1
t_span = (T0, T1)
omega = 20
y0 = np.array([1], dtype=np.complex)
t_eval = np.arange(*t_span, 0.25/omega)
with tqdm(total=1000, unit="‰") as pbar:
sol = solve_ivp(
fun,
t_span,
y0,
t_eval=t_eval,
args=[omega, pbar, [T0, (T1-T0)/1000]],
)
Note that anything mutable (like a list) in the args is instantiated once and can be changed from within the function. I recommend doing this rather than using a global variable.
This will show a progress bar that looks like this:
100%|█████████▉| 999/1000 [00:13<00:00, 71.69‰/s]
You could split the integration domain and integrate the segments, taking the last value of the previous as initial condition of the next segment. In-between, print out whatever you want. Use numpy.concatenate to assemble the pieces if necessary.
In a standard example of a 3-body solar system simulation, replacing the code
u0 = solsys.getState0();
t = np.arange(0, 100*365.242*day, 0.5*day);
%timeit u_res = odeint(lambda u,t: solsys.getDerivs(u), u0, t, atol = 1e11*1e-8, rtol = 1e-9)
output: 1 loop, best of 3: 5.53 s per loop
with a progress-reporting code
def progressive(t,N):
nk = [ int(n+0.5) for n in np.linspace(0,len(t),N+1) ]
u0 = solsys.getState0();
u_seg = [ np.array([u0]) ];
for k in range(N):
u_seg.append( odeint(lambda u,t: solsys.getDerivs(u), u0, t[nk[k]:nk[k+1]], atol = 1e11*1e-8, rtol = 1e-9)[1:] )
print t[nk[k]]/day
for b in solsys.bodies: print("%10s %s"%(b.name,b.x))
return np.concatenate(u_seg)
%timeit u_res = progressive(t,20)
output: 1 loop, best of 3: 5.96 s per loop
shows only a slight 8% overhead for console printing. With a more substantive ODE function, the fraction of the reporting overhead will reduce significantly.
That said, python, at least with its standard packages, is not the tool for industrial-scale number-crunching. Always use compiled versions with strong typing of variables to reduce interpretative overhead as much as possible.
Use some heavily developed and tested package like Sundials or the julia-lang framework differentialequations.jl directly coding the ODE function in an appropriate compiled language. Use the higher-order methods for larger step sizes, thus smaller steps. Test if using implicit or exponential/Rosenbrock methods reduces the number of steps or ODE function evaluations per fixed interval further. The difference can be a factor of 10 to 100 in speedup.
Use a python wrapper of the above with some acceleration-friendly implementation of your ODE function.
Use the quasi-source-translating tool JITcode to translate your python ODE function to a spaghetti list of C instruction that then give a compiled function that can be (almost) directly called from the compiled FORTRAN kernel of odeint.
Simple and Clear.
If you want to integrate an ODE from T0 to T1:
In the last line of the code, before return, you can use print((t/T1)*100,end='')
Then use a sys.stdout.flush() to keep the same line of printing.
Here is an example. My integrating time [0 0.2]
ddt[-2]=(beta/(Ap2*(L-x)))*(-Qgap+Ap*u)
ddt[-1]=(beta/(Ap2*(L+x)))*(Qgap-Ap*u)
print("\rCompletion percentage "+str(format(((t/0.2)*100),".4f")),end='')
sys.stdout.flush()
return ddt
It slows a bit the solving process by fraction of seconds, but it serves perfectly the purpose rather than creating new functions.
I do some computationally expensive tasks in python and found the thread module for parallelization. I have a function which does the computation and returns a ndarray as result. Now I want to know how I can parallize my function and get back the calculated Arrays from each thread.
The followed example is strongly simplified with light functions and calculations.
import numpy as np
def calculate_result(input):
a=np.linspace(1.0, 1000.0, num=10000) # just an example
result = input*a
return(result)
input =[1,2,3,4]
for i in range(0,len(input(i))):
t.Thread(target=calculate_result, args=(input))
t. start()
#Here I want to receive the return value from the thread
I am looking for a way to get the return value from the thread / function for each thread, because in my task each thread calculates different values.
I found an other Question (how to get the return value from a thread in python?) where someone is looking for a similar problem (no ndarrays) and which is handled with ThreadPool and async...
-------------------------------------------------------------------------------
Thanks for your answers !
Due to your help now I am looking for a way to solve my problem with the multiprocessing modul. To give you a better understanding what I do, see my following Explanation.
Explanation:
My 'input_data' is an ndarray with 282240 elements of type uint32
In the 'calculation_function()'I use a for loop to calculate from
every 12 bit a result and put it into the 'output_data'
Because this is very slow, I split my input_data into e.g. 4 or 8
parts and calculate each part in the calculation_function().
Now I am looking for a way, how to parallize the 4 or 8 function
calls
The order of the data is elementary, because the data is in image and
each pixel have to be at the correct Position. So function call no. 1
calculates the first and the last function call the last pixel of the
image.
The calculations work fine and the image can be completly rebuilt
from my algo but I need the parallelization to speed up for time
critical aspects.
Summary:
One input ndarray is devided into 4 or 8 parts. In each part are 70560 or 35280 uint32 values. From each 12 bit I calculate one Pixel with 4 or 8 function calls. Each function returns one ndarray with 188160 or 94080 pixel. All return values will be put together in a row and reshaped into an image.
What allready works:
Calculations are allready working and I can reconstruct my image
Problem:
Function calls are done seriall and in a row but each image reconstruction is very slow
Main Goal:
Speed up the function calls by parallize the function calls.
Code:
def decompress(payload,WIDTH,HEIGHT):
# INPUTS / OUTPUTS
n_threads = 4
img_input = np.fromstring(payload, dtype='uint32')
img_output = np.zeros((WIDTH * HEIGHT), dtype=np.uint32)
n_elements_part = np.int(len(img_input) / n_threads)
input_part=np.zeros((n_threads,n_elements_part)).astype(np.uint32)
output_part =np.zeros((n_threads,np.int(n_elements_part/3*8))).astype(np.uint32)
# DEFINE PARTS (here 4 different ones)
start = np.zeros(n_threads).astype(np.int)
end = np.zeros(n_threads).astype(np.int)
for i in range(0,n_threads):
start[i] = i * n_elements_part
end[i] = (i+1) * n_elements_part -1
# COPY IMAGE DATA
for idx in range(0,n_threads):
input_part [idx,:] = img_input[start[idx]:end[idx]+1]
for idx in range(0,n_threads): # following line is the function_call that should be parallized
output_part[idx,:] = decompress_part2(input_part[idx],output_part[idx])
# COPY PARTS INTO THE IMAGE
img_output[0 : 188160] = output_part[0,:]
img_output[188160: 376320] = output_part[1,:]
img_output[376320: 564480] = output_part[2,:]
img_output[564480: 752640] = output_part[3,:]
# RESHAPE IMAGE
img_output = np.reshape(img_output,(HEIGHT, WIDTH))
return img_output
Please dont take care of my beginner programming style :)
Just looking for a solution how to parallize the function calls with the multiprocessing module and get back the return ndarrays.
Thank you so much for your help !
You can use process pool from the multiprocessing module
def test(a):
return a
from multiprocessing.dummy import Pool
p = Pool(3)
a=p.starmap(test, zip([1,2,3]))
print(a)
p.close()
p.join()
kar's answer works, however keep in mind that he's using the .dummy module which might be limited by the GIL. Heres more info on it:
multiprocessing.dummy in Python is not utilising 100% cpu
In Python, I'm trying to write an algorithm alias_freq(f_signal,f_sample,n), which behaves as follows:
def alias_freq(f_signal,f_sample,n):
f_Nyquist=f_sample/2.0
if f_signal<=f_Nyquist:
return n'th frequency higher than f_signal that will alias to f_signal
else:
return frequency (lower than f_Nyquist) that f_signal will alias to
The following is code that I have been using to test the above function (f_signal, f_sample, and n below are chosen arbitrarily just to fill out the code)
import numpy as np
import matplotlib.pyplot as plt
t=np.linspace(0,2*np.pi,500)
f_signal=10.0
y1=np.sin(f_signal*t)
plt.plot(t,y1)
f_sample=13.0
t_sample=np.linspace(0,int(f_sample)*(2*np.pi/f_sample),f_sample)
y_sample=np.sin(f_signal*t_sample)
plt.scatter(t_sample,y_sample)
n=2
f_alias=alias_freq(f_signal,f_sample,n)
y_alias=np.sin(f_alias*t)
plt.plot(t,y_alias)
plt.xlim(xmin=-.1,xmax=2*np.pi+.1)
plt.show()
My thinking is that if the function works properly, the plots of both y1 and y_alias will hit every scattered point from y_sample. So far I have been completely unsuccessful in getting either the if statement or the else statement in the function to do what I think it should, which makes me believe that either I don't understand aliasing nearly as well as I want to, or my test code is no good.
My questions are: Prelimarily, is the test code I'm using sound for what I'm trying to do? And primarily, what is the alias_freq function that I am looking for?
Also please note: If some Python package has a function just like this already built in, I'd love to hear about it - however, part of the reason I'm doing this is to give myself a device to understand phenomena like aliasing better, so I'd still like to see what my function should look like.
As far as I understood the question correctly, the frequency of the aliased signal is abs(sampling_rate * n - f_signal), where n is the closest integer multiple to f_signal.
Thus:
n = round(f_signal / float(f_sample))
f_alias = abs(f_sample * n - f_signal)
This should work for frequencies under and over Nyquist.
I figured out the answer to my and just realized that I forgot to post it here, sorry. Turns out it was something silly - Antii's answer is basically right, but the way I wrote the code I need a f_sample-1 in the alias_freq function, where I just had an f_sample. There's still a phase shift thing that happens sometimes, but just plugging in either 0 or pi for the phase shift has worked for me every time, I think it's just due to even or odd folding. The working function and test code is below.
import numpy as np
import matplotlib.pyplot as plt
#Given a sample frequency and a signal frequency, return frequency that signal frequency will be aliased to.
def alias_freq(f_signal,f_sample,n):
f_alias = np.abs((f_sample-1)*n - f_signal)
return f_alias
t=np.linspace(0,2*np.pi,500)
f_signal=13
y1=np.sin(f_signal*t)
plt.plot(t,y1)
f_sample=7
t_sample=np.linspace(0,int(f_sample)*(2*np.pi/f_sample),f_sample)
y_sample=np.sin((f_signal)*t_sample)
plt.scatter(t_sample,y_sample)
f_alias=alias_freq(f_signal,f_sample,3)
y_alias=np.sin(f_alias*t+np.pi)#Sometimes with phase shift, usually np.pi for integer f_signal and f_sample, sometimes without.
plt.plot(t,y_alias)
plt.xlim(xmin=-.1,xmax=2*np.pi+.1)
plt.show()
Here is a Python aliased frequency calculator based on numpy
def get_aliased_freq(f, fs):
"""
return aliased frequency of f sampled at fs
"""
import numpy as np
fn = fs / 2
if np.int(f / fn) % 2 == 0:
return f % fn
else:
return fn - (f % fn)
I am trying this code, and it works well, however is really slow, because the number of iterations is high.
I am thinking about threads, that should increase the performance of this script, right? Well, the question is how can I change this code to works with synchronized threads.
def get_duplicated(self):
db_pais_origuem = self.country_assoc(int(self.Pais_origem))
db_pais_destino = self.country_assoc(int(self.Pais_destino))
condicao = self.condition_assoc(int(self.Condicoes))
origem = db_pais_origuem.query("xxx")
destino = db_pais_destino.query("xxx")
origem_result = origem.getresult()
destino_result = destino.getresult()
for i in origem_result:
for a in destino_result:
text1 = i[2]
text2 = a[2]
vector1 = self.text_to_vector(text1)
vector2 = self.text_to_vector(text2)
cosine = self.get_cosine(vector1, vector2)
origem_result and destino_result structure:
[(382360, 'name abcd', 'some data'), (361052, 'name abcd', 'some data'), (361088, 'name abcd', 'some data')]
From what I can see you are computing a distance function between pairs of vectors. Given a list of vectors, v1, ..., vn, and a second list w1,...wn you want the distance/similarity between all pairs from v and w. This is usually highly amenable to parallel computations, and is sometimes referred to as an embarassingly parallel computation. IPython works very well for this.
If your distance function distance(a,b) is independent and does not depend on results from other distance function values (this is usually the case that I have seen), then you can easily use ipython parallel computing toolbox. I would recommend it over threads, queues, etc... for a wide variety of tasks, especially exploratory. However, the same principles could be extended to threads or queue module in python.
I recommend following along with http://ipython.org/ipython-doc/stable/parallel/parallel_intro.html#parallel-overview and http://ipython.org/ipython-doc/stable/parallel/parallel_task.html#quick-and-easy-parallelism It provides a very easy, gentle introduction to parallelization.
In the simple case, you simply will use the threads on your computer (or network if you want a bigger speed up), and let each thread compute as many of the distance(a,b) as it can.
Assuming a command prompt that can see the ipcluster executable command type
ipcluster start -n 3
This starts the cluster. You will want to adjust the number of cores/threads depending on your specific circumstances. Consider using n-1 cores, to allow one core to handle the scheduling.
The hello world examples goes as follows:
serial_result = map(lambda z:z**10, range(32))
from IPython.parallel import Client
rc = Client()
rc
rc.ids
dview = rc[:] # use all engines
parallel_result = dview.map_sync(lambda z: z**10, range(32))
#a couple of caveats, are this template will not work directly
#for our use case of computing distance between a matrix (observations x variables)
#because the allV data matrix and the distance function are not visible to the nodes
serial_result == parallel_result
For the sake of simplicity I will show how to compute the distance between all pairs of vectors specified in allV. Assume that each row represents a data point (observation) that has three dimensions.
Also I am not going to present this the "pedagoically corret" way, but the way that I stumbled through it wrestling with the visiblity of my functions and data on the remote nodes. I found that to be the biggest hurdle to entry
dataPoints = 10
allV = numpy.random.rand(dataPoints,3)
mesh = list(itertools.product(arange(dataPoints),arange(dataPoints)))
#given the following distance function we can evaluate locally
def DisALocal(a,b):
return numpy.linalg.norm(a-b)
serial_result = map(lambda z: DisALocal(allV[z[0]],allV[z[1]]),mesh)
parallel_result = dview.map_sync(lambda z: DisALocal(allV[z[0]],allV[z[1]]),mesh)
#will not work as DisALocal is not visible to the nodes
#also will not work as allV is not visible to the nodes
There are a few ways to define remote functions.
Depending on whether we want to send our data matrix to the nodes or not.
There are tradeoffs as to how big the matrix is, whether you want to
send lots of vectors individually to the nodes or send the entire matrix
upfront...
#in first case we send the function def to the nodes via autopx magic
%autopx
def DisARemote(a,b):
import numpy
return numpy.linalg.norm(a-b)
%autopx
#It requires us to push allV. Also note the import numpy in the function
dview.push(dict(allV=allV))
parallel_result = dview.map_sync(lambda z: DisARemote(allV[z[0]],allV[z[1]]),mesh)
serial_result == parallel_result
#here we will generate the vectors to compute differences between
#and pass the vectors only, so we do not need to load allV across the
#nodes. We must pre compute the vectors, but this could, perhaps, be
#done more cleverly
z1,z2 = zip(*mesh)
z1 = array(z1)
z2 = array(z2)
allVectorsA = allV[z1]
allVectorsB = allV[z2]
#dview.parallel(block=True)
def DisB(a,b):
return numpy.linalg.norm(a-b)
parallel_result = DisB.map(allVectorsA,allVectorsB)
serial_result == parallel_result
In the final case we will do the following
#this relies on the allV data matrix being pre loaded on the nodes.
#note with DisC we do not import numpy in the function, but
#import it via sync_imports command
with dview.sync_imports():
import numpy
#dview.parallel(block=True)
def DisC(a):
return numpy.linalg.norm(allV[a[0]]-allV[a[1]])
#the data structure must be passed to all threads
dview.push(dict(allV=allV))
parallel_result = DisC.map(mesh)
serial_result == parallel_result
All the above can be easily extended to work in a load balanced fashion
Of course, the easiest speedup (assuming if distance(a,b) = distance(b,a)) would be the following. It will only cut run time in half, but can be used with the above parallelization ideas to compute only the upper triangle of the distance matrix.
for vIndex,currentV in enumerate(v):
for wIndex,currentW in enumerate(w):
if vIndex > wIndex:
continue#we can skip the other half of the computations
distance[vIndex,wIndex] = get_cosine(currentV, currentW)
#if distance(a,b) = distance(b,a) then use this trick
distance[wIndex,vIndex] = distance[vIndex,wIndex]
I wrote a function in Python 2.7 (on Window OS 64bit) in order to calculate the mean value of of the intersection area from a reference polygon (Ref) and one or more segmented (Seg) polygon(s) in ESRI shapefile format. The code is quite slow because i have more that 2000 reference polygon (s) and for each Ref_polygon the function run for every time for all Seg polygons(s) (more than 7000). I am sorry but the function is a prototype.
I wish to know if multiprocessing can help me to increase the speed of my loop or there are more performance solutions. if multiprocessing can be a possible solution i wish to know the best way to optimize my following function
import numpy as np
import ogr
import osr,gdal
from shapely.geometry import Polygon
from shapely.geometry import Point
import osgeo.gdal
import osgeo.gdal as gdal
def AreaInter(reference,segmented,outFile):
# open shapefile
ref = osgeo.ogr.Open(reference)
if ref is None:
raise SystemExit('Unable to open %s' % reference)
seg = osgeo.ogr.Open(segmented)
if seg is None:
raise SystemExit('Unable to open %s' % segmented)
ref_layer = ref.GetLayer()
seg_layer = seg.GetLayer()
# create outfile
if not os.path.split(outFile)[0]:
file_path, file_name_ext = os.path.split(os.path.abspath(reference))
outFile_filename = os.path.splitext(os.path.basename(outFile))[0]
file_out = open(os.path.abspath("{0}\\{1}.txt".format(file_path, outFile_filename)), "w")
else:
file_path_name, file_ext = os.path.splitext(outFile)
file_out = open(os.path.abspath("{0}.txt".format(file_path_name)), "w")
# For each reference objects-i
for index in xrange(ref_layer.GetFeatureCount()):
ref_feature = ref_layer.GetFeature(index)
# get FID (=Feature ID)
FID = str(ref_feature.GetFID())
ref_geometry = ref_feature.GetGeometryRef()
pts = ref_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
# convert in a shapely polygon
ref_polygon = Polygon(points)
# get the area
ref_Area = ref_polygon.area
# create an empty list
Area_seg, Area_intersect = ([] for _ in range(2))
# For each segmented objects-j
for segment in xrange(seg_layer.GetFeatureCount()):
seg_feature = seg_layer.GetFeature(segment)
seg_geometry = seg_feature.GetGeometryRef()
pts = seg_geometry.GetGeometryRef(0)
points = []
for p in xrange(pts.GetPointCount()):
points.append((pts.GetX(p), pts.GetY(p)))
seg_polygon = Polygon(points)
seg_Area.append = seg_polygon.area
# intersection (overlap) of reference object with the segmented object
intersect_polygon = ref_polygon.intersection(seg_polygon)
# area of intersection (= 0, No intersection)
intersect_Area.append = intersect_polygon.area
# Avarage for all segmented objects (because 1 or more segmented polygons can intersect with reference polygon)
seg_Area_average = numpy.average(seg_Area)
intersect_Area_average = numpy.average(intersect_Area)
file_out.write(" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")
file_out.close()
You can use the multiprocessing package, and especially the Pool class. First create a function that does all the stuff you want to do within the for loop, and that takes as an argument only the index:
def process_reference_object(index):
ref_feature = ref_layer.GetFeature(index)
# all your code goes here
return (" ".join(["%s" %i for i in [FID, ref_Area,seg_Area_average,intersect_Area_average]])+ "\n")
Note that this doesn't write to a file itself- that would be messy because you'd have multiple processes writing to the same file at the same time. Instead, it returns the string that needs to be written. Also note that there are objects in this function like ref_layer or ref_geometry that will need to reach it somehow- that's up to you how to do it (you could put process_reference_object as the method in a class initialized with them, or it could be as ugly as just defining them globally).
Then, you create a pool of process resources, and run all of your indices using Pool.imap_unordered (which will itself allocate each index to a different process as necessary):
from multiprocessing import Pool
p = Pool() # run multiple processes
for l in p.imap_unordered(process_reference_object, range(ref_layer.GetFeatureCount())):
file_out.write(l)
This will parallelize the independent processing of your reference objects across multiple processes, and write them to the file (in an arbitrary order, note).
Threading can help to a degree, but first you should make sure you can't simplify the algorithm. If you're checking each of 2000 reference polygons against 7000 segmented polygons (perhaps I misunderstood), then you should start there. Stuff that runs at O(n2) is going to be slow, so maybe you can prune away things that will definitely not intersect or find some other way to speed things up. Otherwise, running multiple processes or threads will only improve things linearly when your data grows geometrically.