I have to iterate over two arrays which are 1000x1000 big. I already reduced the resolution to 100x100 to make the iteration faster, but it still takes about 15 minutes for ONE array!
So I tried to iterate over both at the same time, for which I found this:
for index, (x,y) in ndenumerate(izip(x_array,y_array)):
but then I get the error:
ValueError: too many values to unpack
Here is my full python code: I hope you can help me make this a lot faster, because this is for my master thesis and in the end I have to run it about a 100 times...
area_length=11
d_circle=(area_length-1)/2
xdis_new=xdis.copy()
ydis_new=ydis.copy()
ie,je=xdis_new.shape
while (np.isnan(np.sum(xdis_new))) and (np.isnan(np.sum(ydis_new))):
xdis_interpolated=xdis_new.copy()
ydis_interpolated=ydis_new.copy()
# itx=np.nditer(xdis_new,flags=['multi_index'])
# for x in itx:
# print 'next x and y'
for index, (x,y) in ndenumerate(izip(xdis_new,ydis_new)):
if np.isnan(x):
print 'index',index[0],index[1]
print 'interpolate'
# define indizes of interpolation area
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
i2=index[0]+((area_length+1)/2)
if i2>ie:
i2=ie
j1=index[1]-(area_length-1)/2
if j1<0:
j1=0
j2=index[1]+((area_length+1)/2)
if j2>je:
j2=je
# -->
print 'i1',i1,'','i2',i2
print 'j1',j1,'','j2',j2
area_values=xdis_new[i1:i2,j1:j2]
print area_values
b=area_values[~np.isnan(area_values)]
if len(b)>=((area_length-1)/2)*4:
xi,yi=meshgrid(arange(len(area_values[0,:])),arange(len(area_values[:,0])))
weight=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=zeros((len(area_values[0,:]),len(area_values[:,0])))
weight_fac=zeros((len(area_values[0,:]),len(area_values[:,0])))
weighted_area=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=sqrt((xi-xi[(area_length-1)/2,(area_length-1)/2])*(xi-xi[(area_length-1)/2,(area_length-1)/2])+(yi-yi[(area_length-1)/2,(area_length-1)/2])*(yi-yi[(area_length-1)/2,(area_length-1)/2]))
weight=1/d
weight[where(d==0)]=0
weight[where(d>d_circle)]=0
weight[where(np.isnan(area_values))]=0
weight_sum=np.sum(weight.flatten())
weight_fac=weight/weight_sum
weighted_area=area_values*weight_fac
print 'weight'
print weight_fac
print 'values'
print area_values
print 'weighted'
print weighted_area
m=nansum(weighted_area)
xdis_interpolated[index]=m
print 'm',m
else:
print 'insufficient elements'
if np.isnan(y):
print 'index',index[0],index[1]
print 'interpolate'
# define indizes of interpolation area
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
i2=index[0]+((area_length+1)/2)
if i2>ie:
i2=ie
j1=index[1]-(area_length-1)/2
if j1<0:
j1=0
j2=index[1]+((area_length+1)/2)
if j2>je:
j2=je
# -->
print 'i1',i1,'','i2',i2
print 'j1',j1,'','j2',j2
area_values=ydis_new[i1:i2,j1:j2]
print area_values
b=area_values[~np.isnan(area_values)]
if len(b)>=((area_length-1)/2)*4:
xi,yi=meshgrid(arange(len(area_values[0,:])),arange(len(area_values[:,0])))
weight=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=zeros((len(area_values[0,:]),len(area_values[:,0])))
weight_fac=zeros((len(area_values[0,:]),len(area_values[:,0])))
weighted_area=zeros((len(area_values[0,:]),len(area_values[:,0])))
d=sqrt((xi-xi[(area_length-1)/2,(area_length-1)/2])*(xi-xi[(area_length-1)/2,(area_length-1)/2])+(yi-yi[(area_length-1)/2,(area_length-1)/2])*(yi-yi[(area_length-1)/2,(area_length-1)/2]))
weight=1/d
weight[where(d==0)]=0
weight[where(d>d_circle)]=0
weight[where(np.isnan(area_values))]=0
weight_sum=np.sum(weight.flatten())
weight_fac=weight/weight_sum
weighted_area=area_values*weight_fac
print 'weight'
print weight_fac
print 'values'
print area_values
print 'weighted'
print weighted_area
m=nansum(weighted_area)
ydis_interpolated[index]=m
print 'm',m
else:
print 'insufficient elements'
else:
print 'no need to interpolate'
xdis_new=xdis_interpolated
ydis_new=ydis_interpolated
Some advice:
Profile your code to see what is the slowest part. It may not be the iteration but the computations that need to be done each time.
Reduce function calls as much as possible. Function calls are not for free in Python.
Rewrite the slowest part as a C extension and then call that C function in your Python code (see Extending and Embedding the Python interpreter).
This page has some good advice as well.
You specifically asked for iterating two arrays in a single loop. Here is a way to do that
l1 = ["abc", "def", "hi"]
l2 = ["ghi", "jkl", "lst"]
for f,s in zip(l1,l2):
print "%s : %s" %(f,s)
The above is for python 3, you can use izip for python 2
You may use this as your for loop:
for index, x in ndenumerate((x_array,y_array)):
But it wont help you much, because your computer cant do two things at the same time.
Profiling is definitely a good start to identify where all the time spent actually goes.
I usually use the cProfile module, as it requires minimal overhead and gives me more than enough information.
import cProfile
import pstats
cProfile.run('main()', "ProfileData.txt", 'tottime')
p = pstats.Stats('ProfileData.txt')
p.sort_stats('cumulative').print_stats(100)
I your example you would have to wrap your code into a main() function to be able to use this code snippet at the very end of your file.
Comment #1: You don't want to use ndenumerate on the izip iterator, as it'll output you the iterator, which isn't what you want.
Comment #2:
i1=index[0]-(area_length-1)/2
if i1<0:
i1=0
could be simplified in i1 = min(index[0]-(area_length-1)/2, 0), and you could store your (area_length+/-1)/2 in specific variables.
Idea #1 : try to iterate on flat versions of the arrays, i.e. with something like
for (i, (x, y)) in enumerate(izip(xdis_new.flat,ydis_new.flat)):
You could get the original indices via divmod(i, xdis_new.shape[-1]), as you should be iterating by rows first.
Idea #2 : Iterate only on the nans, i.e. indexing your arrays with np.isnan(xdis_new)|np.isnan(ydis_new), that could save you some iterations
EDIT #1
You probably don't need to initialize d, weight_fac and weighted_area in your loop, as you compute them separately.
Your weight[where(d>0)] can be simplified in weight[d>0]
Do you need weight_fac ? Can't you just compute weight then normalize it in place ? That should save you some temporary arrays.
Related
I have a nested list of tuples that acts as a cache, for a script to decide which Queue to put packet data into, for its respective multiprocessing.Process() to process.
For example:
queueCache = [[('100.0.1.19035291111.0.0.255321TCP', '111.0.0.255321100.0.1.19035291TCP'), ('100.0.0.41842111.0.0.280TCP', '111.0.0.280100.0.0.41842TCP')], [('100.0.1.18722506111.0.0.345968TCP', '111.0.0.345968100.0.1.18722506TCP'), ('100.0.0.11710499111.0.0.328881TCP', '111.0.0.328881100.0.0.11710499TCP')], [('100.0.0.14950710111.0.0.339767TCP', '111.0.0.339767100.0.0.14950710TCP'), ('100.0.0.8663063111.0.0.280TCP', '111.0.0.280100.0.0.8663063TCP')]]
If the tuple ('100.0.1.19035291111.0.0.255321TCP', '111.0.0.255321100.0.1.19035291TCP') or ('111.0.0.255321100.0.1.19035291TCP', '100.0.1.19035291111.0.0.255321TCP') (the reverse) is given, 0 should be returned, indicating index 0, and that its respective data should be put into the "first" queue. If it is not in the cache, it will be sent to the queue with the shortest queue record. (This is irrelevant to the question but relevant to my example code below).
This is what I have so far:
for i in range(len(queueCache)):
if any(x in queueCache[i] for x in ((fwd_tup, bwd_tup), (bwd_tup, fwd_tup))):
index = i
break
else:
index = queueCache.index(min(queueCache, key=len))
queueCache[index].append((fwd_tup, bwd_tup))
Running some benchmarks with much greater sample data, if the tuple is at the end of the cache (which should be common since newest tuples are appended), 100,000 runs takes about 9.5 seconds, which is probably more than it should.
# using timeit:
print("Normal search, target at start: %f" % timeit(f"search('{fwd_tup1}', '{bwd_tup1}')", "from __main__ import search", number=100000))
print("Normal search, target at middle: %f" % timeit(f"search('{fwd_tup2}', '{bwd_tup2}')", "from __main__ import search", number=100000))
print("Normal search, target at end: %f" % timeit(f"search('{fwd_tup3}', '{bwd_tup3}')", "from __main__ import search", number=100000))
Normal search, target at start: 0.155531
Normal search, target in the middle: 4.507907
Normal search, target at end: 9.470369
Perhaps there is a way to start the search from the end?
Note: The script deletes records when possible, but for specific reasons most records need to stay in the cache, thus it ends up being significantly sized.
Edit: Using a dictionary yields much greater results, thank you to #SilvioMayolo and #MisterMiyagi for the help.
Transforming the sample data to a dictionary (just for this specific testing purpose):
d = {}
for i, list in enumerate(queueCache):
for tup in list:
d[tup] = i
Then simply:
def dictSearch(fwd_tup, bwd_tup):
global d
d[(fwd_tup, bwd_tup)]
Results:
Dictionary search, target at start: 0.044860
Dictionary search, target at middle: 0.042514
Dictionary search, target at end: 0.044760
Amazing. Faster than the list search where the target is at the start!
I have a video where I want to insert a dynamic amount of TextClip(s). I have a while loop that handles the logic for actually creating the different TextClips and giving them individual durations & start_times (this works). I do however have a problem with actually "compiling" the video itself with inserting these texts.
Code for creating a TextClip (that works).
text = mpy.TextClip(str(contents),
color='white', size=[1700, 395], method='caption').set_duration(
int(list[i - 1])).set_start(currentTime).set_position(("center", 85))
print(str(i) + " written")
textList.append(text)
Code to "compile" the video. (that doesn't work)
final_clip = CompositeVideoClip([clip, len(textList)])
final_clip.write_videofile("files/final/TEST.mp4")
I tried several approaches but now I'm stuck and can't figure out a way to continue. Before I get a lot of "answers" telling me to do a while loop on the compiling, let me just say that the actual compiling takes about 5 minutes and I have 100-500 different texts I need implemented in the final video which would take days. Instead I want to add them one by one and then do 1 big final compile which I know it will take slightly longer than 5 minutes, but still a lot quicker than 2-3 days.
For those of you that may not have used moviepy before I will post a snippet of "my code" that actually works, not in the way I need it to though.
final_clip = CompositeVideoClip([clip, textList[0], textList[1], textList[2]])
final_clip.write_videofile("files/final/TEST.mp4")
This works exactly as intended (adding 3 texts), However I dont/can't know how many texts there will be in each video beforehand so I need to somehow insert a dynamic amount of textList[] into the function.
Kind regards,
Unsure what the arguments after clip, do (you could clarify), but if the problem's solved by inserting a variable number of textList[i] args, the solution's simple:
CompositeVideoClip([clip, *textList])
The star unpacks the list (or any iterable); ex: arg=4,5 -- def f(a,b): return a+b -- f(*arg)==9. If you have many textLists, you can manage them via a nested list or a dictionary:
textListDict = {'1':textList1, '2':textList2, ...}
textListNest = [textList1, textList2, ...] # probably preferable - using for below
# now iterate:
for textList in textListNest:
final_clip = CompositeVideoClip([clip, *textList])
Unpacking demo:
def show_then_join(a, b, c):
print("a={}, b={}, c={}".format(a,b,c))
print(''.join("{}{}{}".format(a,b,c)))
some_list = [1, 2, 'dog']
show_then_sum(*some_list) # only one arg is passed in, but is unpacked into three
# >> a=1, b=2, c=dog
# >> 12dog
Can't get my mind around this...
I read a bunch of spreadsheets, do a bunch of calculations and then want to create a summary DF from each set of calculations. I can create the initial df but don't know how to control my loops so that I
create the initial DF (1st time though the loop)
If it has been created append the next DF (last two rows) for each additional tab.
I just can't wrap my head how to create the right nested loop so that once the 1st one is done the subsequent ones get appended?
My current code looks like: (which just dumbly prints each tab's results separately rather than create a new consolidated sumdf with just the last 2 rows of each tabs' results..
#make summary
area_tabs=['5','12']
for area_tabs in area_tabs:
actdf,aname = get_data(area_tabs)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
print sumdf
Still a newb learning basic python loop techniques :-(
Often a neater way than writing for loops, especially if you are planning on using the result, is to use a list comprehension over a function:
def get_sumdf(area_tab): # perhaps you can name better?
actdf,aname = get_data(area_tab)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
return sumdf
[get_sumdf(area_tab) for area_tab in areas_tabs]
and concat:
pd.concat([get_sumdf(area_tab) for area_tab in areas_tabs])
or you can also use a generator expression:
pd.concat(get_sumdf(area_tab) for area_tab in areas_tabs)
.
To explain my comment re named tuples and dictionaries, I think this line is difficult to read and ripe for bugs:
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst=do_projections(actdf)
A trick is to have do_projections return a named tuple, rather than a tuple:
from collections import namedtuple
Projection = namedtuple('Projection', ['lastq', 'fcast_yr', 'projections', 'yrahead', 'aname', 'actdf', 'merged2', 'mergederrs', 'montdist', 'ols_test', 'mergedfcst'])
then inside do_projections:
return (1, 2, 3, 4, ...) # don't do this
return Projection(1, 2, 3, 4, ...) # do this
return Projection(last_q=last_q, fcast_yr=f_cast_yr, ...) # or this
I think this avoids bugs and is a lot cleaner, especially to access the results later.
projections = do_projections(actdf)
projections.aname
Do the initialisation outside the for loop. Something like this:
#make summary
area_tabs=['5','12']
if not area_tabs:
return # nothing to do
# init the first frame
actdf,aname = get_data(area_tabs[0])
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst =do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname
for area_tabs in area_tabs[1:]:
actdf,aname = get_data(area_tabs)
lastq,fcast_yr,projections,yrahead,aname,actdf,merged2,mergederrs,montdist,ols_test,mergedfcst =do_projections(actdf)
sumdf=merged2[-2:]
sumdf['name']= aname #<<< I'll be doing a few more calculations here as well
print sumdf
You can further improve the code by putting the common steps into a function.
I have this task that I've been working on, but am having extreme misgivings about my methodology.
So the problem is that I have a ton of excel files that are formatted strangely (and not consistently) and I need to extract certain fields for each entry. An example data set is
My original approach was this:
Export to csv
Separate into counties
Separate into districts
Analyze each district individually, pull out values
write to output.csv
The problem I've run into is that the format (seemingly well organized) is almost random across files. Each line contains the same fields, but in a different order, spacing, and wording. I wrote a script to correctly process one file, but it doesn't work on any other files.
So my question is, is there a more robust method of approaching this problem rather than simple string processing? What I had in mind was more of a fuzzy logic approach for trying to pin which field an item was, which could handle the inputs being a little arbitrary. How would you approach this problem?
If it helps clear up the problem, here is the script I wrote:
# This file takes a tax CSV file as input
# and separates it into counties
# then appends each county's entries onto
# the end of the master out.csv
# which will contain everything including
# taxes, bonds, etc from all years
#import the data csv
import sys
import re
import csv
def cleancommas(x):
toggle=False
for i,j in enumerate(x):
if j=="\"":
toggle=not toggle
if toggle==True:
if j==",":
x=x[:i]+" "+x[i+1:]
return x
def districtatize(x):
#list indexes of entries starting with "for" or "to" of length >5
indices=[1]
for i,j in enumerate(x):
if len(j)>2:
if j[:2]=="to":
indices.append(i)
if len(j)>3:
if j[:3]==" to" or j[:3]=="for":
indices.append(i)
if len(j)>5:
if j[:5]==" \"for" or j[:5]==" \'for":
indices.append(i)
if len(j)>4:
if j[:4]==" \"to" or j[:4]==" \'to" or j[:4]==" for":
indices.append(i)
if len(indices)==1:
return [x[0],x[1:len(x)-1]]
new=[x[0],x[1:indices[1]+1]]
z=1
while z<len(indices)-1:
new.append(x[indices[z]+1:indices[z+1]+1])
z+=1
return new
#should return a list of lists. First entry will be county
#each successive element in list will be list by district
def splitforstos(string):
for itemind,item in enumerate(string): # take all exception cases that didn't get processed
splitfor=re.split('(?<=\d)\s\s(?=for)',item) # correctly and split them up so that the for begins
splitto=re.split('(?<=\d)\s\s(?=to)',item) # a cell
if len(splitfor)>1:
print "\n\n\nfor detected\n\n"
string.remove(item)
string.insert(itemind,splitfor[0])
string.insert(itemind+1,splitfor[1])
elif len(splitto)>1:
print "\n\n\nto detected\n\n"
string.remove(item)
string.insert(itemind,splitto[0])
string.insert(itemind+1,splitto[1])
def analyze(x):
#input should be a string of content
#target values are nomills,levytype,term,yearcom,yeardue
clean=cleancommas(x)
countylist=clean.split(',')
emptystrip=filter(lambda a: a != '',countylist)
empt2strip=filter(lambda a: a != ' ', emptystrip)
singstrip=filter(lambda a: a != '\' \'',empt2strip)
quotestrip=filter(lambda a: a !='\" \"',singstrip)
splitforstos(quotestrip)
distd=districtatize(quotestrip)
print '\n\ndistrictized\n\n',distd
county = distd[0]
for x in distd[1:]:
if len(x)>8:
district=x[0]
vote1=x[1]
votemil=x[2]
spaceindex=[m.start() for m in re.finditer(' ', votemil)][-1]
vote2=votemil[:spaceindex]
mills=votemil[spaceindex+1:]
votetype=x[4]
numyears=x[6]
yearcom=x[8]
yeardue=x[10]
reason=x[11]
data = [filename,county,district, vote1, vote2, mills, votetype, numyears, yearcom, yeardue, reason]
print "data",data
else:
print "x\n\n",x
district=x[0]
vote1=x[1]
votemil=x[2]
spaceindex=[m.start() for m in re.finditer(' ', votemil)][-1]
vote2=votemil[:spaceindex]
mills=votemil[spaceindex+1:]
votetype=x[4]
special=x[5]
splitspec=special.split(' ')
try:
forind=[i for i,j in enumerate(splitspec) if j=='for'][0]
numyears=splitspec[forind+1]
yearcom=splitspec[forind+6]
except:
forind=[i for i,j in enumerate(splitspec) if j=='commencing'][0]
numyears=None
yearcom=splitspec[forind+2]
yeardue=str(x[6])[-4:]
reason=x[7]
data = [filename,county,district,vote1,vote2,mills,votetype,numyears,yearcom,yeardue,reason]
print "data other", data
openfile=csv.writer(open('out.csv','a'),delimiter=',', quotechar='|',quoting=csv.QUOTE_MINIMAL)
openfile.writerow(data)
# call the file like so: python tax.py 2007May8Tax.csv
filename = sys.argv[1] #the file is the first argument
f=open(filename,'r')
contents=f.read() #entire csv as string
#find index of every instance of the word county
separators=[m.start() for m in re.finditer('\w+\sCOUNTY',contents)] #alternative implementation in regex
# split contents into sections by county
# analyze each section and append to out.csv
for x,y in enumerate(separators):
try:
data = contents[y:separators[x+1]]
except:
data = contents[y:]
analyze(data)
is there a more robust method of approaching this problem rather than simple string processing?
Not really.
What I had in mind was more of a fuzzy logic approach for trying to pin which field an item was, which could handle the inputs being a little arbitrary. How would you approach this problem?
After a ton of analysis and programming, it won't be significantly better than what you've got.
Reading stuff prepared by people requires -- sadly -- people-like brains.
You can mess with NLTK to try and do a better job, but it doesn't work out terribly well either.
You don't need a radically new approach. You need to streamline the approach you have.
For example.
district=x[0]
vote1=x[1]
votemil=x[2]
spaceindex=[m.start() for m in re.finditer(' ', votemil)][-1]
vote2=votemil[:spaceindex]
mills=votemil[spaceindex+1:]
votetype=x[4]
numyears=x[6]
yearcom=x[8]
yeardue=x[10]
reason=x[11]
data = [filename,county,district, vote1, vote2, mills, votetype, numyears, yearcom, yeardue, reason]
print "data",data
Might be improved by using a named tuple.
Then build something like this.
data = SomeSensibleName(
district= x[0],
vote1=x[1], ... etc.
)
So that you're not creating a lot of intermediate (and largely uninformative) loose variables.
Also, keep looking at your analyze function (and any other function) to pull out the various "pattern matching" rules. The idea is that you'll examine a county's data, step through a bunch of functions until one matches the pattern; this will also create the named tuple. You want something like this.
for p in ( some, list, of, functions ):
match= p(data)
if match:
return match
Each function either returns a named tuple (because it liked the row) or None (because it didn't like the row).
Novice programmer here. I'm writing a program that analyzes the relative spatial locations of points (cells). The program gets boundaries and cell type off an array with the x coordinate in column 1, y coordinate in column 2, and cell type in column 3. It then checks each cell for cell type and appropriate distance from the bounds. If it passes, it then calculates its distance from each other cell in the array and if the distance is within a specified analysis range it adds it to an output array at that distance.
My cell marking program is in wxpython so I was hoping to develop this program in python as well and eventually stick it into the GUI. Unfortunately right now python takes ~20 seconds to run the core loop on my machine while MATLAB can do ~15 loops/second. Since I'm planning on doing 1000 loops (with a randomized comparison condition) on ~30 cases times several exploratory analysis types this is not a trivial difference.
I tried running a profiler and array calls are 1/4 of the time, almost all of the rest is unspecified loop time.
Here is the python code for the main loop:
for basecell in range (0, cellnumber-1):
if firstcelltype == np.array((cellrecord[basecell,2])):
xloc=np.array((cellrecord[basecell,0]))
yloc=np.array((cellrecord[basecell,1]))
xedgedist=(xbound-xloc)
yedgedist=(ybound-yloc)
if xloc>excludedist and xedgedist>excludedist and yloc>excludedist and yedgedist>excludedist:
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
xcomploc=np.array((cellrecord[comparecell,0]))
ycomploc=np.array((cellrecord[comparecell,1]))
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
Here is the matlab code for the main loop:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
for comparecell = 1:cellnumber;
if secondcelltype==cellrecord(comparecell,3);
xcomploc=cellrecord(comparecell,1);
ycomploc=cellrecord(comparecell,2);
dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
if (dist>=1) && (dist<=100.4999);
arraytarget=round(dist*analysisdist/intervalnumber);
spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
end
end
end
end
end
end
Thanks!
Here are some ways to speed up your python code.
First: Don't make np arrays when you are only storing one value. You do this many times over in your code. For instance,
if firstcelltype == np.array((cellrecord[basecell,2])):
can just be
if firstcelltype == cellrecord[basecell,2]:
I'll show you why with some timeit statements:
>>> timeit.Timer('x = 111.1').timeit()
0.045882196294822819
>>> t=timeit.Timer('x = np.array(111.1)','import numpy as np').timeit()
0.55774970267830071
That's an order of magnitude in difference between those calls.
Second: The following code:
arraytarget=round(dist*analysisdist/intervalnumber)
addone=np.array((spatialraw[arraytarget-1]))
addone=addone+1
targetcell=arraytarget-1
np.put(spatialraw,[targetcell,targetcell],addone)
can be replaced with
arraytarget=round(dist*analysisdist/intervalnumber)-1
spatialraw[arraytarget] += 1
Third: You can get rid of the sqrt as Philip mentioned by squaring analysisdist beforehand. However, since you use analysisdist to get arraytarget, you might want to create a separate variable, analysisdist2 that is the square of analysisdist and use that for your comparison.
Fourth: You are looking for cells that match secondcelltype every time you get to that point rather than finding those one time and using the list over and over again. You could define an array:
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
and then replace
for comparecell in range (0, cellnumber-1):
if secondcelltype==np.array((cellrecord[comparecell,2])):
with
for comparecell in comparecells:
Fifth: Use psyco. It is a JIT compiler. Matlab has a built-in JIT compiler if you're using a somewhat recent version. This should speed-up your code a bit.
Sixth: If the code still isn't fast enough after all previous steps, then you should try vectorizing your code. It shouldn't be too difficult. Basically, the more stuff you can have in numpy arrays the better. Here's my try at vectorizing:
basecells = np.where(cellrecord[:,2]==firstcelltype)[0]
xlocs = cellrecord[basecells, 0]
ylocs = cellrecord[basecells, 1]
xedgedists = xbound - xloc
yedgedists = ybound - yloc
whichcells = np.where((xlocs>excludedist) & (xedgedists>excludedist) & (ylocs>excludedist) & (yedgedists>excludedist))[0]
selectedcells = basecells[whichcells]
comparecells = np.where(cellrecord[:,2]==secondcelltype)[0]
xcomplocs = cellrecords[comparecells,0]
ycomplocs = cellrecords[comparecells,1]
analysisdist2 = analysisdist**2
for basecell in selectedcells:
dists = np.round((xcomplocs-xlocs[basecell])**2 + (ycomplocs-ylocs[basecell])**2)
whichcells = np.where((dists >= 1) & (dists <= analysisdist2))[0]
arraytargets = np.round(dists[whichcells]*analysisdist/intervalnumber) - 1
for target in arraytargets:
spatialraw[target] += 1
You can probably take out that inner for loop, but you have to be careful because some of the elements of arraytargets could be the same. Also, I didn't actually try out all of the code, so there could be a bug or typo in there. Hopefully, it gives you a good idea of how to do this. Oh, one more thing. You make analysisdist/intervalnumber a separate variable to avoid doing that division over and over again.
Not too sure about the slowness of python but you Matlab code can be HIGHLY optimized. Nested for-loops tend to have horrible performance issues. You can replace the inner loop with a vectorized function ... as below:
for basecell = 1:cellnumber;
if firstcelltype==cellrecord(basecell,3);
xloc=cellrecord(basecell,1);
yloc=cellrecord(basecell,2);
xedgedist=(xbound-xloc);
yedgedist=(ybound-yloc);
if (xloc>excludedist) && (yloc>excludedist) && (xedgedist>excludedist) && (yedgedist>excludedist);
% for comparecell = 1:cellnumber;
% if secondcelltype==cellrecord(comparecell,3);
% xcomploc=cellrecord(comparecell,1);
% ycomploc=cellrecord(comparecell,2);
% dist=sqrt((xcomploc-xloc)^2+(ycomploc-yloc)^2);
% if (dist>=1) && (dist<=100.4999);
% arraytarget=round(dist*analysisdist/intervalnumber);
% spatialsum(1,arraytarget)=spatialsum(1,arraytarget)+1;
% end
% end
% end
%replace with:
secondcelltype_mask = secondcelltype == cellrecord(:,3);
xcomploc_vec = cellrecord(secondcelltype_mask ,1);
ycomploc_vec = cellrecord(secondcelltype_mask ,2);
dist_vec = sqrt((xcomploc_vec-xloc)^2+(ycomploc_vec-yloc)^2);
dist_mask = dist>=1 & dist<=100.4999
arraytarget_vec = round(dist_vec(dist_mask)*analysisdist/intervalnumber);
count = accumarray(arraytarget_vec,1, [size(spatialsum,1),1]);
spatialsum(:,1) = spatialsum(:,1)+count;
end
end
end
There may be some small errors in there since I don't have any data to test the code with but it should get ~10X speed up on the Matlab code.
From my experience with numpy I've noticed that swapping out for-loops for vectorized/matrix-based arithmetic has noticeable speed-ups as well. However, without the shapes the shapes of all of your variables its hard to vectorize things.
You can avoid some of the math.sqrt calls by replacing the lines
dist=math.sqrt((xcomploc-xloc)**2+(ycomploc-yloc)**2)
dist=round(dist)
if dist>=1 and dist<=analysisdist:
arraytarget=round(dist*analysisdist/intervalnumber)
with
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
dist=round(dist)
if dist>=1 and dist<=analysisdist_squared:
arraytarget=round(math.sqrt(dist)*analysisdist/intervalnumber)
where you have the line
analysisdist_squared = analysis_dist * analysis_dist
outside of the main loop of your function.
Since math.sqrt is called in the innermost loop, you should have from math import sqrt at the top of the module and just call the function as sqrt.
I would also try replacing
dist=(xcomploc-xloc)**2+(ycomploc-yloc)**2
with
dist=(xcomploc-xloc)*(xcomploc-xloc)+(ycomploc-yloc)*(ycomploc-yloc)
There's a chance it will produce faster byte code to do multiplication rather than exponentiation.
I doubt these will get you all the way to MATLABs performance, but they should help reduce some overhead.
If you have a multicore, you could maybe give the multiprocessing module a try and use multiple processes to make use of all the cores.
Instead of sqrt you could use x**0.5, which is, if I remember correct, slightly faster.