Python : reduce execution time, optimize loops

Python : reduce execution time, optimize loops - python

def SSIM_compute(files,WorkingFolder,DestinationAlikeFolder,DestinationUniqueFolder,results, start_time):
NumberAlike=0
loop=1
while True:
files=os.listdir(WorkingFolder)
if files==[]:
break
IsAlike=False
CountAlike=1
print("Loop : "+str(loop)+" --- starttime : "+str(time.time()-start_time))
for i in range (1,len(files)):
#print("\ti= "+str(i)+" : "+str(time.time()-start_time))
img1=cv2.imread(WorkingFolder+"/"+files[0])
img2=cv2.imread(WorkingFolder+"/"+files[i])
x1,y1=img1.shape[:2]
x2,y2=img2.shape[:2]
x=min(x1,x2)
y=min(y1,y2)
img1=cv2.resize(img1,(x,y),1)
img2=cv2.resize(img2,(x,y),1)
threshold=ssim(img1,img2,multichannel=True)
if threshold>0.8:
IsAlike=True
if os.path.exists((WorkingFolder+"/"+files[i])):
shutil.move((WorkingFolder+"/"+files[i]),DestinationAlikeFolder+"/alike"+str(NumberAlike)+"_"+str(CountAlike)+".jpg")
CountAlike+=1
#results.write("ALIKE : " +files[0] +" --- " +files[i]+"\n")
results.write("ALIKE : /alike"+str(NumberAlike)+"_0"+".jpg --- /alike"+str(NumberAlike)+"_"+str(CountAlike)+".jpg -> "+str(threshold))
if IsAlike:
if os.path.exists((WorkingFolder+"/"+files[0])):
shutil.move((WorkingFolder+"/"+files[0]),DestinationAlikeFolder+"/alike"+str(NumberAlike)+"_0"+".jpg")
NumberAlike+=1
else :
if os.path.exists((WorkingFolder+"/"+files[0])):
shutil.move((WorkingFolder+"/"+files[0]),DestinationUniqueFolder)
loop+=1
I have this code that must compare images to determine if they are identical or if some of them were modified (compression, artefact, etc...).
So, to check if two images are strictly similar, I just compute and compare their respective hashes (in another function not shown here), and to check if they are similar I compute the SSIM on those two files.
The next part is where the trouble begins : when I test this code on a quiet small set of pictures (approx. 50), the execution time is decent, but if I make the set bigger (something like 200 pictures), the execution time becomes way too high (several hours) as expected considering I have two imbricated for loops.
As I'm not very creative, has anybody ideas to reduce the time execution on a larger dataset ? Maybe a method in order to avoid those imbricated loops ?
Thank you for any provided help :)

You're comparing each image with every other image - you could pull reading the first image img1 out, of the for loop and just do it once per file.
But as you're comparing each file with every other file that's going to slowdown as O(N^2/2) i.e. 200 will be 8x slower than 50. Maybe you could resize to a much smaller size like 64x64 which would be much quicker to compare with ssim(), and only if similar at that small size do a full-size comparison?

Related

Fastest way to find the best level of compression of an image

I wrote a function (in Python) that saves an image with a certain quality level (0 to 100, higher is better). Depending on the quality level, the final file is bigger or smaller (in byte).
I need to write a function that selects the best possible quality level for an image, keeping it under a maximum file size (in byte). The only way I can do it is by attempts.
The simplest approach is: save the file with a certain quality and if it is bigger than expected then reduce the quality and so forth.
Unfortunately this approach is very time consuming. Even if I reduce the quality by five points at each iteration, the risk is that I have to save the same file 21 times before I find the right quality.
Another solution is: try with the half of the previous quality and focus on the lower range or on the higher range of quality according to the result.
Let me clarify:
assuming that quality can be between 0 and 100, try with quality = 50
if file size is higher than expected, focus on the lower quality range (e.g. 0-49) otherwise on the higher one (e.g. 51-100).
set the quality value in the middle of the considered range and save the file (e.g. 25 if lower range and 75 if higher range); return to 2
exit when the range is smaller than 5
This second solution requires always 6 iterations.
Here it is a Python implementation:
limit_file_size = 512 # maximum file size, change this for different results
q_max = 100
q_min = 0
quality = q_min + (q_max - q_min) // 2
while True:
file_size = save_file(quality)
if (q_max - q_min) <= 5: break
if file_size > limit_file_size:
q_max = quality
else:
q_min = quality
quality = q_min + (q_max - q_min) // 2
Please note that function save_file is not provided for brevity, a fake implementation of it can be the following:
import math
def save_file(quality):
return int(math.sqrt(quality))*100
How to reduce the amount of cycles required by the above function to converge to a valid solution?

You can try to do a binary search function like you mentioned and save the result in a dictionary so that next iteration it will check if the file size quality was calculated already like that:
# { file_size : quality }
{ 512 : 100 , 256 : 100, 1024 : 27 }
Note that each image has different dimensions, color depth, format etc so you may get a range of results, I suggest to play with it and create sub keys in the properties that have the most impact for example:
{ format : { file_size : quality } }

You could make some kind of simple ML (Machine Learning) approach. Train model that given:
image size for quality 50
limit_file_size
as input will produce best quality you seek, or at least narrow down your search so that you need 2-3 iterations instead of 6.
So you would have to gather training (and validation) data, train the model that will be simple and fast (should be fasterer than save_file).
I think this is best approach in terms of determining this quality as fast as possible, but requires a lot of work (specially if you have no experience in ML).

Fastest way to generate ~10^9 poisson random numbers in python/numpy

I would like to find the fastest way to generate ~10^9 poisson random numbers in python/numpy—for instance, say I have a mean Poisson parameter (calculated elsewhere) of shape (1000, 2000), and I need 500 independent samples. This is a bottleneck in my code, taking several minutes to complete. I have tried three methods, but am looking for something faster:
import numpy as np
# example parameters
nsamples = 500
nmeas = 2000
ninputs = 1000
lambdax = np.ones([ninputs, nmeas]) * 20
# numpy, one big array
sample0 = np.random.poisson(lam=lambdax, size=(nsamples, ninputs, nmeas))
# numpy, current version where other code happens in the loop
sample1 = np.zeros([nsamples, ninputs, nmeas])
for i in range(nsamples):
sample1[i, :, :] = np.random.poisson(lam=lambdax)
# scipy
from scipy.stats import poisson
sample2 = poisson.rvs(lambdax, size=(nsamples, ninputs, nmeas))
Results:
sample0: 1 m 16 s
sample1: 1 m 20 s
sample2: 1 m 50 s
Not shown here, I am also parallelizing the independent samples via multiprocessing, but the calculations are still pretty expensive for such large parameters. Is there a better way?

I have been in your shoes and here are my suggestions:
For large mean values, poisson works similar to uniform. check out this post (and probably more if you search) .
~1m runtime seems reasonable to generate such a large number of random numbers. I don't think you can top sample0 method by much via just coding. Now depending on what you want to do with random numbers,
if your issue is rerunning program multiple times, try saving sample0 into a file and reloading it in the next runs.
if not, I suggest creating lower number of randoms and reuse them. A lot of those random numbers in sample0 will be repeated in your sample, depending on your mean value. You might want to create smaller sample size and randomly choose from them. for example I would chose a random number from sample0 and reuse it for e.g. 100 times (since that number would appear in sample0 over 100 times anyways).
If you provide more information on what you intend to do with random numbers, we might be able to help more. Otherwise, coding-wise I am not sure if you can do much further.

My python loop through a dataframe slows down over time

I'm looping through a very large dataframe(11361 x 22679) and converting the values of each row to a pixel image using pyplot. So in the end I should have 11361 images with 151 x 151 pixels (I add 0's to the end to make it square).
allDF is a list of 33 DataFrames that correspond to the 33 subirectories in newFileNames the images need to save to.
I've tried deleting each DataFrame and image at the end of each iteration.
I've tried converting the float values to int.
I've tried gc.collect() at the end of each iteration (even though I know it's redundant)
I've taken measures not to store any additional values by always referencing the original data.
The only thing that helps is if I process one frame at a time. It still slows down, but because there are less iterations it's not as slow. So, I think the inner loop or one of the functions is the issue.
def shape_pixels(imglist):
for i in range(122):
imglist.append(0.0)
imgarr = np.array(imglist).reshape((151,151))
imgarr.reshape((151,151))
return imgarr
def create_rbg_image(subpath,imgarr,imgname):
# create/save image
img = plt.imshow(imgarr, cmap=rgbmap)
plt.axis('off')
plt.savefig(dirpath+subpath+imgname,
transparent=True,
bbox_inches=0,pad_inches=0)
for i in range(len(allDF)):
for j in range(len(allDF[i])):
fname = allDF[i]['File Name'].iloc[j][0:36]
newlist = allDF[i].iloc[j][1:].tolist()
newarr = shape_pixels(allDF[i].iloc[j][1:].tolist())
create_rbg_image(newFileNames[i]+'\\',shape_pixels(allDF[i].iloc[j][1:].tolist()),allDF[i]['File Name'].iloc[j][0:36])
I'd like to be able to run the code for the entire dataset and just come back to it when it's done, but I ran it overnight and got less than 1/3 of the way through. If it continues to slow down I'll never be done.
The first minute generates over 150 images The second generates 80. Then 48, 32, 27, and so on.. eventually it takes several minutes to create just one.
I don

plot.close('all') helped significantly, but I switched to using PIL and hexadec values, This was significantly more efficient and I was able to generate all 11k+ images in under 20 minutes

Speed up python 3.5 loop to run it as fast as python could be

I need to calculate distance between 2 xyz points in massive data (100 Gb, about 20 trylion points). I am trying to speed up this loop. I created KDtree, add parallel calculation's, split my array to smaller parts. So i guess all left to speed up is this loop. My pure python calculation time took about 10 hours 42 minutes. Adding numpy reduce time to 5 hours and 34 minutes. Adding numba speed it up to 4h 15 minutes. But it is still not fast enough. I heard that Cython is the fastest way for python calculation's but i don't have any experience in c and I don't know how to translate my function to cython code. How can i get this loop to run faster, using cython or any other way?
def controller(point_array, las_point_array):
empty = []
tree = spatial.cKDTree(point_array, leafsize=1000, copy_data = True)
empty = __pure_calc(las_point_array, point_array, empty, tree)
return ptList
#############################################################################################
#autojit
def __pure_calc(las_point_array, point_array, empty, tree):
for i in las_point_array:
p = tree.query(i)
euc_dist = math.sqrt(np.sum((point_array[p[1]]-i)**2))
##add one row at a time to empty list
empty.append([i[0], i[1], i[2], euc_dist, point_array[p[1]][0], point_array[p[1]][1], point_array[p[1]][2]])
return empty
I attach sample data for testing:
Sample

Your function builds a list (closestPt) that ends up looking like this:
[
[i0[0], i0[1], i0[2], distM0],
[i1[0], i1[1], i1[2], distM1],
...
]
The first thing you should do is to preallocate the entire result as a NumPy array (np.empty()), and write into it one row at a time. This will avoid a ton of memory allocations. Then you will note that you can defer the sqrt() to the very end, and run it on the distM column after your loops are all done.
There may be more optimization opportunities if you post a full working test harness with random/sample input data.

The key is to utilize vectorized functions as much as possible since any call to a pure python function inside the loop will more or less make the autojit pointless (the bottleneck will be the pure function call).
I noticed that the query function is vectorizable, and so is the euclidian distance calculation.
I'm not sure what your ptList variable in the controller function is (the example is a bit faulty), but assuming it is the output of your jit function, or close enfough to it, you should be able to do something like this:
def controller(point_array, las_point_array):
tree = spatial.cKDTree(point_array, leafsize=1000, copy_data = True)
distances, pt_idx = tree.query(las_point_array)
nearest_pts = point_array[pt_idx]
euc_distances = np.sqrt((nearest_pts - las_point_array).sum(axis=1) ** 2)
result = np.vstack((las_point_array.T, euc_distances.T, nearest_pts.T)).T
return result

Unexpected performance curve from CPython merge sort

I have implemented a naive merge sorting algorithm in Python. Algorithm and test code is below:
import time
import random
import matplotlib.pyplot as plt
import math
from collections import deque
def sort(unsorted):
if len(unsorted) <= 1:
return unsorted
to_merge = deque(deque([elem]) for elem in unsorted)
while len(to_merge) > 1:
left = to_merge.popleft()
right = to_merge.popleft()
to_merge.append(merge(left, right))
return to_merge.pop()
def merge(left, right):
result = deque()
while left or right:
if left and right:
elem = left.popleft() if left[0] > right[0] else right.popleft()
elif not left and right:
elem = right.popleft()
elif not right and left:
elem = left.popleft()
result.append(elem)
return result
LOOP_COUNT = 100
START_N = 1
END_N = 1000
def test(fun, test_data):
start = time.clock()
for _ in xrange(LOOP_COUNT):
fun(test_data)
return time.clock() - start
def run_test():
timings, elem_nums = [], []
test_data = random.sample(xrange(100000), END_N)
for i in xrange(START_N, END_N):
loop_test_data = test_data[:i]
elapsed = test(sort, loop_test_data)
timings.append(elapsed)
elem_nums.append(len(loop_test_data))
print "%f s --- %d elems" % (elapsed, len(loop_test_data))
plt.plot(elem_nums, timings)
plt.show()
run_test()
As much as I can see everything is OK and I should get a nice N*logN curve as a result. But the picture differs a bit:
Things I've tried to investigate the issue:
PyPy. The curve is ok.
Disabled the GC using the gc module. Wrong guess. Debug output showed that it doesn't even run until the end of the test.
Memory profiling using meliae - nothing special or suspicious.
`
I had another implementation (a recursive one using the same merge function), it acts the similar way. The more full test cycles I create - the more "jumps" there are in the curve.
So how can this behaviour be explained and - hopefully - fixed?
UPD: changed lists to collections.deque
UPD2: added the full test code
UPD3: I use Python 2.7.1 on a Ubuntu 11.04 OS, using a quad-core 2Hz notebook. I tried to turn of most of all other processes: the number of spikes went down but at least one of them was still there.

You are simply picking up the impact of other processes on your machine.
You run your sort function 100 times for input size 1 and record the total time spent on this. Then you run it 100 times for input size 2, and record the total time spent. You continue doing so until you reach input size 1000.
Let's say once in a while your OS (or you yourself) start doing something CPU-intensive. Let's say this "spike" lasts as long as it takes you to run your sort function 5000 times. This means that the execution times would look slow for 5000 / 100 = 50 consecutive input sizes. A while later, another spike happens, and another range of input sizes look slow. This is precisely what you see in your chart.
I can think of one way to avoid this problem. Run your sort function just once for each input size: 1, 2, 3, ..., 1000. Repeat this process 100 times, using the same 1000 inputs (it's important, see explanation at the end). Now take the minimum time spent for each input size as your final data point for the chart.
That way, your spikes should only affect each input size only a few times out of 100 runs; and since you're taking the minimum, they will likely have no impact on the final chart at all.
If your spikes are really really long and frequent, you of course might want to increase the number of repetitions beyond the current 100 per input size.
Looking at your spikes, I notice the execution slows down exactly 3 times during a spike. I'm guessing the OS gives your python process one slot out of three during high load. Whether my guess is correct or not, the approach I recommend should resolve the issue.
EDIT:
I realized that I didn't clarify one point in my proposed solution to your problem.
Should you use the same input in each of your 100 runs for the given input size? Or should use 100 different (random) inputs?
Since I recommended to take the minimum of the execution times, the inputs should be the same (otherwise you'll be getting incorrect output, as you'll measuring the best-case algorithm complexity instead of the average complexity!).
But when you take the same inputs, you create some noise in your chart since some inputs are simply faster than others.
So a better solution is to resolve the system load problem, without creating the problem of only one input per input size (this is obviously pseudocode):
seed = 'choose whatever you like'
repeats = 4
inputs_per_size = 25
runtimes = defaultdict(lambda : float('inf'))
for r in range(repeats):
random.seed(seed)
for i in range(inputs_per_size):
for n in range(1000):
input = generate_random_input(size = n)
execution_time = get_execution_time(input)
if runtimes[(n, i)] > execution_time:
runtimes[(n,i)] = execution_time
for n in range(1000):
runtimes[n] = sum(runtimes[(n,i)] for i in range(inputs_per_size))/inputs_per_size
Now you can use runtimes[n] to build your plot.
Of course, depending if your system is super-noisy, you might change (repeats, inputs_per_size) from (4,25) to say, (10,10), or even (25,4).

I can reproduce the spikes using your code:
You should choose an appropriate timing function (time.time() vs. time.clock() -- from timeit import default_timer), number of repetitions in a test (how long each test takes), and number of tests to choose the minimal time from. It gives you a better precision and less external influence on the results. Read the note from timeit.Timer.repeat() docs:
It’s tempting to calculate mean and standard deviation from the result
vector and report these. However, this is not very useful. In a
typical case, the lowest value gives a lower bound for how fast your
machine can run the given code snippet; higher values in the result
vector are typically not caused by variability in Python’s speed, but
by other processes interfering with your timing accuracy. So the min()
of the result is probably the only number you should be interested in.
After that, you should look at the entire vector and apply common
sense rather than statistics.
timeit module can choose appropriate parameters for you:
$ python -mtimeit -s 'from m import testdata, sort; a = testdata[:500]' 'sort(a)'
Here's timeit-based performance curve:
The figure shows that sort() behavior is consistent with O(n*log(n)):
|------------------------------+-------------------|
| Fitting polynom | Function |
|------------------------------+-------------------|
| 1.00 log2(N) + 1.25e-015 | N |
| 2.00 log2(N) + 5.31e-018 | N*N |
| 1.19 log2(N) + 1.116 | N*log2(N) |
| 1.37 log2(N) + 2.232 | N*log2(N)*log2(N) |
To generate the figure I've used make-figures.py:
$ python make-figures.py --nsublists 1 --maxn=0x100000 -s vkazanov.msort -s vkazanov.msort_builtin
where:
# adapt sorting functions for make-figures.py
def msort(lists):
assert len(lists) == 1
return sort(lists[0]) # `sort()` from the question
def msort_builtin(lists):
assert len(lists) == 1
return sorted(lists[0]) # builtin
Input lists are described here (note: the input is sorted so builtin sorted() function shows expected O(N) performance).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python : reduce execution time, optimize loops - python

Related

Fastest way to find the best level of compression of an image

Fastest way to generate ~10^9 poisson random numbers in python/numpy

My python loop through a dataframe slows down over time

Speed up python 3.5 loop to run it as fast as python could be

Unexpected performance curve from CPython merge sort

Categories

Resources