speed up time for loop iterations in python - python

I have "random" points and would like to check which points can be connected by straight lines. Therefore I iterate through a list of points and draw a line at different angles. After all lines at all angles for every single point is drawn, I iterate over each line checking whether they are connecting 3 or more points. If the line connects 3 or more points, it is saved by appending it to a new list (newLines), if not the next line gets tested.
The problem which the following code is that it is way to slow... My testing image took about 30 min and my actual image was not done after about 14 hours. I read about speeding up for loops by using numpy (like in this article). I found plenty of examples for replacing for loops with numpy but in these example it was just simple iterating over a list without declaring the values as variables for usage.
Any hint for speeding up the following code is appreciated, it does not necessarily need to be numpy.
# list for saving rotated lines
lines=[]
for point in points:
# length of line is the diagonal of the point image so it still covers the whole image after rotation
length = sqrt(image.shape[0]**2+image.shape[1]**2)
start = Point(point)
end = Point(start.x+length, start.y)
line = LineString([start,end])
# rotating the generated line for 5 degrees and appeding it to the list
for a in range(0, 360, 5):
angle = np.deg2rad(a)
line = rotate(line, angle, origin=start, use_radians=True)
lines.append(line)
multiLines = MultiLineString(lines)
# list for rotated lines which connect 3 or more points
newLines = []
start = ()
for multiLine in multiLines.geoms:
lst = list(multiLine.coords)
# a: starting point of line | b: ending point of line
a = np.asarray(lst[0])
b = np.asarray(lst[1])
count = 0
# again iterating over point array to check which point is on line
for point in points:
p = np.asarray(point)
# check if point (p) is on line (a - b)
if np.cross(p-a,b-a) == 0:
if count == 0:
start = point
count += 1
else:
end = point
count += 1
if count >= 3:
line = (start, end)
newLines.append(line)

I'm not sure what your current benchmarks are, but you want to try with numpy you can do something like this. I'm using pandas which is a numpy wrapper, but it's effectively doing the same thing
I think this is doing the same thing as you want. I'm looking at each pair of points, calculating the m and c coefficients in the equation y = mx + c through the two points, then checking for cases where these match. I expect you might want some accepted error depending on your input data.
Sorry if I'm way off piste.
import pandas as pd
import numpy as np
import random
import itertools
import time
def get_matches(points):
# get all combinations of two points
combinations_of_points = ([(a[0], a[1], b[0], b[1]) for a, b in itertools.combinations(points, 2) if a != b])
data = pd.DataFrame(combinations_of_points, columns=['x1', 'y1', 'x2', 'y2'])
data['m'] = (data.y1 - data.y2) / (data.x1 - data.x2)
# swap negative gradients so all lines are in same direction
data.loc[np.isfinite(data.m) & data.m < 0, 'm'] = -(1 / data.m)
data.loc[np.isneginf(data.m), 'm'] = -data.m
# y = mx + c
data['c'] = data.y1 - (data.m * data.x1)
data = data.sort_values(['m', 'c', 'x1']).reset_index(drop=True)
# filter to items which are duplicated
filtered = data[
# matching m and c values
(np.isfinite(data.m) & data.duplicated(['m', 'c'], keep=False)) |
# infinite m and x equal (straight line up)
(np.isposinf(data.m) & data.duplicated(['m', 'x1'], keep=False))
]
return filtered
points = [(0, 0), (1, 1), (2, 2)]
print(get_matches(points))
random.seed(1)
count = 500
random_points = [(round(random.random(), 3), round(random.random(), 3)) for i in range(count)]
results = get_matches(random_points)
print(results)
print('\nPerformance with increasing points')
for i in [i ** 2 for i in range(5, 101, 5)]:
random.seed(1)
random_points = [(round(random.random(), 3), round(random.random(), 3)) for i in range(i)]
start = time.perf_counter()
results = get_matches(random_points)
stop = time.perf_counter()
print(f'{i:<9}{stop - start:03f}')
returns:
x1 y1 x2 y2 m c
0 0 0 1 1 1.0 0.0
1 0 0 2 2 1.0 0.0
2 1 1 2 2 1.0 0.0
x1 y1 x2 y2 m c
12243 0.606 0.262 0.400 0.880 -3.0 2.080
12244 0.606 0.262 0.440 0.760 -3.0 2.080
12251 0.378 0.970 0.506 0.586 -3.0 2.104
12252 0.505 0.589 0.378 0.970 -3.0 2.104
12253 0.505 0.589 0.506 0.586 -3.0 2.104
... ... ... ... ... ... ...
124741 0.971 0.382 0.971 0.716 inf -inf
124742 0.971 0.543 0.971 0.716 inf -inf
124744 0.983 0.593 0.983 0.296 inf -inf
124745 0.983 0.593 0.983 0.448 inf -inf
124746 0.983 0.296 0.983 0.448 inf -inf
[237 rows x 6 columns]
Performance with increasing points
25 0.010577
100 0.016897
225 0.045443
400 0.136834
625 0.338148
900 0.765913
1225 1.525819
1600 2.645753
2025 4.834811
2500 8.112012
3025 12.960043
3600 18.262522
4225 27.221498
4900 37.329662
5625 53.064736
6400 67.325213
7225 84.843119
8100 116.864120
9025 140.131420
10000 171.630961
As one of you comments pointed out earlier, the order of growth of the problem is approximately N ^ 2 because it is look at all the combinations of points so the performance very quickly degrades with increasing numbers of points. Note you could use this relationship to estimate how long it would take for your program to run if you know the number of points.

Related

How to find clusters in a matrix of values

I have a 640x480 matrix that contains temperature values (coming from thermal images);
each element of the matrix represents the temperature of a single pixel like so:
[[31.2 30.4 32.5 ... 31.3 31.6 31.7]
[30.0 37.4 40.5 ... 51.5 52.6 52.7]
...
[28.9 28.8 28.1 ... 31.2 32.4 32.3]]
I want to find clusters in this matrix taking into consideration:
temperature difference between two elements;
positional distance between two elements;
I tried to do this by using DBSCAN clustering algorithm on an array containing coordinates and values of the elements like so:
coord = [[0 0 31.2]
[1 0 30.4]
[2 0 32.5]
...
[638 479 32.4]
[639 479 32.3]]
This is the code:
X=np.array(coord)
db = DBSCAN(eps=2, min_samples=2, metric='manhattan').fit(X)
core_samples_mask = np.zeros_like(db.labels_, dtype=bool)
core_samples_mask[db.core_sample_indices_] = True
labels = db.labels_
n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
n_noise_ = list(labels).count(-1)
cluster_dict = {i: X[labels==i] for i in range(n_clusters_)}
The problem is that I get a large number of clusters and the clustering is not precise (also with different values ​​of eps and min_samples).
I would like to know if there is a more efficient method of doing this.
Thank you all

Need help dividing the ratio of elements in a Python list

I am working on a problem where I have been asked to a) output Fibonacci numbers in a sequence based on user input, as I have done below, and b) divide and print the ratio of the two most recent terms.
fixed_start = [0, 1]
def fib(fixed_start, n):
if n == 0:
return fixed_start
else:
fixed_start.append(fixed_start[-1] + fixed_start[-2])
return fib(fixed_start, n -1)
numb = int(input('How many terms: '))
fibonacci_list = fib(fixed_start, numb)
print(fibonacci_list[:-1])
I would like for my output to look something like the below:
"How many terms:" 3
1 1
the ratio is 1.0
1 2
the ratio is 2.0
2 3
the ratio is 1.5
Are you looking for ratio of the last 2 items in the list? If yes, this should work.
print(fibonacci_list[-2:])
print(float(fibonacci_list[-1]/fibonacci_list[-2]))
Or, if you are looking for ratio between every 2 numbers (except 0 & 1 right at the start), the below code should do the trick
for x,y in zip(fibonacci_list[1:],fibonacci_list[2:]):
print(x,y)
print('the ratio is ' + str(round((y/x),3)))
output is something like below for a fibonacci list of 15 terms
1 1
the ratio is 1.0
1 2
the ratio is 2.0
2 3
the ratio is 1.5
3 5
the ratio is 1.667
5 8
the ratio is 1.6
8 13
the ratio is 1.625
13 21
the ratio is 1.615
21 34
the ratio is 1.619
34 55
the ratio is 1.618
55 89
the ratio is 1.618
89 144
the ratio is 1.618
144 233
the ratio is 1.618
233 377
the ratio is 1.618
377 610
the ratio is 1.618
610 987
the ratio is 1.618
As you have already solved the part one of generating the Fibonacci series in the form of a list, you can access the last two elements (most recent) from it and take their ratio. Python allows us to access the elements of the list from backwards using the negative indexing
def fibonacci_ratio(fibonacci_list):
last_element = fibonacci_list[-1]
second_last_element = fibonacci_list[-2]
ratio = last_element//second_last_element
return ratio
The double // in python will ensure floating point division.
Hope this helps!

Algorithm for generation of number matrix with specified boundaries

I am trying to generate a matrix of numbers with 7 rows and 4 columns. Each row must sum to 100 and each column must have an even spread (if permitted) between a min and max range (specified below).
Goal:
C1 C2 C3 C4 sum range
1 low 100 ^
2 .. |
3 .. |
4 .. |
5 .. |
6 .. |
7 high _
c1_high = 98
c1_low = 75
c2_high = 15
c2_low = 6
c3_high = 8
c3_low = 2
c4_low = 0.05
c4_high =0.5
In addition to this, i need the spread of each row to be as linear as possible, though a line fitted to the data with a second order polynomial would suffice (with an r^2 value of >0.98).
I am currently trying to do this using the following sudocode:
generate random number between ranges for c1,c2,c3 and c4.
repeat this 7 times
check correlation between each generated c1 value and a range of numbers from 1-7. For example:
repeat step 3 for c2,c3 and c4.
Break loop when step 3 and 4 are successful
This has proven to be too burdensome in terms of the number of iterations required and as a result, the solution is never reached.
Is there a more efficient way of achieving this solution?
So far:
import pandas as pd
import numpy as np
from sklearn.utils import shuffle
c1_high = 98
c1_low = 75
c2_high = 15
c2_low = 6
c3_high = 8
c3_low = 2
c4_low = 0.05
c4_high =0.5
def matrix_gen(): #generates matrix within min and max values
container =[]
d={}
offset = np.linspace(0.05,1,9)
c1= np.linspace(c1_low, c1_high, 7)
c2= np.linspace(c2_low, c2_high, 7)
c3= np.linspace(c3_low, c3_high, 7)
c4= np.linspace(c4_low, c4_high, 7)
for i in np.arange(7):
d["row{0}".format(i)]=[item[i] for item in [c1,c2,c3,c4]]
df =pd.DataFrame(d)
df.loc[4,:] = df.iloc[0,:][::-1].values
df1 = df.drop(0)
df1.loc[5,:] = df1.sum(axis=0)
new_name = df1.index[-1]
df1 = df1.rename(index={new_name: 'sum'})
return df1
m = matrix_gen()
print(m)
out:
row0 row1 row2 row3 row4 row5 row6
1 6.00 7.500000 9.000000 10.500 12.000000 13.500000 15.0
2 2.00 3.000000 4.000000 5.000 6.000000 7.000000 8.0
3 0.05 0.125000 0.200000 0.275 0.350000 0.425000 0.5
4 98.00 94.166667 90.333333 86.500 82.666667 78.833333 75.0
sum 106.05 104.791667 103.533333 102.275 101.016667 99.758333 98.5
next function:
def shuf(): # attempts at shuffling the values around such that the 'sum' row is as close to 100 as possible.
df = matrix_gen()
df1 = df[1:4]
count =0
while True:
df1 = shuffle(df1)
df1.loc[5,:] = df1.sum(axis=0)
for i in df1.loc[5].values:
if 98<= i <=100:
print('solution')
return df1
else:
count+=1
print(count)
continue
opt = shuf()
print(opt)
next function will need to apply a deviation to each number to provide a sum of each row equal to 100. Optimization should include minimizing deviations.
I think an interesting approach would be to use an optimization model.
Ordered values
Let x(i,j) be the matrix your want to fill. Then we have:
sum(j, x(i,j)) = 100 ∀i
L(j) ≤ x(i,j) ≤ U(j) ∀i,j
x(i,j) = x(i-1,j) + step(j) + deviation(i,j)
special cases:
x(1,j) = L(j) + deviation(1,j)
and x(m,j) = U(j) + deviation(m,j)
step(j) ≥ 0
minimize sum((i,j), deviation(i,j)^2 )
This is a quadratic programming problem. It is possible to absolute deviations instead of squared ones. In that case you have an LP.
The model can be refined to minimize squared relative errors.
This is a little bit related to what is called matrix balancing (a statistical technique often used in economic modeling).
Unordered values
In the above I assumed the values had to be ordered. Now I understand this is not the case. I adapted the model to handle this as follows. First an overview of the results.
The input data is:
---- 17 PARAMETER LO
c1 80.000, c2 5.000, c3 0.500, c4 0.050
---- 17 PARAMETER UP
c1 94.000, c2 14.000, c3 5.000, c4 0.500
Warning: Note that this data has been changed by the poster. My answer is using the original LO and UP values before they were changed.
The model operates in three steps:
(1) populate a perfectly organized matrix without obeying the row sum constraints. This can be done outside the model. I generated simply:
---- 53 PARAMETER init initial matrix
c1 c2 c3 c4 rowsum
r1 80.000 5.000 0.500 0.050 85.550
r2 82.333 6.500 1.250 0.125 90.208
r3 84.667 8.000 2.000 0.200 94.867
r4 87.000 9.500 2.750 0.275 99.525
r5 89.333 11.000 3.500 0.350 104.183
r6 91.667 12.500 4.250 0.425 108.842
r7 94.000 14.000 5.000 0.500 113.500
I.e. from lo(j) to up(j) with equal steps.
(2) The second step is to permute the values within a column to achieve a solution that has a close match to the row sums. This gives:
---- 53 VARIABLE y.L after permutation
c1 c2 c3 c4 rowsum
r1 94.000 5.000 0.500 0.125 99.625
r2 82.333 12.500 4.250 0.500 99.583
r3 89.333 8.000 2.000 0.200 99.533
r4 87.000 9.500 2.750 0.275 99.525
r5 84.667 11.000 3.500 0.350 99.517
r6 91.667 6.500 1.250 0.050 99.467
r7 80.000 14.000 5.000 0.425 99.425
This is already very close and maintains "perfect" spread.
(3) Change the values a little bit by adding a deviation such that the row sums are exactly 100. Minimize the sum of the squared relative deviations. This gives:
---- 53 VARIABLE x.L final values
c1 c2 c3 c4 rowsum
r1 94.374 5.001 0.500 0.125 100.000
r2 82.747 12.503 4.250 0.500 100.000
r3 89.796 8.004 2.000 0.200 100.000
r4 87.469 9.506 2.750 0.275 100.000
r5 85.142 11.007 3.501 0.350 100.000
r6 92.189 6.510 1.251 0.050 100.000
r7 80.561 14.012 5.002 0.425 100.000
---- 53 VARIABLE d.L deviations
c1 c2 c3 c4
r1 0.374 0.001 1.459087E-5 1.459087E-7
r2 0.414 0.003 9.542419E-5 9.542419E-7
r3 0.462 0.004 2.579521E-4 2.579521E-6
r4 0.469 0.006 4.685327E-4 4.685327E-6
r5 0.475 0.007 7.297223E-4 7.297223E-6
r6 0.522 0.010 0.001 1.123123E-5
r7 0.561 0.012 0.002 1.587126E-5
Steps (2) and (3) have to be inside the optimization model: they have to be executed simultaneously to achieve proven optimal solutions.
The mathematical model can look like:
The model solves within a few seconds to proven global optimality using a solver like Cplex or Gurobi.
I think this is pretty cute model (ok, that is really nerdy, I know). The permutation is modeled with a permutation matrix P (binary values). This makes the model a MIQP (Mixed Integer Quadratic Programming) model. It can be linearized fairly easily: use absolute values instead of squares in the objective. After proper reformulation, we end up with a linear MIP model. There is lots of software available to handle this. This includes libraries and packages callable from Python.
Note: I probably should not divide by init(i,j) in the objective, but rather by the column means in the init matrix. Dividing by y(i,j) would be the best, but that leads to another non-linearity.
Your numbers are small enough for a smart brute force approach.
I use two methods to quantify and minimize deviations from the "clean" equidistant values (linspace(low, high, 7)). "abserr" for squared difference and "relerr" for squared error divided by squared clean value. I also check corrcoefs in the very end but I've never seen anything below 99.8%
The following code first finds the shuffle od the clean values with the smallest error. This takes just a few seconds, because we use the following tricks:
split the 4 columns into two pairs
each pair has 7! relative arrangements, a mangeable number even when squared (one factor for each pair)
compute these (7!)^2 shuffles and sum over pairs
to not have to iterate over all relative shuffles between the pairs we observe that the total error is minimized if the the two sets of pair sums are arranged in opposite order this is true for "abserr" and "relerr"
In the end the values are corrected to make rows sum to 100. Here again we use the fact that the summed error is minimized when evenly spread.
The code below contains two variants a legacy one solve which contains a small inaccuracy when minimizing relerr and a corrected version improved_solve. They frequently find different solutions but in more than 100 random problems only one led to a very slightly smaller error with improved_solve.
Answers to a few examples:
OP's example:
((75, 98), (6, 15), (2, 8), (0.05, 0.5))
solve relerr improved_solve relerr
table: table:
76.14213 15.22843 8.12183 0.50761 76.14213 15.22843 8.12183 0.50761
79.02431 13.53270 7.01696 0.42603 79.02431 13.53270 7.01696 0.42603
81.83468 11.87923 5.93961 0.34648 81.83468 11.87923 5.93961 0.34648
84.57590 10.26644 4.88878 0.26888 84.57590 10.26644 4.88878 0.26888
87.25048 8.69285 3.86349 0.19317 87.25048 8.69285 3.86349 0.19317
89.86083 7.15706 2.86282 0.11928 89.86083 7.15706 2.86282 0.11928
92.40924 5.65771 1.88590 0.04715 92.40924 5.65771 1.88590 0.04715
avgerr: avgerr:
0.03239 0.03239
corrcoefs: corrcoefs:
0.99977 0.99977 0.99977 0.99977 0.99977 0.99977 0.99977 0.99977
An example where sorting some colums ascending some descending is not optimal:
((11, 41), (4, 34), (37, 49), (0.01, 23.99))
Note that the solvers find different solutions, but the error is the same.
solve relerr improved_solve relerr
table: table:
10.89217 18.81374 46.53926 23.75483 11.00037 24.00080 49.00163 15.99720
26.00087 9.00030 49.00163 15.99720 16.00107 19.00127 45.00300 19.99467
31.00207 4.00027 45.00300 19.99467 25.74512 13.86276 36.63729 23.75483
16.00000 29.00000 43.00000 12.00000 35.99880 8.99970 46.99843 8.00307
20.99860 33.99773 40.99727 4.00640 41.00000 4.00000 43.00000 12.00000
40.99863 13.99953 36.99877 8.00307 20.99860 33.99773 40.99727 4.00640
36.35996 24.23998 39.38996 0.01010 31.30997 29.28997 39.38996 0.01010
avgerr: avgerr:
0.00529 0.00529
corrcoefs: corrcoefs:
0.99993 0.99994 0.99876 0.99997 0.99989 0.99994 0.99877 0.99997
This is the problem where improved_solve actually beats legacy solve:
((36.787862883725872, 43.967159949544317),
(40.522239654303483, 47.625869880574164),
(19.760537036548321, 49.183056694462799),
(45.701873101046154, 48.051424087501672))
solve relerr improved_solve relerr
table: table:
21.36407 23.53276 28.56241 26.54076 20.25226 26.21874 27.07599 26.45301
22.33545 24.52391 26.03695 27.10370 21.53733 26.33278 25.10656 27.02333
23.33149 25.54022 23.44736 27.68093 22.90176 26.45386 23.01550 27.62888
24.35314 26.58266 20.79119 28.27301 24.35314 26.58266 20.79119 28.27301
25.40141 27.65226 18.06583 28.88050 25.90005 26.71994 18.42047 28.95953
26.47734 28.75009 15.26854 29.50403 27.55225 26.86656 15.88840 29.69279
27.58205 29.87728 12.39644 30.14424 29.32086 27.02351 13.17793 30.47771
avgerr: avgerr:
0.39677 0.39630
corrcoefs: corrcoefs:
0.99975 0.99975 0.99975 0.99975 0.99847 0.99847 0.99847 0.99847
Code:
import numpy as np
import itertools
import math
N_CHUNKS = 3
def improved_solve(LH, errtype='relerr'):
N = math.factorial(7)
# accept anything that looks like a 2d array
LH = np.asanyarray(LH)
# build equidistant columns
C = np.array([np.linspace(l, h, 7) for l, h in LH])
# subtract offset; it's cheaper now than later
c0, c1, c2, c3 = C - 25
# list all permutiations of a single column
p = np.array(list(itertools.permutations(range(7))))
# split into left and right halves, compute all relative permutiations
# and sort them by their sums of corresponding elements.
# Left pairs in ascending, right pairs in descending order.
L = np.sort(c0 + c1[p], axis=1)
R = np.sort(c2 + c3[p], axis=1)[:, ::-1]
# For each pair of permutations l in L, r in R compute the smallest
# possible error (sum of squared deviations.)
if errtype == 'relerr':
err = np.empty((N, N))
split = np.linspace(0, N, N_CHUNKS+1, dtype=int)[1:-1]
for LCH, ECH in zip(np.split(L, split, axis=0),
np.split(err, split, axis=0)):
dev = LCH[:, None] + R[None, :]
((dev / (100+dev))**2).sum(axis=-1, out=ECH)
del dev
elif errtype == 'abserr':
err = (np.add.outer(np.einsum('ij,ij->i', L, L),
np.einsum('ij,ij->i', R, R))
+ np.einsum('ik, jk->ij', 2*L, R))
else:
raise ValueError
# find pair of pairs with smallest error
i = np.argmin(err.ravel())
i1, i3 = np.unravel_index(i, (N, N))
# recreate shuffled table
c0, c1, c2, c3 = C
lidx = np.argsort(c0 + c1[p[i1]])
ridx = np.argsort(c2 + c3[p[i3]])[::-1]
C = np.array([c0[lidx], c1[p[i1]][lidx], c2[ridx], c3[p[i3]][ridx]])
# correct rowsums, calculate error and corrcoef and return
if errtype == 'relerr':
result = C * (100.0 / C.sum(axis=0, keepdims=True))
err = math.sqrt((((result-C)/C)**2).mean())
else:
result = C + (25 - C.mean(axis=0, keepdims=True))
err = math.sqrt(((result-C)**2).mean())
rs = np.sort(result, axis=1)
cc = tuple(np.corrcoef(ri, range(7))[0, 1] for ri in rs)
return dict(table=result.T, avgerr=err, corrcoefs=cc)
def solve(LH, errtype='relerr'):
LH = np.asanyarray(LH)
if errtype=='relerr':
err1 = 200 / LH.sum()
diff = np.diff(LH * err1, axis=1).ravel()
elif errtype=='abserr':
err1 = 25 - LH.mean()
diff = np.diff(LH, axis=1).ravel()
else:
raise ValueError
C = np.array([np.linspace(-d/2, d/2, 7) for d in diff])
c0, c1, c2, c3 = C
p = np.array(list(itertools.permutations(range(7))))
L = np.sort(c0 + c1[p], axis=1)
R = np.sort(c2 + c3[p], axis=1)[:, ::-1]
err = (np.add.outer(np.einsum('ij,ij->i', L, L),
np.einsum('ij,ij->i', R, R))
+ np.einsum('ik, jk->ij', 2*L, R)).ravel()
i = np.argmin(err)
i1, i3 = np.unravel_index(i, (math.factorial(7), math.factorial(7)))
L = np.argsort(c0 + c1[p[i1]])
R = np.argsort(c2 + c3[p[i3]])[::-1]
ref = [np.linspace(l, h, 7) for l, h in LH]
if errtype=='relerr':
c0, c1, c2, c3 = [np.linspace(l, h, 7) for l, h in LH * err1]
C = np.array([c0[L], c1[p[i1]][L], c2[R], c3[p[i3]][R]])
err2 = 100 / np.sum(C, axis=0)
C *= err2
cs = list(map(sorted, C))
err = math.sqrt(sum((c/r-1)**2 for ci, ri in zip(cs, ref) for c, r in zip(ci, ri)) / 28)
elif errtype=='abserr':
c0, c1, c2, c3 = [np.linspace(l, h, 7) for l, h in LH + err1]
C = np.array([c0[L], c1[p[i1]][L], c2[R], c3[p[i3]][R]])
err2 = 25 - np.mean(C, axis=0)
C += err2
cs = list(map(sorted, C))
err = math.sqrt(sum((c-r)**2 for ci, ri in zip(cs, ref) for c, r in zip(ci, ri)) / 28)
else:
raise ValueError
cc = tuple(np.corrcoef(ci, range(7))[0, 1] for ci in cs)
return dict(table=C.T, avgerr=err, corrcoefs=cc)
for problem in [((75, 98), (6, 15), (2, 8), (0.05, 0.5)),
((11, 41), (4, 34), (37, 49), (0.01, 23.99)),
((80, 94), (5, 14), (0.5, 5), (0.05, 0.5)),
((36.787862883725872, 43.967159949544317),
(40.522239654303483, 47.625869880574164),
(19.760537036548321, 49.183056694462799),
(45.701873101046154, 48.051424087501672))]:
for errtype in ('relerr', 'abserr'):
print()
columns = []
for solver in (solve, improved_solve):
sol = solver(problem, errtype)
column = [[' '.join((solver.__name__, errtype))]] + \
[[k + ':'] + [' '.join([f'{e:8.5f}' for e in r])
for r in np.atleast_2d(v)]
for k, v in sol.items()]
column = (line for block in column for line in block)
columns.append(column)
for l, r in zip(*columns):
print(f"{l:39s} {r:39s}")
problems = []
for i in range(0):
problem = np.sort(np.random.random((4, 2)), axis=1) * 50
for errtype in ('relerr', 'abserr'):
sol0 = solve(problem, errtype)
sol1 = improved_solve(problem, errtype)
if not np.allclose(sol0['table'], sol1['table']):
print(i, end= " ")
if np.abs((sol0['avgerr']-sol1['avgerr'])
/(sol0['avgerr']+sol1['avgerr']))>1e-6:
print(problem)
problems.append(problem)
columns = []
for sol, name in [(sol0, 'old '), (sol1, 'improved ')]:
column = [[name + errtype]] + \
[[k + ':'] + [' '.join([f'{e:8.5f}' for e in r])
for r in np.atleast_2d(v)]
for k, v in sol.items()]
column = (line for block in column for line in block)
columns.append(column)
for l, r in zip(*columns):
print(f"{l:39s} {r:39s}")

Troublesome Frog AIO 2013 Intermediate Python 3.x

Recently I was trying out this problem and my code got 60% of the marks, with the remaining cases returning TLEs.
Bazza and Shazza do not like bugs. They wish to clear out all the bugs
on their garden fence. They come up with a brilliant idea: they buy
some sugar frogs and release them near the fence, letting them eat up
all the bugs.
The plan is a great success and the bug infestation is gone. But
strangely, they now have a sugar frog infestation. Instead of getting
rid of the frogs, Bazza and Shazza decide to set up an obstacle course
and watch the frogs jump along it for their enjoyment.
The fence is a series of \$N\$ fence posts of varying heights. Bazza and
Shazza will select three fence posts to create the obstacle course,
where the middle post is strictly higher than the other two. The frogs
are to jump up from the left post to the middle post, then jump down
from the middle post to the right post. The three posts do not have to
be next to each other as frogs can jump over other fence posts,
regardless of the height of those other posts.
The difficulty of an obstacle course is the height of the first jump
plus the height of the second jump. The height of a jump is equal to
the difference in height between it's two fence posts. Your task is to
help Bazza and Shazza find the most difficult obstacle course for the
frogs to jump.
Input
Your program should read from the file. The file will describe
a single fence.
The first line of input will contain one integer \$N\$: the number of
fence posts. The next \$N\$ lines will each contain one integer \$h_i\$: the
height of the ith fence post. You are guaranteed that there will be at
least one valid obstacle course: that is, there will be at least one
combination of three fence posts where the middle post is strictly
higher than the other two.
Output
Your program should write to the file. Your output file should
contain one line with one integer: the greatest difficulty of any
possible obstacle course.
Constraints
To evaluate your solution, the judges will run your
program against several different input files. All of these files will
adhere to the following bounds:
\$3 \leq N \leq 100,000\$ (the number of fence posts)
\$1 \leq h_i \leq 100,000\$ (the height of each post)
As some of the test cases will be quite large,
you may need to think about how well your solution scales for larger
input values. However, not all the cases will be large. In particular:
For 30% of the marks, \$N \leq 300\$. For an additional 30% of the
marks, \$N \leq 3,000\$. For the remaining 40% of the marks, no special > constraints apply.
Hence, I was wondering if anyone could think of a way to optimize my code (below), or perhaps provide a more elegant, efficient algorithm than the one I am currently using.
Here is my code:
infile = open('frogin.txt', 'r')
outfile = open('frogout.txt', 'w')
N = int(infile.readline())
l = []
for i in range(N):
l.append(int(infile.readline()))
m = 0
#find maximum z-x+z-y such that the middle number z is the largest of x, y, z
for j in range(1, N - 1):
x = min(l[0: j])
y = min(l[j + 1:])
z = l[j]
if x < z and y < z:
n = z - x + z - y
m = n if n > m else m
outfile.write(str(m))
infile.close()
outfile.close()
exit()
If you require additional information regarding my solution or the problem, please do comment below.
Ok, first let's evaluate your program. I created a test file like
from random import randint
n = 100000
max_ = 100000
with open("frogin.txt", "w") as outf:
outf.write(str(n) + "\n")
outf.write("\n".join(str(randint(1, max_)) for _ in range(n)))
then ran your code in IPython like
%load_ext line_profiler
def test():
infile = open('frogin.txt', 'r')
outfile = open('frogout.txt', 'w')
N = int(infile.readline())
l = []
for i in range(N):
l.append(int(infile.readline()))
m = 0
for j in range(1, N - 1):
pre_l = l[0: j] # I split these lines
x = min(pre_l) # for a bit more detail
post_l = l[j + 1:] # on exactly which operations
y = min(post_l) # are taking the most time
z = l[j]
if x < z and y < z:
n = z - x + z - y
m = n if n > m else m
outfile.write(str(m))
infile.close()
outfile.close()
%lprun -f test test() # instrument the `test` function, then run `test()`
which gave
Total time: 197.565 s
File: <ipython-input-37-afa35ce6607a>
Function: test at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def test():
2 1 479 479.0 0.0 infile = open('frogin.txt', 'r')
3 1 984 984.0 0.0 outfile = open('frogout.txt', 'w')
4 1 195 195.0 0.0 N = int(infile.readline())
5 1 2 2.0 0.0 l = []
6 100001 117005 1.2 0.0 for i in range(N):
7 100000 269917 2.7 0.0 l.append(int(infile.readline()))
8 1 2 2.0 0.0 m = 0
9 99999 226984 2.3 0.0 for j in range(1, N - 1):
10 99998 94137525 941.4 12.2 pre_l = l[0: j]
11 99998 300309109 3003.2 38.8 x = min(pre_l)
12 99998 85915575 859.2 11.1 post_l = l[j + 1:]
13 99998 291183808 2911.9 37.7 y = min(post_l)
14 99998 441185 4.4 0.1 z = l[j]
15 99998 212870 2.1 0.0 if x < z and y < z:
16 99978 284920 2.8 0.0 n = z - x + z - y
17 99978 181296 1.8 0.0 m = n if n > m else m
18 1 114 114.0 0.0 outfile.write(str(m))
19 1 170 170.0 0.0 infile.close()
20 1 511 511.0 0.0 outfile.close()
which shows that 23.3% of your time (46 s) is spent repeatedly slicing your array, and 76.5% (151 s) is spent running min() on the slices 200k times.
So - how can we speed this up? Consider
a = min(l[0:50001]) # 50000 comparisons
b = min(l[0:50002]) # 50001 comparisons
c = min(a, l[50001]) # 1 comparison
Here's the magic: b and c are exactly equivalent but b takes something like 10k times longer to run. You have to have a calculated first - but you can repeat the same trick, shifted back by 1, to get a cheaply, and the same for the a's predecessor, and so on.
In one pass from start to end you can keep a running tally of 'minimum value seen previous to this index'. You can then do the same thing from end to start, keeping a running tally of 'minimum value seen after this index'. You can then zip all three arrays together and find the maximum achievable values.
I wrote a quick version,
def test():
ERROR_VAL = 1000000 # too big to be part of any valid solution
# read input file
with open("frogin.txt") as inf:
nums = [int(i) for i in inf.read().split()]
# check contents
n = nums.pop(0)
if len(nums) < n:
raise ValueError("Input file is too short!")
elif len(nums) > n:
raise ValueError("Input file is too long!")
# min_pre[i] == min(nums[:i])
min_pre = [0] * n
min_pre[0] = ERROR_VAL
for i in range(1, n):
min_pre[i] = min(nums[i - 1], min_pre[i - 1])
# min_post[i] == min(nums[i+1:])
min_post = [0] * n
min_post[n - 1] = ERROR_VAL
for i in range(n - 2, -1, -1):
min_post[i] = min(nums[i + 1], min_post[i + 1])
return max((nums[i] - min_pre[i]) + (nums[i] - min_post[i]) for i in range(1, n - 1) if min_pre[i] < nums[i] > min_post[i])
and profiled it,
Total time: 0.300842 s
File: <ipython-input-99-2097216e4420>
Function: test at line 1
Line # Hits Time Per Hit % Time Line Contents
==============================================================
1 def test():
2 1 5 5.0 0.0 ERROR_VAL = 1000000 # too big to be part of any valid solution
3 # read input file
4 1 503 503.0 0.0 with open("frogin.txt") as inf:
5 1 99903 99903.0 8.5 nums = [int(i) for i in inf.read().split()]
6 # check contents
7 1 212 212.0 0.0 n = nums.pop(0)
8 1 7 7.0 0.0 if len(nums) < n:
9 raise ValueError("Input file is too short!")
10 1 2 2.0 0.0 elif len(nums) > n:
11 raise ValueError("Input file is too long!")
12 # min_pre[i] == min(nums[:i])
13 1 994 994.0 0.1 min_pre = [0] * n
14 1 3 3.0 0.0 min_pre[0] = ERROR_VAL
15 100000 162915 1.6 13.8 for i in range(1, n):
16 99999 267593 2.7 22.7 min_pre[i] = min(nums[i - 1], min_pre[i - 1])
17 # min_post[i] == min(nums[i+1:])
18 1 1050 1050.0 0.1 min_post = [0] * n
19 1 3 3.0 0.0 min_post[n - 1] = ERROR_VAL
20 100000 167021 1.7 14.2 for i in range(n - 2, -1, -1):
21 99999 272080 2.7 23.1 min_post[i] = min(nums[i + 1], min_post[i + 1])
22 1 205222 205222.0 17.4 return max((nums[i] - min_pre[i]) + (nums[i] - min_post[i]) for i in range(1, n - 1) if min_pre[i] < nums[i] > min_post[i])
and you can see the run-time for processing 100k values has dropped from 197 s to 0.3 s.

Speed up numpy.where for extracting integer segments?

I'm trying to work out how to speed up a Python function which uses numpy. The output I have received from lineprofiler is below, and this shows that the vast majority of the time is spent on the line ind_y, ind_x = np.where(seg_image == i).
seg_image is an integer array which is the result of segmenting an image, thus finding the pixels where seg_image == i extracts a specific segmented object. I am looping through lots of these objects (in the code below I'm just looping through 5 for testing, but I'll actually be looping through over 20,000), and it takes a long time to run!
Is there any way in which the np.where call can be speeded up? Or, alternatively, that the penultimate line (which also takes a good proportion of the time) can be speeded up?
The ideal solution would be to run the code on the whole array at once, rather than looping, but I don't think this is possible as there are side-effects to some of the functions I need to run (for example, dilating a segmented object can make it 'collide' with the next region and thus give incorrect results later on).
Does anyone have any ideas?
Line # Hits Time Per Hit % Time Line Contents
==============================================================
5 def correct_hot(hot_image, seg_image):
6 1 239810 239810.0 2.3 new_hot = hot_image.copy()
7 1 572966 572966.0 5.5 sign = np.zeros_like(hot_image) + 1
8 1 67565 67565.0 0.6 sign[:,:] = 1
9 1 1257867 1257867.0 12.1 sign[hot_image > 0] = -1
10
11 1 150 150.0 0.0 s_elem = np.ones((3, 3))
12
13 #for i in xrange(1,seg_image.max()+1):
14 6 57 9.5 0.0 for i in range(1,6):
15 5 6092775 1218555.0 58.5 ind_y, ind_x = np.where(seg_image == i)
16
17 # Get the average HOT value of the object (really simple!)
18 5 2408 481.6 0.0 obj_avg = hot_image[ind_y, ind_x].mean()
19
20 5 333 66.6 0.0 miny = np.min(ind_y)
21
22 5 162 32.4 0.0 minx = np.min(ind_x)
23
24
25 5 369 73.8 0.0 new_ind_x = ind_x - minx + 3
26 5 113 22.6 0.0 new_ind_y = ind_y - miny + 3
27
28 5 211 42.2 0.0 maxy = np.max(new_ind_y)
29 5 143 28.6 0.0 maxx = np.max(new_ind_x)
30
31 # 7 is + 1 to deal with the zero-based indexing, + 2 * 3 to deal with the 3 cell padding above
32 5 217 43.4 0.0 obj = np.zeros( (maxy+7, maxx+7) )
33
34 5 158 31.6 0.0 obj[new_ind_y, new_ind_x] = 1
35
36 5 2482 496.4 0.0 dilated = ndimage.binary_dilation(obj, s_elem)
37 5 1370 274.0 0.0 border = mahotas.borders(dilated)
38
39 5 122 24.4 0.0 border = np.logical_and(border, dilated)
40
41 5 355 71.0 0.0 border_ind_y, border_ind_x = np.where(border == 1)
42 5 136 27.2 0.0 border_ind_y = border_ind_y + miny - 3
43 5 123 24.6 0.0 border_ind_x = border_ind_x + minx - 3
44
45 5 645 129.0 0.0 border_avg = hot_image[border_ind_y, border_ind_x].mean()
46
47 5 2167729 433545.8 20.8 new_hot[seg_image == i] = (new_hot[ind_y, ind_x] + (sign[ind_y, ind_x] * np.abs(obj_avg - border_avg)))
48 5 10179 2035.8 0.1 print obj_avg, border_avg
49
50 1 4 4.0 0.0 return new_hot
EDIT I have left my original answer at the bottom for the record, but I have actually looked into your code in more detail over lunch, and I think that using np.where is a big mistake:
In [63]: a = np.random.randint(100, size=(1000, 1000))
In [64]: %timeit a == 42
1000 loops, best of 3: 950 us per loop
In [65]: %timeit np.where(a == 42)
100 loops, best of 3: 7.55 ms per loop
You could get a boolean array (that you can use for indexing) in 1/8 of the time you need to get the actual coordinates of the points!!!
There is of course the cropping of the features that you do, but ndimage has a find_objects function that returns enclosing slices, and appears to be very fast:
In [66]: %timeit ndimage.find_objects(a)
100 loops, best of 3: 11.5 ms per loop
This returns a list of tuples of slices enclosing all of your objects, in 50% more time thn it takes to find the indices of one single object.
It may not work out of the box as I cannot test it right now, but I would restructure your code into something like the following:
def correct_hot_bis(hot_image, seg_image):
# Need this to not index out of bounds when computing border_avg
hot_image_padded = np.pad(hot_image, 3, mode='constant',
constant_values=0)
new_hot = hot_image.copy()
sign = np.ones_like(hot_image, dtype=np.int8)
sign[hot_image > 0] = -1
s_elem = np.ones((3, 3))
for j, slice_ in enumerate(ndimage.find_objects(seg_image)):
hot_image_view = hot_image[slice_]
seg_image_view = seg_image[slice_]
new_shape = tuple(dim+6 for dim in hot_image_view.shape)
new_slice = tuple(slice(dim.start,
dim.stop+6,
None) for dim in slice_)
indices = seg_image_view == j+1
obj_avg = hot_image_view[indices].mean()
obj = np.zeros(new_shape)
obj[3:-3, 3:-3][indices] = True
dilated = ndimage.binary_dilation(obj, s_elem)
border = mahotas.borders(dilated)
border &= dilated
border_avg = hot_image_padded[new_slice][border == 1].mean()
new_hot[slice_][indices] += (sign[slice_][indices] *
np.abs(obj_avg - border_avg))
return new_hot
You would still need to figure out the collisions, but you could get about a 2x speed-up by computing all the indices simultaneously using a np.unique based approach:
a = np.random.randint(100, size=(1000, 1000))
def get_pos(arr):
pos = []
for j in xrange(100):
pos.append(np.where(arr == j))
return pos
def get_pos_bis(arr):
unq, flat_idx = np.unique(arr, return_inverse=True)
pos = np.argsort(flat_idx)
counts = np.bincount(flat_idx)
cum_counts = np.cumsum(counts)
multi_dim_idx = np.unravel_index(pos, arr.shape)
return zip(*(np.split(coords, cum_counts) for coords in multi_dim_idx))
In [33]: %timeit get_pos(a)
1 loops, best of 3: 766 ms per loop
In [34]: %timeit get_pos_bis(a)
1 loops, best of 3: 388 ms per loop
Note that the pixels for each object are returned in a different order, so you can't simply compare the returns of both functions to assess equality. But they should both return the same.
One thing you could do to same a little bit of time is to save the result of seg_image == i so that you don't need to compute it twice. You're computing it on lines 15 & 47, you could add seg_mask = seg_image == i and then reuse that result (It might also be good to separate out that piece for profiling purposes).
While there a some other minor things that you could do to eke out a little bit of performance, the root issue is that you're using a O(M * N) algorithm where M is the number of segments and N is the size of your image. It's not obvious to me from your code whether there is a faster algorithm to accomplish the same thing, but that's the first place I'd try and look for a speedup.

Categories