Related
Recently in my homework, I was assinged to solve the following problem:
Given a matrix of order nxn of zeros and ones, find the number of paths from [0,0] to [n-1,n-1] that go only through zeros (they are not necessarily disjoint) where you could only walk down or to the right, never up or left. Return a matrix of the same order where the [i,j] entry is the number of paths in the original matrix that go through [i,j], the solution has to be recursive.
My solution in python:
def find_zero_paths(M):
n,m = len(M),len(M[0])
dict = {}
for i in range(n):
for j in range(m):
M_top,M_bot = blocks(M,i,j)
X,Y = find_num_paths(M_top),find_num_paths(M_bot)
dict[(i,j)] = X*Y
L = [[dict[(i,j)] for j in range(m)] for i in range(n)]
return L[0][0],L
def blocks(M,k,l):
n,m = len(M),len(M[0])
assert k<n and l<m
M_top = [[M[i][j] for i in range(k+1)] for j in range(l+1)]
M_bot = [[M[i][j] for i in range(k,n)] for j in range(l,m)]
return [M_top,M_bot]
def find_num_paths(M):
dict = {(1, 1): 1}
X = find_num_mem(M, dict)
return X
def find_num_mem(M,dict):
n, m = len(M), len(M[0])
if M[n-1][m-1] != 0:
return 0
elif (n,m) in dict:
return dict[(n,m)]
elif n == 1 and m > 1:
new_M = [M[0][:m-1]]
X = find_num_mem(new_M,dict)
dict[(n,m-1)] = X
return X
elif m == 1 and n>1:
new_M = M[:n-1]
X = find_num_mem(new_M, dict)
dict[(n-1,m)] = X
return X
new_M1 = M[:n-1]
new_M2 = [M[i][:m-1] for i in range(n)]
X,Y = find_num_mem(new_M1, dict),find_num_mem(new_M2, dict)
dict[(n-1,m)],dict[(n,m-1)] = X,Y
return X+Y
My code is based on the idea that the number of paths that go through [i,j] in the original matrix is equal to the product of the number of paths from [0,0] to [i,j] and the number of paths from [i,j] to [n-1,n-1]. Another idea is that the number of paths from [0,0] to [i,j] is the sum of the number of paths from [0,0] to [i-1,j] and from [0,0] to [i,j-1]. Hence I decided to use a dictionary whose keys are matricies of the form [[M[i][j] for j in range(k)] for i in range(l)] or [[M[i][j] for j in range(k+1,n)] for i in range(l+1,n)] for some 0<=k,l<=n-1 where M is the original matrix and whose values are the number of paths from the top of the matrix to the bottom. After analizing the complexity of my code I arrived at the conclusion that it is O(n^6).
Now, my instructor said this code is exponential (for find_zero_paths), however, I disagree.
The recursion tree (for find_num_paths) size is bounded by the number of submatrices of the form above which is O(n^2). Also, each time we add a new matrix to the dictionary we do it in polynomial time (only slicing lists), SO... the total complexity is polynomial (poly*poly = poly). Also, the function 'blocks' runs in polynomial time, and hence 'find_zero_paths' runs in polynomial time (2 lists of polynomial-size times a function which runs in polynomial time) so all in all the code runs in polynomial time.
My question: Is the code polynomial and my O(n^6) bound is wrong or is it exponential and I am missing something?
Unfortunately, your instructor is right.
There is a lot to unpack here:
Before we start, as quick note. Please don't use dict as a variable name. It hurts ^^. Dict is a reserved keyword for a dictionary constructor in python. It is a bad practice to overwrite it with your variable.
First, your approach of counting M_top * M_bottom is good, if you were to compute only one cell in the matrix. In the way you go about it, you are unnecessarily computing some blocks over and over again - that is why I pondered about the recursion, I would use dynamic programming for this one. Once from the start to end, once from end to start, then I would go and compute the products and be done with it. No need for O(n^6) of separate computations. Sine you have to use recursion, I would recommend caching the partial results and reusing them wherever possible.
Second, the root of the issue and the cause of your invisible-ish exponent. It is hidden in the find_num_mem function. Say you compute the last element in the matrix - the result[N][N] field and let us consider the simplest case, where the matrix is full of zeroes so every possible path exists.
In the first step, your recursion creates branches [N][N-1] and [N-1][N].
In the second step, [N-1][N-1], [N][N-2], [N-2][N], [N-1][N-1]
In the third step, you once again create two branches from every previous step - a beautiful example of an exponential explosion.
Now how to go about it: You will quickly notice that some of the branches are being duplicated over and over. Cache the results.
So I'm trying to solve a challenge and have come across a dead end. My solution works when the list is small or medium but when it is over 50000. It just "time out"
a = int(input().strip())
b = list(map(int,input().split()))
result = []
flag = []
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and temp in flag):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flag.remove(temp)
else:
flag.append(b[i])
result.sort()
for i in result:
print(i[0],i[1])
Where
a = 10
and b = [ 2, 4 ,6 ,8, 5 ]
Solution sum any two element in b which matches a
**Edit: ** Updated full code
flag is a list, of potentially the same order of magnitude as b. So, when you do temp in flag that's a linear search: it has to check every value in flag to see if that value is == temp. So, that's 50000 comparisons. And you're doing that once per loop in a linear walk over b. So, your total time is quadratic: 50,000 * 50,000 = 2,500,000,000. (And flag.remove is also linear time.)
If you replace flag with a set, you can test it for membership (and remove from it) in constant time. So your total time drops from quadratic to linear, or 50,000 steps, which is a lot faster than 2 billion:
flagset = set(flag)
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and temp in flagset):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flagset.remove(temp)
else:
flagset.add(b[i])
flag = list(flagset)
If flag needs to retain duplicate values, then it's a multiset, not a set, which means you can implement with Counter:
flagset = collections.Counter(flag)
for i in range(len(b)):
temp = a - b[i]
if(temp >=0 and flagset[temp]):
if(temp<b[i]):
result.append((temp,b[i]))
else:
result.append((b[i],temp))
flagset[temp] -= 1
else:
flagset[temp] += 1
flag = list(flagset.elements())
In your edited code, you’ve got another list that’s potentially of the same size, result, and you’re sorting that list every time through the loop.
Sorting takes log-linear time. Since you do it up to 50,000 times, that’s around log(50;000) * 50,000 * 50,000, or around 30 billion steps.
If you needed to keep result in order throughout the operation, you’d want to use a logarithmic data structure, like a binary search tree or a skiplist, so you could insert a new element in the right place in logarithmic time, which would mean just 800.000 steps.
But you don’t need it in order until the end. So, much more simply, just move the result.sort out of the loop and do it at the end.
I am just getting started with competitive programming and after writing the solution to certain problem i got the error of RUNTIME exceeded.
max( | a [ i ] - a [ j ] | + | i - j | )
Where a is a list of elements and i,j are index i need to get the max() of the above expression.
Here is a short but complete code snippet.
t = int(input()) # Number of test cases
for i in range(t):
n = int(input()) #size of list
a = list(map(int, str(input()).split())) # getting space separated input
res = []
for s in range(n): # These two loops are increasing the run-time
for d in range(n):
res.append(abs(a[s] - a[d]) + abs(s - d))
print(max(res))
Input File This link may expire(Hope it works)
1<=t<=100
1<=n<=10^5
0<=a[i]<=10^5
Run-time on leader-board for C language is 5sec and that for Python is 35sec while this code takes 80sec.
It is an online judge so independent on machine.numpy is not available.
Please keep it simple i am new to python.
Thanks for reading.
For a given j<=i, |a[i]-a[j]|+|i-j| = max(a[i]-a[j]+i-j, a[j]-a[i]+i-j).
Thus for a given i, the value of j<=i that maximizes |a[i]-a[j]|+|i-j| is either the j that maximizes a[j]-j or the j that minimizes a[j]+j.
Both these values can be computed as you run along the array, giving a simple O(n) algorithm:
def maxdiff(xs):
mp = mn = xs[0]
best = 0
for i, x in enumerate(xs):
mp = max(mp, x-i)
mn = min(mn, x+i)
best = max(best, x+i-mn, -x+i+mp)
return best
And here's some simple testing against a naive but obviously correct algorithm:
def maxdiff_naive(xs):
best = 0
for i in xrange(len(xs)):
for j in xrange(i+1):
best = max(best, abs(xs[i]-xs[j]) + abs(i-j))
return best
import random
for _ in xrange(500):
r = [random.randrange(1000) for _ in xrange(50)]
md1 = maxdiff(r)
md2 = maxdiff_naive(r)
if md1 != md2:
print "%d != %d\n%s" % (md1, md2, r)
exit
It takes a fraction of a second to run maxdiff on an array of size 10^5, which is significantly better than your reported leaderboard scores.
"Competitive programming" is not about saving a few milliseconds by using a different kind of loop; it's about being smart about how you approach a problem, and then implementing the solution efficiently.
Still, one thing that jumps out is that you are wasting time building a list only to scan it to find the max. Your double loop can be transformed to the following (ignoring other possible improvements):
print(max(abs(a[s] - a[d]) + abs(s - d) for s in range(n) for d in range(n)))
But that's small fry. Worry about your algorithm first, and then turn to even obvious time-wasters like this. You can cut the number of comparisons to half, as #Brett showed you, but I would first study the problem and ask myself: Do I really need to calculate this quantity n^2 times, or even 0.5*n^2 times? That's how you get the times down, not by shaving off milliseconds.
What costs more in python, making a list with comprehension or with a standalone function?
It appears I failed to find previous posts asking the same question. While other answers go into detail about bytes and internal workings of python, and that Is indeed helpful, I felt like the visual graphs help to show that there is a continuous trend.
I don't yet have a good enough understanding of the low level workings of python, so those answers are a bit foreign for me to try and comprehend.
I am currently an undergrad in CS, and I am continually amazed with how powerful python is. I recently did a small experiment to test the cost of forming lists with comprehension versus a standalone function. For example:
def make_list_of_size(n):
retList = []
for i in range(n):
retList.append(0)
return retList
creates a list of size n containing zeros.
It is well known that this function is O(n). I wanted to explore the growth of the following:
def comprehension(n):
return [0 for i in range(n)]
Which makes the same list.
let us explore!
This is the code I used for timing, and note the order of function calls (which way did I make the list first). I made the list with a standalone function first, and then with comprehension. I have yet to learn how to turn off garbage collection for this experiment, so, there is some inherent measurement error, brought about when garbage collection kicks in.
'''
file: listComp.py
purpose: to test the cost of making a list with comprehension
versus a standalone function
'''
import time as T
def get_overhead(n):
tic = T.time()
for i in range(n):
pass
toc = T.time()
return toc - tic
def make_list_of_size(n):
aList = [] #<-- O(1)
for i in range(n): #<-- O(n)
aList.append(n) #<-- O(1)
return aList #<-- O(1)
def comprehension(n):
return [n for i in range(n)] #<-- O(?)
def do_test(size_i,size_f,niter,file):
delta = 100
size = size_i
while size <= size_f:
overhead = get_overhead(niter)
reg_tic = T.time()
for i in range(niter):
reg_list = make_list_of_size(size)
reg_toc = T.time()
comp_tic = T.time()
for i in range(niter):
comp_list = comprehension(size)
comp_toc = T.time()
#--------------------
reg_cost_per_iter = (reg_toc - reg_tic - overhead)/niter
comp_cost_pet_iter = (comp_toc - comp_tic - overhead)/niter
file.write(str(size)+","+str(reg_cost_per_iter)+
","+str(comp_cost_pet_iter)+"\n")
print("SIZE: "+str(size)+ " REG_COST = "+str(reg_cost_per_iter)+
" COMP_COST = "+str(comp_cost_pet_iter))
if size == 10*delta:
delta *= 10
size += delta
def main():
fname = input()
file = open(fname,'w')
do_test(100,1000000,2500,file)
file.close()
main()
I did three tests. Two of them were up to list size 100000, the third was up to 1*10^6
See Plots:
Overlay with NO ZOOM
I found these results to be intriguing. Although both methods have a big-O notation of O(n), the cost, with respect to time, is less for comprehension for making the same list.
I have more information to share, including the same test done with the list made with comprehension first, and then with the standalone function.
I have yet to run a test without garbage collection.
Note: updates/solutions at the bottom of this question
As part of a product recommendation engine, I'm trying to segment my users based on their product preferences starting with using the k-means clustering algorithm.
My data is a dictionary of the form:
prefs = {
'user_id_1': { 1L: 3.0f, 2L: 1.0f, },
'user_id_2': { 4L: 1.0f, 8L: 1.5f, },
}
where the product ids are the longs, and ratings are floats. the data is sparse. I currently have about 60,000 users, most of whom have only rated a handful of products. The dictionary of values for each user is implemented using a defaultdict(float) to simplify the code.
My implementation of k-means clustering is as follows:
def kcluster(prefs,sim_func=pearson,k=100,max_iterations=100):
from collections import defaultdict
users = prefs.keys()
centroids = [prefs[random.choice(users)] for i in range(k)]
lastmatches = None
for t in range(max_iterations):
print 'Iteration %d' % t
bestmatches = [[] for i in range(k)]
# Find which centroid is closest for each row
for j in users:
row = prefs[j]
bestmatch=(0,0)
for i in range(k):
d = simple_pearson(row,centroids[i])
if d < bestmatch[1]: bestmatch = (i,d)
bestmatches[bestmatch[0]].append(j)
# If the results are the same as last time, this is complete
if bestmatches == lastmatches: break
lastmatches=bestmatches
centroids = [defaultdict(float) for i in range(k)]
# Move the centroids to the average of their members
for i in range(k):
len_best = len(bestmatches[i])
if len_best > 0:
items = set.union(*[set(prefs[u].keys()) for u in bestmatches[i]])
for user_id in bestmatches[i]:
row = prefs[user_id]
for m in items:
if row[m] > 0.0: centroids[i][m]+=(row[m]/len_best)
return bestmatches
As far as I can tell, the algorithm is handling the first part (assigning each user to its nearest centroid) fine.
The problem is when doing the next part, taking the average rating for each product in each cluster and using these average ratings as the centroids for the next pass.
Basically, before it's even managed to do the calculations for the first cluster (i=0), the algorithm bombs out with a MemoryError at this line:
if row[m] > 0.0: centroids[i][m]+=(row[m]/len_best)
Originally the division step was in a seperate loop, but fared no better.
This is the exception I get:
malloc: *** mmap(size=16777216) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Any help would be greatly appreciated.
Update: Final algorithms
Thanks to the help recieved here, this is my fixed algorithm. If you spot anything blatantly wrong please add a comment.
First, the simple_pearson implementation
def simple_pearson(v1,v2):
si = [val for val in v1 if val in v2]
n = len(si)
if n==0: return 0.0
sum1 = 0.0
sum2 = 0.0
sum1_sq = 0.0
sum2_sq = 0.0
p_sum = 0.0
for v in si:
sum1+=v1[v]
sum2+=v2[v]
sum1_sq+=pow(v1[v],2)
sum2_sq+=pow(v2[v],2)
p_sum+=(v1[v]*v2[v])
# Calculate Pearson score
num = p_sum-(sum1*sum2/n)
temp = (sum1_sq-pow(sum1,2)/n) * (sum2_sq-pow(sum2,2)/n)
if temp < 0.0:
temp = -temp
den = sqrt(temp)
if den==0: return 1.0
r = num/den
return r
A simple method to turn simple_pearson into a distance value:
def distance(v1,v2):
return 1.0-simple_pearson(v1,v2)
And finally, the k-means clustering implementation:
def kcluster(prefs,k=21,max_iterations=50):
from collections import defaultdict
users = prefs.keys()
centroids = [prefs[u] for u in random.sample(users, k)]
lastmatches = None
for t in range(max_iterations):
print 'Iteration %d' % t
bestmatches = [[] for i in range(k)]
# Find which centroid is closest for each row
for j in users:
row = prefs[j]
bestmatch=(0,2.0)
for i in range(k):
d = distance(row,centroids[i])
if d <= bestmatch[1]: bestmatch = (i,d)
bestmatches[bestmatch[0]].append(j)
# If the results are the same as last time, this is complete
if bestmatches == lastmatches: break
lastmatches=bestmatches
centroids = [defaultdict(float) for i in range(k)]
# Move the centroids to the average of their members
for i in range(k):
len_best = len(bestmatches[i])
if len_best > 0:
for user_id in bestmatches[i]:
row = prefs[user_id]
for m in row:
centroids[i][m]+=row[m]
for key in centroids[i].keys():
centroids[i][key]/=len_best
# We may have made the centroids quite dense which significantly
# slows down subsequent iterations, so we delete values below a
# threshold to speed things up
if centroids[i][key] < 0.001:
del centroids[i][key]
return centroids, bestmatches
Not all these observations are directly relevant to your issues as expressed, but..:
a. why are the key in prefs, as shown, longs? unless you have billions of users, simple ints will be fine and save you a little memory.
b. your code:
centroids = [prefs[random.choice(users)] for i in range(k)]
can give you repeats (two identical centroids), which in turn would not make the K-means algorithm happy. Just use the faster and more solid
centroids = [prefs[u] for random.sample(users, k)]
c. in your code as posted you're calling a function simple_pearson which you never define anywhere; I assume you mean to call sim_func, but it's really hard to help on different issues while at the same time having to guess how the code you posted differs from any code that might actually be working
d. one more indication that this posted code may be different from anything that might actually work: you set bestmatch=(0,0) but then test with if d < bestmatch[1]: -- how is the test ever going to succeed? is the distance function returning negative values?
e. the point of a defaultdict is that just accessing row[m] magically adds an item to row at index m (with the value obtained by calling the defaultdict's factory, here 0.0). That item will then take up memory forevermore. You absolutely DON'T need this behavior, and therefore your code:
row = prefs[user_id]
for m in items:
if row[m] > 0.0: centroids[i][m]+=(row[m]/len_best)
is wasting huge amount of memory, making prefs into a dense matrix (mostly full of 0.0 values) from the sparse one it used to be. If you code instead
row = prefs[user_id]
for m in row:
centroids[i][m]+=(row[m]/len_best)
there will be no growth in row and therefore in prefs because you're looping over the keys that row already has.
There may be many other such issues, major like the last one or minor ones -- as an example of the latter,
f. don't divide a bazillion times by len_best: compute its inverse one outside the loop and multiply by that inverse -- also you don't need to do that multiplication inside the loop, you can do it at the end in a separate since it's the same value that's multiplying every item -- this saves no memory but avoids wantonly wasting CPU time;-). OK, these are two minor issues, I guess, not just one;-).
As I mentioned there may be many others, but with the density of issues already shown by these six (or seven), plus the separate suggestion already advanced by S.Lott (which I think would not fix your main out-of-memory problem, since his code still addressing the row defaultdict by too many keys it doesn't contain), I think it wouldn't be very productive to keep looking for even more -- maybe start by fixing these ones and if problems persist post a separate question about those...?
Your centroids does not need to be an actual list.
You never appear to reference anything other than centroids[i][m]. If you only want centroids[i], then perhaps it doesn't need to be a list; a simple dictionary would probably do.
centroids = defaultdict(float)
# Move the centroids to the average of their members
for i in range(k):
len_best = len(bestmatches[i])
if len_best > 0:
items = set.union(*[set(prefs[u].keys()) for u in bestmatches[i]])
for user_id in bestmatches[i]:
row = prefs[user_id]
for m in items:
if row[m] > 0.0: centroids[m]+=(row[m]/len_best)
May work better.