I apologize if my question isn't appropriate for this website, but this is the only place I know that can answer computer science questions.
For my quiz, we were told to calculate and simplify the complexity class of a function. I understand most of the concepts and everything, but I cannot understand why O(1)is incorrect for the line aset = set(alist). The correct answer is supposed to be O(N), but I don't see why this is.
Here is the complete function:
def sum_to_b(alist,asum):
aset = set(alist)
for v in alist:
if asum-v in aset:
return (v,asum-v)
return None
You need to iterate over each element of 'alist' exactly one time (assuming it is regular iterable) to build 'aset' set.
Related
My aim is to improve the speed of my Python code that has been successfully accepted in a leetcode problem, Course Schedule.
I am aware of the algorithm but even though I am using O(1) data-structures, my runtime is still poor: around 200ms.
My code uses dictionaries and sets:
from collections import defaultdict
class Solution:
def canFinish(self, numCourses: int, prerequisites: List[List[int]]) -> bool:
course_list = []
pre_req_mapping = defaultdict(list)
visited = set()
stack = set()
def dfs(course):
if course in stack:
return False
stack.add(course)
visited.add(course)
for neighbor in pre_req_mapping.get(course, []):
if neighbor in visited:
no_cycle = dfs(neighbor)
if not no_cycle:
return False
stack.remove(course)
return True
# for course in range(numCourses):
# course_list.append(course)
for pair in prerequisites:
pre_req_mapping[pair[1]].append(pair[0])
for course in range(numCourses):
if course in visited:
continue
no_cycle = dfs(course)
if not no_cycle:
return False
return True
What else can I do to improve the speed?
You are calling dfs() for a given course multiple times.
But its return value won't change.
So we have an opportunity to memoize it.
Change your algorithmic approach (here, to dynamic programming)
for the big win.
It's a space vs time tradeoff.
EDIT:
Hmmm, you are already memoizing most of the computation
with visited, so lru_cache would mostly improve clarity
rather than runtime.
It's just a familiar idiom for caching a result.
It would be helpful to add a # comment citing a reference
for the algorithm you implemented.
This is a very nice expression, with defaulting:
pre_req_mapping.get(course, [])
If you use timeit you may find that the generated bytecode
for an empty tuple () is a tiny bit more efficient than that
for an empty list [], as it involves fewer allocations.
Ok, some style nits follow, unrelated to runtime.
As an aside, youAreMixingCamelCase and_snake_case.
PEP-8 asks you to please stick with just snake_case.
This is a fine choice of identifier name:
for pair in prerequisites:
But instead of the cryptic [0], [1] dereferences,
it would be easier to read a tuple unpack:
for course, prereq in prerequisites:
if not no_cycle: is clumsy.
Consider inverting the meaning of dfs' return value,
or rephrasing the assignment as:
cycle = not dfs(course)
I think that you are doing it in good way, but since Python is an interpreted language, it's normal to have slow runtime compared with compiled languages like C/C++ and Java, especially for large inputs.
Try to write the same code in C/C++ for example and compare the speed between them.
I've had the following assignment: Given a list of n integers, each integer in the list is unique and larger than 0. I am also given a number K - which is an integer larger than 0.
List slicing of any kind is not allowed
I need to check whether there is a subset that sums up to K.
e.g: for the list [1,4,8], and k=5, I return True, because we have the subset {1,4}.
Now I need to implement this using a recursion:
So I did, however I needed to implement memoization:
And I wonder what is the difference between those functions' code:
I mean, both seem to implement memoization, however the second should work better but it doesn't. I'd really appreciate some help :)
def subsetsum_mem(L, k):
'''
fill-in your code below here according to the instructions
'''
sum_dict={}
return s_rec_mem(L,0,k,sum_dict)
def s_rec_mem(L, i, k, d):
'''
fill-in your code below here according to the instructions
'''
if(k==0):
return True
elif(k<0 or i==len(L)):
return False
else:
if k not in d:
res_k=s_rec_mem(L,i+1,k-L[i],d) or s_rec_mem(L,i+1,k,d)
d[k]=res_k
return res_k
def subsetsum_mem2(L, k):
'''
fill-in your code below here according to the instructions
'''
sum_dict={}
return s_rec_mem2(L,0,k,sum_dict)
def s_rec_mem2(L, i, k, d):
'''
fill-in your code below here according to the instructions
'''
if(k==0):
return True
elif(k<0 or i==len(L)):
return False
else:
if k not in d:
res_k=s_rec_mem2(L,i+1,k-L[i],d) or s_rec_mem2(L,i+1,k,d)
d[k]=res_k
return res_k
else:
return d[k]
You have two problems with your memoization.
First, you're using just k as the cache key. But the function does different things for different values of i, and you're ignoring i in the cache, so you're going to end up returning the value from L, 9, 1, d for L, 1, 1, d.
Second, only in s_rec_mem, you never return d[k]; if it's present, you just fall off the end and return None (which is falsey).
So, the second one does come closer to working—but it still doesn't actually work.
You could fix it like this:
def s_rec_mem2(L, i, k, d):
'''
fill-in your code below here according to the instructions
'''
if(k==0):
return True
elif(k<0 or i==len(L)):
return False
else:
if (i, k) not in d:
res_k=s_rec_mem2(L,i+1,k-L[i],d) or s_rec_mem2(L,i+1,k,d)
d[i, k]=res_k
return res_k
else:
return d[i, k]
… or by just using lru_cache, either by passing down tuple(L) instead of L (because tuples, unlike lists, can be hashed as keys, and your recursive function doesn't care what kind of sequence it gets), or by making it a local function that sees L via closure instead of getting passed it as a parameter.
Finally, from a quick glance at your code:
It looks like you're only ever going to evaluate s_rec_mem at most twice on each set of arguments (assuming you correctly cache on i, k rather than just k), which means memoization can only provide a 2x constant speedup at best. To get any more than that, you need to rethink your caching or your algorithm.
You're only memoizing within each separate top-level run, but switching to lru_cache on tuple(L), i, k means you're memoizing across all runs until you restart the program (or REPL session)—so the first test may take a long time, but subsequent runs on the same L (even with a different k) could benefit from previous caching.
You seem to be trying to solve a minor variation of the subset sum problem. That problem in the general case is provably NP-complete, which means it's guaranteed to take exponential time. And your variation seems to be if anything harder than the standard problem, not easier. If so, not only is your constant-factor speedup not going to have much benefit, there's really nothing you can do that will do better. In real life, people who solve equivalent problems usually use either optimization (e.g., via dynamic programming) or approximation to within an arbitrary specified limit, both of which allow for polynomial (or at least pseudo-polynomial) solutions in most cases, but can't solve all cases. Or, alternatively, there are subclasses of inputs for which you can solve the problem in polynomial time, and if you're lucky, you can prove your inputs fall within one of those subclasses. If I've guessed right on what you're doing, and you want to pursue any of those three options, you'll need to do a bit of research. (Maybe Wikipedia is good enough, or maybe there are good answers here or on the compsci or math SE sites to get you started.)
This question already has answers here:
Recursion or Iteration?
(31 answers)
Closed 5 years ago.
I am relatively new to python and have recently learned about recursion. When tasked to find the factorial of a number, I used this:
def factorial(n):
product = 1
for z in xrange(1,n+1):
product *= z
return product
if __name__ == "__main__":
n = int(raw_input().strip())
result = factorial(n)
print result
Then, because the task was to use recursion, I created a solution that used recursion:
def factorial(n):
if n == 1:
current = 1
else:
current = n*(factorial(n-1))
return current
if __name__ == "__main__":
n = int(raw_input().strip())
result = factorial(n)
print result
Both seem to produce the same result. My question is why would I ever use recursion, if I could use a for loop instead? Are there situations where I cannot just create for loops instead of using recursion?
For every solution that you found with recursion there are a solution iterative, because you can for example simulate the recursion using an stack.
The example of Factorial use a type of recursion named Tail Recursion an this cases have an easy way to implement iterative, but in this case recursion solution is more similar to the mathematical definition. However there are other problems that found an iterative solution is difficult and is more powerful and more expressive use recursive solution. For example the problem of Tower of Hanoi see this question for more informationTower of Hanoi: Recursive Algorithm, the solution of this problem iterative is very tedious and finally have to simulate a recursion.
There are problems like Fibonacci sequence that the definition is recursive an is easy to generate a solution recursive
def fibonacci(n):
if ((n==1) or (n==2)):
return 1
else (n>2):
return fibonacci(n-2) + fibonacci(n-1)
This solution is straight forward, but calculate many times unnecessarily the fibonacci of n-2 see the image bellow to better understanding the fibonacci(7)
So you can see the recursion like syntactic sugar some time, but depends of what you want, you need to decide if use or no. When you program in Low-level programming language the recursion is not used, when you program a microprocessor is a big error, but on others case is better use a recursive solutions for better understanding of your code.
hope this help, but you need go deep reading books.
In a Python tutorial, I've learned that
Like functions, generators can be recursively programmed. The following
example is a generator to create all the permutations of a given list of items.
def permutations(items):
n = len(items)
if n==0: yield []
else:
for i in range(len(items)):
for cc in permutations(items[:i]+items[i+1:]):
yield [items[i]]+cc
for p in permutations(['r','e','d']): print(''.join(p))
for p in permutations(list("game")): print(''.join(p) + ", ", end="")
I cannot figure out how it generates the results. The recursive things and 'yield' really confused me. Could someone explain the whole process clearly?
There are 2 parts to this --- recursion and generator. Here's the non-generator version that just uses recursion:
def permutations2(items):
n = len(items)
if n==0: return [[]]
else:
l = []
for i in range(len(items)):
for cc in permutations2(items[:i]+items[i+1:]):
l.append([items[i]]+cc)
return l
l.append([item[i]]+cc) roughly translates to the permutation of these items include an entry where item[i] is the first item, and permutation of the rest of the items.
The generator part yield one of the permutations instead of return the entire list of permutations.
When you call a function that returns, it disappears after having produced its result.
When you ask a generator for its next element, it produces it (yields it), and pauses -- yields (the control back) to you. When asked again for the next element, it will resume its operations, and run normally until hitting a yield statement. Then it will again produce a value and pause.
Thus calling a generator with some argument causes creation of actual memory entity, an object, capable of running, remembering its state and arguments, and producing values when asked.
Different calls to the same generator produce different actual objects in memory. The definition is a recipe for the creation of that object. After the recipe is defined, when it is called it can call any other recipe it needs -- or the same one -- to create new memory objects it needs, to produce the values for it.
This is a general answer, not Python-specific.
Thanks for the answers. It really helps me to clear my mind and now I want to share some useful resources about recursion and generator I found on the internet, which is also very friendly to the beginners.
To understand generator in python. The link below is really readable and easy to understand.
What does the "yield" keyword do in Python?
To understand recursion, "https://www.youtube.com/watch?v=MyzFdthuUcA". This youtube video gives a "patented" 4 steps method to writing any recursive method/function. That is very clear and practicable. The channel also has several videos to show people how does the recursion works and how to trace it.
I hope it can help someone like me.
SO busy with some code, and have a function which basically takes dictionary where each value is a list, and returns the key with the largest list.
I wrote the following:
def max_list(dic):
if dic:
l1 = dic.values()
l1 = map(len, l1)
l2 = dic.keys()
return l2[l1.index(max(l1))]
else:
return None
Someone else wrote the following:
def max_list(dic):
result = None
maxValue = 0
for key in dic.keys():
if len(dic[key]) >= maxValue:
result = key
maxValue = len(dic[key])
return result
Which would be the 'correct' way to do this, if there is one. I hope this is not regarded as community wiki (even though the code works), trying to figure which would be the best pattern in terms of the problem.
Another valid option:
maxkey,maxvalue = max(d.items(),key=lambda x: len(x[1]))
Of the two above, I would probably prefer the explicit for loop as you don't generate all sorts of intermediate objects just to throw them away.
As a side note, This solution doesn't work particularly well for empty dicts ... (it raises a ValueError). Since I expect that is an unusual case (rather than the norm), it shouldn't hurt to enclose in a try-except ValueError block.
the most pythonic would be max(dic,key=lambda x:len(dic[x])) ... at least I would think ...
maximizing readability and minimizing lines of code is pythonic ... usually
I think the question you should ask yourself is, what do you think the most important is: code maintainability or computation speed?
As the other answers point out, this problem has a very concise solution using a map. For most people this implementation would probably be more easy to read then the implementation with a loop.
In terms of computational speed, the map solution would be less efficient, but still be in the same Computational Magnitute.
Therefore, I think it is unlikely that the map method would ever have noticeably less performance. I would suggest you to use a profiler after your program is finished, so you can be sure where the real problem lies if your program turns out to run slower than desired.