My heap is working too slow - python

So here is my code for min-heap. It's a part of my homework:
def heapify(i):
global end,a
l=2*i+1
if l>end:
return None
r=2*i+2
minarg=i
if a[i]>a[l]:
minarg=l
if r<=end:
if a[minarg]>a[r]: minarg=r
if a[i]==a[minarg]:
return None
else:
a[i],a[minarg]=a[minarg], a[i]
heapify(minarg)
def buildHeap(start):
global end,a
if start*2+1>end:
return None
buildHeap(start*2+1)
buildHeap(start*2+2)
heapify(start)
It should be working, but I get time limit exceeded for large testcases. Am I doing something wrong?

Function calls in Python take time, recursion takes space.
To save the time of a recursion one usually transforms it into a loop. This usually requires to use a specialized "memory management" of the date to work on to safe space. You did that already (with... ehem... global variables) using an array/list.
If that is your homework, go ahead -- doable, but non-trivial.

Related

Converting recursive function to completely iterative function without using extra space

Is it possible to convert a recursive function like the one below to a completely iterative function?
def fact(n):
if n <= 1:
return
for i in range(n):
fact(n-1)
doSomethingFunc()
It seems pretty easy to do given extra space like a stack or a queue, but I was wondering if we can do this in O(1) space complexity?
Note, we cannot do something like:
def fact(n):
for i in range (factorial(n)):
doSomethingFunc()
since it takes a non-constant amount of memory to store the result of factorial(n).
Well, generally speaking no.
I mean, the space taken in the stack by recursive functions is not just an inconvenient of this programming style. It is the memory needed for the computation.
So, sure, for lot of algorithm, that space is unnecessary and could be spared. For a classical factorial for example
def fact(n):
if n<=1:
return 1
else:
return n*fact(n-1)
the stacking of all the n, n-1, n-2, ..., 1 arguments is not really necessary.
So, sure, you can find an implementation that get rid of it. But that is optimization (For example, in the specific case of terminal recursion. But I am pretty sure that you add that "doSomething" to make clear that you don't want to focus on that specific case).
You cannot assume in general that an algorithm that don't need all those values exist, recursive or iterative. Or else, that would be saying that all algorithm exist in a O(1) space complexity version.
Example: base representation of a positive integer
def baseRepr(num, base):
if num>=base:
s=baseRepr(num//base, base)
else:
s=''
return s+chr(48+num%base)
Not claiming it is optimal, or even well written.
But, the stacking of the arguments is needed. It is the way you implicitly store the digits that you compute in the reverse order.
An iterative function would also need some memory to store those digits, since you have to compute the last one first.
Well, I am pretty sure that for this simple example, you could find a way to compute from left to right, for example using a log computation to know in advance the number of digits or something. But that's not the point. Just imagine that there is no other algorithm known than the one computing digits from right to left. Then you need to store them. Either implicitly in the stack using recursion, or explicitly in allocated memory. So again, memory used in the stack is not just an inconvenience of recursion. It is the way recursive algorithm store things, that would be stored otherwise in iterative algorithm
Note, we cannot do something like:
def fact(n):
for i in range (factorial(n)):
doSomethingFunc()
since it takes a non-constant amount of memory to store the result of
factorial(n).
Yes.
I was wondering if we can do this in O(1) space complexity?
So, no.

Python recursive algorithm segmentation fault

I'm pretty bad with recursion as it is, but this algorithm naturally seems like it's best done recursively. Basically, I have a list of all the function calls made in a C program, throughout multiple files. This list is unordered. My recursive algorithm attempts to make a tree of all the functions called, starting from the main method.
This works perfectly fine for smaller programs, but when I tried it out with larger ones I'm getting this error. I read that the issue might be due to me exceeding the cstack limit? Since I already tried raising the recursion limit in python.
Would appreciate some help here, thanks.
functions = set containing a list of function calls and their info, type Function. The data in node is of type Function.
#dataclass
class Function:
name : str
file : str
id : int
calls : set
....
Here's the algorithm.
def order_functions(node, functions, defines):
calls = set()
# Checking if the called function is user-defined
for call in node.data.calls:
if call in defines:
calls.add(call)
node.data.calls = calls
if len(calls) == 0:
return node
for call in node.data.calls:
child = Node(next((f for f in functions if f.name == call), None))
node.add_child(child)
Parser.order_functions(child, functions, defines)
return node
If you exceed the predefined limit on the call stack size, the best idea probably is to rewrite an iterative version of your program. If you have no idea on how deeply your recursion will go, then don't use recursion.
More information here, and maybe if you need to implement an iterative version you can get inspiration from this post.
The main information here is that python doesn't perform any tail recursion elimination. Therefore recursive functions will never work with inputs that have an unknown/unbounded hierarchical structure.

Python memory issues - memory doesn't get released after finishing a method

I have a quite complex python (2.7 on ubuntu) code which is leaking memory unexpectedly. To break it down, it is a method which is repeatedly called (and itself calls different methods) and returns a very small object. After finishing the method the used memory is not released. As far as I know it is not unusual to reserve some memory for later usages, but if I use big enough input my machine eventually consumes all memory and freezes. This is not the case if I use a subprocess with concurrent.futures ProcessPoolExecutor, thus I need to assume it is not my code but some underlying problems?!
Is this a known issue? Might it be a problem in 3rd party libraries I am using (e.g. PyQgis)? Where should I start to search for the problem?
Some more Background to eliminate silly reasons (because I am still somewhat of a beginner):
The method uses some global variables but in my understanding these should only be active in the file where they are declared and anyways should be overwritten in the next call of the method?!
To clarify in pseudocode:
def main():
load input from file
for x in input:
result = extra_file.initialization(x)
#here is the point where memory should get released in my opinion
#extra file
def initialization(x):
global input
input = x
result_container = []
while not result do:
part_of_result = method1()
result_container.append(part_of_result)
if result_container fulfills condition to be the final result:
result = result_container
del input
return result
def method1():
#do stuff
method2()
#do stuff
return part_of_result
def method2():
#do stuff with input not altering it
Numerous different methods and global variables are involved and the global declaration is used to not pass like 5 different input variables through multiple methods which don't even use them.
Should I try using garbage collection? All references after finishing the method should be deleted and python itself should take care of it?
Definitely try using garbage collection. I don't believe it's a known problem.

Is it possible to code this formula using recursion

I am trying to code the following formula using recursion.
I was thinking of doing it in different ways, but since the expression is recursive, in my opinion recursion is the way to go.
I know how to apply recursion to simple problems, but in this particular case my understanding seems to be wrong. I tried to code it in python, but the code failed with the message
RuntimeError: maximum recursion depth exceeded
Therefore, I would like to ask what is the best way to code this expression and whether recursion is possible at all.
The python code I tried is:
def coeff(l,m,m0,m1):
if l==0 and m==0:
return 1.
elif l==1 and m==0:
return -(m1+m0)/(m1-m0)
elif l==1 and m==1 :
return 1./(m1-m0)
elif m<0 or m>l:
return 0.
else:
return -((l+1.)*(m1-m0))/((2.*l+1.)*(m1+m0))* \
((2.*l+1.)*(m+1.)/((l+1.)*(2.*m+3.)*(m1-m0))*coeff(l,m+1,m0,m1) +\
(2.*l+1.)*m/((l+1.)*(2.*m-1.)*(m1-m0))*coeff(l,m-1,m0,m1) -\
l/(l+1.)*coeff(l-1,m,m0,m1))
where x=m1-m0 and y=m1+m0. In my code I tried to express the a(l,m) coefficient as a function of the others and code the recursion on the basis of it.
A naive recursive implementation here, obviously recalculates the same things over and over. It probably pays to store things previously calculated. This can be done either by explicitly filling out a table, or implicitly by memoization (I therefore don't really agree with the comments taking about "recursion vs. dynamic programming").
E.g., using this decorator,
class memoize(dict):
def __init__(self, func):
self.func = func
def __call__(self, *args):
return self[args]
def __missing__(self, key):
result = self[key] = self.func(*key)
return result
You could write it as
#memoize
def calc_a(l, m, x, y):
if l == 0 and m == 0:
return 1
# Rest of formula goes here.
Note that the same link contains a version that caches between invocations.
There are a number of tradeoffs between (memoized) recursion and explicit table building:
Recursion is typically more limited in the number of invocations (which might or might not be an issue in your case - you seem to have an infinite recursion problem in your original code).
Memoized recursion is (arguably) simpler to implement than explicit table building with a loop.

Very simple python functions takes spends long time in function and not subfunctions

I have spent many hours trying to figure what is going on here.
The function 'grad_logp' in the code below is called many times in my program, and cProfile and runsnakerun the visualize the results reveals that the function grad_logp spends about .00004s 'locally' every call not in any functions it calls and the function 'n' spends about .00006s locally every call. Together these two times make up about 30% of program time that I care about. It doesn't seem like this is function overhead as other python functions spend far less time 'locally' and merging 'grad_logp' and 'n' does not make my program faster, but the operations that these two functions do seem rather trivial. Does anyone have any suggestions on what might be happening?
Have I done something obviously inefficient? Am I misunderstanding how cProfile works?
def grad_logp(self, variable, calculation_set ):
p = params(self.p,self.parents)
return self.n(variable, self.p)
def n (self, variable, p ):
gradient = self.gg(variable, p)
return np.reshape(gradient, np.shape(variable.value))
def gg(self, variable, p):
if variable is self:
gradient = self._grad_logps['x']( x = self.value, **p)
else:
gradient = __builtin__.sum([self._pgradient(variable, parameter, value, p) for parameter, value in self.parents.iteritems()])
return gradient
Functions coded in C are not instrumented by profiling; so, for example, any time spent in sum (which you're spelling __builtin__.sum) will be charged to its caller. Not sure what np.reshape is, but if it's numpy.reshape, the same applies there.
Your "many hours" might be better spent making your code less like a maze of twisty little passages and also documenting it.
The first method's arg calculation_set is NOT USED.
Then it does p = params(self.p,self.parents) but that p is NOT USED.
variable is self???
__builtin__.sum???
Get it firstly understandable, secondly correct. Then and only then, worry about the speed.

Categories