Understanding to get mean from integer stream using heap - python

I am following the below blog and understood how to get the median in a very subtle way. The blog is here
Now, I added the below function to the streamMedian class to get the mean of the number inserted and not getting the desired output
import heapq
class streamMedian:
def __init__(self):
self.minHeap, self.maxHeap = [], []
self.N=0
def insert(self, num):
if self.N%2==0:
heapq.heappush(self.maxHeap, -1*num)
self.N+=1
if len(self.minHeap)==0:
return
if -1*self.maxHeap[0]>self.minHeap[0]:
toMin=-1*heapq.heappop(self.maxHeap)
toMax=heapq.heappop(self.minHeap)
heapq.heappush(self.maxHeap, -1*toMax)
heapq.heappush(self.minHeap, toMin)
else:
toMin=-1*heapq.heappushpop(self.maxHeap, -1*num)
heapq.heappush(self.minHeap, toMin)
self.N+=1
def getMedian(self):
if self.N%2==0:
return (-1*self.maxHeap[0]+self.minHeap[0])/2.0
else:
return -1*self.maxHeap[0]
def getMean(self):
sum = 0
for num in self.maxHeap:
sum += num
for num in self.minHeap:
sum += num
return sum/self.N
This is the function call to the streamMedian class.
test = streamMedian()
test.insert(1)
test.insert(2)
test.insert(3)
print test.getMedian()
print test.getMean()
The median here should be 2 and mean should be 2 (Instead 0 is the output). Thanks in advance.

You are pushing negative numbers to your maxHeap (-1*num).
You need to reverse that in your getMean(), e.g.:
def getMean(self):
total = 0
for num in self.maxHeap:
total -= num
for num in self.minHeap:
total += num
return total/self.N
Or alternatively:
def getMean(self):
return (abs(sum(self.maxHeap)) + sum(self.minHeap))/self.N
Note: don't use sum as a variable it hides the python builtin sum() function.

AChampion's answer correctly identifies the issue with your current code and offers a reasonable fix while still using your current algorithm. However, that algorithm is not very efficient (it takes O(N) time) and you can do better.
Specifically, you should add the value you're inserting to a cumulative sum in addition to pushing it onto one of your heaps. That way, when you need to get a mean, you can compute it in constant time (with just a single division):
class streamMedian:
def __init__(self):
self.minHeap, self.maxHeap = [], []
self.cumulative_sum = 0.0 # new instance variable
self.N=0
def insert(self, num):
self.cumulative_sum += num # add each value to it
# rest of insert code...
# median code...
def getMean(self):
return self.cumulative_sum / self.N # compute the mean in constant time
Note that if you're using Python 2 (which it appears you are) it's important that cumulative_sum is initialized with the float value 0.0 instead of the integer 0 (which would otherwise be natural). When you divide two integers in Python 2, you'll get another integer, rounding down. That may not be desirable if you're computing, say, the mean of 1 and 2 (you'd expect 1.5, but you'd get 1 if you just do (1 + 2) / 2). Python 3 does this better (you always get a float from regular division and can use the // operator to explicitly request "floor" division). If you want to get the same semantics in Python 2, you can put from __future__ import division at the top of your module.

Related

Finding the Maximum Pyramidal Number by recursion in Python

I'm given the task to define a function to find the largest pyramidal number. For context, this is what pyramidal numbers are:
1 = 1^2
5 = 1^2 + 2^2
14 = 1^2 + 2^2 + 3^2
And so on.
The first part of the question requires me to find iteratively the largest pyramidal number within the range of argument n. To which, I successfully did:
def largest_square_pyramidal_num(n):
total = 0
i = 0
while total <= n:
total += i**2
i += 1
if total > n:
return total - (i-1)**2
else:
return total
So far, I can catch on.
The next part of the question then requires me to define the same function, but this time recursively. That's where I was instantly stunned. For the usual recursive functions that I have worked on before, I had always operated ON the argument, but had never come across a function where the argument was the condition instead. I struggled for quite a while and ended up with a function I knew clearly would not work. But I simply could not wrap my head around how to "recurse" such function. Here's my obviously-wrong code:
def largest_square_pyramidal_num_rec(n):
m = 0
pyr_number = 0
pyr_number += m**2
def pyr_num(m):
if pyr_number >= n:
return pyr_number
else:
return pyr_num(m+1)
return pyr_number
I know this is erroneous, and I can say why, but I don't know how to correct it. Does anyone have any advice?
Edit: At the kind request of a fellow programmer, here is my logic and what I know is wrong:
Here's my logic: The process that repeats itself is the addition of square numbers to give the pyr num. Hence this is the recursive process. But this isn't what the argument is about, hence I need to redefine the recursive argument. In this case, m, and build up to a pyr num of pyr_number, to which I will compare with the condition of n. I'm used to recursion in decrements, but it doesn't make sense to me (I mean, where to start?) so I attempted to recall the function upwards.
BUT this clearly isn't right. First of all, I'm sceptical of defining the element m and pyr_num outside of the pyr_num subfunction. Next, m isn't pre-defined. Which is wrong. Lastly and most importantly, the calling of pyr_num will always call back pyr_num = 0. But I cannot figure out another way to write this logic out
Here's a recursive function to calculate the pyramid number, based on how many terms you give it.
def pyramid(terms: int) -> int:
if terms <=1:
return 1
return terms * terms + pyramid(terms - 1)
pyramid(3) # 14
If you can understand what the function does and how it works, you should be able to figure out another function that gives you the greatest pyramid less than n.
def base(n):
return rec(n, 0, 0)
def rec(n, i, tot):
if tot > n:
return tot - (i-1)**2
else:
return rec(n, i+1, tot+i**2)
print(base(NUMBER))
this output the same thing of your not-recursive function.

Find no of occurrences of a function from another function in python

Question: " If all digits of a number n are multiplied by each other repeating with the product, the one-digit number obtained at last is called the multiplicative digital root of n. The number of times digits need to be multiplied to reach one digit is called the multiplicative persistence of n.
Example: 86 -> 48 -> 32 -> 6 (MDR 6, MPersistence 3)
341 -> 12->2 (MDR 2, MPersistence 2)
Using the function prodDigits() of previous exercise write functions MDR() and MPersistence() that input a number and return its multiplicative digital root and
multiplicative persistence respectively"
no=int(input())
def prodDigits(n):
prod=1
while n>0:
prod=prod*(n%10)
n=n//10
return prod
def mdr(n):
if n<10:
return n
else:
return mdr(prodDigits(n))
def MPersistence(n):
count=0
while():
count=count+1
return count
print(mdr(no))
print(MPersistence(no))
I am not able to get how can I count mdr() function call from MPersistence() function. I have tried few conditions inside while loop but all are resulting in an infinite loop.
In general, this is not possible – MPersistence will only ever run before or after MDR. At a minimum, MDR would need to be modified to track iterations itself, which raises problems how to safely exchange the result between two functions. Ideally, MPersistence should work by independently computing the chain of digit multiplications.
To avoid code duplication, use one function that computes both digital root and persistence; as needed, provide convenience functions to return each component directly.
def mdrp(n: int):
"""
Compute the multiplicative persistence and digital root of ``n``
"""
if n < 10:
return 0, n
else:
persistence, digital_root = mdrp(prodDigits(n))
return persistence + 1, digital_root
def MDR(n):
return mdrp(n)[1]
def MPersistence(n):
return mdrp(n)[0]
You don't need another function to count the function calls. MisterMiyagi's idea is to have a counter inside the function itself, and return both the persistence and the digital root at every function call, unpacking the result at the end.
I think a simpler way would be to have a counter as a global variable and increase its value inside the function on every call:
def mdr(n):
if n<10:
return n
else:
return mdr(prodDigits(n))
globals()['persistence'] += 1
persistence = 0
print(mdr(86), persistence) # 6 3
This approach looks simpler but there's one thing to keep in mind: the counter has to be initialized every time before calling the function:
print(msr(341), persistence) # 2 5 Wrong result
persistence = 0
print(msr(341), persistence) # 2 2 Right result
This would not be a pure function, because it depends on and alters non-local variables.

Adding through iteration

I want to take user input and add each number up to 0. For example user inputs 9 I want to add 9+8+7+6.... +1 and output the total. My code
def main(*args):
sum = 0
for i in args:
sum = i + (i - 1)
return sum
result = main(9)
print(result)
comes close, but I can't get it to iterate through until 0. I've tried adding ranges in a few ways but no luck there. I'm stuck.
Let's say the user input is assigned to x, then the most simplistic answer is:
sum(range(int(x)+1))
Note that range() will generate a list (actually, an immutable sequence type in Python 3) of numbers up to, but not including, x, hence the +1.
In terms of your original code, there are a few issues. First, you should avoid naming variables the same as Python built-ins, such as sum. Second, you are attempting to iterate through a tuple of input arguments (e.g. args = (9,) in your case), which will perform 9 + (9-1), or otherwise 17 and then return that sum as an output.
Instead, you could do something like:
def main(*args):
mysum = 0
for i in range(args[0]+1):
mysum = mysum + i
return mysum
result = main(9)
print(result)
Both solutions here will return 45.
Nth triangle number. No iteration needed.
def calculate_nth_triangle_number(value):
return value * (value + 1) / 2
Your code misuses a relatively advanced feature of Python, that is argument packing, where all the arguments supplied to a function are packed in a tuple.
What happens when you call main(9)? the loop is entered once (because calling the function with a single argument is equivalent to args = (9, ) in the body of the function) i takes only one value, i = 9 and you have sum = 9+8 = 17.
For your case I don't like a for loop, can you use a while loop? with a while your function follows exactly the definition of your task!
def my_sum(n):
result = 0
while n>0:
result = result + n
n = n - 1
return result
Note that the order of summation and decrease is paramount to a correct result... note also that sum is the name of a built-in function and it is considered bad taste to overload a built-in name with an expression of yours.

Using python how do I repeatedly divide a number by 2 until it is less than 1.0?

I am unsure of how to create the loop to keep dividing the number by two? Please help. I know you you can divide a number by 2 don't know how to create the loop to keep dividing until it is less than 1.0.
It depends on what exactly you're after as it isn't clear from the question. A function that just divides a number by zero until it is less than 1.0 would look like this:
def dividingBy2(x):
while x > 1.0:
x = x/2
But this serves no purpose other than understanding while loops, as it gives you no information. If you wanted to see how many times you can divide by 2 before a number is less than 1.0, then you could always add a counter:
def dividingBy2Counter(x):
count = 0
while x > 1.0:
x = x/2
count = count + 1
return count
Or if you wanted to see each number as x becomes increasingly small:
def dividingBy2Printer(x):
while x > 1.0:
x = x/2
print(x)
b=[] #initiate a list to store the result of each division
#creating a recursive function replaces the while loop
#this enables the non-technical user to call the function easily
def recursive_func(a=0): #recursive since it will call itself later
if a>=1: #specify the condition that will make the function run again
a = a/2 #perform the desired calculation(s)
recursive_func(a) #function calls itself
b.append(a) #records the result of each division in a list
#this is how the user calls the function as an example
recursive_func(1024)
print (b)

compute mean in python for a generator

I'm doing some statistics work, I have a (large) collection of random numbers to compute the mean of, I'd like to work with generators, because I just need to compute the mean, so I don't need to store the numbers.
The problem is that numpy.mean breaks if you pass it a generator. I can write a simple function to do what I want, but I'm wondering if there's a proper, built-in way to do this?
It would be nice if I could say "sum(values)/len(values)", but len doesn't work for genetators, and sum already consumed values.
here's an example:
import numpy
def my_mean(values):
n = 0
Sum = 0.0
try:
while True:
Sum += next(values)
n += 1
except StopIteration: pass
return float(Sum)/n
X = [k for k in range(1,7)]
Y = (k for k in range(1,7))
print numpy.mean(X)
print my_mean(Y)
these both give the same, correct, answer, buy my_mean doesn't work for lists, and numpy.mean doesn't work for generators.
I really like the idea of working with generators, but details like this seem to spoil things.
In general if you're doing a streaming mean calculation of floating point numbers, you're probably better off using a more numerically stable algorithm than simply summing the generator and dividing by the length.
The simplest of these (that I know) is usually credited to Knuth, and also calculates variance. The link contains a python implementation, but just the mean portion is copied here for completeness.
def mean(data):
n = 0
mean = 0.0
for x in data:
n += 1
mean += (x - mean)/n
if n < 1:
return float('nan')
else:
return mean
I know this question is super old, but it's still the first hit on google, so it seemed appropriate to post. I'm still sad that the python standard library doesn't contain this simple piece of code.
Just one simple change to your code would let you use both. Generators were meant to be used interchangeably to lists in a for-loop.
def my_mean(values):
n = 0
Sum = 0.0
for v in values:
Sum += v
n += 1
return Sum / n
def my_mean(values):
total = 0
for n, v in enumerate(values, 1):
total += v
return total / n
print my_mean(X)
print my_mean(Y)
There is statistics.mean() in Python 3.4 but it calls list() on the input:
def mean(data):
if iter(data) is data:
data = list(data)
n = len(data)
if n < 1:
raise StatisticsError('mean requires at least one data point')
return _sum(data)/n
where _sum() returns an accurate sum (math.fsum()-like function that in addition to float also supports Fraction, Decimal).
The old-fashioned way to do it:
def my_mean(values):
sum, n = 0, 0
for x in values:
sum += x
n += 1
return float(sum)/n
One way would be
numpy.fromiter(Y, int).mean()
but this actually temporarily stores the numbers.
Your approach is a good one, but you should instead use the for x in y idiom instead of repeatedly calling next until you get a StopIteration. This works for both lists and generators:
def my_mean(values):
n = 0
Sum = 0.0
for value in values:
Sum += value
n += 1
return float(Sum)/n
You can use reduce without knowing the size of the array:
from itertools import izip, count
reduce(lambda c,i: (c*(i[1]-1) + float(i[0]))/i[1], izip(values,count(1)),0)
def my_mean(values):
n = 0
sum = 0
for v in values:
sum += v
n += 1
return sum/n
The above is very similar to your code, except by using for to iterate values you are good no matter if you get a list or an iterator.
The python sum method is however very optimized, so unless the list is really, really long, you might be more happy temporarily storing the data.
(Also notice that since you are using python3, you don't need float(sum)/n)
If you know the length of the generator in advance and you want to avoid storing the full list in memory, you can use:
reduce(np.add, generator)/length
Try:
import itertools
def mean(i):
(i1, i2) = itertools.tee(i, 2)
return sum(i1) / sum(1 for _ in i2)
print mean([1,2,3,4,5])
tee will duplicate your iterator for any iterable i (e.g. a generator, a list, etc.), allowing you to use one duplicate for summing and the other for counting.
(Note that 'tee' will still use intermediate storage).

Categories