Median code explanation

Median code explanation - python

My professor wrote this median function and I don't understand it very well. Can someone please explain the part about i = len(list)/2 and median = avg() and the else statement?
def avg_list(numbers):
sum = 0
for num in numbers:
sum += num
avg = float(sum)/len(numbers)
print avg
def median(list):
list.sort()
if len(list)%2 == 0:
#have to take avg of middle two
i = len(list)/2
median = avg()
else:
#find the middle (remembering that lists start at 0)
i = len(list)/2
median = list
return median
To add from an example I saw, for even list length:
def median(s):
i = len(s)
if not i%2:
return (s[(i/2)-1]+s[i/2])/2.0
return s[i/2]
This works very well but I don't understand the last return s[i/2]?
For odd list length:
x = [1,2,5,2,3,763,234,23,1,234,21,3,2134,23,54]
median = sorted(x)[len(x)/2]
Since x has a list length of odd, wouldn't the [len(x)/2] be a floating number index? I'm not getting this all the way? Any explanation better than mine is much appreciated.

Why this is is very wrong, line by line:
def median(list): # 1
list.sort() # 2
if len(list)%2 == 0:
#have to take avg of middle two
i = len(list)/2 # 3
median = avg() # 4
else:
#find the middle (remembering that lists start at 0)
i = len(list)/2 # 5
median = list # 6
return median
#1: It's a bad idea to give your variables the same name as data types, namely list.
#2: list.sort() will modify the list that is being passed. One would expect a getter like median() not to do that.
#4 It calls a function avg() with no arguments, which is completely meaningless, even if such a function was defined.
#3 and #5 are calculated the same way regardless of the if branch taken. Regardless, i is never used.
#6 It sets median to the original list, which makes zero sense.
Here's how I would rewrite this (while maintaining clarity):
def median(alist):
srtd = sorted(alist) # returns a sorted copy
mid = len(alist)/2 # remember that integer division truncates
if len(alist) % 2 == 0: # take the avg of middle two
return (srtd[mid-1] + srtd[mid]) / 2.0
else:
return srtd[mid]
Also, the avg_list() function (which is not used nor could be used in median()) could be rewritten as:
def avg_list(numbers):
return float(sum(numbers))/len(numbers)
sum() is a function that returns the sum of all elements in an iterable.

We're missing some code here, but we can puzzle it out.
The comments here are instructive. When we check:
if len(list)%2 == 0:
Then we're checking to see if the list is of even length. If a list has an even number of members, then there is no true "middle" element, and so:
#have to take avg of middle two
i = len(list)/2
median = avg()
We assume that the avg() function is going to return the average of the two middle elements. Since you didn't include a definition of an avg function, it's possible that this is really supposed to be an avg_list function taking the middle two elements of the list.
Now, if the list is of odd length, there is a middle element, and so:
else:
#find the middle (remembering that lists start at 0)
i = len(list)/2
median = list
Now this looks kinda wrong to me too, but my guess is that the intention is that this should read:
median = list[i]
That would be us returning the middle element of the list. Since the list has been sorted, that middle element is the true median of the list.
Hope this helps!

I'm sure it's trying to say, "If the list is of odd size, just take the central element; otherwise take the mean of the central two elements" - but I can't see that that's what the code is actually doing at all.
In particular:
It's calling an avg() function (not avg_list, note) but without any arguments
It's ignoring the value of i after computing it in the same way in both branches
Are you sure that's the complete code which is meant to work?

You can also decide to always return the average of the middle sub-array of the ordered list:
For instance return the average of [4,5] out of [1,2,3,4,5,6,7,8], and that of [5] out of [1,2,3,4,5,6,7,8,9].
A python implementation would be:
def median(a):
ordered = sorted(a)
length = len(a)
return float((ordered[length/2] + ordered[-(length+1)/2]))/2

Related

A recursive function to check if element is in SORTED list

I am not really into recursive function and I am asking to change my function in recursive and more optimal func. I know that in this task i should take advantage of the fact that this list is sorted, but I dont know how, maybe some bubble sort? This is my code, pretty simple but not even recursive and it doesn't take into account whether the list is sorted or not.
list1 = [1, 2, 3, 4, 5, 6]
def is_in_List(x):
if x in list1:
return True
else:
return False
x = 3
if is_in_List(x):
print(f"{x} is in list")
else:
print(f"{x} is not in list")

what you could do, is use devide and conquer, which means :
The algo goes like this :
You have a sorted list of n total elements. Checkin array if the element at n/2 is the one you're looking for
If it isn't, being a sorted list, you know that all the elements from n/2 -> n are bigger, and all the elements from 0 -> n/2 are smaller. Check if the number at n/2 is less or more than the one you're searching for. If it's less, you run the same function again, but now, you give it only a subset of the list, meaning, if it's smaller you give 0 -> n/2, if it's bigger, you give n/2 -> n. Of course you'll need some stoping condtitions but hey, this is the algo.
That's the theory, here's the code.
Not the best implementation of it, just from the top of my mind.
my_list = [1,2,3,4,5,6,7,8,9];
def binary_search(a_list, search_term):
#get the middle position of the array and convert it to int
middle_pos = int((len(a_list)-1)/2)
#check if the array has only one element, and if so it it is not equal to what we're searching for, than nothing is in the aray
if len(a_list) == 1 and search_term != a_list[middle_pos] :
#means there are no more elements to search through
return False
#get the middle term of the list
middle_term = a_list[middle_pos]
#check if they are equal, if so, the number is in the array
if search_term == middle_term:
return True
#if the middle is less than search, it means we need to search in the list from middle to top
if middle_term < search_term :
#run the same algo, but now on a subset of the given list
return binary_search(a_list[middle_pos:len(a_list)], search_term)
else :
#on else, it means its less, we need to search from 0 to middle
#run the same algo, but now on a subset of the given list
return binary_search(a_list[0:middle_pos], search_term)
print(binary_search(my_list, 1)

Binary search: weird middle point calculation

Regarding calculation of the list mid-point: why is there
i = (first +last) //2
and last is initialized to len(a_list) - 1? From my quick tests, this algorithm without -1 works correctly.
def binary_search(a_list, item):
"""Performs iterative binary search to find the position of an integer in a given, sorted, list.
a_list -- sorted list of integers
item -- integer you are searching for the position of
"""
first = 0
last = len(a_list) - 1
while first <= last:
i = (first + last) / 2
if a_list[i] == item:
return '{item} found at position {i}'.format(item=item, i=i)
elif a_list[i] > item:
last = i - 1
elif a_list[i] < item:
first = i + 1
else:
return '{item} not found in the list'.format(item=item)

The last legal index is len(a_list) - 1. The algorithm will work correctly, as first will always be no more than this, so that the truncated mean will never go out of bounds. However, without the -1, the midpoint computation will be one larger than optimum about half the time, resulting in a slight loss of speed.

Consider the case where the item you're searching for is greater than all the elements of the list. In that case the statement first = i + 1 gets executed repeatedly. Finally you get to the last iteration of the loop, where first == last. In that case i is also equal to last, but if last=len() then i is off the end of the list! The first if statement will fail with an index out of range.
See for yourself: https://ideone.com/yvdTzo
You have another error in that code too, but I'll let you find it for yourself.

Pythonic way of checking if indefinite # of consec elements in list sum to given value

Having trouble figuring out a nice way to get this task done.
Say i have a list of triangular numbers up to 1000 -> [0,1,3,6,10,15,..]etc
Given a number, I want to return the consecutive elements in that list that sum to that number.
i.e.
64 --> [15,21,28]
225 --> [105,120]
371 --> [36, 45, 55, 66, 78, 91]
if there's no consecutive numbers that add up to it, return an empty list.
882 --> [ ]
Note that the length of consecutive elements can be any number - 3,2,6 in the examples above.
The brute force way would iteratively check every possible consecutive pairing possibility for each element. (start at 0, look at the sum of [0,1], look at the sum of [0,1,3], etc until the sum is greater than the target number). But that's probably O(n*2) or maybe worse. Any way to do it better?
UPDATE:
Ok, so a friend of mine figured out a solution that works at O(n) (I think) and is pretty intuitively easy to follow. This might be similar (or the same) to Gabriel's answer, but it was just difficult for me to follow and I like that this solution is understandable even from a basic perspective. this is an interesting question, so I'll share her answer:
def findConsec(input1 = 7735):
list1 = range(1, 1001)
newlist = [reduce(lambda x,y: x+y,list1[0:i]) for i in list1]
curr = 0
end = 2
num = sum(newlist[curr:end])
while num != input1:
if num < input1:
num += newlist[end]
end += 1
elif num > input1:
num -= newlist[curr]
curr += 1
if curr == end:
return []
if num == input1:
return newlist[curr:end]

A 3-iteration max solution
Another solution would be to start from close where your number would be and walk forward from one position behind. For any number in the triangular list vec, their value can be defined by their index as:
vec[i] = sum(range(0,i+1))
The division between the looking-for sum value and the length of the group is the average of the group and, hence, lies within it, but may as well not exist in it.
Therefore, you can set the starting point for finding a group of n numbers whose sum matches a value val as the integer part of the division between them. As it may not be in the list, the position would be that which minimizes their difference.
# vec as np.ndarray -> the triangular or whatever-type series
# val as int -> sum of n elements you are looking after
# n as int -> number of elements to be summed
import numpy as np
def seq_index(vec,n,val):
index0 = np.argmin(abs(vec-(val/n)))-n/2-1 # covers odd and even n values
intsum = 0 # sum of which to keep track
count = 0 # counter
seq = [] # indices of vec that sum up to val
while count<=2: # walking forward from the initial guess of where the group begins or prior to it
intsum = sum(vec[(index0+count):(index0+count+n)])
if intsum == val:
seq.append(range(index0+count,index0+count+n))
count += 1
return seq
# Example
vec = []
for i in range(0,100):
vec.append(sum(range(0,i))) # build your triangular series from i = 0 (0) to i = 99 (whose sum equals 4950)
vec = np.array(vec) # convert to numpy to make it easier to query ranges
# looking for a value that belong to the interval 0-4590
indices = seq_index(vec,3,4)
# print indices
print indices[0]
print vec[indices]
print sum(vec[indices])
Returns
print indices[0] -> [1, 2, 3]
print vec[indices] -> [0 1 3]
print sum(vec[indices]) -> 4 (which we were looking for)

This seems like an algorithm question rather than a question on how to do it in python.
Thinking backwards I would copy the list and use it in a similar way to the Sieve of Eratosthenes. I would not consider the numbers that are greater than x. Then start from the greatest number and sum backwards. Then if I get greater than x, subtract the greatest number (exclude it from the solution) and continue to sum backward.
This seems the most efficient way to me and actually is O(n) - you never go back (or forward in this backward algorithm), except when you subtract or remove the biggest element, which doesn't need accessing the list again - just a temp var.
To answer Dunes question:
Yes, there is a reason - to subtracts the next largest in case of no-solution that sums larger. Going from the first element, hit a no-solution would require access to the list again or to the temporary solution list to subtract a set of elements that sum greater than the next element to sum. You risk to increase the complexity by accessing more elements.
To improve efficiency in the cases where an eventual solution is at the beginning of the sequence you can search for the smaller and larger pair using binary search. Once a pair of 2 elements, smaller than x is found then you can sum the pair and if it sums larger than x you go left, otherwise you go right. This search has logarithmic complexity in theory. In practice complexity is not what it is in theory and you can do whatever you like :)

You should pick the first three elements, sum them and do and then you keep subtracting the first of the three and add the next element in the list and see if the sum add up to whatever number you want. That would be O(n).
# vec as np.ndarray
import numpy as np
itsum = sum(list[0:2]) # the sum you want to iterate and check its value
sequence = [[] if itsum == whatever else [range(0,3)]] # indices of the list that add up to whatever (creation)
for i in range(3,len(vec)):
itsum -= vec[i-3]
itsum += vec[i]
if itsum == whatever:
sequence.append(range(i-2,i+1)) # list of sequences that add up to whatever

The solution you provide in the question isn't truly O(n) time complexity -- the way you compute your triangle numbers makes the computation O(n2). The list comprehension throws away the previous work that want into calculating the last triangle number. That is: tni = tni-1 + i (where tn is a triangle number). Since you also, store the triangle numbers in a list, your space complexity is not constant, but related to the size of the number you are looking for. Below is an identical algorithm, but is O(n) time complexity and O(1) space complexity (written for python 3).
# for python 2, replace things like `highest = next(high)` with `highest = high.next()`
from itertools import count, takewhile, accumulate
def find(to_find):
# next(low) == lowest number in total
# next(high) == highest number not in total
low = accumulate(count(1)) # generator of triangle numbers
high = accumulate(count(1))
total = highest = next(high)
# highest = highest number in the sequence that sums to total
# definitely can't find solution if the highest number in the sum is greater than to_find
while highest <= to_find:
# found a solution
if total == to_find:
# keep taking numbers from the low iterator until we find the highest number in the sum
return list(takewhile(lambda x: x <= highest, low))
elif total < to_find:
# add the next highest triangle number not in the sum
highest = next(high)
total += highest
else: # if total > to_find
# subtract the lowest triangle number in the sum
total -= next(low)
return []

using recursion to find the maximum in a list

I'm trying to find the maximum element in a list using recursion.
the input needs to be the actual list, the left index, and the right index.
I have written a function and can't understand why it won't work. I drew the recursion tree, ran examples of lists in my head, and it makes sense , that's why it's even harder to find a solution now ! (it's fighting myself basically).
I know it's ugly, but try to ignore that. my idea is to split the list in half at each recursive call (it's required), and while the left index will remain 0, the right will be the length of the new halved list, minus 1.
the first call to the function will be from the tail function.
thanks for any help, and I hope I'm not missing something really stupid, or even worse- not even close !
by the way- didn't use slicing to cut list because I'm not allowed.
def max22(L,left,right):
if len(L)==1:
return L[0]
a = max22([L[i] for i in range(left, (left+right)//2)], 0 , len([L[i] for i in range(left, (left+right)//2)])-1)
b = max22([L[i] for i in range(((left+right)//2)+1, right)], 0 ,len([L[i] for i in range(left, (left+right)//2)])-1)
return max(a,b)
def max_list22(L):
return max22(L,0,len(L)-1)
input example -
for max_list22([1,20,3]) the output will be 20.

First off, for the sake of clarity I suggest assigning your list comprehensions to variables so you don't have to write each one twice. This should make the code easier to debug. You can also do the same for the (left+right)//2 value.
def max22(L,left,right):
if len(L)==1:
return L[0]
mid = (left+right)//2
left_L = [L[i] for i in range(left, mid)]
right_L = [L[i] for i in range(mid+1, right)]
a = max22(left_L, 0 , len(left_L)-1)
b = max22(right_L, 0 , len(left_L)-1)
return max(a,b)
def max_list22(L):
return max22(L,0,len(L)-1)
print max_list22([4,8,15,16,23,42])
I see four problems with this code.
On your b = line, the second argument is using len(left_L) instead of len(right_L).
You're missing an element between left_L and right_L. You should not be adding one to mid in the right_L list comprehension.
You're missing the last element of the list. You should be going up to right+1 in right_L, not just right.
Your mid value is off by one in the case of even sized lists. Ex. [1,2,3,4] should split into [1,2] and [3,4], but with your mid value you're getting [1] and [2,3,4]. (assuming you've already fixed the missing element problems in the previous bullet points).
Fixing these problems looks like:
def max22(L,left,right):
if len(L)==1:
return L[0]
mid = (left+right+1)//2
left_L = [L[i] for i in range(left, mid)]
right_L = [L[i] for i in range(mid, right+1)]
a = max22(left_L, 0 , len(left_L)-1)
b = max22(right_L, 0 , len(right_L)-1)
return max(a,b)
def max_list22(L):
return max22(L,0,len(L)-1)
print max_list22([4,8,15,16,23,42])
And if you insist on not using temporary variables, it looks like:
def max22(L,left,right):
if len(L)==1:
return L[0]
a = max22([L[i] for i in range(left, (left+right+1)//2)], 0 , len([L[i] for i in range(left, (left+right+1)//2)])-1)
b = max22([L[i] for i in range((left+right+1)//2, right+1)], 0 , len([L[i] for i in range((left+right+1)//2, right+1)])-1)
return max(a,b)
def max_list22(L):
return max22(L,0,len(L)-1)
print max_list22([4,8,15,16,23,42])
Bonus style tip: you don't necessarily need three arguments for max22, since left is always zero and right is always the length of the list minus one.
def max22(L):
if len(L)==1:
return L[0]
mid = (len(L))//2
left_L = [L[i] for i in range(0, mid)]
right_L = [L[i] for i in range(mid, len(L))]
a = max22(left_L)
b = max22(right_L)
return max(a,b)
print max22([4,8,15,16,23,42])

The problem is that you aren't handling empty lists at all. max_list22([]) recurses infinitely, and [L[i] for i in range(((left+right)//2)+1, right)] eventually produces an empty list.

Your problem is that you don't handle uneven splits. Lists could become empty using your code, but you can also stop on sizes 1 and 2 instead of 0 and 1 whichi s more natural (because you return a max, zero size lists don't have a max).
def max22(L,left,right):
if left == right:
# handle size 1
return L[left]
if left + 1 == right:
# handle size 2
return max(L[left], L[right])
# split the lists (could be uneven lists!)
split_index = (left + right) / 2
# solve two easier problems
return max (max22(L, left, split_index), max22(L, split_index, right))
print max22([1,20, 3], 0, 2)
Notes:
Lose the list comprehension, you don't have to create new lists since you have indices within the list.
When dealing with recursion, you have to think of:
1 - the stop condition(s), in this case there are two because list splits can be uneven, making the recursion stop at uneven conditions.
2 - the easier problem step . Assuming I can solve an easier problem, how can I solve this problem? This is usually what's in the end of the recursive function. In this case a call to the same function on two smaller (index-wise) lists. Recursion looks a lot like proof by induction if you're familiar with it.
Python prefers things to be done explicitly. While Python has some functional features it's best to let readers of the code know what you're up to ratehr than having a big one-liner that makes people scratch their head.
Good luck!

(Binary) Summing the elements of a list

I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
def binary_search(l, low=0,high=-1):
if not l: return -1
if(high == -1): high = len(l)-1
if low == high:
if l[low] == 1: return low
else: return -1
mid = (low + high)//2
upper = [l[mid:high]]
lower = [l[0:mid-1]]
u = sum(int(x) for x in upper)
lo = sum(int(x) for x in lower)
if u == 1: return binary_search(upper, mid, high)
elif lo == 1: return binary_search(lower, low, mid-1)
return -1
l = [0 for x in range(255)]
l[123] = 1
binary_search(l)
The code I'm using to test
u = sum(int(x) for x in upper)
works fine in the interpreter, but gives me the error
TypeError: int() argument must be a string or a number, not 'list'
I've just started to use python, and can't figure out what's going wrong (the version I've written in c++ doesn't work either).
Does anyone have any pointers?
Also, how would I do the sum so that it is a binary xor, not simply decimal addition?

You don't actually want a sum; you want to know whether upper or lower contains a 1 value. Just take advantage of Python's basic container-type syntax:
if 1 in upper:
# etc
if 1 in lower:
# etc
The reason you're getting the error, by the way, is because you're wrapping upper and lower with an extra nested list when you're trying to split l (rename this variable, by the way!!). You just want to split it like this:
upper = the_list[mid:high]
lower = the_list[:mid-1]
Finally, it's worth noting that your logic is pretty weird. This is not a binary search in the classic sense of the term. It looks like you're implementing "find the index of the first occurrence of 1 in this list". Even ignoring the fact that there's a built-in function to do this already, you would be much better served by just iterating through the whole list until you find a 1. Right now, you've got O(nlogn) time complexity (plus a bunch of extra one-off loops), which is pretty silly considering the output can be replicated in O(n) time by:
def first_one(the_list):
for i in range(len(the_list)):
if the_list[i] == 1:
return i
return -1
Or of course even more simply by using the built-in function index:
def first_one(the_list):
try:
return the_list.index(1)
except ValueError:
return -1

I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
What's wrong with
int(1 in l)

I need to sum the elements of a list, containing all zeros or ones, so that the result is 1 if there is a 1 in the list, but 0 otherwise.
No need to sum the whole list; you can stop at the first 1. Simply use any(). It will return True if there is at least one truthy value in the container and False otherwise, and it short-circuits (i.e. if a truthy value is found early in the list, it doesn't scan the rest). Conveniently, 1 is truthy and 0 is not.
True and False work as 1 and 0 in an arithmetic context (Booleans are a subclass of integers), but if you want specifically 1 and 0, just wrap any() in int().

Stop making nested lists.
upper = l[mid:high]
lower = l[0:mid-1]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Median code explanation - python

Related

A recursive function to check if element is in SORTED list

Binary search: weird middle point calculation

Pythonic way of checking if indefinite # of consec elements in list sum to given value

using recursion to find the maximum in a list

(Binary) Summing the elements of a list

Categories

Resources