Python quickselect sorting [closed]

Python quickselect sorting [closed] - python

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
This question appears to be off-topic because it lacks sufficient information to diagnose the problem. Describe your problem in more detail or include a minimal example in the question itself.
Improve this question
The program is supposed to use quick select and return the median of a set of integer values.
Question: When I run the program, it tells me that k is not defined. How should I define k to get the median?
def quickSelect(lines,k):
if len(lines)!=0:
pivot=lines[(len(lines)//2)]
smallerlist=[]
for i in lines:
if i<pivot:
smallerlist.append(i)
largerlist=[]
for i in lines:
if i>pivot:
largerlist.append(i)
m=len(smallerlist)
count=len(lines)-len(smallerlist)-len(largerlist)
if k >= m and k<m + count:
return pivot
elif m > k:
return quickSelect(smallerList,k)
else:
return quickSelect(largerList, k - m - count)

The code seems to be working fine, after a minor correction.
smallerlist and largerlist had a typo.
elif m > k:
return quickSelect(smallerList,k)
else:
return quickSelect(largerList, k - m - count)
should be changed by
elif m > k:
return quickSelect(smallerlist,k)
else:
return quickSelect(largerlist, k - m - count)
This is the final corrected code, which runs just fine.
def quickSelect(lines,k):
if len(lines)!=0:
pivot=lines[(len(lines)//2)]
smallerlist=[]
for i in lines:
if i<pivot:
smallerlist.append(i)
largerlist=[]
for i in lines:
if i>pivot:
largerlist.append(i)
m=len(smallerlist)
count=len(lines)-len(smallerlist)-len(largerlist)
if k >= m and k<m + count:
return pivot
elif m > k:
return quickSelect(smallerlist,k)
else:
return quickSelect(largerlist, k - m - count)
Hope it helps

You misreported the error; it doesn't complain about k, it complains about smallerList because you defined smallerlist (lower-case-l) and then tried to call it with an uppercase-L. Python variables are case-sensitive, ie smallerlist is smallerList == False.
Looking at your code:
def quickSelect(lines, k):
if len(lines) != 0:
len(lst) != 0 is non-idiomatic; PEP-8 says it should just be lst, as in if lst:. Also, camelCase is unPythonic; your function name should be quick_select. lines implies you can only operate on text, but your function should work just as well on any orderable data type, so items would be more accurate. You should have a docstring so the next person to come along has some idea what it does, and we're going to call len(items) again, so we may as well do it once and store the result. Finally, what if k > len(items)?
def quick_select(items, k):
"""
Return the k-from-smallest item in items
Assumes 0 <= k < len(items)
"""
num_items = len(items)
if 0 <= k < num_items:
pivot = items[num_items // 2]
continuing:
smallerlist = []
for i in lines:
if i<pivot:
smallerlist.append(i)
largerlist=[]
for i in lines:
if i>pivot:
largerlist.append(i)
You've iterated through lines twice; you could combine this into a single pass. Also, better variable names:
smaller, larger = [], []
for item in items:
if item < pivot:
smaller.append(item)
elif item > pivot:
larger.append(item)
continuing with better variable names,
num_smaller = len(smaller)
num_pivot = num_items - num_smaller - len(larger)
then your ifs are out of order; they are easier to read in order, so
if k < num_smaller:
return quick_select(smaller, k)
elif k < num_smaller + num_pivot
return pivot
else:
return quick_select(larger, k - num_smaller - num_pivot)
then what if k < 0 or k >= num_items?:
else:
raise ValueError("k={} is out of range".format(k))
Finally, because this function is tail-recursive, it is easy to convert to an iterative function instead:
while True:
pivot = items[num_items // 2]
# ...
if k < num_smaller:
items = smaller
num_items = num_smaller
elif k < num_smaller + num_pivot
return pivot
else:
items = larger
num_items = num_larger
k -= num_smaller + num_pivot
... some assembly required, hope that helps!

You need to initializing k for the first time into the function. It should be the position of the item you are looking for (if the list was sorted). Default it to half the list length, for median. Call it like so:
k = len(lines) // 2
x = quickSelect(lines, k)
or if you only ever want the median, you could fix the function so you don't have to provide the index of the item you want
def quickSelect(lines, k=None):
if k is None:
k = len(lines)//2
As Hugh pointed out, this function will only select an element of the list. For a median of an even number of elements, the median should actually be the mean of the middle two elements.

Related

Leetcode 5: Longes Palindrome Substring

I have been working on the LeetCode problem 5. Longest Palindromic Substring:
Given a string s, return the longest palindromic substring in s.
But I kept getting time limit exceeded on large test cases.
I used dynamic programming as follows:
dp[(i, j)] = True implies that s[i] to s[j] is a palindrome. So if s[i] == str[j] and dp[(i+1, j-1]) is set to True, that means S[i] to S[j] is also a palindrome.
How can I improve the performance of this implementation?
class Solution:
def longestPalindrome(self, s: str) -> str:
dp = {}
res = ""
for i in range(len(s)):
# single character is always a palindrome
dp[(i, i)] = True
res = s[i]
#fill in the table diagonally
for x in range(len(s) - 1):
i = 0
j = x + 1
while j <= len(s)-1:
if s[i] == s[j] and (j - i == 1 or dp[(i+1, j-1)] == True):
dp[(i, j)] = True
if(j-i+1) > len(res):
res = s[i:j+1]
else:
dp[(i, j)] = False
i += 1
j += 1
return res

I think the judging system for this problem is kind of too tight, it took some time to make it pass, improved version:
class Solution:
def longestPalindrome(self, s: str) -> str:
dp = {}
res = ""
for i in range(len(s)):
dp[(i, i)] = True
res = s[i]
for x in range(len(s)): # iterate till the end of the string
for i in range(x): # iterate up to the current state (less work) and for loop looks better here
if s[i] == s[x] and (dp.get((i + 1, x - 1), False) or x - i == 1):
dp[(i, x)] = True
if x - i + 1 > len(res):
res = s[i:x + 1]
return res

Here is another idea to improve the performance:
The nested loop will check over many cases where the DP value is already False for smaller ranges. We can avoid looking at large spans, by looking for palindromes from inside-out and stop extending the span as soon as it no longer is a palindrome. This process should be repeated at every offset in the source string, but this could still save some processing.
The inputs for which then most time is wasted, are those where there are lots of the same letters after each other, like "aaaaaaabcaaaaaaa". These lead to many iterations: each "a" or "aa" could be the center of a palindrome, but "growing" each of them is a waste of time. We should just consider all consecutive "a" together from the start and expand from there onwards.
You can specifically deal with these cases by first grouping consecutive letters which are the same. So the above example would be turned into 4 groups: a(7)b(1)c(1)a(7)
Then let each group in turn be taken as the center of a palindrome. For each group, "fan out" to potentially include one or more neighboring groups at both sides in "tandem". Continue fanning out until either the outside groups are not about the same letter, or they have a different group size. From that result you can derive what the largest palindrome is around that center. In particular, when the case is that the letters of the outer groups are the same, but not their sizes, you still include that letter at the outside of the palindrome, but with a repetition that corresponds to the least of these two mismatching group sizes.
Here is an implementation. I used named tuples to make it more readable:
from itertools import groupby
from collections import namedtuple
Group = namedtuple("Group", "letter,size,end")
class Solution:
def longestPalindrome(self, s: str) -> str:
longest = ""
x = 0
groups = [Group(group[0], len(group), x := x + len(group)) for group in
("".join(group[1]) for group in groupby(s))]
for i in range(len(groups)):
for j in range(0, min(i+1, len(groups) - i)):
if groups[i - j].letter != groups[i + j].letter:
break
left = groups[i - j]
right = groups[i + j]
if left.size != right.size:
break
size = right.end - (left.end - left.size) - abs(left.size - right.size)
if size > len(longest):
x = left.end - left.size + max(0, left.size - right.size)
longest = s[x:x+size]
return longest

Alternatively, you can try this approach, it seems to be faster than 96% Python submission.
def longestPalindrome(self, s: str) -> str:
N = len(s)
if N == 0:
return 0
max_len, start = 1, 0
for i in range(N):
df = i - max_len
if df >= 1 and s[df-1: i+1] == s[df-1: i+1][::-1]:
start = df - 1
max_len += 2
continue
if df >= 0 and s[df: i+1] == s[df: i+1][::-1]:
start= df
max_len += 1
return s[start: start + max_len]

If you want to improve the performance, you should create a variable for len(s) at the beginning of the function and use it. That way instead of calling len(s) 3 times, you would do it just once.
Also, I see no reason to create a class for this function. A simple function will outrun a class method, albeit very slightly.

Leetcode question '3Sum' algorithm exceeds time limit, looking for improvement

Given an array nums of n integers, are there elements a, b, c in nums such that a + b + c = 0? Find all unique triplets in the array which gives the sum of zero.
class Solution:
def threeSum(self, nums):
data = []
i = j = k =0
length = len(nums)
for i in range(length):
for j in range(length):
if j == i:
continue
for k in range(length):
if k == j or k == i:
continue
sorted_num = sorted([nums[i],nums[j],nums[k]])
if nums[i]+nums[j]+nums[k] == 0 and sorted_num not in data:
data.append(sorted_num)
return data
My soulution is working well but it appears that it may be too slow.
Is there a way to improve my codes without changing it significantly?

This is a O(n^2) solution with some optimization tricks:
import itertools
class Solution:
def findsum(self, lookup: dict, target: int):
for u in lookup:
v = target - u
# reduce duplication, we may enforce v <= u
try:
m = lookup[v]
if u != v or m > 1:
yield u, v
except KeyError:
pass
def threeSum(self, nums: List[int]) -> List[List[int]]:
lookup = {}
triplets = set()
for x in nums:
for y, z in self.findsum(lookup, -x):
triplets.add(tuple(sorted([x, y, z])))
lookup[x] = lookup.get(x, 0) + 1
return [list(triplet) for triplet in triplets]
First, you need a hash lookup to reduce your O(n^3) algorithm to O(n^2). This is the whole idea, and the rest are micro-optimizations:
the lookup table is build along with the scan on the array, so it is one-pass
the lookup table index on the unique items that seen before, so it handles duplicates efficiently, and by using that, we keep the iteration count of the second-level loop to the minimal

This is an optimized version, will pass through:
from typing import List
class Solution:
def threeSum(self, nums: List[int]) -> List[List[int]]:
unique_triplets = []
nums.sort()
for i in range(len(nums) - 2):
if i > 0 and nums[i] == nums[i - 1]:
continue
lo = i + 1
hi = len(nums) - 1
while lo < hi:
target_sum = nums[i] + nums[lo] + nums[hi]
if target_sum < 0:
lo += 1
if target_sum > 0:
hi -= 1
if target_sum == 0:
unique_triplets.append((nums[i], nums[lo], nums[hi]))
while lo < hi and nums[lo] == nums[lo + 1]:
lo += 1
while lo < hi and nums[hi] == nums[hi - 1]:
hi -= 1
lo += 1
hi -= 1
return unique_triplets
The TLE is most likely for those instances that fall into these two whiles:
while lo < hi and nums[lo] == nums[lo + 1]:
while lo < hi and nums[lo] == nums[lo + 1]:
References
For additional details, please see the Discussion Board where you can find plenty of well-explained accepted solutions with a variety of languages including low-complexity algorithms and asymptotic runtime/memory analysis1, 2.

I'd suggest:
for j in range(i+1, length):
This will save you len(nums)^2/2 steps and first if statement becomes redundant.
sorted_num = sorted([nums[i],nums[j],nums[k]])
if nums[i]+nums[j]+nums[k] == 0 and sorted_num not in data:
sorted_num = sorted([nums[i],nums[j],nums[k]])
data.append(sorted_num)
To avoid unneeded calls to sorted in the innermost loop.

Your solution is the brute force one, and the slowest one.
Better solutions can be:
Assume you start from an element from array. Consider using a Set for finding next two numbers from remaining array.
There is a 3rd better solution as well. See https://www.gyanblog.com/gyan/coding-interview/leetcode-three-sum/

Why is this code not running fully? It doesn't run line 53

I made myself an exercise with python since I am new. I wanted to make a rever LMC calculator ( Least common multiple ) but for some reason, something as simple as a print in a loop doesn't seem o work for me. I would appreciate some help since I am stuck on this weird issue for 20 minutes now. Here is the code:
import random
import sys
def print_list():
count_4_print = 0
while count_4_print < len(values):
print(values[count_4_print])
count_4_print += 1
def lcm(x, y):
if x > y:
greater = x
else:
greater = y
while True:
if (greater % x == 0) and (greater % y == 0):
lcm1 = greater
break
greater += 1
return lcm1
def guess(index, first_guess, second_guess):
num = 1
while lcm(first_guess, second_guess) != values[num - 1]:
first_guess = random.randrange(1, 1000000)
second_guess = random.randrange(1, 1000000)
num += 1
num = 1
if lcm(first_guess, second_guess) == values[num - 1]:
return first_guess, second_guess
num += 1
lineN = int(input())
values = []
count_4_add = 0
count_4_guess = 0
for x in range(lineN):
values.append(int(input()))
count_4_add += 1
if count_4_add >= lineN:
break
print_list()
for x in range(lineN + 1):
first, second = guess(count_4_guess, 1, 1)
count_4_guess += 1
print(first + second)
# this ^^^ doesn't work for some reason
Line 57 is in the while loop with count_4_guess. Right above this text, it says print(first_guess + second_guess)
Edit: The code is supposed to take in an int x and then prompt for x values. The outputs are the inputs without x and LMC(output1, output2) where the "LMC" is one of the values. This is done for each of the values, x times. What it actually does is just the first part. It takes the x and prompts for x outputs and then prints them but doesn't process the data (or it just doesn't print it)

Note: From looking at your comments and edits it seems that you are lacking some basic knowledge and/or understanding of things. I strongly encourage you to study more programming, computer science and python before attempting to create entire programs like this.
It is tough to answer your question properly since many aspects are unclear, so I will update my answer to reflect any relevant changes in your post.
Now, onto my answer. First, I will go over some of your code and attempt to give feedback on what could improved. Then, I will present two ways to compute the least common multiple (LCM) in python.
Code review
Code:
def print_list():
count_4_print = 0
while count_4_print < len(values):
print(values[count_4_print])
count_4_print += 1
Notes:
Where are the parameters? It was already mentioned in a few comments, but the importance of this cannot be stressed enough! (see the note at the beginning of my comment)
It appears that you are trying to print each element of a list on a new line. You can do that with print(*my_list, sep='\n').
That while loop is not how you should iterate over the elements of a list. Instead, use a for loop: for element in (my_list):.
Code:
def lcm(x, y):
if x > y:
greater = x
else:
greater = y
while True:
if (greater % x == 0) and (greater % y == 0):
lcm1 = greater
break
greater += 1
return lcm1
Notes:
This is not a correct algorithm for the LCM, since it crashes when both numbers are 0.
The comparison of a and b can be replaced with greater = max(x, y).
See the solution I posted below for a different way of writing this same algorithm.
Code:
def guess(index, first_guess, second_guess):
num = 1
while lcm(first_guess, second_guess) != values[num - 1]:
first_guess = random.randrange(1, 1000000)
second_guess = random.randrange(1, 1000000)
num += 1
num = 1
if lcm(first_guess, second_guess) == values[num - 1]:
return first_guess, second_guess
num += 1
Notes:
The line num += 1 comes immediately after return first_guess, second_guess, which means it is never executed. Somehow the mistakes cancel each other out since, as far as I can tell, it wouldn't do anything anyway if it were executed.
if lcm(first_guess, second_guess) == values[num - 1]: is completely redundant, since the while loop above checks the exact same condition.
In fact, not only is it redundant it is also fundamentally broken, as mentioned in this comment by user b_c.
Unfortunately I cannot say much more on this function since it is too difficult for me to understand its purpose.
Code:
lineN = int(input())
values = []
count_4_add = 0
count_4_guess = 0
for x in range(lineN):
values.append(int(input()))
count_4_add += 1
if count_4_add >= lineN:
break
print_list()
Notes:
As explained previously, print_list() should not be a thing.
lineN should be changed to line_n, or even better, something like num_in_vals.
count_4_add will always be equal to lineN at the end of your for loop.
Building on the previous point, the check if count_4_add >= lineN is useless.
In conclusion, count_4_add and count_4_guess are completely unnecessary and detrimental to the program.
The for loop produces values in the variable x which is never used. You can replace an unused variable with _: for _ in range(10):.
Since your input code is simple you could probably get away with something like in_vals = [int(input(f'Enter value number {i}: ')) for i in range(1, num_in_vals+1)]. Again, this depends on what it is you're actually trying to do.
LCM Implementations
According to the Wikipedia article referenced earlier, the best way to calculate the LCM is using the greatest common denominator.
import math
def lcm(a: int, b: int) -> int:
if a == b:
res = a
else:
res = abs(a * b) // math.gcd(a, b)
return res
This second method is one possible brute force solution, which is similar to how the one you are currently using should be written.
def lcm(a, b):
if a == b:
res = a
else:
max_mult = a * b
res = max_mult
great = max(a, b)
small = min(a, b)
for i in range(great, max_mult, great):
if i % small == 0:
res = i
break
return res
This final method works for any number of inputs.
import math
import functools
def lcm_simp(a: int, b: int) -> int:
if a == b:
res = a
else:
res = abs(a * b) // math.gcd(a, b)
return res
def lcm(*args: int) -> int:
return functools.reduce(lcm_simp, args)
Oof, that ended up being way longer than I expected. Anyway, let me know if anything is unclear, if I've made a mistake, or if you have any further questions! :)

Divide and Conquer. Find the majority of element in array

I am working on a python algorithm to find the most frequent element in the list.
def GetFrequency(a, element):
return sum([1 for x in a if x == element])
def GetMajorityElement(a):
n = len(a)
if n == 1:
return a[0]
k = n // 2
elemlsub = GetMajorityElement(a[:k])
elemrsub = GetMajorityElement(a[k:])
if elemlsub == elemrsub:
return elemlsub
lcount = GetFrequency(a, elemlsub)
rcount = GetFrequency(a, elemrsub)
if lcount > k:
return elemlsub
elif rcount > k:
return elemrsub
else:
return None
I tried some test cases. Some of them are passed, but some of them fails.
For example, [1,2,1,3,4] this should return 1, buit I get None.
The implementation follows the pseudocode here:
http://users.eecs.northwestern.edu/~dda902/336/hw4-sol.pdf
The pseudocode finds the majority item and needs to be at least half. I only want to find the majority item.
Can I get some help?
Thanks!

I wrote an iterative version instead of the recursive one you're using in case you wanted something similar.
def GetFrequency(array):
majority = int(len(array)/2)
result_dict = {}
while array:
array_item = array.pop()
if result_dict.get(array_item):
result_dict[array_item] += 1
else:
result_dict[array_item] = 1
if result_dict[array_item] > majority:
return array_item
return max(result_dict, key=result_dict.get)
This will iterate through the array and return the value as soon as one hits more than 50% of the total (being a majority). Otherwise it goes through the entire array and returns the value with the greatest frequency.

def majority_element(a):
return max([(a.count(elem), elem) for elem in set(a)])[1]
EDIT
If there is a tie, the biggest value is returned. E.g: a = [1,1,2,2] returns 2. Might not be what you want but that could be changed.
EDIT 2
The pseudocode you gave divided into arrays 1 to k included, k + 1 to n. Your code does 1 to k - 1, k to end, not sure if it changes much though ? If you want to respect the algorithm you gave, you should do:
elemlsub = GetMajorityElement(a[:k+1]) # this slice is indices 0 to k
elemrsub = GetMajorityElement(a[k+1:]) # this one is k + 1 to n.
Also still according to your provided pseudocode, lcount and rcount should be compared to k + 1, not k:
if lcount > k + 1:
return elemlsub
elif rcount > k + 1:
return elemrsub
else:
return None
EDIT 3
Some people in the comments highligted that provided pseudocode solves not for the most frequent, but for the item which is present more that 50% of occurences. So indeed your output for your example is correct. There is a good chance that your code already works as is.
EDIT 4
If you want to return None when there is a tie, I suggest this:
def majority_element(a):
n = len(a)
if n == 1:
return a[0]
if n == 0:
return None
sorted_counts = sorted([(a.count(elem), elem) for elem in set(a)], key=lambda x: x[0])
if len(sorted_counts) > 1 and sorted_counts[-1][0] == sorted_counts[-2][0]:
return None
return sorted_counts[-1][1]

python recursion combination [duplicate]

This question already has answers here:
Understanding recursion [closed]
(20 answers)
Closed 5 months ago.
How can I write a function that computes:
C(n,k)= 1 if k=0
0 if n<k
C(n-1,k-1)+C(n-1,k) otherwise
So far I have:
def choose(n,k):
if k==0:
return 1
elif n<k:
return 0
else:

Assuming the missing operands in your question are subtraction operators (thanks lejlot), this should be the answer:
def choose(n,k):
if k==0:
return 1
elif n<k:
return 0
else:
return choose(n-1, k-1) + choose(n-1, k)
Note that on most Python systems, the max depth or recursion limit is only 1000. After that it will raise an Exception. You may need to get around that by converting this recursive function to an iterative one instead.
Here's an example iterative function that uses a stack to mimic recursion, while avoiding Python's maximum recursion limit:
def choose_iterative(n, k):
stack = []
stack.append((n, k))
combinations = 0
while len(stack):
n, k = stack.pop()
if k == 0:
combinations += 1
elif n<k:
combinations += 0 #or just replace this line with `pass`
else:
stack.append((n-1, k))
stack.append((n-1, k-1))
return combinations

Improving from Exponential to Linear time
All of the answers given so far run in exponential time O(2n). However, it's possible to make this run in O(k) by changing a single line of code.
Explanation:
The reason for the exponential running time is that each recursion separates the problem into overlapping subproblems with this line of code (see Ideone here):
def choose(n, k):
...
return choose(n-1, k-1) + choose(n-1, k)
To see why this is so bad consider the example of choose(500, 2). The numeric value of 500 choose 2 is 500*499/2; however, using the recursion above it takes 250499 recursive calls to compute that. Obviously this is overkill since only 3 operations are needed.
To improve this to linear time all you need to do is choose a different recursion which does not split into two subproblems (there are many on wikipedia).
For example the following recursion is equivalent, but only uses 3 recursive calls to compute choose(500,2) (see Ideone here):
def choose(n,k):
...
return ((n + 1 - k)/k)*choose(n, k-1)
The reason for the improvement is that each recursion has only one subproblem that reduces k by 1 with each call. This means that we are guaranteed that this recursion will only take k + 1 recursions or O(k). That's a vast improvement for changing a single line of code!
If you want to take this a step further, you could take advantage of the symmetry in "n choose k" to ensure that k <= n/2 (see Ideone here):
def choose(n,k):
...
k = k if k <= n/2 else n - k # if k > n/2 it's equivalent to k - n
return ((n + 1 - k)/k)*choose(n, k-1)

Solution from wikipedia (http://en.wikipedia.org/wiki/Binomial_coefficient)
def choose(n, k):
if k < 0 or k > n:
return 0
if k > n - k: # take advantage of symmetry
k = n - k
if k == 0 or n <= 1:
return 1
return choose(n-1, k) + choose(n-1, k-1)

You're trying to calculate the number of options to choose k out of n elements:
def choose(n,k):
if k == 0:
return 1 # there's only one option to choose zero items out of n
elif n < k:
return 0 # there's no way to choose k of n when k > n
else:
# The recursion: you can do either
# 1. choose the n-th element and then the rest k-1 out of n-1
# 2. or choose all the k elements out of n-1 (not choose the n-th element)
return choose(n-1, k-1) + choose(n-1, k)

just like this
def choose(n,k):
if k==0:
return 1
elif n<k:
return 0
else:
return choose(n-1,k-1)+choose(n-1,k)
EDIT
It is the easy way, for an efficient one take a look at wikipedia and spencerlyon2 answer

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python quickselect sorting [closed] - python

Related

Leetcode 5: Longes Palindrome Substring

Leetcode question '3Sum' algorithm exceeds time limit, looking for improvement

Why is this code not running fully? It doesn't run line 53

Divide and Conquer. Find the majority of element in array

python recursion combination [duplicate]

Categories

Resources