Fair partitioning of elements of a list - python

Given a list of ratings of players, I am required to partition the players (ie ratings) into two groups as fairly as possible. The goal is to minimize the difference between the teams' cumulative rating. There are no constraints as to how I can split the players into the teams (one team can have 2 players and the other team can have 10 players).
For example: [5, 6, 2, 10, 2, 3, 4] should return ([6, 5, 3, 2], [10, 4, 2])
I would like to know the algorithm to solve this problem. Please note I am taking an online programming introductory course, so simple algorithms would be appreciated.
I am using the following code, but for some reason, the online code checker says it is incorrect.
def partition(ratings):
set1 = []
set2 =[]
sum_1 = 0
sum_2 = 0
for n in sorted(ratings, reverse=True):
if sum_1 < sum_2:
set1.append(n)
sum_1 = sum_1 + n
else:
set2.append(n)
sum_2 = sum_2 + n
return(set1, set2)
Update: I contacted the instructors and I was told I should defined another "helper" function inside the function to check all different combinations then I need to check for the minimum difference.

Note: Edited to better handle the case when the sum of all numbers is odd.
Backtracking is a possibility for this problem.
It allows examining all the possibilities recursively, without the need of a large amount of memory.
It stops as soon as an optimal solution is found: sum = 0, where sum is the difference between the sum of elements of set A and the sum of elements of set B. EDIT: it stops as soon sum < 2, to handle the case when the sum of all numbers is odd, i.e. corresponding to a minimum difference of 1. If this global sum is even, the min difference cannot be equal to 1.
It allows to implement a simple procedure of premature abandon:
at a given time, if sum is higher then the sum of all remaining elements (i.e. not placed in A or B) plus the absolute value of current minimum obtained, then we can give up examining the current path, without examining the remaining elements. This procedure is optimized with:
sort the input data in decreasing order
A each step, first examine the most probable choice: this allow to go rapidly to a near-optimum solution
Here is a pseudo-code
Initialization:
sort elements a[]
Calculate the sum of remaining elements: sum_back[i] = sum_back[i+1] + a[i];
Set the min "difference" to its maximum value: min_diff = sum_back[0];
Put a[0] in A -> the index i of examined element is set to 1
Set up_down = true; : this boolean indicates if we are currently going forward (true) or backward (false)
While loop:
If (up_down): forward
Test premature abandon, with help of sum_back
Select most probable value, adjust sum according to this choice
if (i == n-1): LEAF -> test if the optimum value is improved and return if the new value is equal to 0 (EDIT: if (... < 2)); go backward
If not in a leaf: continue going forward
If (!updown): backward
If we arrive at i == 0 : return
If it is the second walk in this node: select the second value, go up
else: go down
In both cases: recalculate the new sum value
Here is a code, in C++ (Sorry, don't know Python)
#include <iostream>
#include <vector>
#include <algorithm>
#include <tuple>
std::tuple<int, std::vector<int>> partition(std::vector<int> &a) {
int n = a.size();
std::vector<int> parti (n, -1); // current partition studies
std::vector<int> parti_opt (n, 0); // optimal partition
std::vector<int> sum_back (n, 0); // sum of remaining elements
std::vector<int> n_poss (n, 0); // number of possibilities already examined at position i
sum_back[n-1] = a[n-1];
for (int i = n-2; i >= 0; --i) {
sum_back[i] = sum_back[i+1] + a[i];
}
std::sort(a.begin(), a.end(), std::greater<int>());
parti[0] = 0; // a[0] in A always !
int sum = a[0]; // current sum
int i = 1; // index of the element being examined (we force a[0] to be in A !)
int min_diff = sum_back[0];
bool up_down = true;
while (true) { // UP
if (up_down) {
if (std::abs(sum) > sum_back[i] + min_diff) { //premature abandon
i--;
up_down = false;
continue;
}
n_poss[i] = 1;
if (sum > 0) {
sum -= a[i];
parti[i] = 1;
} else {
sum += a[i];
parti[i] = 0;
}
if (i == (n-1)) { // leaf
if (std::abs(sum) < min_diff) {
min_diff = std::abs(sum);
parti_opt = parti;
if (min_diff < 2) return std::make_tuple (min_diff, parti_opt); // EDIT: if (... < 2) instead of (... == 0)
}
up_down = false;
i--;
} else {
i++;
}
} else { // DOWN
if (i == 0) break;
if (n_poss[i] == 2) {
if (parti[i]) sum += a[i];
else sum -= a[i];
//parti[i] = 0;
i--;
} else {
n_poss[i] = 2;
parti[i] = 1 - parti[i];
if (parti[i]) sum -= 2*a[i];
else sum += 2*a[i];
i++;
up_down = true;
}
}
}
return std::make_tuple (min_diff, parti_opt);
}
int main () {
std::vector<int> a = {5, 6, 2, 10, 2, 3, 4, 13, 17, 38, 42};
int diff;
std::vector<int> parti;
std::tie (diff, parti) = partition (a);
std::cout << "Difference = " << diff << "\n";
std::cout << "set A: ";
for (int i = 0; i < a.size(); ++i) {
if (parti[i] == 0) std::cout << a[i] << " ";
}
std::cout << "\n";
std::cout << "set B: ";
for (int i = 0; i < a.size(); ++i) {
if (parti[i] == 1) std::cout << a[i] << " ";
}
std::cout << "\n";
}

I think that you should do the next exercise by yourself, otherwise you don't learn much. As for this one, here is a solution that tries to implement the advice by your instructor:
def partition(ratings):
def split(lst, bits):
ret = ([], [])
for i, item in enumerate(lst):
ret[(bits >> i) & 1].append(item)
return ret
target = sum(ratings) // 2
best_distance = target
best_split = ([], [])
for bits in range(0, 1 << len(ratings)):
parts = split(ratings, bits)
distance = abs(sum(parts[0]) - target)
if best_distance > distance:
best_distance = distance
best_split = parts
return best_split
ratings = [5, 6, 2, 10, 2, 3, 4]
print(ratings)
print(partition(ratings))
Output:
[5, 6, 2, 10, 2, 3, 4]
([5, 2, 2, 3, 4], [6, 10])
Note that this output is different from your desired one, but both are correct.
This algorithm is based on the fact that, to pick all possible subsets of a given set with N elements, you can generate all integers with N bits, and select the I-th item depending on the value of the I-th bit. I leave to you to add a couple of lines in order to stop as soon as the best_distance is zero (because it can't get any better, of course).
A bit on bits (note that 0b is the prefix for a binary number in Python):
A binary number: 0b0111001 == 0·2⁶+1·2⁵+1·2⁴+1·2³+0·2²+0·2¹+1·2⁰ == 57
Right shifted by 1: 0b0111001 >> 1 == 0b011100 == 28
Left shifted by 1: 0b0111001 << 1 == 0b01110010 == 114
Right shifted by 4: 0b0111001 >> 4 == 0b011 == 3
Bitwise & (and): 0b00110 & 0b10101 == 0b00100
To check whether the 5th bit (index 4) is 1: (0b0111001 >> 4) & 1 == 0b011 & 1 == 1
A one followed by 7 zeros: 1 << 7 == 0b10000000
7 ones: (1 << 7) - 1 == 0b10000000 - 1 == 0b1111111
All 3-bit combinations: 0b000==0, 0b001==1, 0b010==2, 0b011==3, 0b100==4, 0b101==5, 0b110==6, 0b111==7 (note that 0b111 + 1 == 0b1000 == 1 << 3)

The following algorithm does this:
sorts the items
puts even members in list a, odd in list b to start
randomly moves and swaps items between a and b if the change is for the better
I have added print statements to show the progress on your example list:
# -*- coding: utf-8 -*-
"""
Created on Fri Dec 6 18:10:07 2019
#author: Paddy3118
"""
from random import shuffle, random, randint
#%%
items = [5, 6, 2, 10, 2, 3, 4]
def eq(a, b):
"Equal enough"
return int(abs(a - b)) == 0
def fair_partition(items, jiggles=100):
target = sum(items) / 2
print(f" Target sum: {target}")
srt = sorted(items)
a = srt[::2] # every even
b = srt[1::2] # every odd
asum = sum(a)
bsum = sum(b)
n = 0
while n < jiggles and not eq(asum, target):
n += 1
if random() <0.5:
# move from a to b?
if random() <0.5:
a, b, asum, bsum = b, a, bsum, asum # Switch
shuffle(a)
trial = a[0]
if abs(target - (bsum + trial)) < abs(target - bsum): # closer
b.append(a.pop(0))
asum -= trial
bsum += trial
print(f" Jiggle {n:2}: Delta after Move: {abs(target - asum)}")
else:
# swap between a and b?
apos = randint(0, len(a) - 1)
bpos = randint(0, len(b) - 1)
trya, tryb = a[apos], b[bpos]
if abs(target - (bsum + trya - tryb)) < abs(target - bsum): # closer
b.append(trya) # adds to end
b.pop(bpos) # remove what is swapped
a.append(tryb)
a.pop(apos)
asum += tryb - trya
bsum += trya - tryb
print(f" Jiggle {n:2}: Delta after Swap: {abs(target - asum)}")
return sorted(a), sorted(b)
if __name__ == '__main__':
for _ in range(5):
print('\nFinal:', fair_partition(items), '\n')
Output:
Target sum: 16.0
Jiggle 1: Delta after Swap: 2.0
Jiggle 7: Delta after Swap: 0.0
Final: ([2, 3, 5, 6], [2, 4, 10])
Target sum: 16.0
Jiggle 4: Delta after Swap: 0.0
Final: ([2, 4, 10], [2, 3, 5, 6])
Target sum: 16.0
Jiggle 9: Delta after Swap: 3.0
Jiggle 13: Delta after Move: 2.0
Jiggle 14: Delta after Swap: 1.0
Jiggle 21: Delta after Swap: 0.0
Final: ([2, 3, 5, 6], [2, 4, 10])
Target sum: 16.0
Jiggle 7: Delta after Swap: 3.0
Jiggle 8: Delta after Move: 1.0
Jiggle 13: Delta after Swap: 0.0
Final: ([2, 3, 5, 6], [2, 4, 10])
Target sum: 16.0
Jiggle 5: Delta after Swap: 0.0
Final: ([2, 4, 10], [2, 3, 5, 6])

Since I know I have to generate all possible lists, I need to make a "helper" function to help generate all possibilities. After doing that, I true to check for the minimum difference, and the combination of lists with that minimum difference is the desired solution.
The helper function is recursive, and check for all possibilities of combinations of lists.
def partition(ratings):
def helper(ratings, left, right, aux_list, current_index):
if current_index >= len(ratings):
aux_list.append((left, right))
return
first = ratings[current_index]
helper(ratings, left + [first], right, aux_list, current_index + 1)
helper(ratings, left, right + [first], aux_list, current_index + 1)
#l contains all possible sublists
l = []
helper(ratings, [], [], l, 0)
set1 = []
set2 = []
#set mindiff to a large number
mindiff = 1000
for sets in l:
diff = abs(sum(sets[0]) - sum(sets[1]))
if diff < mindiff:
mindiff = diff
set1 = sets[0]
set2 = sets[1]
return (set1, set2)
Examples:
r = [1, 2, 2, 3, 5, 4, 2, 4, 5, 5, 2], the optimal partition would be: ([1, 2, 2, 3, 5, 4], [2, 4, 5, 5, 2]) with a difference of 1.
r = [73, 7, 44, 21, 43, 42, 92, 88, 82, 70], the optimal partition would be: ([73, 7, 21, 92, 88], [44, 43, 42, 82, 70]) with a difference of 0.

Here is a fairly elaborate example, intended for educational purposes rather than performance. It introduces some interesting Python concepts such as list comprehensions and generators, as well as a good example of recursion in which fringe cases need to be checked appropriately. Extensions, e.g. only teams with an equal number of players are valid, are easy to implement in the appropriate individual functions.
def listFairestWeakTeams(ratings):
current_best_weak_team_rating = -1
fairest_weak_teams = []
for weak_team in recursiveWeakTeamGenerator(ratings):
weak_team_rating = teamRating(weak_team, ratings)
if weak_team_rating > current_best_weak_team_rating:
fairest_weak_teams = []
current_best_weak_team_rating = weak_team_rating
if weak_team_rating == current_best_weak_team_rating:
fairest_weak_teams.append(weak_team)
return fairest_weak_teams
def recursiveWeakTeamGenerator(
ratings,
weak_team=[],
current_applicant_index=0
):
if not isValidWeakTeam(weak_team, ratings):
return
if current_applicant_index == len(ratings):
yield weak_team
return
for new_team in recursiveWeakTeamGenerator(
ratings,
weak_team + [current_applicant_index],
current_applicant_index + 1
):
yield new_team
for new_team in recursiveWeakTeamGenerator(
ratings,
weak_team,
current_applicant_index + 1
):
yield new_team
def isValidWeakTeam(weak_team, ratings):
total_rating = sum(ratings)
weak_team_rating = teamRating(weak_team, ratings)
optimal_weak_team_rating = total_rating // 2
if weak_team_rating > optimal_weak_team_rating:
return False
elif weak_team_rating * 2 == total_rating:
# In case of equal strengths, player 0 is assumed
# to be in the "weak" team
return 0 in weak_team
else:
return True
def teamRating(team_members, ratings):
return sum(memberRatings(team_members, ratings))
def memberRatings(team_members, ratings):
return [ratings[i] for i in team_members]
def getOpposingTeam(team, ratings):
return [i for i in range(len(ratings)) if i not in team]
ratings = [5, 6, 2, 10, 2, 3, 4]
print("Player ratings: ", ratings)
print("*" * 40)
for option, weak_team in enumerate(listFairestWeakTeams(ratings)):
strong_team = getOpposingTeam(weak_team, ratings)
print("Possible partition", option + 1)
print("Weak team members: ", weak_team)
print("Weak team ratings: ", memberRatings(weak_team, ratings))
print("Strong team members:", strong_team)
print("Strong team ratings:", memberRatings(strong_team, ratings))
print("*" * 40)
Output:
Player ratings: [5, 6, 2, 10, 2, 3, 4]
****************************************
Possible partition 1
Weak team members: [0, 1, 2, 5]
Weak team ratings: [5, 6, 2, 3]
Strong team members: [3, 4, 6]
Strong team ratings: [10, 2, 4]
****************************************
Possible partition 2
Weak team members: [0, 1, 4, 5]
Weak team ratings: [5, 6, 2, 3]
Strong team members: [2, 3, 6]
Strong team ratings: [2, 10, 4]
****************************************
Possible partition 3
Weak team members: [0, 2, 4, 5, 6]
Weak team ratings: [5, 2, 2, 3, 4]
Strong team members: [1, 3]
Strong team ratings: [6, 10]
****************************************

Given that you want even teams you know the target score of the ratings of each team. This is the sum of the ratings divided by 2.
So the following code should do what you want.
from itertools import combinations
ratings = [5, 6, 2, 10, 2, 3, 4]
target = sum(ratings)/2
difference_dictionary = {}
for i in range(1, len(ratings)):
for combination in combinations(ratings, i):
diff = sum(combination) - target
if diff >= 0:
difference_dictionary[diff] = difference_dictionary.get(diff, []) + [combination]
# get min difference to target score
min_difference_to_target = min(difference_dictionary.keys())
strong_ratings = difference_dictionary[min_difference_to_target]
first_strong_ratings = [x for x in strong_ratings[0]]
weak_ratings = ratings.copy()
for strong_rating in first_strong_ratings:
weak_ratings.remove(strong_rating)
Output
first_strong_ratings
[6, 10]
weak_rating
[5, 2, 2, 3, 4]
There is other splits that have the same fairness these are all available to find inside the strong_ratings tuple, I just choose to look at the first one as this will always exist for any ratings list that you pass in (provided len(ratings) > 1).

A greedy solution might yield a sub-optimal solution. Here is a fairly simple greedy solution, the idea is to sort the list in descending order in order to decrease the effect of the addition of ratings in the bucket. Rating will be added to that bucket whose total rating sum is less
lis = [5, 6, 2, 10, 2, 3, 4]
lis.sort()
lis.reverse()
bucket_1 = []
bucket_2 = []
for item in lis:
if sum(bucket_1) <= sum(bucket_2):
bucket_1.append(item)
else:
bucket_2.append(item)
print("Bucket 1 : {}".format(bucket_1))
print("Bucket 2 : {}".format(bucket_2))
Output :
Bucket 1 : [10, 4, 2]
Bucket 2 : [6, 5, 3, 2]
Edit:
Another approach will be to generate all possible subsets of the list. Let's say you have l1 which is one of the subsets of the list, then you can easily get list l2 such that l2 = list(original) - l1. Number of all possible subset of the list of size n is 2^n. We can denote them as seq of an integer from 0 to 2^n -1. Take an example, say you have list = [1, 3, 5] then no of possible combination is 2^3 i.e 8. Now we can write all combination as follow:
000 - [] - 0
001 - [1] - 1
010 - [3] - 2
011 - [1,3] - 3
100 - [5] - 4
101 - [1,5] - 5
110 - [3,5]- 6
111 - [1,3,5] - 7
and l2, in this case, can easily be obtained by taking xor with 2^n-1.
Solution:
def sum_list(lis, n, X):
"""
This function will return sum of all elemenst whose bit is set to 1 in X
"""
sum_ = 0
# print(X)
for i in range(n):
if (X & 1<<i ) !=0:
# print( lis[i], end=" ")
sum_ += lis[i]
# print()
return sum_
def return_list(lis, n, X):
"""
This function will return list of all element whose bit is set to 1 in X
"""
new_lis = []
for i in range(n):
if (X & 1<<i) != 0:
new_lis.append(lis[i])
return new_lis
lis = [5, 6, 2, 10, 2, 3, 4]
n = len(lis)
total = 2**n -1
result_1 = 0
result_2 = total
result_1_sum = 0
result_2_sum = sum_list(lis,n, result_2)
ans = total
for i in range(total):
x = (total ^ i)
sum_x = sum_list(lis, n, x)
sum_y = sum_list(lis, n, i)
if abs(sum_x-sum_y) < ans:
result_1 = x
result_2 = i
result_1_sum = sum_x
result_2_sum = sum_y
ans = abs(result_1_sum-result_2_sum)
"""
Produce resultant list
"""
bucket_1 = return_list(lis,n,result_1)
bucket_2 = return_list(lis, n, result_2)
print("Bucket 1 : {}".format(bucket_1))
print("Bucket 2 : {}".format(bucket_2))
Output :
Bucket 1 : [5, 2, 2, 3, 4]
Bucket 2 : [6, 10]

Related

using recursion to find the integer appearing odd times

I am looking for some guidance with the following code please. I am learning Python and I come from Java and C# where I was a beginner. I want to write a function which returns the number which appears an odd number of times. Assumption is that the array is always greater than 1 and there is always only one integer appearing an odd number of times. I want to use recursion.
The function does not return a value as when I store the result I get a NoneType. Please, I am not looking for a solution but some advice of where to look and how to think when debugging.
def find_it(seq):
seqSort = seq
seqSort.sort()
def recurfinder(arg,start,end):
seqSort = arg
start = 0
end = seqSort.length()-1
for i in range(start,end):
counter = 1
pos = 0
if seqSort[i+1] == seqSort[i]:
counter+=1
pos = counter -1
else:
if(counter % 2 == 0):
recurfinder(seqSort, pos+1, end)
else:
return seqSort[i]
return -1
You need to actually call recurFinder from somewhere outside of recurFinder to get the ball rolling.
def getOddOccurrence(arr, arr_size):
for i in range(0, arr_size):
count = 0
for j in range(0, arr_size):
if arr[i] == arr[j]:
count+= 1
if (count % 2 != 0):
return arr[i]
return -1
arr = [2, 3, 5, 4, 5, 2, 4, 3, 5, 2, 4, 4, 2 ]
n = len(arr)
print(getOddOccurrence(arr, n))
This answer uses recursion and a dict for fast counter lookups -
def find_it(a = [], i = 0, d = {}):
if i >= len(a):
return [ n for (n, count) in d.items() if count % 2 == 1 ]
else:
d = d.copy()
d[a[i]] = d.get(a[i], 0) + 1
return find_it(a, i + 1, d)
It works like this -
print(find_it([ 1, 2, 2, 2, 3, 3, 4, 5, 5, 5, 5 ]))
# [ 1, 2, 4 ]
print(find_it([ 1, 2, 3 ]))
# [ 1, 2, 3 ]
print(find_it([ 1, 1, 2, 2, 3, 3 ]))
# []
print(find_it([]))
# []
Above i and d are exposed at the call-site. Additionally, because we're relying on Python's default arguments, we have to call d.copy() to avoid mutating d. Using an inner loop mitigates both issues -
def find_it(a = []):
def loop(i, d):
if i >= len(a):
return [ n for (n, count) in d.items() if count % 2 == 1 ]
else:
d = d.copy()
d[a[i]] = d.get(a[i], 0) + 1
return loop(i + 1, d)
return loop(0, {})
It works the same as above.

How do I improve remove duplicate algorithm?

My interview question was that I need to return the length of an array that removed duplicates but we can leave at most 2 duplicates.
For example, [1, 1, 1, 2, 2, 3] the new array would be [1, 1, 2, 2, 3]. So the new length would be 5. I came up with an algorithm with O(2n) I believe. How can I improve that to be the fastest.
def removeDuplicates(nums):
if nums is None:
return 0
if len(nums) == 0:
return 0
if len(nums) == 1:
return 1
new_array = {}
for num in nums:
new_array[num] = new_array.get(num, 0) + 1
new_length = 0
for key in new_array:
if new_array[key] > 2:
new_length = new_length + 2
else:
new_length = new_length + new_array[key]
return new_length
new_length = removeDuplicates([1, 1, 1, 2, 2, 3])
assert new_length == 5
My first question would be is my algorithm even correct?
Your logic is correct however he is a simpler method to reach the goal you had mentioned in your question.
Here is my logic.
myl = [1, 1, 1, 2, 2, 3, 1, 1, 1, 2, 2, 3, 1, 1, 1, 2, 2, 3]
newl = []
for i in myl:
if newl.count(i) != 2:
newl.append(i)
print newl
[1, 1, 2, 2, 3, 3]
Hope this helps.
If your original array size is n.
Count distinct numbers in your array.
If you have d distinct numbers, then your answer will be
d (when n == d)
d+1 (when n == d+1)
d+2 (when n >= d+2)
If all the numbers in your array are less than n-1, you can even solve this without using any extra space. If that's the case check this and you can count distinct numbers very easily without using extra space.
I'd forget about generating the new array and just focus on counting:
from collections import Counter
def count_non_2dups(nums):
new_len = 0
for num, count in Counter(nums).items():
new_len += min(2, count)
return new_len
int removeDuplicates(vector<int>& nums) {
if (nums.size() == 0) return nums.size();
int state = 1;
int idx = 1;
for (int i = 1; i < nums.size(); ++i) {
if (nums[i] != nums[i-1]) {
state = 1;
nums[idx++] = nums[i];
}
else if (state == 1) {
state++;
nums[idx++] = nums[i];
}
else {
state++;
}
}
return idx;
}
idea:maintain a variable(state) recording the current repeat times(more precisely, state records the repeat times of the element which adjacent to the left of current element). This algorithm is O(n) with one scanning of the array.
def removeDuplicates(nums):
if nums is None:
return 0
if len(nums) == 0:
return 0
if len(nums) == 1:
return 1
new_array_a = set()
new_array_b = set()
while nums:
i = nums.pop()
if i not in new_array_a:
new_array_a.add(i)
elif i not in new_array_b:
new_array_b.add(i)
return len(new_array_a) + len(new_array_b)

Find the number of consecutively increasing elements in a list

I got a problem in TalentBuddy, which sounds like this
A student's performance in lab activities should always improve, but that is not always the case.
Since progress is one of the most important metrics for a student, let’s write a program that computes the longest period of increasing performance for any given student.
For example, if his grades for all lab activities in a course are: 9, 7, 8, 2, 5, 5, 8, 7 then the longest period would be 4 consecutive labs (2, 5, 5, 8).
So far, I seem too confused to work the code. The only thing that I worked is
def longest_improvement(grades):
res = 0
for i in xrange(len(grades) - 2):
while grades[i] <= grades[i + 1]:
res += 1
i += 1
print res
But that prints 17, rather than 6 when grades = [1, 7, 2, 5, 6, 9, 11, 11, 1, 6, 1].
How to work out the rest of the code? Thanks
Solved with some old-fashioned tail-recursion:
grades = [1, 7, 2, 5, 6, 9, 11, 11, 1, 6, 1]
def streak(grades):
def streak_rec(longest, challenger, previous, rest):
if rest == []: # Base case
return max(longest, challenger)
elif previous <= rest[0]: # Streak continues
return streak_rec(longest, challenger + 1, rest[0], rest[1:])
else: # Streak is reset
return streak_rec(max(longest, challenger), 1, rest[0], rest[1:])
return streak_rec(0, 0, 0, grades)
print streak(grades) # => 6
print streak([2]) # => 1
Since the current solution involves yield and maps and additional memory overhead, it's probably a good idea to at least mention the simple solution:
def length_of_longest_sublist(lst):
max_length, cur_length = 1, 1
prev_val = lst[0]
for val in lst[1:]:
if val >= prev_val :
cur_length += 1
else:
max_length = max(max_length, cur_length)
cur_length = 1
prev_val = val
return max(max_length, cur_length)
We could reduce that code by getting the previous value directly:
def length_of_longest_sublist2(lst):
max_length, cur_length = int(bool(lst)), int(bool(lst))
for prev_val, val in zip(lst, lst[1:]):
if val >= prev_val:
cur_length += 1
else:
max_length = max(max_length, cur_length)
cur_length = 1
return max(max_length, cur_length)
which is a nice trick to know (and allows it to easily return the right result for an empty list), but confusing to people who don't know the idiom.
This method uses fairly basic python and the return statement can be quickly modified so that you have a list of all the streak lengths.
def longest_streak(grades):
if len(grades) < 2:
return len(grades)
else:
start, streaks = -1, []
for idx, (x, y) in enumerate(zip(grades, grades[1:])):
if x > y:
streaks.append(idx - start)
start = idx
else:
streaks.append(idx - start + 1)
return max(streaks)
I would solve it this way:
from itertools import groupby
from funcy import pairwise, ilen
def streak(grades):
if len(grades) <= 1:
return len(grades)
orders = (x <= y for x, y in pairwise(grades))
return max(ilen(l) for asc, l in groupby(orders) if asc) + 1
Very explicit: orders is an iterator of Trues for ascending pairs and Falses for descending ones. Then we need just find a longest list of ascending and add 1.
You're using the same res variable in each iteration of the inner while loop. You probably want to reset it, and keep the highest intermediate result in a different variable.
Little bit late, but here's my Updated version:
from funcy import ilen, ireductions
def streak(last, x):
if last and x >= last[-1]:
last.append(x)
return last
return [x]
def longest_streak(grades):
xs = map(ilen, ireductions(streak, grades, None))
return xs and max(xs) or 1
grades = [1, 7, 2, 5, 6, 9, 11, 11, 1, 6, 1]
print longest_streak(grades)
print longest_streak([2])
I decided in the end to not only produce a correct
version without bugs, but to use a library I quite like funcy :)
Output:
6
1
Maybe not as efficient as previous answers, but it's short :P
diffgrades = np.diff(grades)
maxlen = max([len(list(g)) for k,g in groupby(diffgrades, lambda x: x >= 0) if k]) + 1
Building on the idea of #M4rtini to use itertools.groupby.
def longest_streak(grades):
from itertools import groupby
if len(grade) > 1:
streak = [x <= y for x, y in zip(grades,grades[1:])]
return max([sum(g, 1) for k, g in groupby(streak) if k])
else:
return len(grades)

Python - Memoization and Collatz Sequence

When I was struggling to do Problem 14 in Project Euler, I discovered that I could use a thing called memoization to speed up my process (I let it run for a good 15 minutes, and it still hadn't returned an answer). The thing is, how do I implement it? I've tried to, but I get a keyerror(the value being returned is invalid). This bugs me because I am positive I can apply memoization to this and get this faster.
lookup = {}
def countTerms(n):
arg = n
count = 1
while n is not 1:
count += 1
if not n%2:
n /= 2
else:
n = (n*3 + 1)
if n not in lookup:
lookup[n] = count
return lookup[n], arg
print max(countTerms(i) for i in range(500001, 1000000, 2))
Thanks.
There is also a nice recursive way to do this, which probably will be slower than poorsod's solution, but it is more similar to your initial code, so it may be easier for you to understand.
lookup = {}
def countTerms(n):
if n not in lookup:
if n == 1:
lookup[n] = 1
elif not n % 2:
lookup[n] = countTerms(n / 2)[0] + 1
else:
lookup[n] = countTerms(n*3 + 1)[0] + 1
return lookup[n], n
print max(countTerms(i) for i in range(500001, 1000000, 2))
The point of memoising, for the Collatz sequence, is to avoid calculating parts of the list that you've already done. The remainder of a sequence is fully determined by the current value. So we want to check the table as often as possible, and bail out of the rest of the calculation as soon as we can.
def collatz_sequence(start, table={}): # cheeky trick: store the (mutable) table as a default argument
"""Returns the Collatz sequence for a given starting number"""
l = []
n = start
while n not in l: # break if we find ourself in a cycle
# (don't assume the Collatz conjecture!)
if n in table:
l += table[n]
break
elif n%2 == 0:
l.append(n)
n = n//2
else:
l.append(n)
n = (3*n) + 1
table.update({n: l[i:] for i, n in enumerate(l) if n not in table})
return l
Is it working? Let's spy on it to make sure the memoised elements are being used:
class NoisyDict(dict):
def __getitem__(self, item):
print("getting", item)
return dict.__getitem__(self, item)
def collatz_sequence(start, table=NoisyDict()):
# etc
In [26]: collatz_sequence(5)
Out[26]: [5, 16, 8, 4, 2, 1]
In [27]: collatz_sequence(5)
getting 5
Out[27]: [5, 16, 8, 4, 2, 1]
In [28]: collatz_sequence(32)
getting 16
Out[28]: [32, 16, 8, 4, 2, 1]
In [29]: collatz_sequence.__defaults__[0]
Out[29]:
{1: [1],
2: [2, 1],
4: [4, 2, 1],
5: [5, 16, 8, 4, 2, 1],
8: [8, 4, 2, 1],
16: [16, 8, 4, 2, 1],
32: [32, 16, 8, 4, 2, 1]}
Edit: I knew it could be optimised! The secret is that there are two places in the function (the two return points) that we know l and table share no elements. While previously I avoided calling table.update with elements already in table by testing them, this version of the function instead exploits our knowledge of the control flow, saving lots of time.
[collatz_sequence(x) for x in range(500001, 1000000)] now times around 2 seconds on my computer, while a similar expression with #welter's version clocks in 400ms. I think this is because the functions don't actually compute the same thing - my version generates the whole sequence, while #welter's just finds its length. So I don't think I can get my implementation down to the same speed.
def collatz_sequence(start, table={}): # cheeky trick: store the (mutable) table as a default argument
"""Returns the Collatz sequence for a given starting number"""
l = []
n = start
while n not in l: # break if we find ourself in a cycle
# (don't assume the Collatz conjecture!)
if n in table:
table.update({x: l[i:] for i, x in enumerate(l)})
return l + table[n]
elif n%2 == 0:
l.append(n)
n = n//2
else:
l.append(n)
n = (3*n) + 1
table.update({x: l[i:] for i, x in enumerate(l)})
return l
PS - spot the bug!
This is my solution to PE14:
memo = {1:1}
def get_collatz(n):
if n in memo : return memo[n]
if n % 2 == 0:
terms = get_collatz(n/2) + 1
else:
terms = get_collatz(3*n + 1) + 1
memo[n] = terms
return terms
compare = 0
for x in xrange(1, 999999):
if x not in memo:
ctz = get_collatz(x)
if ctz > compare:
compare = ctz
culprit = x
print culprit

python - 2 lists, and finding maximum product from 2 lists

I have two lists made of numbers(integers); both have 2 million unique elements.
I want to find number a from list 1 and b from list 2, that -
1)a*b should be maximized.
2)a*b has to be smaller than certain limit.
here's what I came up with:
maxpq = 0
nums = sorted(nums, reverse=True)
nums2 = sorted(nums2, reverse=True)
for p in nums:
n = p*dropwhile(lambda q: p*q>sqr, nums2).next()
if n>maxpq:
maxpq=n
print maxpq
any suggestions?
edit : my method is too slow. It would take more than one day.
Here's a linear-time solution (after sorting):
def maximize(a, b, lim):
a.sort(reverse=True)
b.sort()
found = False
best = 0
j = 0
for i in xrange(len(a)):
while j < len(b) and a[i] * b[j] < lim:
found = True
if a[i]*b[j] > best:
best, n1, n2 = a[i] * b[j], a[i], b[j]
j += 1
return found and (best, n1, n2)
Simply put:
start from the highest and lowest from each list
while their product is less than the target, advance the small-item
once the product becomes bigger than your goal, advance the big-item until it goes below again
This way, you're guaranteed to go through each list only once. It'll return False if it couldn't find anything small enough, otherwise it'll return the product and the pair that produced it.
Sample output:
a = [2, 5, 4, 3, 6]
b = [8, 1, 5, 4]
maximize(a, b, 2) # False
maximize(a, b, 3) # (2, 2, 1)
maximize(a, b, 10) # (8, 2, 4)
maximize(a, b, 100) # (48, 6, 8)
Thanks for everyone's advices and ideas. I finally came up with useful solution. Mr inspectorG4dget shone a light on this one.
It uses bisect module from python's standard library.
edit : bisect module does binary search in order to find insert position of a value in a sorted list. therefore It reduces number of compares, unlike my previous solution.
http://www.sparknotes.com/cs/searching/binarysearch/section1.rhtml
import bisect
def bisect_find(num1, num2, limit):
num1.sort()
max_ab = 0
for a in num2:
complement = limit / float(a)
b = num1[bisect.bisect(num1, complement)-1]
if limit > a*b > max_ab:
max_ab=b*a
return max_ab
This might be faster.
def doer(L1, L2, ceil):
max_c = ceil - 1
L1.sort(reverse=True)
L2.sort(reverse=True)
big_a = big_b = big_c = 0
for a in L1:
for b in L2:
c = a * b
if c == max_c:
return a, b
elif max_c > c > big_c:
big_a = a
big_b = b
big_c = c
return big_a, big_b
print doer([1, 3, 5, 10], [8, 7, 3, 6], 60)
Note that it sorts the lists in-place; this is faster, but may or may not be appropriate in your scenario.

Categories