I have a list such as
[0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]
I want to find start and end position where element are positive:
[[3,5],[9,12],[16,18]]
what is the best way to do this in python?
(build in function in python such as: find,lambda,itemgetter and so on.)
And lastly, regex version. ;)
input = [0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]
input3 = str(list(
map(lambda i_x: i_x[0] * (i_x[1] and (1, -1)[i_x[1] < 0]), enumerate(input))
))
import re
s = re.sub(r'([\[ ]0[\],])+', ' ', input3)
s = s.replace(', ', '], [')
if s[-1:] != ']':
s = s[:-2] + ']'
s = '[' + s[2:]
s = re.sub(r' [0-9]+,', '', s)
output = list(eval(s))
print(output) # [[3, 5], [9, 12], [16, 18]]
Crude for solution. :(
input = [0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]
output = []
pair = []
for i in range(len(input)):
if input[i] > 0:
if len(pair) > 1:
pair.pop()
pair.append(i)
else:
if pair:
output.append(pair)
pair = []
print(output) # [[3, 5], [9, 12], [16, 18]]
Not sure if ranges can go off the ends of the array or not.
def get_positive_ranges(a):
in_range = False
result = []
for i in range(len(a)):
if not in_range:
if a[i] > 0:
in_range = True
first = i
else: # Inside a range
if a[i] <= 0: # End of range
in_range = False
result.append([first, i - 1])
if in_range: # Tidy
result.append([first, i])
return result
print(get_positive_ranges([0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]))
print(get_positive_ranges([]))
print(get_positive_ranges([1]))
print(get_positive_ranges([0, 1]))
print(get_positive_ranges([0, 1, 0]))
Here is a numpy solution, not sure if this is better than a naive for-loop though; see inline comments for explanation.
import numpy as np
a = np.array([0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0])
# get indices of non-zero elements in a
nze = a.nonzero()[0]
# check where the differences of these indices are unequal to one; there you have a jump to/from 0
nze_diff = np.where(np.diff(nze) > 1)[0] + 1
# if a starts with 0, add the index 0
if nze_diff[0] != 0:
nze_diff = np.insert(nze_diff, 0, 0)
# store output
res = []
# loop through the indices and add the desired slices
for ix, i in enumerate(nze_diff):
try:
sl = nze[i:nze_diff[ix + 1]]
res.append([sl[0], sl[-1]])
# means we reached the end of nze_diff
except IndexError:
sl = nze[i:]
res.append([sl[0], sl[-1]])
If you run it for your a, you receive the desired output:
[[3, 5], [9, 12], [16, 18]]
There are probably smarter solutions than this, but this might get you started.
If you want to get the entire range, it simplifies a bit:
res2 = []
for ix, i in enumerate(nze_diff):
try:
res2.append(nze[i:nze_diff[ix + 1]])
except IndexError:
res2.append(nze[i:])
Then res2 would be:
[array([3, 4, 5]), array([ 9, 10, 11, 12]), array([16, 17, 18])]
This works
lst = [0, 0, 0, 12, 34, 86, 0, 0, 0, 95, 20, 1, 6, 0, 0, 0, 11, 24, 67, 0, 0, 0]
n = len(lst)
starting_points = [i for i in range(n) if lst[i] > 0 and (lst[i - 1] == 0 or i == 0)]
end_points = [next((i for i in range(j + 1, n) if lst[i] == 0), n) - 1 for j in starting_points]
print zip(starting_points, end_points)
output
[(3, 5), (9, 12), (16, 18)]
If performance is the key, you should test which implementaion is the most fast with your very long list. Anyway, this is 'no array acess by index' version, hopefully for boosting speed. And it uses map, lambda, index(find), if it pleases you. Though, of course it uses while.
input = [0,0,0,12,34,86,0,0,0,95,20,1,6,0,0,0,11,24,67,0,0,0]
output = []
input2 = list(map(lambda x: x and (1, -1)[x < 0], input)) # mapping by 'math.sign'-like func
start = end = 0
while end < len(input2):
try:
start = input2.index(1, end + 1)
end = input2.index(0, start) - 1
output.append([start, end])
except ValueError:
break
if start >= end:
output.append([start, len(input2) - 1])
print(output) # [[3, 5], [9, 12], [16, 18]]
Related
For each element in a list I want to add the value before and after the element and append the result to an empty list. The problem is that at index 0 there is no index before and at the end there is no index next. At index 0 I want to add the value of index 0 with value of index 1, and in the last index I want to add the value of the last index with the same index value. As following:
vec = [1,2,3,4,5]
newVec = []
for i in range(len(vec)):
newValue = vec[i] + vec[i+1] + vec[i-1]
# if i + 1 or i - 1 does now exist pass
newVec.append(newValue)
Expected output: newVec = [1+2, 2+1+3, 3+2+4,4+3+5,5+4]
# newVec = [3, 6, 9, 12, 9]
You have possible exceptions here, I think this code will do the trick and manage the exceptions.
vec = [1, 2, 3, 4, 5]
new_vec = []
for index, number in enumerate(vec):
new_value = number
if index != 0:
new_value += vec[index - 1]
try:
new_value += vec[index + 1]
except IndexError:
pass
new_vec.append(new_value)
Your output will look like this:
[3, 6, 9, 12, 9]
Good luck !
You can make the conditions inside the for loop
for i in range(len(vec)):
if i == 0 :
newValue = vec[i] + vec[i+1]
elif i == len(vec)-1:
newValue = vec[i] + vec[i-1]
else:
newValue = vec[i] + vec[i+1] + vec[i-1]
newVec.append(newValue)
print(newVec)
output:
[3, 6, 9, 12, 9]
You can just add 0 to either side of vec so that it's adding nothing to create an accurate result. Then just use a for i in range(1, ...) loop, starting at value 1 to add value before and after i. This is what i got for my code:
vec = [1,2,3,4,5]
newVec = []
vec.insert(0, 0)
vec.insert(len(vec) + 1, 0)
for i in range(1, len(vec) - 1):
newVec.append(vec[i-1] + vec[i] + vec[i+1])
print(newVec)
Which creates the output of:
[3, 6, 9, 12, 9]
Hope this helps.
I'm new to dynamic programming in python and this is the code I have done so far to get the numbers that give the max sum from the array. However, my code doesn't work for input array A
Here are cases:
Test cases:
A = [7,2,-3,5,-4,8,6,3,1]
B = [7,2,5,8,6]
C = [-2,3,1,10,3,-7]
Output:
A = [7,5,8,3]
B = [7,5,6]
C = [3,10]
My output works for B and C but not for array A. The output I get is this:
[7,6,1]
And Here is my code:
def max_sum(nums):
#Get the size of the array
size = len(nums)
list = []
cache = [[0 for i in range(3)] for j in range(size)]
if(size == 0):
return 0
if (size == 1):
return nums[0]
for i in range(0, size):
if(nums[i] < 0):
validate = i
if(size == validate + 1):
return []
#Create array 'cache' to store non-consecutive maximum values
#cache = [0]*(size + 1)
#base case
cache[0][2] = nums[0]
#temp = nums[0]
cache[0][1] = nums[0]
for i in range(1, size):
#temp1 = temp
cache[i][2] = nums[i] #I store the array numbers at index [I][2]
cache[i][1] = cache[i - 1][0] + nums[I] #the max sum is store here
cache[i][0] = max(cache[i - 1][1], cache[i -1][0]) #current sum is store there
maxset = 0
for i in range(0, size): #I get the max sum
if(cache[i][1] > maxset):
maxset = cache[i][1]
for i in range(0, size): #I get the first element here
if(cache[i][1] == maxset):
temp = cache[i][2]
count = 0
for i in range(0, size): # I check at what index in the nums array the index 'temp' is store
if(nums[i] != temp):
count += 1
if(size - 1 == count): #iterate through the nums array to apend the non-adjacent elements
if(count % 2 == 0):
for i in range(0, size):
if i % 2 == 0 and i < size:
list.append(nums[i])
else:
for i in range(0, size):
if i % 2 != 0 and i < size:
list.append(nums[i])
list[:]= [item for item in list if item >= 0]
return list
if __name__ == '__main__':
A = [7,2,-3,5,-4,8,6,3,1]
B = [7,2,5,8,6]
C = [-2,3,1,10,3,-7]
'''
Also, I came up with the idea to create another array to store the elements that added to the max sum, but I don't know how to do that.
Any guidance would be appreciated and thanks beforehand!
Probably not the best solution , but what about trying with recursion ?
tests = [([7, 2, -3, 5, -4, 8, 6, 3, 1], [7, 5, 8, 3]),
([7, 2, 5, 8, 6], [7, 5, 6]),
([-2, 3, 1, 10, 3, -7], [3, 10]),
([7, 2, 9, 10, 1], [7, 9, 1]),
([7, 2, 5, 18, 6], [7, 18]),
([7, 20, -3, -5, -4, 8, 60, 3, 1], [20, 60, 1]),
([-7, -20, -3, 5, -4, 8, 60, 3, 1], [5, 60, 1])]
def bigest(arr, cache, final=[0]):
if len(arr) == 0:
return cache
for i in range(len(arr)):
result = bigest(arr[i + 2:], [*cache, arr[i]], final)
if sum(cache) > sum(final):
final[:] = cache[:]
if sum(result) > sum(final):
final[:] = result[:]
return result
if __name__ == "__main__":
print("has started")
for test, answer in tests:
final = [0]
bigest(test, [], final)
assert final == answer, "not matching"
print(f"for {test} , answer: {final} ")
Here is a dynamic programming approach.
def best_skips (data):
answers = []
for i in range(len(data)):
x = data[i]
best = [0, None]
for prev in answers[0:i-1]:
if best[0] < prev[0]:
best = prev
max_sum, path = best
answers.append([max_sum + x, [x, path]])
answers.append([0, None]) # Add empty set as an answer.
path = max(answers)[1]
final = []
while path is not None:
final.append(path[0])
path = path[1]
return final
I have a list of list and I want to remove zero values that are between numbers in each list. All my lists inside my list have same lenght.
For example:
List1=[[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
desired output:
list2=[[0,1,2,3,0,0],[0,5,6,9,0]]
I was thinking about using indices to identify the first non zero value and last non zero value, but then I don't know how I can remove zeros between them.
You have the right idea, I think, with finding the first and last indices of nonzeroes and removing zeroes between them. Here's a function that does that:
def remove_enclosed_zeroes(lst):
try:
first_nonzero = next(
i
for (i, e) in enumerate(lst)
if e != 0
)
last_nonzero = next(
len(lst) - i - 1
for (i, e) in enumerate(reversed(lst))
if e != 0
)
except StopIteration:
return lst[:]
return lst[:first_nonzero] + \
[e for e in lst[first_nonzero:last_nonzero] if e != 0] + \
lst[last_nonzero:]
list1 = [[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
list2 = [remove_enclosed_zeroes(sublist) for sublist in list1]
# [[0, 1, 2, 3, 0, 0], [0, 5, 6, 9, 0]]
Inspired by #python_user I thought about this a bit more and came up with this simpler solution:
def remove_internal_zeros(lst):
return [v for i, v in enumerate(lst) if v or not any(lst[i+1:]) or not any(lst[:i])]
This works by passing any value from the original list which is either
not zero (v); or
zero and not preceded by a non-zero value (not any(lst[:i])); or
zero and not followed by a non-zero value (not any(lst[i+1:]))
It can also be written as a list comprehension:
list2 = [[v for i, v in enumerate(lst) if v or not any(lst[:i]) or not any(lst[i+1:])] for lst in list1]
Original Answer
Here's another brute force approach, this pops all the zeros off either end of the list into start and end lists, then filters the balance of the list for non-zero values:
def remove_internal_zeros(l):
start_zeros = []
# get starting zeros
v = l.pop(0)
while v == 0 and len(l) > 0:
start_zeros.append(0)
v = l.pop(0)
if len(l) == 0:
return start_zeros + [v]
l = [v] + l
# get ending zeros
end_zeros = []
v = l.pop()
while v == 0 and len(l) > 0:
end_zeros.append(0)
v = l.pop()
# filter balance of list
if len(l) == 0:
return start_zeros + [v] + end_zeros
return start_zeros + list(filter(bool, l)) + [v] + end_zeros
print(remove_internal_zeros([0,1,0,2,3,0,0]))
print(remove_internal_zeros([0,5,6,0,0,9,0]))
print(remove_internal_zeros([0,0]))
print(remove_internal_zeros([0,5,0]))
Output:
[0, 1, 2, 3, 0, 0]
[0, 5, 6, 9, 0]
[0, 0]
[0, 5, 0]
I think this has to be done with brute force.
new = []
for sub in List1:
# Find last non-zero.
for j in range(len(sub)):
if sub[-1-j]:
lastnonzero = len(sub)-j
break
print(j)
newsub = []
firstnonzero = False
for i,j in enumerate(sub):
if j:
firstnonzero = True
newsub.append(j)
elif i >= lastnonzero or not firstnonzero:
newsub.append(j)
new.append(newsub)
print(new)
Please try this, remove all 0 between numbers in each list.:
list1=[[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
rowIndex=len(list1) # count of rows
colIndex=len(list1[0]) # count of columns
for i in range(0, rowIndex):
noZeroFirstIndex = 1
noZeroLastIndex = colIndex - 2
for j in range(1, colIndex - 1):
if(list1[i][j] != 0):
noZeroFirstIndex = j
break
for j in range(colIndex -2, 0, -1):
if(list1[i][j] != 0):
noZeroLastIndex = j
break
for j in range(noZeroLastIndex, noZeroFirstIndex, -1):
if(list1[i][j] == 0 ):
del list1[i][j]
print(list1)
Result:
[[0, 1, 2, 3, 0, 0], [0, 5, 6, 9, 0]]
I wrote a pretty straight-forward approach. Try this.
def removeInnerZeroes(list):
listHold=[]
listNew = []
firstNonZeroFound = False
for item in list:
if item==0:
if firstNonZeroFound:
listHold.append(item)
else:
listNew.append(item)
else:
firstNonZeroFound=True
listHold.clear()
listNew.append(item)
listNew.extend(listHold)
return listNew
complexList = [[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
print(complexList)
complexListNew = []
for listi in complexList:
complexListNew.append(removeInnerZeroes(listi))
print(complexListNew)
One way to do it is to treat each sublist as 3 sections:
Zeros at the front, if any
Zeros at the end, if any
Numbers in the middle from which zeros are to be purged
itertools.takewhile is handy for the front and end bits.
from itertools import takewhile
List1=[[0,1,0,2,3,0,0],[0,5,6,0,0,9,0]]
def purge_middle_zeros(numbers):
is_zero = lambda x: x==0
leading_zeros = list(takewhile(is_zero, numbers))
n_lead = len(leading_zeros)
trailing_zeros = list(takewhile(is_zero, reversed(numbers[n_lead:])))
n_trail = len(trailing_zeros)
mid_numbers = numbers[n_lead:-n_trail] if n_trail else numbers[n_lead:]
mid_non_zeros = [x for x in mid_numbers if x]
return leading_zeros + mid_non_zeros + trailing_zeros
list2 = [purge_middle_zeros(sub_list) for sub_list in List1]
list2
[[0, 1, 2, 3, 0, 0], [0, 5, 6, 9, 0]]
Other notes:
the lambda function is_zero tells takewhile what the criteria are for continuing, in this case "keep taking while it's a zero"
for the mid_non_zeros section the list comprehension [x for.... ] takes all the numbers except for the zeros (the if x at the end applies the filter)
slicing notation to pick out the middle of the list, numbers[from_start:-from_end] with the negative -from_end meaning 'except for this many elements at the end'. The case where there are no trailing zeros requires a different slice expression, i.e. numbers[from_start:]
I wrote the following code out of curiosity, that is somewhat intuitive and mimics a "looking item-by-item" approach.
def remove_zeros_inbetween(list_):
new_list = list_.copy()
for j, l in enumerate(list_): # loop through the inner lists
checking = False
start = end = None
i = 0
deleted = 0
while i < len(l): # loop through the values of an inner list
if l[i] == 0: # ignore
i += 1
continue
if l[i] != 0 and not checking: # non-zero value found
checking = True # start checking for zeros
start = i
elif l[i] != 0 and checking: # if got here and checking, the finish checking
checking = False
end = i
if start and end: # if both values have been set, i.e, different to None
# delete values in-between
new_list[j] = new_list[j][:(start+1-deleted)] + new_list[j][(end-deleted):]
deleted += end - start - 1
if l[i] != 0: # for the case of two non-zero values
start = i
checking = True
else:
i = end # ignore everything up to end
end = None # restart check
i += 1
return new_list
>>> remove_zeros_inbetween([[0, 1, 0, 2, 3, 0, 5], [0, 5, 6, 0, 0, 9, 4]])
[[0, 1, 2, 3, 5], [0, 5, 6, 9, 4]]
>>> remove_zeros_inbetween([[0, 0], [0, 3, 0], [0]]))
[[0, 0], [0, 3, 0], [0]]
>>> remove_zeros_inbetween([[0, 0, 0, 0]]))
[[0, 0, 0, 0]]
You start by replacing 0 by "0" - which is not necessary. Secondly your filter call does not save the resulting list; try:
list1[i] = list(filter(lambda a: a !=0, list1[1:-1])) # changed indexing , I suppose this could work
Let's say that I have an array of number S = [6, 2, 1, 7, 4, 3, 9, 5, 3, 1]. I want to divide into three arrays. The order of the number and the number of item in those array does not matter.
Let's say A1, A2, and A3 are the sub arrays. I want to minimize the function
f(x) = ( SUM(A1) - SUM(S) / 3 )^2 / 3 +
( SUM(A2) - SUM(S) / 3 )^2 / 3 +
( SUM(A3) - SUM(S) / 3 )^2 / 3
I don't need an optimal solution; I just need the solution that is good enough.
I don't want an algorithm that is too slow. I can trade some speed for a better result, but I cannot trade too much.
The length of S is around 10 to 30.
Why
Why do I need to solve this problem? I want to nicely arrange the box into three columns such that the total height of each columns is not too different from each other.
What have I tried
My first instinct is to use greedy. The result is not that bad, but it does not ensure an optimal solution. Is there a better way?
s = [6, 2, 1, 7, 4, 3, 9, 5, 3, 1]
s = sorted(s, reverse=True)
a = [[], [], []]
sum_a = [0, 0, 0]
for x in s:
i = sum_a.index(min(sum_a))
sum_a[i] += x
a[i].append(x)
print(a)
As you said you don't mind a non-optimal solution, I though I would re-use your initial function, and add a way to find a good starting arrangement for your initial list s
Your initial function:
def pigeon_hole(s):
a = [[], [], []]
sum_a = [0, 0, 0]
for x in s:
i = sum_a.index(min(sum_a))
sum_a[i] += x
a[i].append(x)
return map(sum, a)
This is a way to find a sensible initial ordering for your list, it works by creating rotations of your list in sorted and reverse sorted order. The best rotation is found by minimizing the standard deviation, once the list has been pigeon holed:
def rotate(l):
l = sorted(l)
lr = l[::-1]
rotation = [np.roll(l, i) for i in range(len(l))] + [np.roll(lr, i) for i in range(len(l))]
blocks = [pigeon_hole(i) for i in rotation]
return rotation[np.argmin(np.std(blocks, axis=1))] # the best rotation
import random
print pigeon_hole(rotate([random.randint(0, 20) for i in range(20)]))
# Testing with some random numbers, these are the sums of the three sub lists
>>> [64, 63, 63]
Although this could be optimized further it is quite quick taking 0.0013s for 20 numbers. Doing a quick comparison with #Mo Tao's answer, using a = rotate(range(1, 30))
# This method
a = rotate(range(1, 30))
>>> [[29, 24, 23, 18, 17, 12, 11, 6, 5], [28, 25, 22, 19, 16, 13, 10, 7, 4, 1], [27, 26, 21, 20, 15, 14, 9, 8, 3, 2]]
map(sum, a)
# Sum's to [145, 145, 145] in 0.002s
# Mo Tao's method
>>> [[25, 15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1], [29, 26, 20, 19, 18, 17, 16], [28, 27, 24, 23, 22, 21]]
# Sum's to [145, 145, 145] in 1.095s
This method also seems to find the optimal solution in many cases, although this probably wont hold for all cases. Testing this implementation 500 times using a list of 30 numbers against Mo Tao's answer, and comparing if the sub-lists sum to the same quantity:
c = 0
for i in range(500):
r = [random.randint(1, 10) for j in range(30)]
res = pigeon_hole(rotate(r))
d, e = sorted(res), sorted(tao(r)) # Comparing this to the optimal solution by Mo Tao
if all([k == kk] for k, kk in zip(d, e)):
c += 1
memory = {}
best_f = pow(sum(s), 3)
best_state = None
>>> 500 # (they do)
I thought I would provide an update with a more optimized version of my function here:
def rotate2(l):
# Calculate an acceptable minimum stdev of the pigeon holed list
if sum(l) % 3 == 0:
std = 0
else:
std = np.std([0, 0, 1])
l = sorted(l, reverse=True)
best_rotation = None
best_std = 100
for i in range(len(l)):
rotation = np.roll(l, i)
sd = np.std(pigeon_hole(rotation))
if sd == std:
return rotation # If a min stdev if found
elif sd < best_std:
best_std = sd
best_rotation = rotation
return best_rotation
The main change is that the search for a good ordering stops once a suitable rotation has been found. Also only the reverse sorted list is searched which doesnt appear to alter the result. Timing this with
print timeit.timeit("rotate2([random.randint(1, 10) for i in range(30)])", "from __main__ import rotate2, random", number=1000) / 1000.
results in a large speed up. On my current computer rotate takes about 1.84ms and rotate2 takes about 0.13ms, so about a 14x speed-up. For comparison גלעד ברקן 's implementation took about 0.99ms on my machine.
As I mentioned in the comment of the question, this is the straight-forward dynamic programming method. It takes less than 1 second for s = range(1, 30) and gives optimized solution.
I think the code is self-explained if you known Memoization.
s = range(1, 30)
# s = [6, 2, 1, 7, 4, 3, 9, 5, 3, 1]
n = len(s)
memory = {}
best_f = pow(sum(s), 3)
best_state = None
def search(state, pre_state):
global memory, best_f, best_state
s1, s2, s3, i = state
f = s1 * s1 + s2 * s2 + s3 * s3
if state in memory or f >= best_f:
return
memory[state] = pre_state
if i == n:
best_f = f
best_state = state
else:
search((s1 + s[i], s2, s3, i + 1), state)
search((s1, s2 + s[i], s3, i + 1), state)
search((s1, s2, s3 + s[i], i + 1), state)
search((0, 0, 0, 0), None)
a = [[], [], []]
state = best_state
while state[3] > 0:
pre_state = memory[state]
for j in range(3):
if state[j] != pre_state[j]:
a[j].append(s[pre_state[3]])
state = pre_state
print a
print best_f, best_state, map(sum, a)
We can research the stability of the solution you found with respect to replacing of elements between found lists. Below I placed my code. If we make the target function better by a replacement we keep found lists and go further hoping that we will make the function better again with another replacement. As the starting point we can take your solution. The final result will be something like a local minimum.
from copy import deepcopy
s = [6, 2, 1, 7, 4, 3, 9, 5, 3, 1]
s = sorted(s, reverse=True)
a = [[], [], []]
sum_a = [0, 0, 0]
for x in s:
i = sum_a.index(min(sum_a))
sum_a[i] += x
a[i].append(x)
def f(a):
return ((sum(a[0]) - sum(s) / 3.0)**2 + (sum(a[1]) - sum(s) / 3.0)**2 + (sum(a[2]) - sum(s) / 3.0)**2) / 3
fa = f(a)
while True:
modified = False
# placing
for i_from, i_to in [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]:
for j in range(len(a[i_from])):
a_new = deepcopy(a)
a_new[i_to].append(a_new[i_from][j])
del a_new[i_from][j]
fa_new = f(a_new)
if fa_new < fa:
a = a_new
fa = fa_new
modified = True
break
if modified:
break
# replacing
for i_from, i_to in [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]:
for j_from in range(len(a[i_from])):
for j_to in range(len(a[i_to])):
a_new = deepcopy(a)
a_new[i_to].append(a_new[i_from][j_from])
a_new[i_from].append(a_new[i_to][j_to])
del a_new[i_from][j_from]
del a_new[i_to][j_to]
fa_new = f(a_new)
if fa_new < fa:
a = a_new
fa = fa_new
modified = True
break
if modified:
break
if modified:
break
if not modified:
break
print(a, f(a)) # [[9, 3, 1, 1], [7, 4, 3], [6, 5, 2]] 0.2222222222222222222
It's interesting that this approach works well even if we start with arbitrary a:
from copy import deepcopy
s = [6, 2, 1, 7, 4, 3, 9, 5, 3, 1]
def f(a):
return ((sum(a[0]) - sum(s) / 3.0)**2 + (sum(a[1]) - sum(s) / 3.0)**2 + (sum(a[2]) - sum(s) / 3.0)**2) / 3
a = [s, [], []]
fa = f(a)
while True:
modified = False
# placing
for i_from, i_to in [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]:
for j in range(len(a[i_from])):
a_new = deepcopy(a)
a_new[i_to].append(a_new[i_from][j])
del a_new[i_from][j]
fa_new = f(a_new)
if fa_new < fa:
a = a_new
fa = fa_new
modified = True
break
if modified:
break
# replacing
for i_from, i_to in [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1)]:
for j_from in range(len(a[i_from])):
for j_to in range(len(a[i_to])):
a_new = deepcopy(a)
a_new[i_to].append(a_new[i_from][j_from])
a_new[i_from].append(a_new[i_to][j_to])
del a_new[i_from][j_from]
del a_new[i_to][j_to]
fa_new = f(a_new)
if fa_new < fa:
a = a_new
fa = fa_new
modified = True
break
if modified:
break
if modified:
break
if not modified:
break
print(a, f(a)) # [[3, 9, 2], [6, 7], [4, 3, 1, 1, 5]] 0.2222222222222222222
It provides a different result but the same value of the function.
I would have to say that your greedy function does produce good results but tends to become very slow if input size is large say more than 100.
But, you've said that your input size is fixed in the range - 10,30. Hence the greedy solution is actually quite good.Instead of becoming all too greedy in the beginning itself.I propose to become a bit lazy at first and become greedy at the end.
Here is a altered function lazy :
def lazy(s):
k = (len(s)//3-2)*3 #slice limit
s.sort(reverse=True)
#Perform limited extended slicing
a = [s[1:k:3],s[2:k:3],s[:k:3]]
sum_a = list(map(sum,a))
for x in s[k:]:
i = sum_a.index(min(sum_a))
sum_a[i] += x
a[i].append(x)
return a
What it does is it first sorts the input in descending order and fills items in three sub-lists one-by-one until about 6 items are left.(You can change this limit and test, but for size 10-30 I think this is the best)
When that is done simply continue with the greedy approach.This method takes very less time and more accurate than the greedy solution on average.
Here is a line plot of size versus time -
and size versus accuracy -
Accuracy is the standard deviation from the mean of final sub-lists and the original list. Because you want the columns to stack up at almost similar height and not at the (mean of the original list) height.
Also, the range of item value is between 3-15 so that sum is around 100-150 as you mentioned.
These are the test functions -
def test_accuracy():
rsd = lambda s:round(math.sqrt(sum([(sum(s)//3-y)**2 for y in s])/3),4)
sm = lambda s:list(map(sum,s))
N=[i for i in range(10,30)]
ST=[]
MT=[]
for n in N:
case = [r(3,15) for x in range(n)]
ST.append(rsd(sm(lazy(case))))
MT.append(rsd(sm(pigeon(case))))
strace = go.Scatter(x=N,y=ST,name='Lazy pigeon')
mtrace = go.Scatter(x=N,y=MT,name='Pigeon')
data = [strace,mtrace]
layout = go.Layout(
title='Uniform distribution in 3 sublists',
xaxis=dict(title='List size',),
yaxis=dict(title='Accuracy - Standard deviation',))
fig = go.Figure(data=data, layout=layout)
plotly.offline.plot(fig,filename='N vs A2.html')
def test_timings():
N=[i for i in range(10,30)]
ST=[]
MT=[]
for n in N:
case = [r(3,15) for x in range(n)]
start=time.clock()
lazy(case)
ST.append(time.clock()-start)
start=time.clock()
pigeon(case)
MT.append(time.clock()-start)
strace = go.Scatter(x=N,y=ST,name='Lazy pigeon')
mtrace = go.Scatter(x=N,y=MT,name='Pigeon')
data = [strace,mtrace]
layout = go.Layout(
title='Uniform distribution in 3 sublists',
xaxis=dict(title='List size',),
yaxis=dict(title='Time (seconds)',))
fig = go.Figure(data=data, layout=layout)
plotly.offline.plot(fig,filename='N vs T2.html')
Here is the complete file.
Edit -
I tested kezzos answer for accuracy and it performed really good. The deviation stayed less than .8 all the time.
Average standard deviation in 100 runs.
Lazy Pigeon Pigeon Rotation
1.10668795 1.1573573 0.54776425
In the case of speed, the order is quite high for rotation function to compare. But, 10^-3 is fine unless you want to run that function repeatedly.
Lazy Pigeon Pigeon Rotation
5.384013e-05 5.930269e-05 0.004980
Here is bar chart comparing accuracy of all three functions. -
All in all, kezzos solution is the best if you are fine with the speed.
Html files of plotly - versus time,versus accuracy and the bar chart.
Here's my nutty implementation of Korf's1 Sequential Number Partitioning (SNP), but it only uses Karmarkar–Karp rather than Complete Karmarkar–Karp for the two-way partition (I've included an unused, somewhat unsatisfying version of CKK - perhaps someone has a suggestion to make it more efficient?).
On the first subset, it places lower and upper bounds. See the referenced article. I'm sure more efficient implementations can be made. Edit MAX_ITERATIONS for better results versus longer wait :)
By the way, the function, KK3 (extension of Karmarkar–Karp to three-way partition, used to compute the first lower bound), seems pretty good by itself.
from random import randint
from collections import Counter
from bisect import insort
from time import time
def KK3(s):
s = list(map(lambda x: (x,0,0,[],[],[x]),sorted(s)))
while len(s) > 1:
large = s.pop()
small = s.pop()
combined = sorted([large[0] + small[2], large[1] + small[1],
large[2] + small[0]],reverse=True)
combined = list(map(lambda x: x - combined[2],combined))
combined = combined + sorted((large[3] + small[5], large[4] +
small[4], large[5] + small[3]),key = sum)
insort(s,tuple(combined))
return s
#s = [6, 2, 1, 7, 4, 3, 9, 5, 3, 1]
s = [randint(0,100) for r in range(0,30)]
# global variables
s = sorted(s,reverse=True)
sum_s = sum(s)
upper_bound = sum_s // 3
lower_bound = sum(KK3(s)[0][3])
best = (sum_s,([],[],[]))
iterations = 0
MAX_ITERATIONS = 10000
def partition(i, accum):
global lower_bound, best, iterations
sum_accum = sum(accum)
if sum_accum > upper_bound or iterations > MAX_ITERATIONS:
return
iterations = iterations + 1
if sum_accum >= lower_bound:
rest = KK(diff(s,accum))[0]
new_diff = sum(rest[1]) - sum_accum
if new_diff < best[0]:
best = (new_diff,(accum,rest[1],rest[2]))
lower_bound = (sum_s - 2 * new_diff) // 3
print("lower_bound: " + str(lower_bound))
if not best[0] in [0,1] and i < len(s) - 1 and sum(accum) + sum(s[i
+ 1:]) > lower_bound:
_accum = accum[:]
partition(i + 1, _accum + [s[i]])
partition(i + 1, accum)
def diff(l1,l2):
return list((Counter(l1) - Counter(l2)).elements())
def KK(s):
s = list(map(lambda x: (x,[x],[]),sorted(s)))
while len(s) > 1:
large = s.pop()
small = s.pop()
insort(s,(large[0] - small[0],large[1] + small[2],large[2] + small[1]))
return s
print(s)
start_time = time()
partition(0,[])
print(best)
print("iterations: " + str(iterations))
print("--- %s seconds ---" % (time() - start_time))
1 Richard E. Korf, Multi-Way Number Partitioning, Computer Science Department, University of California, Los Angeles; aaai.org/ocs/index.php/IJCAI/IJCAI-09/paper/viewFile/625/705
For instance, if I have a list
[1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
This algorithm should return [1,2,3,4,5,6,7,8,9,10,11].
To clarify, the longest list should run forwards. I was wondering what is an algorithmically efficient way to do this (preferably not O(n^2))?
Also, I'm open to a solution not in python since the algorithm is what matters.
Thank you.
Here is a simple one-pass O(n) solution:
s = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11,42]
maxrun = -1
rl = {}
for x in s:
run = rl[x] = rl.get(x-1, 0) + 1
print x-run+1, 'to', x
if run > maxrun:
maxend, maxrun = x, run
print range(maxend-maxrun+1, maxend+1)
The logic may be a little more self-evident if you think in terms of ranges instead of individual variables for the endpoint and run length:
rl = {}
best_range = xrange(0)
for x in s:
run = rl[x] = rl.get(x-1, 0) + 1
r = xrange(x-run+1, x+1)
if len(r) > len(best_range):
best_range = r
print list(best_range)
Not that clever, not O(n), could use a bit of optimization. But it works.
def longest(seq):
result = []
for v in seq:
for l in result:
if v == l[-1] + 1:
l.append(v)
else:
result.append([v])
return max(result, key=len)
You can use The Patience Sort implementation of the Largest Ascending Sub-sequence Algorithm
def LargAscSub(seq):
deck = []
for x in seq:
newDeck = [x]
i = bisect.bisect_left(deck, newDeck)
deck[i].insert(0, x) if i != len(deck) else deck.append(newDeck)
return [p[0] for p in deck]
And here is the Test results
>>> LargAscSub([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11])
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>> LargAscSub([1, 2, 3, 11, 12, 13, 14])
[1, 2, 3, 11, 12, 13, 14]
>>> LargAscSub([11,12,13,14])
[11, 12, 13, 14]
The Order of Complexity is O(nlogn)
There was one note in the wiki link where they claimed that you can achieve O(n.loglogn) by relying on Van Emde Boas tree
How about using a modified Radix Sort? As JanneKarila pointed out the solution is not O(n). It uses Radix sort, which wikipedia says Radix sort's efficiency is O(k·n) for n keys which have k or fewer digits.
This will only work if you know the range of numbers that we're dealing with so that will be the first step.
Look at each element in starting list to find lowest, l and highest, h number. In this case l is 1 and h is 11. Note, if you already know the range for some reason, you can skip this step.
Create a result list the size of our range and set each element to null.
Look at each element in list and add them to the result list at the appropriate place if needed. ie, the element is a 4, add a 4 to the result list at position 4. result[element] = starting_list[element]. You can throw out duplicates if you want, they'll just be overwritten.
Go through the result list to find the longest sequence without any null values. Keep a element_counter to know what element in the result list we're looking at. Keep a curr_start_element set to the beginning element of the current sequence and keep a curr_len of how long the current sequence is. Also keep a longest_start_element and a `longest_len' which will start out as zero and be updated as we move through the list.
Return the result list starting at longest_start_element and taking longest_len
EDIT: Code added. Tested and working
#note this doesn't work with negative numbers
#it's certainly possible to write this to work with negatives
# but the code is a bit hairier
import sys
def findLongestSequence(lst):
#step 1
high = -sys.maxint - 1
for num in lst:
if num > high:
high = num
#step 2
result = [None]*(high+1)
#step 3
for num in lst:
result[num] = num
#step 4
curr_start_element = 0
curr_len = 0
longest_start_element = -1
longest_len = -1
for element_counter in range(len(result)):
if result[element_counter] == None:
if curr_len > longest_len:
longest_start_element = curr_start_element
longest_len = curr_len
curr_len = 0
curr_start_element = -1
elif curr_start_element == -1:
curr_start_element = element_counter
curr_len += 1
#just in case the last element makes the longest
if curr_len > longest_len:
longest_start_element = curr_start_element
longest_len = curr_len
#step 5
return result[longest_start_element:longest_start_element + longest_len-1]
If the result really does have to be a sub-sequence of consecutive ascending integers, rather than merely ascending integers, then there's no need to remember each entire consecutive sub-sequence until you determine which is the longest, you need only remember the starting and ending values of each sub-sequence. So you could do something like this:
def longestConsecutiveSequence(sequence):
# map starting values to largest ending value so far
map = collections.OrderedDict()
for i in sequence:
found = False
for k, v in map.iteritems():
if i == v:
map[k] += 1
found = True
if not found and i not in map:
map[i] = i + 1
return xrange(*max(map.iteritems(), key=lambda i: i[1] - i[0]))
If I run this on the original sample date (i.e. [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]) I get:
>>> print list(longestConsecutiveSequence([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
If I run it on one of Abhijit's samples [1,2,3,11,12,13,14], I get:
>>> print list(longestConsecutiveSequence([1,2,3,11,12,13,14]))
[11, 12, 13, 14]
Regrettably, this algorithm is O(n*n) in the worst case.
Warning: This is the cheaty way to do it (aka I use python...)
import operator as op
import itertools as it
def longestSequence(data):
longest = []
for k, g in it.groupby(enumerate(set(data)), lambda(i, y):i-y):
thisGroup = map(op.itemgetter(1), g)
if len(thisGroup) > len(longest):
longest = thisGroup
return longest
longestSequence([1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11, 15,15,16,17,25])
You need the Maximum contiguous sum(Optimal Substructure):
def msum2(a):
bounds, s, t, j = (0,0), -float('infinity'), 0, 0
for i in range(len(a)):
t = t + a[i]
if t > s: bounds, s = (j, i+1), t
if t < 0: t, j = 0, i+1
return (s, bounds)
This is an example of dynamic programming and is O(N)
O(n) solution works even if the sequence does not start from the first element.
Warning does not work if len(A) = 0.
A = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
def pre_process(A):
Last = {}
Arrow = []
Length = []
ArgMax = 0
Max = 0
for i in xrange(len(A)):
Arrow.append(i)
Length.append(0)
if A[i] - 1 in Last:
Aux = Last[A[i] - 1]
Arrow[i] = Aux
Length[i] = Length[Aux] + 1
Last[A[i]] = i
if Length[i] > Max:
ArgMax = i
Max = Length[i]
return (Arrow,ArgMax)
(Arr,Start) = pre_process(A)
Old = Arr[Start]
ToRev = []
while 1:
ToRev.append(A[Start])
if Old == Start:
break
Start = Old
New = Arr[Start]
Old = New
ToRev.reverse()
print ToRev
Pythonizations are welcome!!
Ok, here's yet another attempt in python:
def popper(l):
listHolders = []
pos = 0
while l:
appended = False
item = l.pop()
for holder in listHolders:
if item == holder[-1][0]-1:
appended = True
holder.append((item, pos))
if not appended:
pos += 1
listHolders.append([(item, pos)])
longest = []
for holder in listHolders:
try:
if (holder[0][0] < longest[-1][0]) and (holder[0][1] > longest[-1][1]):
longest.extend(holder)
except:
pass
if len(holder) > len(longest):
longest = holder
longest.reverse()
return [x[0] for x in longest]
Sample inputs and outputs:
>>> demo = list(range(50))
>>> shuffle(demo)
>>> demo
[40, 19, 24, 5, 48, 36, 23, 43, 14, 35, 18, 21, 11, 7, 34, 16, 38, 25, 46, 27, 26, 29, 41, 8, 31, 1, 33, 2, 13, 6, 44, 22, 17,
12, 39, 9, 49, 3, 42, 37, 30, 10, 47, 20, 4, 0, 28, 32, 45, 15]
>>> popper(demo)
[1, 2, 3, 4]
>>> demo = [1,4,2,3,5,4,5,6,7,8,1,3,4,5,9,10,11]
>>> popper(demo)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
>>>
This should do the trick (and is O(n)):
target = 1
result = []
for x in list:
for y in result:
if y[0] == target:
y[0] += 1
result.append(x)
For any starting number, this works:
result = []
for x in mylist:
matched = False
for y in result:
if y[0] == x:
matched = True
y[0] += 1
y.append(x)
if not matched:
result.append([x+1, x])
return max(result, key=len)[1:]