Simple Genetic Algorithm meeting local optimum for "Hello World" - python

My target was simple, using genetic algorithm to reproduce the classical "Hello, World" string.
My code was based on this post. The code mainly contain 4 parts:
Generate the population which has serval different individual
Define the fitness and grade function which evaluate the individual good or bad based on the comparing with target.
Filter the population and leave len(pop)*retain individuals
Add some other individuals and mutate randomly
The parents's DNA will pass over to its children to comprise the whole population.
I modified the code and shows like this:
import numpy as np
import string
from operator import add
from random import random, randint
def population(GENSIZE,target):
p = []
for i in range(0,GENSIZE):
individual = [np.random.choice(list(string.printable[:-5])) for j in range(0,len(target))]
p.append(individual)
return p
def fitness(source, target):
fitval = 0
for i in range(0,len(source)-1):
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
def grade(pop, target):
'Find average fitness for a population.'
summed = reduce(add, (fitness(x, target) for x in pop))
return summed / (len(pop) * 1.0)
def evolve(pop, target, retain=0.2, random_select=0.05, mutate=0.01):
graded = [ (fitness(x, target), x) for x in p]
graded = [ x[1] for x in sorted(graded)]
retain_length = int(len(graded)*retain)
parents = graded[:retain_length]
# randomly add other individuals to
# promote genetic diversity
for individual in graded[retain_length:]:
if random_select > random():
parents.append(individual)
# mutate some individuals
for individual in parents:
if mutate > random():
pos_to_mutate = randint(0, len(individual)-1)
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
#
parents_length = len(parents)
desired_length = len(pop) - parents_length
children = []
while len(children) < desired_length:
male = randint(0, parents_length-1)
female = randint(0, parents_length-1)
if male != female:
male = parents[male]
female = parents[female]
half = len(male) / 2
child = male[:half] + female[half:]
children.append(child)
parents.extend(children)
return parents
GENSIZE = 40
target = "Hello, World"
p = population(GENSIZE,target)
fitness_history = [grade(p, target),]
for i in xrange(20):
p = evolve(p, target)
fitness_history.append(grade(p, target))
# print p
for datum in fitness_history:
print datum
But it seems that the result can't fit targetwell.
I tried to change the GENESIZE and loop time(more generation).
But the result always get stuck. Sometimes, enhance the loop time can help to find a optimum solution. But when I change the loop time to an much larger number like for i in xrange(10000). The result shows the error like:
individual[pos_to_mutate] = chr(ord(individual[pos_to_mutate]) + np.random.randint(-1,1))
ValueError: chr() arg not in range(256)
Anyway, how to modify my code and get an good result.
Any advice would be appreciate.

The chr function in Python2 only accepts values in the range 0 <= i < 256.
You are passing:
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
So you need to check that the result of
ord(individual[pos_to_mutate]) + np.random.randint(-1,1)
is not going to be outside that range, and take corrective action before passing to chr if it is outside that range.
EDIT
A reasonable fix for the ValueError might be to take the amended value modulo 256 before passing to chr:
chr((ord(individual[pos_to_mutate]) + np.random.randint(-1, 1)) % 256)
There is another bug: the fitness calculation doesn't take the final element of the candidate list into account: it should be:
def fitness(source, target):
fitval = 0
for i in range(0,len(source)): # <- len(source), not len(source) -1
fitval += (ord(target[i]) - ord(source[i])) ** 2
return (fitval)
Given that source and target must be of equal length, the function can be written as:
def fitness(source, target):
return sum((ord(t) - ord(s)) ** 2 for (t, s) in zip(target, source))
The real question was, why doesn't the code provided evolve random strings until the target string is reached.
The answer, I believe, is it may, but will take a lot of iterations to do so.
Consider, in the blog post referenced in the question, each iteration generates a child which replaces the least fit member of the gene pool if the child is fitter. The selection of the child's parent is biased towards fitter parents, increasing the likelihood that the child will enter the gene pool and increase the overall "fitness" of the pool. Consequently the members of the gene pool converge on the desired result within a few thousand iterations.
In the code in the question, the probability of mutation is much lower, based on the initial conditions, that is the defaults for the evolve function.
Parents that are retained have only a 1% chance of mutating, and one third of the time the "mutation" will not result in a change (zero is a possible result of random.randint(-1, 1)).
Discard parents are replaced by individuals created by merging two retained individuals. Since only 20% of parents are retained, the population can converge on a local minimum where each new child is effectively a copy of an existing parent, and so no diversity is introduced.
So apart from fixing the two bugs, the way to converge more quickly on the target is to experiment with the initial conditions and to consider changing the code in the question to inject more diversity, for example by mutating children as in the original blog post, or by extending the range of possible mutations.

Related

My code suddenly stops writing at beyond 15500 iterations

I'm studying how to code in Python and I'm trying to recreate a code I did in college.
The code is based on a 2D Ising model applied to epidemiology. What it does is:
it constructs a 2D 100x100 array using numpy, and assigns a value of -1 to each element.
The energy is calculated based on the function calc_h in the script below.
Then, the code randomly selects a cell from the lattice, changes the value to 1, then calculates the energy of the system again.
Then, the code compares if the energy of the system is less than or equal to the previous configuration. If it does, it "accepts" the change. If it isn't, a probability is compared to a random number to determine if the change is "accepted". This part is done in the metropolis function.
The code repeats the process using a while loop until the maximum specified iteration, max_iterations.
-The code tallies the number of elements with a -1 value (which is the s variable) and the number of elements with a 1 value (which is the i variable) in the countSI function. The script appends to a text file every 500 iteratons.
THE PROBLEM
I ran the script and besides taking too long to execute, the tallying stops at 15500. The code doesn't throw any error, but it just keeps going. I waited for around 3 hours for it to finish but it still goes only up to 15500 iterations.
I've tried commenting out the writing to csv block and instead printing the values first to observe it as it happens, and there I see, it stops at 15500 again.
I have no idea what's wrong as it doesn't throw in any error, and the code doesn't stop.
Here's the whole script. I put a description of what the part does below each block:
import numpy as np
import random as r
import math as m
import csv
init_size = input("Input array size: ")
size = int(init_size)
this part initializes the size of the 2D array. For observation purposes, I selected a 100 by 100 latice.
def check_overflow(indx, size):
if indx == size - 1:
return -indx
else:
return 1
I use this function for the calc_h function, to initialize a circular boundary condition. Simply put, the edges of the lattice are connected to one another.
def calc_h(pop, J1, size):
h_sum = 0
r = 0
c = 0
while r < size:
buffr = check_overflow(r, size)
while c < size:
buffc = check_overflow(c, size)
h_sum = h_sum + J1*pop[r,c] * pop[r,c-1] * pop[r-1,c] * pop[r+buffr,c] * pop[r,c+buffc]
c = c + 1
c = 0
r = r + 1
return h_sum
this function calculates the energy of the system by taking the sum of the product of the value of a cell, its top, bottom, left and right neighbors, multiplied to a constant J.
def metropolis(h, h0, T_):
if h <= h0:
return 1
else:
rand_h = r.random()
p = m.exp(-(h - h0)/T_)
if rand_h <= p:
return 1
else:
return 0
This determines whether the change from -1 to 1 is accepted depending on what calc_h gets.
def countSI(pop, sz, iter):
s = np.count_nonzero(pop == -1)
i = np.count_nonzero(pop == 1)
row = [iter, s, i]
with open('data.txt', 'a') as si_csv:
tally_data = csv.writer(si_csv)
tally_data.writerow(row)
si_csv.seek(0)
This part tallies the number of -1's and 1's in the lattice.
def main():
J = 1
T = 4.0
max_iterations = 150000
population = np.full((size, size), -1, np.int8) #initialize population array
The 2D array is initialized in population.
h_0 = calc_h(population, J, size)
turn = 1
while turn <= max_iterations:
inf_x = r.randint(1,size) - 1
inf_y = r.randint(1,size) - 1
while population[inf_x,inf_y] == 1:
inf_x = r.randint(1,size) - 1
inf_y = r.randint(1,size) - 1
population[inf_x, inf_y] = 1
h = calc_h(population, J, size)
accept_i = metropolis(h,h_0,T)
This is the main loop, where a random cell is selected, and whether the change is accepted or not is determined by the function metropolis.
if (accept_i == 0):
population[inf_x, inf_y] = -1
if turn%500 == 0 :
countSI(population, size, turn)
The script tallies every 500th iteration.
turn = turn + 1
h_0 = h
main()
The expected output is a text file with the tallies of the number of the s and i every 500th iteration. something that looks like this:
500,9736,264
1000,9472,528
1500,9197,803
2000,8913,1087
2500,8611,1389
3000,8292,1708
3500,7968,2032
4000,7643,2357
4500,7312,2688
5000,6960,3040
5500,6613,3387
6000,6257,3743
6500,5913,4087
7000,5570,4430
7500,5212,4788
I have no idea where to start at a solution. At first, I thought it was the writing to csv that's causing the problem, but probing through the print function proves otherwise. I tried to make it as concise as I can.
I hope you guys can help! I really wanna learn this language and start simulating a lot of stuff, and I think this mini project is a great starting step for me.
Thanks a lot!
Answer provided by #randomir in the comments:
Your code is probably wrong. It will block in that nested while loop whenever the number of spins to flip is smaller than the number of iterations. In your example from the previous comment, the size of the population is 10000 and you want to flip 15500 spins. Note that once spin is flipped up (with 100% prob), it will be flipped down with smaller prob, due to metropolis sampling.
works.

genetic algorithm, implementation of generation of new population

I have this problem with the implementation of the generation of a new population. I have a population of matrices, I defined a fitness function and I need the value of fitness to be as low as possible. So I implemented the function to re-create the population, maintaining only the best individuals from the one before, as it follows:
def new_generation(old_generation, real_coords, elit_rate = ELIT, mutation_rate = MUTATION_RATE, half = HALF, mtype = "strong"):
fit = [fitness(individual, real_coords) for individual in old_generation]
idx = np.argsort(fit)
print(idx)
new_gen = []
for i in range(n_population):
if i < ELIT:
new_gen.append(old_generation[idx[i]])
else:
new_gen.append(crossover(old_generation[idx[np.random.randint(0,n_population)]], old_generation[idx[np.random.randint(0,n_population)]]))
for i in range(n_population):
if np.random.rand() < mutation_rate:
if i > ELIT:
new_gen[i] = mutate(new_gen[i],mtype)
print("new gen")
print([fitness(individual, real_coords) for individual in new_gen])
return new_gen
My problem is that the new generation I get is not really ordered with the first ones with the lowest fitness possible!
(ELIT is 10 and n_population is 100)
I think my problem is in the part where I do:
for i in range(n_population):
if i < ELIT:
new_gen.append(old_generation[idx[i]])
because in my head this should guarantee me to have the first individual in new_gen as I desire.
Where am I wrong?
Thank you!

Calculate height of an arbitrary (non-binary) tree

I'm currently taking on online data structures course and this is one of the homework assignments; please guide me towards the answer rather than giving me the answer.
The prompt is as follows:
Task. You are given a description of a rooted tree. Your task is to compute and output its height. Recall that the height of a (rooted) tree is the maximum depth of a node, or the maximum distance from a leaf to the root. You are given an arbitrary tree, not necessarily a binary tree.
Input Format. The first line contains the number of nodes n. The second line contains integer numbers from −1 to n−1 parents of nodes. If the i-th one of them (0 ≤ i ≤ n−1) is −1, node i is the root, otherwise it’s 0-based index of the parent of i-th node. It is guaranteed that there is exactly one root. It is guaranteed that the input represents a tree.
Constraints. 1 ≤ n ≤ 105.
My current solution works, but is very slow when n > 102. Here is my code:
# python3
import sys
import threading
# In Python, the default limit on recursion depth is rather low,
# so raise it here for this problem. Note that to take advantage
# of bigger stack, we have to launch the computation in a new thread.
sys.setrecursionlimit(10**7) # max depth of recursion
threading.stack_size(2**27) # new thread will get stack of such size
threading.Thread(target=main).start()
# returns all indices of item in seq
def listOfDupes(seq, item):
start = -1
locs = []
while True:
try:
loc = seq.index(item, start+1)
except:
break
else:
locs.append(loc)
start = loc
return locs
def compute_height(node, parents):
if node not in parents:
return 1
else:
return 1 + max(compute_height(i, parents) for i in listOfDupes(parents, node))
def main():
n = int(input())
parents = list(map(int, input().split()))
print(compute_height(parents.index(-1), parents))
Example input:
>>> 5
>>> 4 -1 4 1 1
This will yield a solution of 3, because the root is 1, 3 and 4 branch off of 1, then 0 and 2 branch off of 4 which gives this tree a height of 3.
How can I improve this code to get it under the time benchmark of 3 seconds? Also, would this have been easier in another language?
Python will be fine as long as you get the algorithm right. Since you're only looking for guidance, consider:
1) We know the depth of a node iif the depth of its parent is known; and
2) We're not interested in the tree's structure, so we can throw irrelevant information away.
The root node pointer has the value -1. Suppose that we replaced its children's pointers to the root node with the value -2, their children's pointers with -3, and so forth. The greatest absolute value of these is the height of the tree.
If we traverse the tree from an arbitrary node N(0) we can stop as soon as we encounter a negative value at node N(k), at which point we can replace each node with the value of its parent, less one. I.e, N(k-1) = N(k) -1, N(k-2)=N(k-1) - 1... N(0) = N(1) -1. As more and more pointers are replaced by their depth, each traversal is more likely to terminate by encountering a node whose depth is already known. In fact, this algorithm takes basically linear time.
So: load your data into an array, start with the first element and traverse the pointers until you encounter a negative value. Build another array of the nodes traversed as you go. When you encounter a negative value, use the second array to replace the original values in the first array with their depth. Do the same with the second element and so forth. Keep track of the greatest depth you've encountered: that's your answer.
The structure of this question looks like it would be better solved bottom up rather than top down. Your top-down approach spends time seeking, which is unnecessary, e.g.:
def height(tree):
for n in tree:
l = 1
while n != -1:
l += 1
n = tree[n]
yield l
In []:
tree = '4 -1 4 1 1'
max(height(list(map(int, tree.split()))))
Out[]:
3
Or if you don't like a generator:
def height(tree):
d = [1]*len(tree)
for i, n in enumerate(tree):
while n != -1:
d[i] += 1
n = tree[n]
return max(d)
In []:
tree = '4 -1 4 1 1'
height(list(map(int, tree.split())))
Out[]:
3
The above is brute force as it doesn't take advantage of reusing parts of the tree you've already visited, it shouldn't be too hard to add that.
Your algorithm spends a lot of time searching the input for the locations of numbers. If you just iterate over the input once, you can record the locations of each number as you come across them, so you don't have to keep searching over and over later. Consider what data structure would be effective for recording this information.

Please suggest a better solution to an optimization puzzle

Please find below a problem, its solution and its working implementation. The solution below has a time complexity of O(n!) (Please correct me if I am wrong).
My Question:
1)Please suggest a solution with better time complexity. Given it's an optimization problem, Dynamic programming or memoization seems like a better option. Also please provide an analysis justifying the time complexity of your solution. Thanks!
Problem:
A pipe company produces pipes of fixed length n. It gets orders of k number of pipes each with length between (0,n] every day. Write an algorithm that will help the company fulfill the orders using minimum number of fixed length of pipes.
Solution1:
For k orders, consider all permutations. For each permutation, greedily compute cost. Pick permutation with minimum cost.
We need two data structures: 1) Order: use list 2) Cost: list containing all pipes used where value is the remaining length of pipe.
If we used a single pipe of length n fully, Then the data structure representing cost is [0].
#IMPLEMENTATION OF Solution1
import itertools
n = 10
def fulfill_element_greedily(pipes_used, order):
eligible_pipes = filter(lambda x : x - order >= 0, pipes_used)
if len(eligible_pipes) == 0:
new_pipe_used = n-order
else:
eligible_pipes.sort(reverse=True)
new_pipe_used = eligible_pipes[-1] - order
pipes_used.remove(eligible_pipes[-1])
return pipes_used + [new_pipe_used]
def cost_for_greedy_fulfill(orders):
pipes_used = []
for order in orders:
pipes_used = fulfill_element_greedily(pipes_used, order)
return len(pipes_used)
def min_cost(orders):
if(any(map(lambda x : x > n,orders))):
print "Orders %s" % str(orders)
raise ValueError("Invalid orders")
return min(map(cost_for_greedy_fulfill,itertools.permutations(orders))) if len(orders)!=0 else 0
def test():
assert 0 == min_cost([])
assert 1 == min_cost([1])
assert 1 == min_cost([5])
assert 1 == min_cost([10])
assert 2 == min_cost([10,2])
assert 2 == min_cost([2,9,7])
assert 2 == min_cost([1,7,9,3])
return "tests passed"
print test()
There is a Dynamic Programming algorithm of Complexity O(k*(2^k)) as follows:
define a state that contains the following 2 members:
struct State {
int minPipe; // minimum number of pipes
int remainingLen; // remaining length of last pipe
}
We say state a is better than state b if and only if
(a.minPipe < b.minPipe) or (a.minPipe == b.minPipe && a.remainingLen > b.remainingLen)
The problem itself can be divided into 2^k states:
State states[2^k]
where states[i] represents the optimal state (minimum number of pipe which maximum remaining length of last pipe) that already produce pipe x, (1<=x<=k), where the x-th bit in the binary representation of i is set.
for example,
states[0]: inital state, no pipe produced
states[1]: optimal state with only the 1st pipe produced
states[2]: optimal state with only the 2nd pipe produced
states[3]: optimal state with the 1st and 2nd pipe produced
states[4]: optimal state with only the 3rd pipe produced
...
By processing all the previous state for each state x:
states[x] = best state transit from all possible previous state y, where x > y and there's only 1 bit difference between the binary representation of x and y.
The final answer is from state[2^k - 1].minPipe;
Complexity: each state has at most (k-1) previous states, there're 2^k states so the final complexity is O(k * 2^k) which is less than O(k!)

Subset sum for large sums

The subset sum problem is well-known for being NP-complete, but there are various tricks to solve versions of the problem somewhat quickly.
The usual dynamic programming algorithm requires space that grows with the target sum. My question is: can we reduce this space requirement?
I am trying to solve a subset sum problem with a modest number of elements but a very large target sum. The number of elements is too large for the exponential time algorithm (and shortcut method) and the target sum is too large for the usual dynamic programming method.
Consider this toy problem that illustrates the issue. Given the set A = [2, 3, 6, 8] find the number of subsets that sum to target = 11 . Enumerating all subsets we see the answer is 2: (3, 8) and (2, 3, 6).
The dynamic programming solution gives the same result, of course - ways[11] returns 2:
def subset_sum(A, target):
ways = [0] * (target + 1)
ways[0] = 1
ways_next = ways[:]
for x in A:
for j in range(x, target + 1):
ways_next[j] += ways[j - x]
ways = ways_next[:]
return ways[target]
Now consider targeting the sum target = 1100 the set A = [200, 300, 600, 800]. Clearly there are still 2 solutions: (300, 800) and (200, 300, 600). However, the ways array has grown by a factor of 100.
Is it possible to skip over certain weights when filling out the dynamic programming storage array? For my example problem I could compute the greatest common denominator of the input set and then reduce all items by that constant, but this won't work for my real application.
This SO question is related, but those answers don't use the approach I have in mind. The second comment by Akshay on this page says:
...in the cases where n is very small (eg. 6) and sum is very large
(eg. 1 million) then the space complexity will be too large. To avoid
large space complexity n HASHTABLES can be used.
This seems closer to what I'm looking for, but I can't seem to actually implement the idea. Is this really possible?
Edited to add: A smaller example of a problem to solve. There is 1 solution.
target = 5213096522073683233230240000
A = [2316931787588303659213440000,
1303274130518420808307560000,
834095443531789317316838400,
579232946897075914803360000,
425558899761116998631040000,
325818532629605202076890000,
257436865287589295468160000,
208523860882947329329209600,
172333769324749858949760000,
144808236724268978700840000,
123386899930738064691840000,
106389724940279249657760000,
92677271503532146368537600,
81454633157401300519222500,
72153585080604612224640000,
64359216321897323867040000,
57762842349846905631360000,
52130965220736832332302400,
47284322195679666514560000,
43083442331187464737440000,
39418499221729173786240000,
36202059181067244675210000,
33363817741271572692673536,
30846724982684516172960000,
28604096143065477274240000,
26597431235069812414440000,
24794751591313594450560000,
23169317875883036592134400,
21698632766175580575360000,
20363658289350325129805625,
19148196591638873216640000,
18038396270151153056160000,
17022355990444679945241600]
A real problem is:
target = 262988806539946324131984661067039976436265064677212251086885351040000
A = [116883914017753921836437627140906656193895584300983222705282378240000,
65747201634986581032996165266759994109066266169303062771721337760000,
42078209046391411861117545770726396229802410348353960173901656166400,
29220978504438480459109406785226664048473896075245805676320594560000,
21468474003260924418937523352411426647858372626711204170357987840000,
16436800408746645258249041316689998527266566542325765692930334440000,
12987101557528213537381958571211850688210620477887024745031375360000,
10519552261597852965279386442681599057450602587088490043475414041600,
8693844844295746252297013588993057072273225278585528961549928960000,
7305244626109620114777351696306666012118474018811451419080148640000,
6224587137040149683597270084426981690799173128454727836375984640000,
5367118500815231104734380838102856661964593156677801042589496960000,
4675356560710156873457505085636266247755823372039328908211295129600,
4109200102186661314562260329172499631816641635581441423232583610000,
3639983481521748430892521260443459881470796742937193786669693440000,
3246775389382053384345489642802962672052655119471756186257843840000,
2914003396564502206448583502127866774917064428556368433095682560000,
2629888065399463241319846610670399764362650646772122510868853510400,
2385386000362324935437502594712380738650930291856800463373109760000,
2173461211073936563074253397248264268068306319646382240387482240000,
1988573206351200938616141104476672789688204647842814753019927040000,
1826311156527405028694337924076666503029618504702862854770037160000,
1683128361855656474444701830829055849192096413934158406956066246656,
1556146784260037420899317521106745422699793282113681959093996160000,
1443011284169801504153550952356872298690068941987447193892375040000,
1341779625203807776183595209525714165491148289169450260647374240000,
1250838556670374906691960338012080744048823137584838292922165760000,
1168839140177539218364376271409066561938955843009832227052823782400,
1094646437211014876720019400903392201607763016346356924399106560000,
1027300025546665328640565082293124907954160408895360355808145902500,
965982760477305139144112620999228563585913919842836551283325440000,
909995870380437107723130315110864970367699185734298446667423360000,
858738960130436976757500934096457065914334905068448166814319513600,
811693847345513346086372410700740668013163779867939046564460960000,
768411414287644482489363509326632509674989232073666182868912640000,
728500849141125551612145875531966693729266107139092108273920640000,
691620793004461075955252231602997965644352569828303092930664960000,
657472016349865810329961652667599941090662661693030627717213377600,
625791330255672395317036671188673352614551016483550865168079360000,
596346500090581233859375648678095184662732572964200115843277440000,
568931977371436071675467087219123799753953628290345594563299840000,
543365302768484140768563349312066067017076579911595560096870560000,
519484062301128541495278342848474027528424819115480989801255014400,
497143301587800234654035276119168197422051161960703688254981760000,
476213321032044045508347054897310957784092466595223632570186240000,
456577789131851257173584481019166625757404626175715713692509290000,
438132122515529069774235170457376054037925971973698044293020160000,
420782090463914118611175457707263962298024103483539601739016561664,
404442609057972047876946806715939986830088526993021531852188160000,
389036696065009355224829380276686355674948320528420489773499040000,
374494562534633427030238036407319297168052779889230688624970240000,
360752821042450376038387738089218074672517235496861798473093760000,
347753793771829850091880543559722282890929011143421158461997158400,
335444906300951944045898802381428541372787072292362565161843560000,
323778155173833578494287055791985197213007158728485381455075840000,
312709639167593726672990084503020186012205784396209573230541440000,
302199145693704480473409550206308504954053507241841138853071360000,
292209785044384804591094067852266640484738960752458056763205945600,
282707666261699891568916593460940582033071824431295083135592960000,
273661609302753719180004850225848050401940754086589231099776640000,
265042888929147215048611399412486748738992254650755607041456640000,
256825006386666332160141270573281226988540102223840088952036475625,
248983485481605987343890803377079267631966925138189113455039385600,
241495690119326284786028155249807140896478479960709137820831360000,
234340660761814501342824380545368657996226388663143017230461440000,
227498967595109276930782578777716242591924796433574611666855840000,
220952578483466770957349011608519198854244960871423861446658560000,
214684740032609244189375233524114266478583726267112041703579878400,
208679870295533683104133831435857945991878646837700655494453760000,
202923461836378336521593102675185167003290944966984761641115240000,
197401994025105141026072179446079922264038329650750423033879040000,
192102853571911120622340877331658127418747308018416545717228160000,
187014262428406274938300203425450649910232934881573156328451805184,
182125212285281387903036468882991673432316526784773027068480160000,
177425404985627474536673746714144021883127046501745489011223040000,
172905198251115268988813057900749491411088142457075773232666240000,
168555556186474170249629649778586749838977769381324948621621760000,
164368004087466452582490413166899985272665665423257656929303344400]
In the particular comment you linked to, the suggestion is to use a hashtable to only store values which actually arise as a sum of some subset. In the worst case, this is exponential in the number of elements, so it is basically equivalent to the brute force approach you already mentioned and ruled out.
In general, there are two parameters to the problem - the number of elements in the set and the size of the target sum. Naive brute force is exponential in the first, while the standard dynamic programming solution is exponential in the second. This works well when one of the parameters is small, but you already indicated that both parameters are too big for an exponential solution. Therefore, you are stuck with the "hard" general case of the problem.
Most NP-Complete problems have some underlying graph whether implicit or explicit. Using graph partitioning and DP, it can be solved exponential in the treewidth of the graph but only polynomial in the size of the graph with treewidth held constant. Of course, without access to your data, it is impossible to say what the underlying graph might look like or whether it is in one of the classes of graphs that have bounded treewidths and hence can be solved efficiently.
Edit: I just wrote the following code to show what I meant by reducing it mod small numbers. The following code solves your first problem in less than a second, but it doesn't work on the larger problem (though it does reduce it to n=57, log(t)=68).
target = 5213096522073683233230240000
A = [2316931787588303659213440000,
1303274130518420808307560000,
834095443531789317316838400,
579232946897075914803360000,
425558899761116998631040000,
325818532629605202076890000,
257436865287589295468160000,
208523860882947329329209600,
172333769324749858949760000,
144808236724268978700840000,
123386899930738064691840000,
106389724940279249657760000,
92677271503532146368537600,
81454633157401300519222500,
72153585080604612224640000,
64359216321897323867040000,
57762842349846905631360000,
52130965220736832332302400,
47284322195679666514560000,
43083442331187464737440000,
39418499221729173786240000,
36202059181067244675210000,
33363817741271572692673536,
30846724982684516172960000,
28604096143065477274240000,
26597431235069812414440000,
24794751591313594450560000,
23169317875883036592134400,
21698632766175580575360000,
20363658289350325129805625,
19148196591638873216640000,
18038396270151153056160000,
17022355990444679945241600]
import itertools, time
from fractions import gcd
def gcd_r(seq):
return reduce(gcd, seq)
def miniSolve(t, vals):
vals = [x for x in vals if x and x <= t]
for k in range(len(vals)):
for sub in itertools.combinations(vals, k):
if sum(sub) == t:
return sub
return None
def tryMod(n, state, answer):
t, vals, mult = state
mods = [x%n for x in vals if x%n]
if (t%n or mods) and sum(mods) < n:
print 'Filtering with', n
print t.bit_length(), len(vals)
else:
return state
newvals = list(vals)
tmod = t%n
if not tmod:
for x in vals:
if x%n:
newvals.remove(x)
else:
if len(set(mods)) != len(mods):
#don't want to deal with the complexity of multisets for now
print 'skipping', n
else:
mini = miniSolve(tmod, mods)
if mini is None:
return None
mini = set(mini)
for x in vals:
mod = x%n
if mod:
if mod in mini:
t -= x
answer.add(x*mult)
newvals.remove(x)
g = gcd_r(newvals + [t])
t = t//g
newvals = [x//g for x in newvals]
mult *= g
return (t, newvals, mult)
def solve(t, vals):
answer = set()
mult = 1
for d in itertools.count(2):
if not t:
return answer
elif not vals or t < min(vals):
return None #no solution'
res = tryMod(d, (t, vals, mult), answer)
if res is None:
return None
t, vals, mult = res
if len(vals) < 23:
break
if (d % 10000) == 0:
print 'd', d
#don't want to deal with the complexity of multisets for now
assert(len(set(vals)) == len(vals))
rest = miniSolve(t, vals)
if rest is None:
return None
answer.update(x*mult for x in rest)
return answer
start_t = time.time()
answer = solve(target, A)
assert(answer <= set(A) and sum(answer) == target)
print answer

Categories