I'm writing a program, and the goal is to take a list of numbers and return all the six-letter combinations for it using a recursive function (without importing a function to do it for me). Say, for example, my numbers are "1 2 3 4 5 6 7 8 9", output would be:
1 2 3 4 5 6
1 2 3 4 5 7
1 2 3 4 5 8
1 2 3 4 5 9
1 2 3 4 6 7
1 2 3 4 6 8
1 2 3 4 6 9
1 2 3 4 7 8
... etcetera, all the way down to
4 5 6 7 8 9
I'm not looking for code, persay, just a push in the right direction conceptually. What I've attempted thus far has failed and I've driven myself into a logical rut.
I've included the code I used before below, but it isn't really a recursive function and only seems to work for 6-8-digit values. It's very messy, and I'd be fine with scrapping it entirely:
# Function prints all the possible 6-number combinations for a group of numbers
def lotto(constantnumbers, variablenumbers):
# Base case: No more constant variables, or only 6 numbers to begin with
if len(constantnumbers) == 0 or len(variablenumbers) == 0:
if len(constantnumbers) == 0:
print(" ".join(variablenumbers[1:7]))
else:
print(" ".join(constantnumbers[0:6]))
i = 6 - len(constantnumbers)
outvars = variablenumbers[1:i + 1]
if len(variablenumbers) > len(outvars) + 1:
print(" ".join(constantnumbers + outvars))
for index in range(len(outvars), 0, -1):
outvars[index - 1] = variablenumbers[index + 1]
print(" ".join(constantnumbers + outvars))
else:
i = 6 - len(constantnumbers)
outvars = variablenumbers[1:i + 1]
print(" ".join(constantnumbers + outvars))
if len(variablenumbers) > len(outvars) + 1:
for index in range(len(outvars), 0, -1):
outvars[index - 1] = variablenumbers[index + 1]
print(" ".join(constantnumbers + outvars))
#Reiterates the function until there are no more constant numbers
lotto(constantnumbers[0:-1], constantnumbers[-1:] + variablenumbers)
import itertools
for combo in itertools.combinations(range(1,10), 6):
print(" ".join(str(c) for c in combo))
which gives
1 2 3 4 5 6
1 2 3 4 5 7
1 2 3 4 5 8
...
3 4 6 7 8 9
3 5 6 7 8 9
4 5 6 7 8 9
Edit: ok, here is a recursive definition:
def combinations(basis, howmany):
for index in range(0, len(basis) - howmany + 1):
if howmany == 1:
yield [basis[index]]
else:
this, remainder = basis[index], basis[index+1:]
for rest in combinations(remainder, howmany - 1):
yield [this] + rest
Edit2:
Base case: A 1-item combination is any basis item.
Induction: An N-item combination is any basis item plus an (N-1)-item combination from the remaining basis.
If anyone can help with improving the runtime that would be great!
I have a truck that has a max capacity of C and a beginning stock on it of S1 The truck goes through a fixed route Depot --> 1 --> 2 --> ... --> N-1 --> N --> Depot
Each station i=1…n has a current stock items of Xi and the objective stock items of Xi* At each station the truck can decide to drop-off or take the amount of items possible according to the situation. Let Yi be the number of items left after the truck visited station i The total cost is TC (as written in the code).
I implemented a dynamic programming code whereas xd is the number of units taken or dropped at each station and s is the number of items on the truck:
run on -min(c-s,xi)<= xd <= s: f(i,s) = f(i+1, s-xd) - so if xd is in minus it means the truck took items from a station.
this is the code - the problem is that it's running for days and not returning an answer.
anyone know a way to implement it better?
n = 50
c=10
s1 = 6
xi = [59,33,14,17,26,31,91,68,3,53,53,73,86,24,98,37,55,14,97,61,57,23,65,24,50,31,39,31,24,60,92,80,48,28,47,81,19,82,3,74,50,89,86,37,98,11,12,94,6,61]
x_star = [35,85,51,88,44,20,79,68,97,7,68,19,50,19,42,45,8,9,61,60,80,4,96,57,100,22,2,51,56,100,6,84,96,69,18,31,86,6,39,6,78,73,14,45,100,43,89,4,76,70]
c_plus = [4.6,1.3,2.7,0.5,2.7,5,2.7,2.6,4.1,4,3.2,3.1,4.8,3.1,0.8,1,0.5,5,5,4.6,2.5,4.1,2.1,2.9,1.4,3.9,0.5,1.7,4.9,0.6,2.8,4.9,3.3,4.7,3.6,2.4,3.4,1.5,1.2,0.5,4.3,4.3,3.9,4.8,1.2,4.8,2,2.2,5,4.5]
c_minus = [8.7,7.5,11.7,6.9,11.7,14.4,7.5,11.1,1.2,1.5,12,8.1,2.7,8.7,9.3,1.5,0.3,1.5,1.2,12.3,5.7,0.6,8.7,8.1,0.6,3.9,0.3,5.4,14.7,0,10.8,6.6,8.4,9.9,14.7,2.7,1.2,10.5,9.3,14.7,11.4,5.4,6,13.2,3.6,7.2,3,4.8,9,8.1]
dict={}
values={}
def tc(i,xd):
yi = xi[i-1] + xd
if yi>=x_star[i-1]:
tc = c_plus[i-1]*(yi-x_star[i-1])
else:
tc = c_minus[i-1]*(x_star[i-1]-yi)
return tc
def func(i,s):
if i==n+1:
return 0
else:
a=[]
b=[]
start = min(c-s,xi[i-1])*-1
for xd in range(start,s+1):
cost = tc(i,xd)
f= func(i+1,s-xd)
a.append(cost+f)
b.append(xd)
min_cost = min(a)
index = a.index(min_cost)
xd_optimal = b[index]
if i in values:
if values[i]>min_cost:
dict[i] = xd_optimal
values[i] = min_cost
else:
values[i] = min_cost
dict[i] = xd_optimal
return min_cost
best_cost = func(1,s1)
print best_cost
print dict
First, the solution:
The function is called very often with exactly the same parameters. Thus, I added a cache that avoids repeating the calculations for recurring parameter sets. This returns the answer almost instantly on my computer.
cache = {}
def func(i,s):
if i==n+1:
return 0
else:
try:
return cache[(i,s)]
except KeyError:
pass
a=[]
...
cache[(i,s)] = min_cost
return min_cost
And here is how I found out what to do...
I modified your code to produce some debug output:
...
count = 0
def func(i,s):
global count
count += 1
print count, ':', i, s
...
Setting n to 2 results in the following output:
1 : 1 6
2 : 2 10
3 : 3 10
4 : 3 9
5 : 3 8
6 : 3 7
7 : 3 6
8 : 3 5
9 : 3 4
10 : 3 3
11 : 3 2
12 : 3 1
13 : 3 0
14 : 2 9
15 : 3 10
16 : 3 9
17 : 3 8
18 : 3 7
19 : 3 6
20 : 3 5
21 : 3 4
22 : 3 3
23 : 3 2
24 : 3 1
25 : 3 0
26 : 2 8
27 : 3 10
28 : 3 9
29 : 3 8
30 : 3 7
31 : 3 6
32 : 3 5
...
You will notice that the function is called very often with the same set of parameters.
After (i=2, s=10) it runs through all combinations of (i=3, s=x). It does that again after (i=2, s=9). The whole thing finishes after 133 recursions. Setting n=3 takes 1464 recursions, and setting n=4 takes 16105 recursions. You can see where that leads to...
Remark: I have absolutely no idea how your optimization works. Instead I simply treated the symptoms :)
I have two arrays (a and b) with n integer elements in the range (0,N).
typo: arrays with 2^n integers where the largest integer takes the value N = 3^n
I want to calculate the sum of every combination of elements in a and b (sum_ij_ = a_i_ + b_j_ for all i,j). Then take modulus N (sum_ij_ = sum_ij_ % N), and finally calculate the frequency of the different sums.
In order to do this fast with numpy, without any loops, I tried to use the meshgrid and the bincount function.
A,B = numpy.meshgrid(a,b)
A = A + B
A = A % N
A = numpy.reshape(A,A.size)
result = numpy.bincount(A)
Now, the problem is that my input arrays are long. And meshgrid gives me MemoryError when I use inputs with 2^13 elements. I would like to calculate this for arrays with 2^15-2^20 elements.
that is n in the range 15 to 20
Is there any clever tricks to do this with numpy?
Any help will be highly appreciated.
--
jon
try chunking it. your meshgrid is an NxN matrix, block that up to 10x10 N/10xN/10 and just compute 100 bins, add them up at the end. this only uses ~1% as much memory as doing the whole thing.
Edit in response to jonalm's comment:
jonalm: N~3^n not n~3^N. N is max element in a and n is number of
elements in a.
n is ~ 2^20. If N is ~ 3^n then N is ~ 3^(2^20) > 10^(500207).
Scientists estimate (http://www.stormloader.com/ajy/reallife.html) that there are only around 10^87 particles in the universe. So there is no (naive) way a computer can handle an int of size 10^(500207).
jonalm: I am however a bit curios about the pv() function you define. (I
do not manage to run it as text.find() is not defined (guess its in another
module)). How does this function work and what is its advantage?
pv is a little helper function I wrote to debug the value of variables. It works like
print() except when you say pv(x) it prints both the literal variable name (or expression string), a colon, and then the variable's value.
If you put
#!/usr/bin/env python
import traceback
def pv(var):
(filename,line_number,function_name,text)=traceback.extract_stack()[-2]
print('%s: %s'%(text[text.find('(')+1:-1],var))
x=1
pv(x)
in a script you should get
x: 1
The modest advantage of using pv over print is that it saves you typing. Instead of having to
write
print('x: %s'%x)
you can just slap down
pv(x)
When there are multiple variables to track, it's helpful to label the variables.
I just got tired of writing it all out.
The pv function works by using the traceback module to peek at the line of code
used to call the pv function itself. (See http://docs.python.org/library/traceback.html#module-traceback) That line of code is stored as a string in the variable text.
text.find() is a call to the usual string method find(). For instance, if
text='pv(x)'
then
text.find('(') == 2 # The index of the '(' in string text
text[text.find('(')+1:-1] == 'x' # Everything in between the parentheses
I'm assuming n ~ 3^N, and n~2**20
The idea is to work module N. This cuts down on the size of the arrays.
The second idea (important when n is huge) is to use numpy ndarrays of 'object' type because if you use an integer dtype you run the risk of overflowing the size of the maximum integer allowed.
#!/usr/bin/env python
import traceback
import numpy as np
def pv(var):
(filename,line_number,function_name,text)=traceback.extract_stack()[-2]
print('%s: %s'%(text[text.find('(')+1:-1],var))
You can change n to be 2**20, but below I show what happens with small n
so the output is easier to read.
n=100
N=int(np.exp(1./3*np.log(n)))
pv(N)
# N: 4
a=np.random.randint(N,size=n)
b=np.random.randint(N,size=n)
pv(a)
pv(b)
# a: [1 0 3 0 1 0 1 2 0 2 1 3 1 0 1 2 2 0 2 3 3 3 1 0 1 1 2 0 1 2 3 1 2 1 0 0 3
# 1 3 2 3 2 1 1 2 2 0 3 0 2 0 0 2 2 1 3 0 2 1 0 2 3 1 0 1 1 0 1 3 0 2 2 0 2
# 0 2 3 0 2 0 1 1 3 2 2 3 2 0 3 1 1 1 1 2 3 3 2 2 3 1]
# b: [1 3 2 1 1 2 1 1 1 3 0 3 0 2 2 3 2 0 1 3 1 0 0 3 3 2 1 1 2 0 1 2 0 3 3 1 0
# 3 3 3 1 1 3 3 3 1 1 0 2 1 0 0 3 0 2 1 0 2 2 0 0 0 1 1 3 1 1 1 2 1 1 3 2 3
# 3 1 2 1 0 0 2 3 1 0 2 1 1 1 1 3 3 0 2 2 3 2 0 1 3 1]
wa holds the number of 0s, 1s, 2s, 3s in a
wb holds the number of 0s, 1s, 2s, 3s in b
wa=np.bincount(a)
wb=np.bincount(b)
pv(wa)
pv(wb)
# wa: [24 28 28 20]
# wb: [21 34 20 25]
result=np.zeros(N,dtype='object')
Think of a 0 as a token or chip. Similarly for 1,2,3.
Think of wa=[24 28 28 20] as meaning there is a bag with 24 0-chips, 28 1-chips, 28 2-chips, 20 3-chips.
You have a wa-bag and a wb-bag. When you draw a chip from each bag, you "add" them together and form a new chip. You "mod" the answer (modulo N).
Imagine taking a 1-chip from the wb-bag and adding it with each chip in the wa-bag.
1-chip + 0-chip = 1-chip
1-chip + 1-chip = 2-chip
1-chip + 2-chip = 3-chip
1-chip + 3-chip = 4-chip = 0-chip (we are mod'ing by N=4)
Since there are 34 1-chips in the wb bag, when you add them against all the chips in the wa=[24 28 28 20] bag, you get
34*24 1-chips
34*28 2-chips
34*28 3-chips
34*20 0-chips
This is just the partial count due to the 34 1-chips. You also have to handle the other
types of chips in the wb-bag, but this shows you the method used below:
for i,count in enumerate(wb):
partial_count=count*wa
pv(partial_count)
shifted_partial_count=np.roll(partial_count,i)
pv(shifted_partial_count)
result+=shifted_partial_count
# partial_count: [504 588 588 420]
# shifted_partial_count: [504 588 588 420]
# partial_count: [816 952 952 680]
# shifted_partial_count: [680 816 952 952]
# partial_count: [480 560 560 400]
# shifted_partial_count: [560 400 480 560]
# partial_count: [600 700 700 500]
# shifted_partial_count: [700 700 500 600]
pv(result)
# result: [2444 2504 2520 2532]
This is the final result: 2444 0s, 2504 1s, 2520 2s, 2532 3s.
# This is a test to make sure the result is correct.
# This uses a very memory intensive method.
# c is too huge when n is large.
if n>1000:
print('n is too large to run the check')
else:
c=(a[:]+b[:,np.newaxis])
c=c.ravel()
c=c%N
result2=np.bincount(c)
pv(result2)
assert(all(r1==r2 for r1,r2 in zip(result,result2)))
# result2: [2444 2504 2520 2532]
Check your math, that's a lot of space you're asking for:
2^20*2^20 = 2^40 = 1 099 511 627 776
If each of your elements was just one byte, that's already one terabyte of memory.
Add a loop or two. This problem is not suited to maxing out your memory and minimizing your computation.