I want to systematically generate permutations of the alphabet.
I cannot don't want to use python itertools.permutation, because pregenerating a list of every permutation causes my computer to crash (first time i actually got it to force a shutdown, it was pretty great).
Therefore, my new approach is to generate and test each key on the fly. Currently, I am trying to handle this with recursion.
My idea is to start with the largest list (i'll use a 3 element list as an example), recurse in to smaller list until the list is two elements long. Then, it will print the list, swap the last two, print the list again, and return up one level and repeat.
For example, for 123
123 (swap position 0 with position 0)
23 --> 123 (swap position 1 with position 1)
32 --> 132 (swap position 1 with position 2)
213 (swap position 0 with position 1)
13 --> 213 (swap position 1 with position 1)
31 --> 231 (swap position 1 with position 2)
321 (swap position 0 with position 2)
21 --> 321 (swap position 1 with position 1)
12 --> 312 (swap position 1 with position 2)
for a four letter number (1234)
1234 (swap position 0 with position 0)
234 (swap position 1 with position 1)
34 --> 1234
43 --> 1243
324 (swap position 1 with position 2)
24 --> 1324
42 --> 1342
432 (swap position 1 with position 3)
32 --> 1432
23 --> 1423
2134 (swap position 0 for position 1)
134 (swap position 1 with position 1)
34 --> 2134
43 --> 2143
314 (swap position 1 with position 2)
14--> 2314
41--> 2341
431 (swap position 1 with position 3)
31--> 2431
13 -->2413
This is the code i currently have for the recursion, but its causing me a lot of grief, recursion not being my strong suit. Here's what i have.
def perm(x, y, key):
print "Perm called: X=",x,", Y=",y,", key=",key
while (x<y):
print "\tLooping Inward"
print "\t", x," ",y," ", key
x=x+1
key=perm(x, y, key)
swap(x,y,key)
print "\tAfter 'swap':",x," ",y," ", key, "\n"
print "\nFull Depth Reached"
#print key, " SWAPPED:? ",swap(x,y,key)
print swap(x, y, key)
print " X=",x,", Y=",y,", key=",key
return key
def swap(x, y, key):
v=key[x]
key[x]=key[y]
key[y]=v
return key
Any help would be greatly appreciated, this is a really cool project and I don't want to abandon it.
Thanks to all! Comments on my method or anything are welcome.
Happened upon my old question later in my career
To efficiently do this, you want to write a generator.
Instead of returning a list of all of the permutations, which requires that you store them (all of them) in memory, a generator returns one permutation (one element of this list), then pauses, and then computes the next one when you ask for it.
The advantages to generators are:
Take up much less space.
Generators take up between 40 and 80 bytes of space. One generators can have generate millions of items.
A list with one item takes up 40 bytes. A list with 1000 items takes up 4560 bytes
More efficient
Only computes as many values as you need. In permuting the alphabet, if the correct permutation was found before the end of the list, the time spend generating all of the other permutations was wasted.
(Itertools.permutation is an example of a generator)
How do I Write a Generator?
Writing a generator in python is actually very easy.
Basically, write code that would work for generating a list of permutations. Now, instead of writing resultList+=[ resultItem ], write yield(resultItem).
Now you've made a generator. If I wanted to loop over my generator, I could write
for i in myGenerator:
It's that easy.
Below is a generator for the code that I tried to write long ago:
def permutations(iterable, r=None):
# permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
# permutations(range(3)) --> 012 021 102 120 201 210
pool = tuple(iterable)
n = len(pool)
r = n if r is None else r
if r > n:
return
indices = range(n)
cycles = range(n, n-r, -1)
yield tuple(pool[i] for i in indices[:r])
while n:
for i in reversed(range(r)):
cycles[i] -= 1
if cycles[i] == 0:
indices[i:] = indices[i+1:] + indices[i:i+1]
cycles[i] = n - i
else:
j = cycles[i]
indices[i], indices[-j] = indices[-j], indices[i]
yield tuple(pool[i] for i in indices[:r])
break
else:
return
I think you have a really good idea, but keeping track of the positions might get a bit difficult to deal with. The general way I've seen for generating permutations recursively is a function which takes two string arguments: one to strip characters from (str) and one to add characters to (soFar).
When generating a permutation then we can think of taking characters from str and adding them to soFar. Assume we have a function perm that takes these two arguments and finds all permutations of str. We can then consider the current string str. We'll have permutations beginning with each character in str so we just need to loop over str, using each of these characters as the initial character and calling perm on the characters remaining in the string:
// half python half pseudocode
def perm(str, soFar):
if(str == ""):
print soFar // here we have a valid permutation
return
for i = 0 to str.length:
next = soFar + str[i]
remaining = str.substr(0, i) + str.substr(i+1)
perm(remaining, next)
Related
I have a scenario in which I have a peptide frame having 9 AA. I want to generate all possible peptides by replacing a maximum of 3 AA on this frame ie by replacing only 1 or 2 or 3 AA.
The frame is CKASGFTFS and I want to see all the mutants by replacing a maximum of 3 AA from the pool of 20 AA.
we have a pool of 20 different AA (A,R,N,D,E,G,C,Q,H,I,L,K,M,F,P,S,T,W,Y,V).
I am new to coding so Can someone help me out with how to code for this in Python or Biopython.
output is supposed to be a list of unique sequences like below:
CKASGFTFT, CTTSGFTFS, CTASGKTFS, CTASAFTWS, CTRSGFTFS, CKASEFTFS ....so on so forth getting 1, 2, or 3 substitutions from the pool of AA without changing the existing frame.
Ok, so after my code finished, I worked the calculations backwards,
Case1, is 9c1 x 19 = 171
Case2, is 9c2 x 19 x 19 = 12,996
Case3, is 9c3 x 19 x 19 x 19 = 576,156
That's a total of 589,323 combinations.
Here is the code for all 3 cases, you can run them sequentially.
You also requested to join the array into a single string, I have updated my code to reflect that.
import copy
original = ['C','K','A','S','G','F','T','F','S']
possibilities = ['A','R','N','D','E','G','C','Q','H','I','L','K','M','F','P','S','T','W','Y','V']
storage=[]
counter=1
# case 1
for i in range(len(original)):
for x in range(20):
temp = copy.deepcopy(original)
if temp[i] == possibilities[x]:
pass
else:
temp[i] = possibilities[x]
storage.append(''.join(temp))
print(counter,''.join(temp))
counter += 1
# case 2
for i in range(len(original)):
for j in range(i+1,len(original)):
for x in range(len(possibilities)):
for y in range(len(possibilities)):
temp = copy.deepcopy(original)
if temp[i] == possibilities[x] or temp[j] == possibilities[y]:
pass
else:
temp[i] = possibilities[x]
temp[j] = possibilities[y]
storage.append(''.join(temp))
print(counter,''.join(temp))
counter += 1
# case 3
for i in range(len(original)):
for j in range(i+1,len(original)):
for k in range(j+1,len(original)):
for x in range(len(possibilities)):
for y in range(len(possibilities)):
for z in range(len(possibilities)):
temp = copy.deepcopy(original)
if temp[i] == possibilities[x] or temp[j] == possibilities[y] or temp[k] == possibilities[z]:
pass
else:
temp[i] = possibilities[x]
temp[j] = possibilities[y]
temp[k] = possibilities[z]
storage.append(''.join(temp))
print(counter,''.join(temp))
counter += 1
The outputs look like this, (just the beginning and the end).
The results will also be saved to a variable named storage which is a native python list.
1 AKASGFTFS
2 RKASGFTFS
3 NKASGFTFS
4 DKASGFTFS
5 EKASGFTFS
6 GKASGFTFS
...
...
...
589318 CKASGFVVF
589319 CKASGFVVP
589320 CKASGFVVT
589321 CKASGFVVW
589322 CKASGFVVY
589323 CKASGFVVV
It takes around 10 - 20 minutes to run depending on your computer.
It will display all the combinations, skipping over changing AAs if any one is same as the original in case1 or 2 in case2 or 3 in case 3.
This code both prints them and stores them to a list variable so it can be storage or memory intensive and CPU intensive.
You could reduce the memory foot print if you want to store the string by replacing the letters with numbers cause they might take less space, you could even consider using something like pandas or appending to a csv file in storage.
You can iterate over the storage variable to go through the strings if you wish, like this.
for i in storage:
print(i)
Or you can convert it to a pandas series, dataframe or write line by line directly to a csv file in storage.
Let's compute the total number of mutations that you are looking for.
Say you want to replace a single AA. Firstly, there are 9 AAs in your frame, each of which can be changed into one of 19 other AA. That's 9 * 19 = 171
If you want to change two AA, there are 9c2 = 36 combinations of AA in your frame, and 19^2 permutations of two of the pool. That gives us 36 * 19^2 = 12996
Finally, if you want to change three, there are 9c3 = 84 combinations and 19^3 permutations of three of the pool. That gives us 84 * 19^3 = 576156
Put it all together and you get 171 + 12996 + 576156 = 589323 possible mutations. Hopefully, this helps illustrate the scale of the task you are trying to accomplish!
Problem Statement:
Problem
Apollo is playing a game involving polyominos. A polyomino is a shape made by joining together one or more squares edge to edge to form a single connected shape. The game involves combining N polyominos into a single rectangular shape without any holes. Each polyomino is labeled with a unique character from A to Z.
Apollo has finished the game and created a rectangular wall containing R rows and C columns. He took a picture and sent it to his friend Selene. Selene likes pictures of walls, but she likes them even more if they are stable walls. A wall is stable if it can be created by adding polyominos one at a time to the wall so that each polyomino is always supported. A polyomino is supported if each of its squares is either on the ground, or has another square below it.
Apollo would like to check if his wall is stable and if it is, prove that fact to Selene by telling her the order in which he added the polyominos.
Input
The first line of the input gives the number of test cases, T. T test cases follow. Each test case begins with a line containing the two integers R and C. Then, R lines follow, describing the wall from top to bottom. Each line contains a string of C uppercase characters from A to Z, describing that row of the wall.
Output
For each test case, output one line containing Case #x: y, where x is the test case number (starting from 1) and y is a string of N uppercase characters, describing the order in which he built them. If there is more than one such order, output any of them. If the wall is not stable, output -1 instead.
Limits
Time limit: 20 seconds per test set.
Memory limit: 1GB.
1 ≤ T ≤ 100.
1 ≤ R ≤ 30.
1 ≤ C ≤ 30.
No two polyominos will be labeled with the same letter.
The input is guaranteed to be valid according to the rules described in the statement.
Test set 1
1 ≤ N ≤ 5.
Test set 2
1 ≤ N ≤ 26.
Sample
Input
Output
4
4 6
ZOAAMM
ZOAOMM
ZOOOOM
ZZZZOM
4 4
XXOO
XFFO
XFXO
XXXO
5 3
XXX
XPX
XXX
XJX
XXX
3 10
AAABBCCDDE
AABBCCDDEE
AABBCCDDEE
Case #1: ZOAM
Case #2: -1
Case #3: -1
Case #4: EDCBA
In sample case #1, note that ZOMA is another possible answer.
In sample case #2 and sample case #3, the wall is not stable, so the answer is -1.
In sample case #4, the only possible answer is EDCBA.
Syntax pre-check
Show Test Input
My Code:
class Case:
def __init__(self, arr):
self.arr = arr
def solve(self):
n = len(self.arr)
if n == 1:
return ''.join(self.arr[0])
m = len(self.arr[0])
dep = {}
used = set() # to save letters already used
res = []
for i in range(n-1):
for j in range(m):
# each letter depends on the letter below it
if self.arr[i][j] not in dep:
dep[self.arr[i][j]] = set()
# only add dependency besides itself
if self.arr[i+1][j] != self.arr[i][j]:
dep[self.arr[i][j]].add(self.arr[i+1][j])
for j in range(m):
if self.arr[n-1][j] not in dep:
dep[self.arr[n-1][j]] = set()
# always find and desert the letters with all dependencies met
while len(dep) > 0:
# count how many letters are used in this round, if none is used, return -1
count = 0
next_dep = {}
for letter in dep:
if len(dep[letter]) == 0:
used.add(letter)
count += 1
res.append(letter)
else:
all_used = True
for neigh in dep[letter]:
if neigh not in used:
all_used = False
break
if all_used:
used.add(letter)
count += 1
res.append(letter)
else:
next_dep[letter] = dep[letter]
dep = next_dep
if count == 0:
return -1
if count == 0:
return -1
return ''.join(res)
t = int(input())
for i in range(1, t + 1):
R, C = [int(j) for j in input().split()]
arr = []
for j in range(R):
arr.append([c for c in input()])
case = Case(arr)
print("Case #{}: {}".format(i,case.solve()))
My code successfully passes all sample cases I can think of, but still keeps getting WA when submitted. Can anyone spot what is wrong with my solution? Thanks
I have to write a python program that given a large 50 MB DNA sequence and a smaller one, of around 15 characters, returns a list of all sequences of 15 characters ordered by how close they are to the one given as well as where they are in the larger one.
My current approach is to first get all the subsequences:
def get_subsequences_of_size(size, data):
sequences = {}
i = 0
while(i+size <= len(data)):
sequence = data[i:i+size]
if sequence not in sequences:
sequences[sequence] = data.count(sequence)
i += 1
return sequences
and then pack them in a list of dictionaries according to what the problem asked (I forgot to get the position):
def find_similar_sequences(seq, data):
similar_sequences = {}
sequences = get_subsequences_of_size(len(seq), data)
for sequence in sequences.keys():
diffs, muts = calculate_similarity(seq,sequence)
if diffs not in similar_sequences:
similar_sequences[diffs] = [{"Sequence": sequence, "Mutations": muts}]
else:
similar_sequences[diffs].append({"Sequence": sequence, "Mutations": muts})
#similar_sequences[sequence] = {"Similarity": (len(sequence)-diffs), "Differences": diffs, "Mutatations": muts}
return similar_sequences
My problem is that this running way too slow. With the 50MB input, it takes over 30 minutes to finish processing.
What about the following approach:
Go with a sliding window of length 15 over your long sequence and for every subsequence:
store the start location on the long sequence
calculate and store the similarity
import re
from itertools import islice
from collections import defaultdict
short_seq = 'TGGCGACGGACTTCA'
long_seq = 'AGAACGTTTCGCGTCAGCCCGGAAGTGGTCAGTCGCCTGAGTCCGAACAAAAATGACAACAACGTTTATGACAGAACATT' +\
'CCTTGCTGGCAACTACCTGAAAATCGGCTGGCCGTCAGTCAATATCATGTCCTCATCAGATTATAAATGCGTGGCGCTGA' +\
'CGGATTATGACCGTTTTCCGGAAGATATTGATGGCGAGGGGGATGCCTTCTCTCTTGCCTCAAAACGTACCACCACATTT' +\
'ATGTCCAGTGGTATGACGCTGGTGGAGAGTTCCCCCGGCAGGGATGTGAAGGATGTGAAATGGCGACGGACTTCACCGCA' +\
'TGAGGCTCCACCAACCACGGGGATACTGTCGCTCTATAACCGTGGCGATCGCCGTCGCTGGTACTGGCCCTGTCCACACT' +\
'GTGGTGAGTATTTTCAGCCCTGCGGCGATGTGGTTGCTGGTTTCCGTGATATTGCCGATCCCGTGCTGGCAAGTGAGGCG' +\
'GCTTATATTCAGTGTCCTTCTGGCGACGGACTTCACGCGTCAGCCCGGAAGTGGTCAGTCGCCTGAGTCCGAACAAAAAT'
def window(seq, n=2):
"Returns a sliding window (of width n) over data from the iterable"
" s -> (s0,s1,...s[n-1]), (s1,s2,...,sn), ... "
# from https://docs.python.org/release/2.3.5/lib/itertools-example.html
it = iter(seq)
result = tuple(islice(it, n))
if len(result) == n:
yield ''.join(result)
for elem in it:
result = result[1:] + (elem,)
yield ''.join(result)
def hamming_distance(s1, s2):
if len(s1) != len(s2):
raise ValueError("Undefined for sequences of unequal length")
return sum(ch1 != ch2 for ch1, ch2 in zip(s1, s2))
k = len(short_seq)
locations = defaultdict(list)
similarities = defaultdict(set)
for start, subseq in enumerate(window(long_seq, k)):
locations[subseq].append(start)
similarity = hamming_distance(subseq, short_seq) # substitute with your own similarity function
similarities[similarity].add(subseq)
with open(r'stack46268997.txt', 'w') as f:
for similarity in sorted(similarities.keys()):
f.write("Sequence(s) which differ in {} base(s) from the short sequence:\n".format(similarity))
for subseq in similarities[similarity]:
f.write("{} at location(s) {}\n".format(subseq, ', '.join(map(str, locations[subseq]))))
f.write('\n')
This outputs the list of subsequences ordered by how close they are to the given sequence.
Sequence(s) which differ in 0 base(s) from the short sequence:
TGGCGACGGACTTCA at location(s) 300, 500
Sequence(s) which differ in 5 base(s) from the short sequence:
TGGCGATCGCCGTCG at location(s) 362
Sequence(s) which differ in 6 base(s) from the short sequence:
TGGCAACTACCTGAA at location(s) 86
TGGTGAGTATTTTCA at location(s) 401
TGGCGAGGGGGATGC at location(s) 191
Sequence(s) which differ in 7 base(s) from the short sequence:
ATGTGAAGGATGTGA at location(s) 283
AGGGGGATGCCTTCT at location(s) 196
TGACAACAACGTTTA at location(s) 53
CGCTGACGGATTATG at location(s) 154
TTATGACCGTTTTCC at location(s) 164
TGGTTGCTGGTTTCC at location(s) 430
TCGCGTCAGCCCGGA at location(s) 8
AGTCGCCTGAGTCCG at location(s) 30, 536
CGGCGATGTGGTTGC at location(s) 422
[... and so on...]
I also ran the script on a 50 MB FASTA file. On my machine, this took 42 seconds to compute the results and another 30 seconds to write out the results to a file (printing them out would have taken much longer!)
Some assumptions:
One deck of 52 cards is used
Picture cards count as 10
Aces count as 1 or 11
The order is not important (ie. Ace + Queen is the same as Queen + Ace)
I thought I would then just sequentially try all the possible combinations and see which ones add up to 21, but there are way too many ways to mix the cards (52! ways). This approach also does not take into account that order is not important nor does it account for the fact that there are only 4 maximum types of any one card (Spade, Club, Diamond, Heart).
Now I am thinking of the problem like this:
We have 11 "slots". Each of these slots can have 53 possible things inside them: 1 of 52 cards or no card at all. The reason it is 11 slots is because 11 cards is the maximum amount of cards that can be dealt and still add up to 21; more than 11 cards would have to add up to more than 21.
Then the "leftmost" slot would be incremented up by one and all 11 slots would be checked to see if they add up to 21 (0 would represent no card in the slot). If not, the next slot to the right would be incremented, and the next, and so on.
Once the first 4 slots contain the same "card" (after four increments, the first 4 slots would all be 1), the fifth slot could not be that number as well since there are 4 numbers of any type. The fifth slot would then become the next lowest number in the remaining available cards; in the case of four 1s, the fifth slot would become a 2 and so on.
How would you do approach this?
divide and conquer by leveraging the knowledge that if you have 13 and pick a 10 you only have to pick cards to sum to 3 left to look at ... be forwarned this solution might be slow(took about 180 seconds on my box... it is definately non-optimal) ..
def sum_to(x,cards):
if x == 0: # if there is nothing left to sum to
yield []
for i in range(1,12): # for each point value 1..11 (inclusive)
if i > x: break # if i is bigger than whats left we are done
card_v = 11 if i == 1 else i
if card_v not in cards: continue # if there is no more of this card
new_deck = cards[:] # create a copy of hte deck (we do not want to modify the original)
if i == 1: # one is clearly an ace...
new_deck.remove(11)
else: # remove the value
new_deck.remove(i)
# on the recursive call we need to subtract our recent pick
for result in sum_to(x-i,new_deck):
yield [i] + result # append each further combination to our solutions
set up your cards as follows
deck = []
for i in range(2,11): # two through ten (with 4 of each)
deck.extend([i]*4)
deck.extend([10]*4) #jacks
deck.extend([10]*4) #queens
deck.extend([10]*4) #kings
deck.extend([11]*4) # Aces
then just call your function
for combination in sum_to(21,deck):
print combination
unfortunately this does allow some duplicates to sneak in ...
in order to get unique entries you need to change it a little bit
in sum_to on the last line change it to
# sort our solutions so we can later eliminate duplicates
yield sorted([i] + result) # append each further combination to our solutions
then when you get your combinations you gotta do some deep dark voodoo style python
unique_combinations = sorted(set(map(tuple,sum_to(21,deck))),key=len,reverse=0)
for combo in unique_combinations: print combo
from this cool question i have learned the following (keep in mind in real play you would have the dealer and other players also removing from the same deck)
there are 416 unique combinations of a deck of cards that make 21
there are 300433 non-unique combinations!!!
the longest number of ways to make 21 are as follows
with 11 cards there are 1 ways
[(1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3)]
with 10 cards there are 7 ways
with 9 cards there are 26 ways
with 8 cards there are 54 ways
with 7 cards there are 84 ways
with 6 cards there are 94 ways
with 5 cards there are 83 ways
with 4 cards there are 49 ways
with 3 cards there are 17 ways
with 2 cards there are 1 ways
[(10, 11)]
there are 54 ways in which all 4 aces are used in making 21!!
there are 106 ways of making 21 in which NO aces are used !!!
keep in mind these are often suboptimal plays (ie considering A,10 -> 1,10 and hitting )
Before worrying about the suits and different cards with value 10 lets figure out how many different value combinations resulting to 21 there are. For example 5, 5, 10, 1 is one such combination. The following function takes in limit which is the target value, start which indicates the lowest value that can be picked and used which is the list of picked values:
def combinations(limit, start, used):
# Base case
if limit == 0:
return 1
# Start iteration from lowest card to picked so far
# so that we're only going to pick cards 3 & 7 in order 3,7
res = 0
for i in range(start, min(12, limit + 1)):
# Aces are at index 1 no matter if value 11 or 1 is used
index = i if i != 11 else 1
# There are 16 cards with value of 10 (T, J, Q, K) and 4 with every
# other value
available = 16 if index == 10 else 4
if used[index] < available:
# Mark the card used and go through combinations starting from
# current card and limit lowered by the value
used[index] += 1
res += combinations(limit - i, i, used)
used[index] -= 1
return res
print combinations(21, 1, [0] * 11) # 416
Since we're interested about different card combinations instead of different value combinations the base case in above should be modified to return number of different card combinations that can be used to generate a value combination. Luckily that's quite easy task, Binomial coefficient can be used to figure out how many different combinations of k items can be picked from n items.
Once the number of different card combinations for each value in used is known they can be just multiplied with each other for the final result. So for the example of 5, 5, 10, 1 value 5 results to bcoef(4, 2) == 6, value 10 to bcoef(16, 1) == 16 and value 1 to bcoef(4, 1) == 4. For all the other values bcoef(x, 0) results to 1. Multiplying those values results to 6 * 16 * 4 == 384 which is then returned:
import operator
from math import factorial
def bcoef(n, k):
return factorial(n) / (factorial(k) * factorial(n - k))
def combinations(limit, start, used):
if limit == 0:
combs = (bcoef(4 if i != 10 else 16, x) for i, x in enumerate(used))
res = reduce(operator.mul, combs, 1)
return res
res = 0
for i in range(start, min(12, limit + 1)):
index = i if i != 11 else 1
available = 16 if index == 10 else 4
if used[index] < available:
used[index] += 1
res += combinations(limit - i, i, used)
used[index] -= 1
return res
print combinations(21, 1, [0] * 11) # 186184
So I decided to write the script that every possible viable hand can be checked. The total number comes out to be 188052. Since I checked every possible combination, this is the exact number (as opposed to an estimate):
import itertools as it
big_list = []
def deck_set_up(m):
special = {8:'a23456789TJQK', 9:'a23456789', 10:'a2345678', 11:'a23'}
if m in special:
return [x+y for x,y in list(it.product(special[m], 'shdc'))]
else:
return [x+y for x,y in list(it.product('a23456789TJQKA', 'shdc'))]
deck_dict = {'as':1,'ah':1,'ad':1,'ac':1,
'2s':2,'2h':2,'2d':2,'2c':2,
'3s':3,'3h':3,'3d':3,'3c':3,
'4s':4,'4h':4,'4d':4,'4c':4,
'5s':5,'5h':5,'5d':5,'5c':5,
'6s':6,'6h':6,'6d':6,'6c':6,
'7s':7,'7h':7,'7d':7,'7c':7,
'8s':8,'8h':8,'8d':8,'8c':8,
'9s':9,'9h':9,'9d':9,'9c':9,
'Ts':10,'Th':10,'Td':10,'Tc':10,
'Js':10,'Jh':10,'Jd':10,'Jc':10,
'Qs':10,'Qh':10,'Qd':10,'Qc':10,
'Ks':10,'Kh':10,'Kd':10,'Kc':10,
'As':11,'Ah':11,'Ad':11,'Ac':11}
stop_here = {2:'As', 3:'8s', 4:'6s', 5:'4h', 6:'3c', 7:'3s', 8:'2h', 9:'2s', 10:'2s', 11:'2s'}
for n in range(2,12): # n is number of cards in the draw
combos = it.combinations(deck_set_up(n), n)
stop_point = stop_here[n]
while True:
try:
pick = combos.next()
except:
break
if pick[0] == stop_point:
break
if n < 8:
if len(set([item.upper() for item in pick])) != n:
continue
if sum([deck_dict[card] for card in pick]) == 21:
big_list.append(pick)
print n, len(big_list) # Total number hands that can equal 21 is 188052
In the output, the the first column is the number of cards in the draw, and the second number is the cumulative count. So the number after "3" in the output is the total count of hands that equal 21 for a 2-card draw, and a 3-card draw. The lower case a is a low ace (1 point), and uppercase A is high ace. I have a line (the one with the set command), to make sure it throws out any hand that has a duplicate card.
The script takes 36 minutes to run. So there is definitely a trade-off between execution time, and accuracy. The "big_list" contains the solutions (i.e. every hand where the sum is 21)
>>>
================== RESTART: C:\Users\JBM\Desktop\bj3.py ==================
2 64
3 2100
4 14804
5 53296
6 111776
7 160132
8 182452
9 187616
10 188048
11 188052 # <-- This is the total count, as these numbers are cumulative
>>>
I have a min-heap code for Huffman coding which you can see here: http://rosettacode.org/wiki/Huffman_coding#Python
I'm trying to make a max-heap Shannon-Fano code which is similar to min-heap.
Here is a code:
from collections import defaultdict, Counter
import heapq, math
def _heappop_max(heap):
"""Maxheap version of a heappop."""
lastelt = heap.pop() # raises appropriate IndexError if heap is empty
if heap:
returnitem = heap[0]
heap[0] = lastelt
heapq._siftup_max(heap, 0)
return returnitem
return lastelt
def _heappush_max(heap, item):
"""Push item onto heap, maintaining the heap invariant."""
heap.append(item)
heapq._siftdown_max(heap, 0, len(heap)-1)
def sf_encode(symb2freq):
heap = [[wt, [sym, ""]] for sym, wt in symb2freq.items()]
heapq._heapify_max(heap)
while len(heap) > 1:
lo = _heappop_max(heap)
hi = _heappop_max(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
_heappush_max(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
print heap
return sorted(_heappop_max(heap)[1:], key=lambda p: (len(p[1]), p))
But i've got output like this:
Symbol Weight Shannon-Fano Code
! 1 1
3 1 01
: 1 001
J 1 0001
V 1 00001
z 1 000001
E 3 0000001
L 3 00000001
P 3 000000001
N 4 0000000001
O 4 00000000001
Am I right using heapq to implement Shannon-Fano coding? The problem in this string:
_heappush_max(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
and I don't understand how to fix it.
Expect output similar to Huffman encoding
Symbol Weight Huffman Code
2875 01
a 744 1001
e 1129 1110
h 606 0000
i 610 0001
n 617 0010
o 668 1000
t 842 1100
d 358 10100
l 326 00110
Added:
Well, I've tried to do this without heapq, but have unstopable recursion:
def sf_encode(iA, iB, maxP):
global tupleList, total_sf
global mid
maxP = maxP/float(2)
sumP = 0
for i in range(iA, iB):
tup = tupleList[i]
if sumP < maxP or i == iA: # top group
sumP += tup[1]/float(total_sf)
tupleList[i] = (tup[0], tup[1], tup[2] + '0')
mid = i
else: # bottom group
tupleList[i] = (tup[0], tup[1], tup[2] + '1')
print tupleList
if mid - 1 > iA:
sf_encode(iA, mid - 1, maxP)
if iB - mid > 0:
sf_encode(mid, iB, maxP)
return tupleList
In Shannon-Fano coding you need the following steps:
A Shannon–Fano tree is built according to a specification designed to
define an effective code table. The actual algorithm is simple:
For a given list of symbols, develop a corresponding list of
probabilities or frequency counts so that each symbol’s relative
frequency of occurrence is known.
Sort the lists of symbols according
to frequency, with the most frequently occurring symbols at the left
and the least common at the right.
Divide the list into two parts,
with the total frequency counts of the left part being as close to the
total of the right as possible.
The left part of the list is assigned
the binary digit 0, and the right part is assigned the digit 1. This
means that the codes for the symbols in the first part will all start
with 0, and the codes in the second part will all start with 1.
Recursively apply the steps 3 and 4 to each of the two halves,
subdividing groups and adding bits to the codes until each symbol has
become a corresponding code leaf on the tree.
So you will need code to sort (your input appears already sorted so you may be able to skip this), plus a recursive function that chooses the best partition and then recurses on the first and second halves of the list.
Once the list is sorted, the order of the elements never changes so there is no need to use heapq to do this style of encoding.