Polyalphabetic cipher in python, decode method doesn't work - python

I am trying to implement some basic ciphers in Python. The one I'm stuck on is a polyalphabetic cipher. I've seen other posts on here that use the same term for what appears to be a different cipher, so I'll specify I'm trying to implement what this textbook calls a polyalphabetic cipher in chapter 7, section 1. This involves digitizing the message, breaking it up into vectors of equal length, multiplying each one by a set matrix (that is invertible, mod 26), adding a shift vector, and reversing the digitization to get the encoded message out.
Using the example in the textbook, I'm trying to encode "HELP" with matrix [[3, 5], [1, 2]] and shift vector [2, 2], and it is encoding to "RRGR" as the book says it should. However, when I apply my decode method to "RRGR" to "HELO". In case more data is helpful, I'll add that when I encode and then decode the whole alphabet with the same matrix and shift vector I get "ABCDEFGGHJKLMNOPQRSTUVWWYZ".
My code is below (apologies for the lack of comments, this code isn't for anything important so I didn't bother):
import numpy as np
class Polyalphabetic:
def __init__(self, alphabet, vec_len, shift, mult):
self.alphabet = alphabet
self.vec_len = vec_len
self.mod = len(alphabet)
self.shift = np.array(shift)
self.mult = np.array(mult)
self.inv = np.linalg.inv(mult) % self.mod
def digitize(self, string):
return [alphabet.index(letter) for letter in string]
def undigitize(self, int_list):
return ''.join([alphabet[i] for i in int_list])
def encode(self, message):
digits = self.digitize(message)
output_vectors = []
for i in range(len(digits) // self.vec_len):
in_vec = digits[self.vec_len * i : self.vec_len * (i + 1)]
multed_vec = np.matmul(self.mult, in_vec)
shifted_vec = (multed_vec + self.shift)
out_vec = shifted_vec.astype(int) % self.mod
output_vectors.append(out_vec)
encoded = np.concatenate(output_vectors)
return self.undigitize(encoded)
def decode(self, encoded_message):
digits = self.digitize(encoded_message)
output_vectors = []
for i in range(len(digits) // self.vec_len):
in_vec = digits[self.vec_len * i : self.vec_len * (i + 1)]
shifted_vec = (in_vec - self.shift)
multed_vec = np.matmul(self.inv, shifted_vec)
out_vec = multed_vec.astype(int) % self.mod
output_vectors.append(out_vec)
decoded = np.concatenate(output_vectors)
return self.undigitize(decoded)
alphabet = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b = [2, 2]
A = [
[3, 5],
[1, 2]
]
Poly = Polyalphabetic(alphabet, 2, b, A)
print(Poly.encode(alphabet))
# Output is "HEXKNQDWTCJIZOPUFAVGLMBSRY"
print(Poly.decode(Poly.encode(alphabet)))
# Output is "ABCDEFGGHJKLMNOPQRSTUVWWYZ"

I've figured out a solution, though I'm still not quite sure why it was broken in the first place. I replaced the line out_vec = multed_vec.astype(int) % self.mod with out_vec = np.rint(multed_vec).astype(int) % self.mod because for some reason numpy's array.astype(int) was sometimes rounding down what should have been integers. My guess is that it was a floating point error and numpy was displaying something slightly under 15 (the P in HELP) as 15. but when that was cast to an integer it dropped the fractional bits and got 14 (the O in HELO).
Follow up with a solution to another problem I faced. My code as it stood required that the multiplier matrix have determinant 1 or -1 in order to have an inverse with all integer entries, because getting the inverse involves dividing by the determinant. To fix this, my first approach was to multiply the inverse by the determinant of the multiplier, but then it isn't an inverse of the multiplier at all because the entries are too big. To fix that, I had to multiply it by the modular multiplicative inverse of the determinant (the number i such that det(A) * i = 1 (mod 26)). This gets the inverse back to actually being an inverse of the multiplier, but without reintroducing the fractions involved in the initial calculation of the inverse.

Related

Could I get a clarification to this Python code below?

I'm a beginner to Python and I'm trying to calculate the angles (-26.6 &18.4) for this figure below and so on for the rest of the squares by using Python code.
I have found the code below and I'm trying to understand very well. How could it work here? Any clarification, please?
Python Code:
def computeDegree(a,b,c):
babc = (a[0]-b[0])*(c[0]-b[0])+(a[1]-b[1])*(c[1]-b[1])
norm_ba = math.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)
norm_bc = math.sqrt((c[0]-b[0])**2 + (c[1]-b[1])**2)
norm_babc = norm_ba * norm_bc
radian = math.acos(babc/norm_babc)
degree = math.degrees(radian)
return round(degree, 1)
def funcAngle(p, s, sn):
a = (s[0]-p[0], s[1]-p[1])
b = (sn[0]-p[0], sn[1]-p[1])
c = a[0] * b[1] - a[1] * b[0]
if p != sn:
d = computeDegree(s, p, sn)
else:
d = 0
if c > 0:
result = d
elif c < 0:
result = -d
elif c == 0:
result = 0
return result
p = (1,4)
s = (2,2)
listSn= ((1,2),(2,3),(3,2),(2,1))
for sn in listSn:
func(p,s,sn)
The results
I expected to get the angles in the picture such as -26.6, 18.4 ...
Essentially, this uses the definition of dot products to solve for the angle. You can read more it at this link (also where I found these images).
To solve for the angle you first need to convert your 3 input points into two vectors.
# Vector from b to a
# BA = (a[0] - b[0], a[1] - b[1])
BA = a - b
# Vector from b to c
# BC = (a[0] - c[0], a[1] - c[1])
BC = c - b
Using the two vectors you can then find the angle between them by first finding the value of the dot product with the second formula.
# babc = (a[0]-b[0])*(c[0]-b[0])+(a[1]-b[1])*(c[1]-b[1])
dot_product = BA[0] * BC[0] + BA[1] * BC[1]
Then by going back to the first definition, you can divide off the lengths of the two input vectors and the resulting value should be the cosine of the angle between the vectors. It may be hard to read with the array notation but its just using the Pythagoras theorem.
# Length/magnitude of vector BA
# norm_ba = math.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)
length_ba = math.sqrt(BA[0]**2 + BA[1]**2)
# Length/magnitude of vector BC
# norm_bc = math.sqrt((c[0]-b[0])**2 + (c[1]-b[1])**2)
length_bc = math.sqrt(BC[0]**2 + BC[1]**2)
# Then using acos (essentially inverse of cosine), you can get the angle
# radian = math.acos(babc/norm_babc)
angle = Math.acos(dot_product / (length_ba * length_bc))
Most of the other stuff is just there to catch cases where the program might accidentally try to divide by zero. Hopefully this helps to explain why it looks the way it does.
Edit: I answered this question because I was bored and didn't see harm in explaining the math behind that code, however in the future try to avoid asking questions like 'how does this code work' in the future.
Let's start with funcAngle since it calls computeDegree later.
The first thing it does is define a as a two item tuple. A lot of this code seems to use two item tuples, with the two parts referenced by v[0] and v[1] or similar. These are almost certainly two dimensional vectors of some sort.
I'm going to write these as 𝐯 for the vector and vβ‚“ and vᡧ since they're probably the two components.
[don't look too closely at that second subscript, it's totally a y and not a gamma...]
a is the vector difference between s and p: i.e.
a = (s[0]-p[0], s[1]-p[1])
is aβ‚“=sβ‚“-pβ‚“ and aᡧ=sᡧ-pᡧ; or just 𝐚=𝐬-𝐩 in vector.
b = (sn[0]-p[0], sn[1]-p[1])
again; 𝐛=𝐬𝐧-𝐩
c = a[0] * b[1] - a[1] * b[0]
c=aβ‚“bᡧ-aᡧbβ‚“; c is the cross product of 𝐚 and 𝐛 (and is just a number)
if p != sn:
d = computeDegree(s, p, sn)
else:
d = 0
I'd take the above in reverse: if 𝐩 and 𝐬𝐧 are the same, then we already know the angle between them is zero (and it's possible the algorithm fails badly) so don't compute it. Otherwise, compute the angle (we'll look at that later).
if c > 0:
result = d
elif c < 0:
result = -d
elif c == 0:
result = 0
If c is pointing in the normal direction (via the left hand rule? right hand rule? can't remember) that's fine: if it isn't, we need to negate the angle, apparently.
return result
Pass the number we've just worked out to some other code.
You can probably invoke this code by adding something like:
print (funcangle((1,0),(0,1),(2,2))
at the end and running it. (Haven't actually tested these numbers)
So this function works out a and b to get c; all just to negate the angle if it's pointing the wrong way. None of these variables are actually passed to computeDegree.
so, computeDegree():
def computeDegree(a,b,c):
First thing to note is that the variables from before have been renamed. funcAngle passed s, p and sn, but now they're called a, b and c. And the note the order they're passed in isn't the same as they're passed to funcAngle, which is nasty and confusing.
babc = (a[0]-b[0])*(c[0]-b[0])+(a[1]-b[1])*(c[1]-b[1])
babc = (aβ‚“-bβ‚“)(cβ‚“-bβ‚“)+(aᡧ-bᡧ)(cᡧ-bᡧ)
If 𝐚' and 𝐜' are 𝐚-𝐛 and 𝐜-𝐛 respectively, this is just
a'β‚“c'β‚“+a'ᡧc'ᡧ, or the dot product of 𝐚' and 𝐜'.
norm_ba = math.sqrt((a[0]-b[0])**2 + (a[1]-b[1])**2)
norm_bc = math.sqrt((c[0]-b[0])**2 + (c[1]-b[1])**2)
norm_ba = √[(aβ‚“-bβ‚“)Β² + (aᡧ-bᡧ)Β²] (and norm_bc likewise).
This looks like the length of the hypotenuse of 𝐚' (and 𝐜' respectively)
norm_babc = norm_ba * norm_bc
which we then multiply together
radian = math.acos(babc/norm_babc)
We use the arccosine (inverse cosine, cos^-1) function, with the length of those multiplied hypotenuses as the hypotenuse and that dot product as the adjacent length...
degree = math.degrees(radian)
return round(degree, 1)
but that's in radians, so we convert to degrees and round it for nice formatting.
Ok, so now it's in maths, rather than Python, but that's still not very easy to understand.
(sidenote: this is why descriptive variable names and documentation is everyone's friend!)

Exact math with big numbers in Python 3

I'm trying to implement a system for encryption similar to Shamir's Secret Sharing using Python. Essentially, I have code that will generate a list of points that can be used to find a password at the y-intercept of the gradient formed by these points. The password is a number in ASCII (using two digits per ASCII character), thus is gets to be a pretty big number with larger passwords. For example, the password ThisIsAPassword will generate a list of points that looks like this:
x y
9556 66707086867915126140753213946756441607861037300900
4083 28502040182447127964404994111341362715565457349000
9684 67600608880657662915204624898507424633297513499300
9197 64201036847801292531159022293017356403707170463200
To be clear, these points are generated upon a randomly chosen slope (this is fine since it's the y-intercept that matters).
The problem arises in trying to make a program to decode a password. Using normal math, Python is unable to accurately find the password because of the size of the numbers. Here's the code I have:
def findYint(x,y):
slope = (y[1] - y[0]) / (x[1] - x[0])
yint = int(y[0] - slope * x[0])
return yint
def asciiToString(num):
chars = [num[i:i+3] for i in range(0, len(num), 3)]
return ''.join(chr(int(i)) for i in chars)
def main():
fi = open('pass.txt','r')
x,y = [], []
for i in fi:
row = i.split()
x.append(int(row[0]))
y.append(int(row[1]))
fi.close()
yint = findYint(x,y)
pword = asciiToString(str(yint))
print(pword)
main()
Output (with the password "ThisIsAPassword"):
Ν‰)3 Η’ΞœΔ©Ε©Δ‡Β»Β’Η”ΒΌ
Typically my code will work with shorter passwords such as "pass" or "word", but the bigger numbers presumably aren't computed with the exact accuracy needed to convert them into ASCII. Any solutions for using either precise math or something else?
Also here's the code for generating points in case it's important:
import random
def encryptWord(word):
numlist = []
for i in range(len(word)):
numlist.append(str(ord(word[i])).zfill(3))
num = int("".join(numlist))
return num
def createPoints(pwd, pts):
yint = pwd
gradient = pwd*random.randint(10,100)
xvals = []
yvals = []
for i in range(pts):
n = random.randint(1000,10000)
xvals.append(n)
yvals.append(((n) * gradient) + pwd)
return xvals, yvals
def main():
pword = input("Enter a password to encrypt: ")
pword = encryptWord(pword)
numpoints = int(input("How many points to generate? "))
if numpoints < 2:
numpoints = 2
xpts, ypts = createPoints(pword, numpoints)
fi = open("pass.txt","w")
for i in range(len(xpts)):
fi.write(str(xpts[i]))
fi.write(' ')
fi.write(str(ypts[i]))
fi.write('\n')
fi.close()
print("Sent to file (pass.txt)")
main()
As you may know, Python's built-in int type can handle arbitrarily large integers, but the float type which has limited precision. The only part of your code which deals with numbers that aren't ints seems to be this function:
def findYint(x,y):
slope = (y[1] - y[0]) / (x[1] - x[0])
yint = int(y[0] - slope * x[0])
return yint
Here the division results in a float, even if the result would be exact as an int. Moreover, we can't safely do integer division here with the // operator, because slope will get multiplied by x[0] before the truncation is supposed to happen.
So either you need to do some algebra in order to get the same result using only ints, or you need to represent the fraction (y1 - y0) / (x1 - x0) with an exact non-integer number type instead of float. Fortunately, Python's standard library has a class named Fraction which will do what you want:
from fractions import Fraction
def findYint(x,y):
slope = Fraction(y[1] - y[0], x[1] - x[0])
yint = int(y[0] - slope * x[0])
return yint
It should be possible to do this only with integer-based math:
def findYint(x,y):
return (y[0] * (x[1] - x[0]) - (y[1] - y[0]) * x[0]) // (x[1] - x[0])
This way you avoid the floating point arithmetic and the precision constraints it has.
Fractions, and rewriting for all integer math are good.
For truly large integers, you may find yourself wanting https://pypi.org/project/gmpy/ instead of the builtin int type. I've successfully used it for testing for large primes.
Or if you really do want numbers with a decimal point, maybe try decimal.Decimal("1") - just for example.

Binary mask with shift operation without cycle

We have some large binary number N (large means millions of digits). We also have binary mask M where 1 means that we must remove digit in this position in number N and move all higher bits one position right.
Example:
N = 100011101110
M = 000010001000
Res 1000110110
Is it possible to solve this problem without cycle with some set of logical or arithmetical operations? We can assume that we have access to bignum arithmetic in Python.
Feels like it should be something like this:
Res = N - (N xor M)
But it doesn't work
UPD: My current solution with cycle is following:
def prepare_reduced_arrays(dict_of_N, mask):
'''
mask: string '0000011000'
each element of dict_of_N - big python integer
'''
capacity = len(mask)
answer = dict()
for el in dict_of_N:
answer[el] = 0
new_capacity = 0
for i in range(capacity - 1, -1, -1):
if mask[i] == '1':
continue
cap2 = (1 << new_capacity)
pos = (capacity - i - 1)
for el in dict_of_N:
current_bit = (dict_of_N[el] >> pos) & 1
if current_bit:
answer[el] |= cap2
new_capacity += 1
return answer, new_capacity
While this may not be possible without a loop in python, it can be made extremely fast with numba and just in time compilation. I went on the assumption that your inputs could be easily represented as boolean arrays, which would be very simple to construct from a binary file using struct. The method I have implemented involves iterating a few different objects, however these iterations were chosen carefully to make sure they were compiler optimized, and never doing the same work twice. The first iteration is using np.where to locate the indices of all the bits to delete. This specific function (among many others) is optimized by the numba compiler. I then use this list of bit indices to build the slice indices for slices of bits to keep. The final loop copies these slices to an empty output array.
import numpy as np
from numba import jit
from time import time
def binary_mask(num, mask):
num_nbits = num.shape[0] #how many bits are in our big num
mask_bits = np.where(mask)[0] #which bits are we deleting
mask_n_bits = mask_bits.shape[0] #how many bits are we deleting
start = np.empty(mask_n_bits + 1, dtype=int) #preallocate array for slice start indexes
start[0] = 0 #first slice starts at 0
start[1:] = mask_bits + 1 #subsequent slices start 1 after each True bit in mask
end = np.empty(mask_n_bits + 1, dtype=int) #preallocate array for slice end indexes
end[:mask_n_bits] = mask_bits #each slice ends on (but does not include) True bits in the mask
end[mask_n_bits] = num_nbits + 1 #last slice goes all the way to the end
out = np.empty(num_nbits - mask_n_bits, dtype=np.uint8) #preallocate return array
for i in range(mask_n_bits + 1): #for each slice
a = start[i] #use local variables to reduce number of lookups
b = end[i]
c = a - i
d = b - i
out[c:d] = num[a:b] #copy slices
return out
jit_binary_mask = jit("b1[:](b1[:], b1[:])")(binary_mask) #decorator without syntax sugar
###################### Benchmark ########################
bignum = np.random.randint(0,2,1000000, dtype=bool) # 1 million random bits
bigmask = np.random.randint(0,10,1000000, dtype=np.uint8)==9 #delete about 1 in 10 bits
t = time()
for _ in range(10): #10 cycles of just numpy implementation
out = binary_mask(bignum, bigmask)
print(f"non-jit: {time()-t} seconds")
t = time()
out = jit_binary_mask(bignum, bigmask) #once ahead of time to compile
compile_and_run = time() - t
t = time()
for _ in range(10): #10 cycles of compiled numpy implementation
out = jit_binary_mask(bignum, bigmask)
jit_runtime = time()-t
print(f"jit: {jit_runtime} seconds")
print(f"estimated compile_time: {compile_and_run - jit_runtime/10}")
In this example, I execute the benchmark on a boolean array of length 1,000,000 a total of 10 times for both the compiled and un-compiled version. On my laptop, the output is:
non-jit: 1.865583896636963 seconds
jit: 0.06370806694030762 seconds
estimated compile_time: 0.1652850866317749
As you can see with a simple algorithm like this, very significant performance gains can be seen from compilation. (in my case about 20-30x speedup)
As far as I know, this can be done without the use of loops if and only if M is a power of 2.
Let's take your example, and modify M so that it is a power of 2:
N = 0b100011101110 = 2286
M = 0b000000001000 = 8
Removing the fourth lowest bit from N and shifting the higher bits to the right would result in:
N = 0b10001110110 = 1142
We achieved this using the following algorithm:
Begin with N = 0b100011101110 = 2286
Iterate from the most-significant bit to the least-significant bit in M.
If the current bit in M is set to 1, then store the lower bits in some variable, x:
x = 0b1101110
Then, subtract every bit up to and including the current bit in M from N, so that we end up with the following:
N - (0b10000000 + x) = N - (0b10000000 + 0b1101110) = 0b100011101110 - 0b11101110 = 0b100000000000
This step can also be achieved by and-ing the bits with 0, which may be more efficient.
Next, we shift the result once to the right:
0b100000000000 >> 1 = 0b10000000000
Finally, we add back x to the shifted result:
0b10000000000 + x = 0b10000000000 + 0b1101110 = 0b10001101110 = 1142
There may be a possibility that this can somehow be done without loops, but it would actually be efficient if you were to simply iterate over M (from the most-significant bit to the least-significant bit) and performed this process on every set bit, as the time complexity would be O(M.bit_length()).
I wrote up the code for this algorithm as well, and I believe it's relatively efficient, but I don't have any big binary numbers to test it with:
def remove_bits(N, M):
bit = 2 ** (M.bit_length() - 1)
while bit != 0:
if M & bit:
ones = bit - 1
# Store lower `bit` bits.
temp = N & ones
# Clear lower `bit` bits.
N &= ~ones
# Shift once to the right.
N >>= 1
# Set stored lower `bit` bits.
N |= temp
bit >>= 1
return N
if __name__ == '__main__':
N = 0b100011101110
M = 0b000010001000
print(bin(remove_bits(N, M)))
Using your example, this returns your result: 0b1000110110
I don't think there's any way to do this in a constant number of calls to the built-in bitwise operators. Python would have to provide something like PEXT for that to be possible.
For literally millions of digits, you may actually get best performance by working in terms of sequences of bits, sacrificing the space advantages of Python ints and the time advantages of bitwise operations in favor of more flexibility in the operations you can perform. I don't know where the break-even point would be:
import itertools
bits = bin(N)[2:]
maskbits = bin(M)[2:].zfill(len(bits))
bits = bits.zfill(len(maskbits))
chosenbits = itertools.compress(bits, map('0'.__eq__, maskbits))
result = int(''.join(chosenbits), 2)

Generate equation with the result value closest to the requested one, have speed problems

I am writing some quiz game and need computer to solve 1 game in the quiz if players fail to solve it.
Given data :
List of 6 numbers to use, for example 4, 8, 6, 2, 15, 50.
Targeted value, where 0 < value < 1000, for example 590.
Available operations are division, addition, multiplication and division.
Parentheses can be used.
Generate mathematical expression which evaluation is equal, or as close as possible, to the target value. For example for numbers given above, expression could be : (6 + 4) * 50 + 15 * (8 - 2) = 590
My algorithm is as follows :
Generate all permutations of all the subsets of the given numbers from (1) above
For each permutation generate all parenthesis and operator combinations
Track the closest value as algorithm runs
I can not think of any smart optimization to the brute-force algorithm above, which will speed it up by the order of magnitude. Also I must optimize for the worst case, because many quiz games will be run simultaneously on the server.
Code written today to solve this problem is (relevant stuff extracted from the project) :
from operator import add, sub, mul, div
import itertools
ops = ['+', '-', '/', '*']
op_map = {'+': add, '-': sub, '/': div, '*': mul}
# iterate over 1 permutation and generates parentheses and operator combinations
def iter_combinations(seq):
if len(seq) == 1:
yield seq[0], str(seq[0])
else:
for i in range(len(seq)):
left, right = seq[:i], seq[i:] # split input list at i`th place
# generate cartesian product
for l, l_str in iter_combinations(left):
for r, r_str in iter_combinations(right):
for op in ops:
if op_map[op] is div and r == 0: # cant divide by zero
continue
else:
yield op_map[op](float(l), r), \
('(' + l_str + op + r_str + ')')
numbers = [4, 8, 6, 2, 15, 50]
target = best_value = 590
best_item = None
for i in range(len(numbers)):
for current in itertools.permutations(numbers, i+1): # generate perms
for value, item in iter_combinations(list(current)):
if value < 0:
continue
if abs(target - value) < best_value:
best_value = abs(target - value)
best_item = item
print best_item
It prints : ((((4*6)+50)*8)-2). Tested it a little with different values and it seems to work correctly. Also I have a function to remove unnecessary parenthesis but it is not relevant to the question so it is not posted.
Problem is that this runs very slowly because of all this permutations, combinations and evaluations. On my mac book air it runs for a few minutes for 1 example. I would like to make it run in a few seconds tops on the same machine, because many quiz game instances will be run at the same time on the server. So the questions are :
Can I speed up current algorithm somehow (by orders of magnitude)?
Am I missing on some other algorithm for this problem which would run much faster?
You can build all the possible expression trees with the given numbers and evalate them. You don't need to keep them all in memory, just print them when the target number is found:
First we need a class to hold the expression. It is better to design it to be immutable, so its value can be precomputed. Something like this:
class Expr:
'''An Expr can be built with two different calls:
-Expr(number) to build a literal expression
-Expr(a, op, b) to build a complex expression.
There a and b will be of type Expr,
and op will be one of ('+','-', '*', '/').
'''
def __init__(self, *args):
if len(args) == 1:
self.left = self.right = self.op = None
self.value = args[0]
else:
self.left = args[0]
self.right = args[2]
self.op = args[1]
if self.op == '+':
self.value = self.left.value + self.right.value
elif self.op == '-':
self.value = self.left.value - self.right.value
elif self.op == '*':
self.value = self.left.value * self.right.value
elif self.op == '/':
self.value = self.left.value // self.right.value
def __str__(self):
'''It can be done smarter not to print redundant parentheses,
but that is out of the scope of this problem.
'''
if self.op:
return "({0}{1}{2})".format(self.left, self.op, self.right)
else:
return "{0}".format(self.value)
Now we can write a recursive function that builds all the possible expression trees with a given set of expressions, and prints the ones that equals our target value. We will use the itertools module, that's always fun.
We can use itertools.combinations() or itertools.permutations(), the difference is in the order. Some of our operations are commutative and some are not, so we can use permutations() and assume we will get many very simmilar solutions. Or we can use combinations() and manually reorder the values when the operation is not commutative.
import itertools
OPS = ('+', '-', '*', '/')
def SearchTrees(current, target):
''' current is the current set of expressions.
target is the target number.
'''
for a,b in itertools.combinations(current, 2):
current.remove(a)
current.remove(b)
for o in OPS:
# This checks whether this operation is commutative
if o == '-' or o == '/':
conmut = ((a,b), (b,a))
else:
conmut = ((a,b),)
for aa, bb in conmut:
# You do not specify what to do with the division.
# I'm assuming that only integer divisions are allowed.
if o == '/' and (bb.value == 0 or aa.value % bb.value != 0):
continue
e = Expr(aa, o, bb)
# If a solution is found, print it
if e.value == target:
print(e.value, '=', e)
current.add(e)
# Recursive call!
SearchTrees(current, target)
# Do not forget to leave the set as it were before
current.remove(e)
# Ditto
current.add(b)
current.add(a)
And then the main call:
NUMBERS = [4, 8, 6, 2, 15, 50]
TARGET = 590
initial = set(map(Expr, NUMBERS))
SearchTrees(initial, TARGET)
And done! With these data I'm getting 719 different solutions in just over 21 seconds! Of course many of them are trivial variations of the same expression.
24 game is 4 numbers to target 24, your game is 6 numbers to target x (0 < x < 1000).
That's much similar.
Here is the quick solution, get all results and print just one in my rMBP in about 1-3s, I think one solution print is ok in this game :), I will explain it later:
def mrange(mask):
#twice faster from Evgeny Kluev
x = 0
while x != mask:
x = (x - mask) & mask
yield x
def f( i ) :
global s
if s[i] :
#get cached group
return s[i]
for x in mrange(i & (i - 1)) :
#when x & i == x
#x is a child group in group i
#i-x is also a child group in group i
fk = fork( f(x), f(i-x) )
s[i] = merge( s[i], fk )
return s[i]
def merge( s1, s2 ) :
if not s1 :
return s2
if not s2 :
return s1
for i in s2 :
#print just one way quickly
s1[i] = s2[i]
#combine all ways, slowly
# if i in s1 :
# s1[i].update(s2[i])
# else :
# s1[i] = s2[i]
return s1
def fork( s1, s2 ) :
d = {}
#fork s1 s2
for i in s1 :
for j in s2 :
if not i + j in d :
d[i + j] = getExp( s1[i], s2[j], "+" )
if not i - j in d :
d[i - j] = getExp( s1[i], s2[j], "-" )
if not j - i in d :
d[j - i] = getExp( s2[j], s1[i], "-" )
if not i * j in d :
d[i * j] = getExp( s1[i], s2[j], "*" )
if j != 0 and not i / j in d :
d[i / j] = getExp( s1[i], s2[j], "/" )
if i != 0 and not j / i in d :
d[j / i] = getExp( s2[j], s1[i], "/" )
return d
def getExp( s1, s2, op ) :
exp = {}
for i in s1 :
for j in s2 :
exp['('+i+op+j+')'] = 1
#just print one way
break
#just print one way
break
return exp
def check( s ) :
num = 0
for i in xrange(target,0,-1):
if i in s :
if i == target :
print numbers, target, "\nFind ", len(s[i]), 'ways'
for exp in s[i]:
print exp, ' = ', i
else :
print numbers, target, "\nFind nearest ", i, 'in', len(s[i]), 'ways'
for exp in s[i]:
print exp, ' = ', i
break
print '\n'
def game( numbers, target ) :
global s
s = [None]*(2**len(numbers))
for i in xrange(0,len(numbers)) :
numbers[i] = float(numbers[i])
n = len(numbers)
for i in xrange(0,n) :
s[2**i] = { numbers[i]: {str(numbers[i]):1} }
for i in xrange(1,2**n) :
#we will get the f(numbers) in s[2**n-1]
s[i] = f(i)
check(s[2**n-1])
numbers = [4, 8, 6, 2, 2, 5]
s = [None]*(2**len(numbers))
target = 590
game( numbers, target )
numbers = [1,2,3,4,5,6]
target = 590
game( numbers, target )
Assume A is your 6 numbers list.
We define f(A) is all result that can calculate by all A numbers, if we search f(A), we will find if target is in it and get answer or the closest answer.
We can split A to two real child groups: A1 and A-A1 (A1 is not empty and not equal A) , which cut the problem from f(A) to f(A1) and f(A-A1). Because we know f(A) = Union( a+b, a-b, b-a, a*b, a/b(b!=0), b/a(a!=0) ), which a in A, b in A-A1.
We use fork f(A) = Union( fork(A1,A-A1) ) stands for such process. We can remove all duplicate value in fork(), so we can cut the range and make program faster.
So, if A = [1,2,3,4,5,6], then f(A) = fork( f([1]),f([2,3,4,5,6]) ) U ... U fork( f([1,2,3]), f([4,5,6]) ) U ... U stands for Union.
We will see f([2,3,4,5,6]) = fork( f([2,3]), f([4,5,6]) ) U ... , f([3,4,5,6]) = fork( f([3]), f([4,5,6]) ) U ..., the f([4,5,6]) used in both.
So if we can cache every f([...]) the program can be faster.
We can get 2^len(A) - 2 (A1,A-A1) in A. We can use binary to stands for that.
For example: A = [1,2,3,4,5,6], A1 = [1,2,3], then binary 000111(7) stands for A1. A2 = [1,3,5], binary 010101(21) stands for A2. A3 = [1], then binary 000001(1) stands for A3...
So we get a way stands for all groups in A, we can cache them and make all process faster!
All combinations for six number, four operations and parenthesis are up to 5 * 9! at least. So I think you should use some AI algorithm. Using genetic programming or optimization seems to be the path to follow.
In the book Programming Collective Intelligence in the chapter 11 Evolving Intelligence you will find exactly what you want and much more. That chapter explains how to find a mathematical function combining operations and numbers (as you want) to match a result. You will be surprised how easy is such task.
PD: The examples are written using Python.
I would try using an AST at least it will
make your expression generation part easier
(no need to mess with brackets).
http://en.wikipedia.org/wiki/Abstract_syntax_tree
1) Generate some tree with N nodes
(N = the count of numbers you have).
I've read before how many of those you
have, their size is serious as N grows.
By serious I mean more than polynomial to say the least.
2) Now just start changing the operations
in the non-leaf nodes and keep evaluating
the result.
But this is again backtracking and too much degree of freedom.
This is a computationally complex task you're posing. I believe if you
ask the question as you did: "let's generate a number K on the output
such that |K-V| is minimal" (here V is the pre-defined desired result,
i.e. 590 in your example) , then I guess this problem is even NP-complete.
Somebody please correct me if my intuition is lying to me.
So I think even the generation of all possible ASTs (assuming only 1 operation
is allowed) is NP complete as their count is not polynomial. Not to talk that more
than 1 operation is allowed here and not to talk of the minimal difference requirement (between result and desired result).
1. Fast entirely online algorithm
The idea is to search not for a single expression for target value,
but for an equation where target value is included in one part of the equation and
both parts have almost equal number of operations (2 and 3).
Since each part of the equation is relatively small, it does not take much time to
generate all possible expressions for given input values.
After both parts of equation are generated it is possible to scan a pair of sorted arrays
containing values of these expressions and find a pair of equal (or at least best matching)
values in them. After two matching values are found we could get corresponding expressions and
join them into a single expression (in other words, solve the equation).
To join two expression trees together we could descend from the root of one tree
to "target" leaf, for each node on this path invert corresponding operation
('*' to '/', '/' to '*' or '/', '+' to '-', '-' to '+' or '-'), and move "inverted"
root node to other tree (also as root node).
This algorithm is faster and easier to implement when all operations are invertible.
So it is best to use with floating point division (as in my implementation) or with
rational division. Truncating integer division is most difficult case because it produces same result for different inputs (42/25=1 and 25/25 is also 1). With zero-remainder integer division this algorithm gives result almost instantly when exact result is available, but needs some modifications to work correctly when approximate result is needed.
See implementation on Ideone.
2. Even faster approach with off-line pre-processing
As noticed by #WolframH, there are not so many possible input number combinations.
Only 3*3*(49+4-1) = 4455 if repetitions are possible.
Or 3*3*(49) = 1134 without duplicates. Which allows us to pre-process
all possible inputs off-line, store results in compact form, and when some particular result
is needed quickly unpack one of pre-processed values.
Pre-processing program should take array of 6 numbers and generate values for all possible
expressions. Then it should drop out-of-range values and find nearest result for all cases
where there is no exact match. All this could be performed by algorithm proposed by #Tim.
His code needs minimal modifications to do it. Also it is the fastest alternative (yet).
Since pre-processing is offline, we could use something better than interpreted Python.
One alternative is PyPy, other one is to use some fast interpreted language. Pre-processing
all possible inputs should not take more than several minutes.
Speaking about memory needed to store all pre-processed values, the only problem are the
resulting expressions. If stored in string form they will take up to 4455*999*30 bytes or 120Mb.
But each expression could be compressed. It may be represented in postfix notation like this:
arg1 arg2 + arg3 arg4 + *. To store this we need 10 bits to store all arguments' permutations,
10 bits to store 5 operations, and 8 bits to specify how arguments and operations are
interleaved (6 arguments + 5 operations - 3 pre-defined positions: first two are always
arguments, last one is always operation). 28 bits per tree or 4 bytes, which means it is only
20Mb for entire data set with duplicates or 5Mb without them.
3. Slow entirely online algorithm
There are some ways to speed up algorithm in OP:
Greatest speed improvement may be achieved if we avoid trying each commutative operation twice and make recursion tree less branchy.
Some optimization is possible by removing all branches where the result of division operation is zero.
Memorization (dynamic programming) cannot give significant speed boost here, still it may be useful.
After enhancing OP's approach with these ideas, approximately 30x speedup is achieved:
from itertools import combinations
numbers = [4, 8, 6, 2, 15, 50]
target = best_value = 590
best_item = None
subsets = {}
def get_best(value, item):
global best_value, target, best_item
if value >= 0 and abs(target - value) < best_value:
best_value = abs(target - value)
best_item = item
return value, item
def compare_one(value, op, left, right):
item = ('(' + left + op + right + ')')
return get_best(value, item)
def apply_one(left, right):
yield compare_one(left[0] + right[0], '+', left[1], right[1])
yield compare_one(left[0] * right[0], '*', left[1], right[1])
yield compare_one(left[0] - right[0], '-', left[1], right[1])
yield compare_one(right[0] - left[0], '-', right[1], left[1])
if right[0] != 0 and left[0] >= right[0]:
yield compare_one(left[0] / right[0], '/', left[1], right[1])
if left[0] != 0 and right[0] >= left[0]:
yield compare_one(right[0] / left[0], '/', right[1], left[1])
def memorize(seq):
fs = frozenset(seq)
if fs in subsets:
for x in subsets[fs].items():
yield x
else:
subsets[fs] = {}
for value, item in try_all(seq):
subsets[fs][value] = item
yield value, item
def apply_all(left, right):
for l in memorize(left):
for r in memorize(right):
for x in apply_one(l, r):
yield x;
def try_all(seq):
if len(seq) == 1:
yield get_best(numbers[seq[0]], str(numbers[seq[0]]))
for length in range(1, len(seq)):
for x in combinations(seq[1:], length):
for value, item in apply_all(list(x), list(set(seq) - set(x))):
yield value, item
for x, y in try_all([0, 1, 2, 3, 4, 5]): pass
print best_item
More speed improvements are possible if you add some constraints to the problem:
If integer division is only possible when the remainder is zero.
If all intermediate results are to be non-negative and/or below 1000.
Well I don't will give up. Following the line of all the answers to your question I come up with another algorithm. This algorithm gives the solution with a time average of 3 milliseconds.
#! -*- coding: utf-8 -*-
import copy
numbers = [4, 8, 6, 2, 15, 50]
target = 590
operations = {
'+': lambda x, y: x + y,
'-': lambda x, y: x - y,
'*': lambda x, y: x * y,
'/': lambda x, y: y == 0 and 1e30 or x / y # Handle zero division
}
def chain_op(target, numbers, result=None, expression=""):
if len(numbers) == 0:
return (expression, result)
else:
for choosen_number in numbers:
remaining_numbers = copy.copy(numbers)
remaining_numbers.remove(choosen_number)
if result is None:
return chain_op(target, remaining_numbers, choosen_number, str(choosen_number))
else:
incomming_results = []
for key, op in operations.items():
new_result = op(result, choosen_number)
new_expression = "%s%s%d" % (expression, key, choosen_number)
incomming_results.append(chain_op(target, remaining_numbers, new_result, new_expression))
diff = 1e30
selected = None
for exp_result in incomming_results:
exp, res = exp_result
if abs(res - target) < diff:
diff = abs(res - target)
selected = exp_result
if diff == 0:
break
return selected
if __name__ == '__main__':
print chain_op(target, numbers)
Erratum: This algorithm do not include the solutions containing parenthesis. It always hits the target or the closest result, my bad. Still is pretty fast. It can be adapted to support parenthesis without much work.
Actually there are two things that you can do to speed up the time to milliseconds.
You are trying to find a solution for given quiz, by generating the numbers and the target number. Instead you can generate the solution and just remove the operations. You can build some thing smart that will generate several quizzes and choose the most interesting one, how ever in this case you loose the as close as possible option.
Another way to go, is pre-calculation. Solve 100 quizes, use them as build-in in your application, and generate new one on the fly, try to keep your quiz stack at 100, also try to give the user only the new quizes. I had the same problem in my bible games, and I used this method to speed thing up. Instead of 10 sec for question it takes me milliseconds as I am generating new question in background and always keeping my stack to 100.
What about Dynamic programming, because you need same results to calculate other options?

How to make a random but partial shuffle in Python?

Instead of a complete shuffle, I am looking for a partial shuffle function in python.
Example : "string" must give rise to "stnrig", but not "nrsgit"
It would be better if I can define a specific "percentage" of characters that have to be rearranged.
Purpose is to test string comparison algorithms. I want to determine the "percentage of shuffle" beyond which an(my) algorithm will mark two (shuffled) strings as completely different.
Update :
Here is my code. Improvements are welcome !
import random
percent_to_shuffle = int(raw_input("Give the percent value to shuffle : "))
to_shuffle = list(raw_input("Give the string to be shuffled : "))
num_of_chars_to_shuffle = int((len(to_shuffle)*percent_to_shuffle)/100)
for i in range(0,num_of_chars_to_shuffle):
x=random.randint(0,(len(to_shuffle)-1))
y=random.randint(0,(len(to_shuffle)-1))
z=to_shuffle[x]
to_shuffle[x]=to_shuffle[y]
to_shuffle[y]=z
print ''.join(to_shuffle)
This is a problem simpler than it looks. And the language has the right tools not to stay between you and the idea,as usual:
import random
def pashuffle(string, perc=10):
data = list(string)
for index, letter in enumerate(data):
if random.randrange(0, 100) < perc/2:
new_index = random.randrange(0, len(data))
data[index], data[new_index] = data[new_index], data[index]
return "".join(data)
Your problem is tricky, because there are some edge cases to think about:
Strings with repeated characters (i.e. how would you shuffle "aaaab"?)
How do you measure chained character swaps or re arranging blocks?
In any case, the metric defined to shuffle strings up to a certain percentage is likely to be the same you are using in your algorithm to see how close they are.
My code to shuffle n characters:
import random
def shuffle_n(s, n):
idx = range(len(s))
random.shuffle(idx)
idx = idx[:n]
mapping = dict((idx[i], idx[i-1]) for i in range(n))
return ''.join(s[mapping.get(x,x)] for x in range(len(s)))
Basically chooses n positions to swap at random, and then exchanges each of them with the next in the list... This way it ensures that no inverse swaps are generated and exactly n characters are swapped (if there are characters repeated, bad luck).
Explained run with 'string', 3 as input:
idx is [0, 1, 2, 3, 4, 5]
we shuffle it, now it is [5, 3, 1, 4, 0, 2]
we take just the first 3 elements, now it is [5, 3, 1]
those are the characters that we are going to swap
s t r i n g
^ ^ ^
t (1) will be i (3)
i (3) will be g (5)
g (5) will be t (1)
the rest will remain unchanged
so we get 'sirgnt'
The bad thing about this method is that it does not generate all the possible variations, for example, it could not make 'gnrits' from 'string'. This could be fixed by making partitions of the indices to be shuffled, like this:
import random
def randparts(l):
n = len(l)
s = random.randint(0, n-1) + 1
if s >= 2 and n - s >= 2: # the split makes two valid parts
yield l[:s]
for p in randparts(l[s:]):
yield p
else: # the split would make a single cycle
yield l
def shuffle_n(s, n):
idx = range(len(s))
random.shuffle(idx)
mapping = dict((x[i], x[i-1])
for i in range(len(x))
for x in randparts(idx[:n]))
return ''.join(s[mapping.get(x,x)] for x in range(len(s)))
import random
def partial_shuffle(a, part=0.5):
# which characters are to be shuffled:
idx_todo = random.sample(xrange(len(a)), int(len(a) * part))
# what are the new positions of these to-be-shuffled characters:
idx_target = idx_todo[:]
random.shuffle(idx_target)
# map all "normal" character positions {0:0, 1:1, 2:2, ...}
mapper = dict((i, i) for i in xrange(len(a)))
# update with all shuffles in the string: {old_pos:new_pos, old_pos:new_pos, ...}
mapper.update(zip(idx_todo, idx_target))
# use mapper to modify the string:
return ''.join(a[mapper[i]] for i in xrange(len(a)))
for i in xrange(5):
print partial_shuffle('abcdefghijklmnopqrstuvwxyz', 0.2)
prints
abcdefghljkvmnopqrstuxwiyz
ajcdefghitklmnopqrsbuvwxyz
abcdefhwijklmnopqrsguvtxyz
aecdubghijklmnopqrstwvfxyz
abjdefgcitklmnopqrshuvwxyz
Evil and using a deprecated API:
import random
# adjust constant to taste
# 0 -> no effect, 0.5 -> completely shuffled, 1.0 -> reversed
# Of course this assumes your input is already sorted ;)
''.join(sorted(
'abcdefghijklmnopqrstuvwxyz',
cmp = lambda a, b: cmp(a, b) * (-1 if random.random() < 0.2 else 1)
))
maybe like so:
>>> s = 'string'
>>> shufflethis = list(s[2:])
>>> random.shuffle(shufflethis)
>>> s[:2]+''.join(shufflethis)
'stingr'
Taking from fortran's idea, i'm adding this to collection. It's pretty fast:
def partial_shuffle(st, p=20):
p = int(round(p/100.0*len(st)))
idx = range(len(s))
sample = random.sample(idx, p)
res=str()
samptrav = 1
for i in range(len(st)):
if i in sample:
res += st[sample[-samptrav]]
samptrav += 1
continue
res += st[i]
return res

Categories