Decoding an encrypted file using Caesar Cipher - python

I want to decrypt an encrypted file. I'm having trouble all the way at the bottom when converting it and comparing it to a dictionary (which is full of words). Can someone guide me in the right direction? I'm struggling comparing the two.
#this function takes a string and encrypts ONLY letters by k shifts
def CaeserCipher(string, k):
#setting up variables to move through
upper = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'*10000
lower = 'abcdefghijklmnopqrstuvwxyz'*10000
newCipher = ''
#looping each letter and moving it k times
for letter in string:
if letter in upper:
if upper.index(letter) + k > 25:
indexPosition = (upper.index(letter) + k)
newCipher = newCipher + upper[indexPosition]
else:
indexPosition = upper.index(letter) + k
newCipher = newCipher + upper[indexPosition]
elif letter in lower:
if lower.index(letter) + k > 25:
indexPosition = (lower.index(letter) + k)
newCipher = newCipher + lower[indexPosition]
else:
indexPosition = lower.index(letter) + k
newCipher = newCipher + lower[indexPosition]
else:
newCipher = newCipher + letter
return newCipher
f = open('dictionary.txt', "r")
dictionary = set()
for line in f:
word = line.strip()
dictionary.add(word)
print dictionary
#main file
#reading file and encrypting text
f = open('encryptMystery1.txt')
string = ''
out = open("plain1.txt", "w")
myList = []
for line in f:
myList.append(line)
for sentence in myList:
for k in range(26):
updatedSentence = CaeserCipher(sentence, k)
for word in updatedSentence.split():
if word in dictionary:
out.write(updatedSentence)
break
print myList
f.close()
out.close()

Let's tackle this in steps, and the first step is entitled
WHY DO YOU HAVE 260,000 CHARACTER LONG STRINGS IN A CAESAR CIPHER
Sorry, I don't mean to be overly dramatic, but you realize that's going to take up more space than, well, Space, don't you? And it's completely unnecessary. It's an ugly and slow hack to avoid understanding the % (modulo) operator. Don't do that.
Now, to the modulo:
Step two of course will have to be understanding the modulo. It's not actually hard, it's just like the remainder of a division problem. You remember when you were in school and just LEARNING division? 7/4 was 1r3 not 1.75, remember? Well Python has functions for all that. 7/4 == 1.75, 7//4 == 1 and 7 % 4 == 3. This is useful because it can serve to "wrap" a number around a fixed length.
Let's say for example you have some string with 26 indexes (like, I don't know, an alphabet?). You're trying to add some number to a starting index, then return the result but UGH YOU'RE ADDING 2 TO Y AND IT DOESN'T WORK! Well with modulo it can. Y is in index 24 (remember zero is its own index), and 24+2 is 26 and there IS no 26th index. However, if you know there's going to be only 26 elements in your string, we can take the modulo and use THAT instead.
By that logic, index + CONSTANT % len(alphabet) will ALWAYS return the right number using simple math and not sweet baby jesus the quarter million element long string you just butchered.
Ugh your mother would be ashamed.
Reversing a Caesar cipher
So you've got a good idea, going through each line in turn and applying every kind of cipher to it. If I were you I'd dump them all into separate files, or even into separate list elements. Remember though that if you're reversing the cipher, you need to use -k not k. It's probably a good idea to simply change your Caesar cipher to detect that though, since the modulo trick doesn't work in this case. Try something like:
def cipher(text, k):
cipherkey = "SOMESTRINGGOESHERE"
if k < 0:
k = len(cipherkey) + k
# len(cipherkey) - abs(k) would be more clear, but if it HAS to be
# a negative number to get in here, it seems silly to add the call
# to abs
Then you can do:
startingtext = "Encrypted_text_goes_here"
possibledecrypts = [cipher(startingtext, -i) for i in range(1,26)]

Related

How to start again at the beginning of the word?

To apply a Vigenere coding, we have to shift the letters but not all by the same number. The key is this time a keyword which each letter gives us the shift to be done (taking A for a shift of 0, B for a shift of 1 ...).
Let's take an example to explain the method: Let's imagine that the keyword is "MATHS" and the word to code is "PYTHON".
To code P, I shift the number corresponding to M, i.e. 12 (because we start at 0 with A) which gives me B as coding for P.
Let's move on to Y: I shift it by the number corresponding to A, i.e. 0, so Y is the coding for Y here.
Let's go to T which is shifted by the number corresponding to T, i.e. 19, so T becomes M once shifted
And so on.
import string
def vigenere_cipher(msg, shift):
encrypted = ''
for i,j in zip(msg,shift):
new_index = ( string.ascii_uppercase.index(i) + string.ascii_uppercase.index(j) ) % 26
encrypted += string.ascii_uppercase[new_index]
return encrypted
print(vigenere_cipher('PYTHON', 'MATH'))
If our keyword is too short we start again at the beginning of the word, i.e. N will be shifted by the number corresponding to M.
My problem right here is actually with the last part, How I can simply say that if the keyword is too short we start again at the beginning of the word ?
Because only "PYTH" part is encrypted to "BYMO" with MATH as a key but not the "ON"
I think the main issue here is that you're zipping both msg and shift together, when you don't actually need to do so. You already understand the concept of using % to guarantee that you stay on a number smaller than your max number, so I'll modify your function to also use % to select which character from shift you want to use
import string
def vigenere_cipher(msg, shift):
encrypted = ''
shift_length = len(shift)
for i, char in enumerate(msg):
new_index = ( string.ascii_uppercase.index(char) + string.ascii_uppercase.index(shift[i % shift_length]) ) % 26
encrypted += string.ascii_uppercase[new_index]
return encrypted
print(vigenere_cipher('PYTHON', 'MATH'))
Just add the line shift = shift * (len(msg) // len(shift) + 1) at the start of the function so shift is repeated until it's longer than msg (e.g. this line turns MATH into MATHMATH)
import string
def vigenere_cipher(msg, shift):
shift = shift * (len(msg) // len(shift) + 1)
encrypted = ''
for i,j in zip(msg,shift):
new_index = (string.ascii_uppercase.index(i) + string.ascii_uppercase.index(j)) % 26
encrypted += string.ascii_uppercase[new_index]
return encrypted
print(vigenere_cipher('PYTHON', 'MATH'))
Output: BYMOAN

How can I optimize this function which is related to reversal of string?

I have a string: "String"
The first thing you do is reverse it: "gnirtS"
Then you will take the string from the 1st position and reverse it again: "gStrin"
Then you will take the string from the 2nd position and reverse it again: "gSnirt"
Then you will take the string from the 3rd position and reverse it again: "gSntri"
Continue this pattern until you have done every single position, and then you will return the string you have created. For this particular string, you would return: "gSntir"
And I have to repeat this entire procedure for x times where the string and x can be very big . (million or billion)
My code is working fine for small strings but it's giving timeout error for very long strings.
def string_func(s,x):
def reversal(st):
n1=len(st)
for i in range(0,n1):
st=st[0:i]+st[i:n1][::-1]
return st
for i in range(0,x):
s=reversal(s)
return s
This linear implementation could point you in the right direction:
from collections import deque
from itertools import cycle
def special_reverse(s):
d, res = deque(s), []
ops = cycle((d.pop, d.popleft))
while d:
res.append(next(ops)())
return ''.join(res)
You can recognize the slice patterns in the following examples:
>>> special_reverse('123456')
'615243'
>>> special_reverse('1234567')
'7162534'
This works too:
my_string = "String"
my_string_len = len(my_string)
result = ""
for i in range(my_string_len):
my_string = my_string[::-1]
result += my_string[0]
my_string = my_string[1:]
print(result)
And this, though it looks spaghetti :D
s = "String"
lenn = len(s)
resultStringList = []
first_half = list(s[0:int(len(s) / 2)])
second_half = None
middle = None
if lenn % 2 == 0:
second_half = list(s[int(len(s) / 2) : len(s)][::-1])
else:
second_half = list(s[int(len(s) / 2) + 1 : len(s)][::-1])
middle = s[int(len(s) / 2)]
lenn -= 1
for k in range(int(lenn / 2)):
print(k)
resultStringList.append(second_half.pop(0))
resultStringList.append(first_half.pop(0))
if middle != None:
resultStringList.append(middle)
print(''.join(resultStringList))
From the pattern of the original string and the result I constructed this algorithm. It has minimal number of operations.
str = 'Strings'
lens = len(str)
lensh = int(lens/2)
nstr = ''
for i in range(lensh):
nstr = nstr + str[lens - i - 1] + str[i]
if ((lens % 2) == 1):
nstr = nstr + str[lensh]
print(nstr)
or a short version using iterator magic:
def string_func(s):
ops = (iter(reversed(s)), iter(s))
return ''.join(next(ops[i % 2]) for i in range(len(s)))
which does the right thing for me, while if you're happy using some library code, you can golf it down to:
from itertools import cycle, islice
def string_func(s):
ops = (iter(reversed(s)), iter(s))
return ''.join(map(next, islice(cycle(ops), len(s))))
my original version takes 80microseconds for a 512 character string, this updated version takes 32µs, while your version took 290µs and schwobaseggl's solution is about 75µs.
I've had a play in Cython and I can get runtime down to ~0.5µs. Measuring this under perf_event_open I can see my CPU is retiring ~8 instructions per character, which seems pretty good, while a hard-coded loop in C gets this down to ~4.5 instructions per ASCII char. These don't seem to be very "Pythonic" solutions so I'll leave them out of this answer. But included this paragraph to show that the OP has options to make things faster, and that running this a billion times on a string consisting of ~500 characters will still take hundreds of seconds even with relatively careful C code.

python intelligent hexadecimal numbers generator

I want to be able to generate 12 character long chain, of hexadecimal, BUT with no more than 2 identical numbers duplicate in the chain: 00 and not 000
Because, I know how to generate ALL possibilites, including 00000000000 to FFFFFFFFFFF, but I know that I won't use all those values, and because the size of the file generated with ALL possibilities is many GB long, I want to reduce the size by avoiding the not useful generated chains.
So my goal is to have results like 00A300BF8911 and not like 000300BF8911
Could you please help me to do so?
Many thanks in advance!
if you picked the same one twice, remove it from the choices for a round:
import random
hex_digits = set('0123456789ABCDEF')
result = ""
pick_from = hex_digits
for digit in range(12):
cur_digit = random.sample(hex_digits, 1)[0]
result += cur_digit
if result[-1] == cur_digit:
pick_from = hex_digits - set(cur_digit)
else:
pick_from = hex_digits
print(result)
Since the title mentions generators. Here's the above as a generator:
import random
hex_digits = set('0123456789ABCDEF')
def hexGen():
while True:
result = ""
pick_from = hex_digits
for digit in range(12):
cur_digit = random.sample(hex_digits, 1)[0]
result += cur_digit
if result[-1] == cur_digit:
pick_from = hex_digits - set(cur_digit)
else:
pick_from = hex_digits
yield result
my_hex_gen = hexGen()
counter = 0
for result in my_hex_gen:
print(result)
counter += 1
if counter > 10:
break
Results:
1ECC6A83EB14
D0897DE15E81
9C3E9028B0DE
CE74A2674AF0
9ECBD32C003D
0DF2E5DAC0FB
31C48E691C96
F33AAC2C2052
CD4CEDADD54D
40A329FF6E25
5F5D71F823A4
You could also change the while true loop to only produce a certain number of these based on a number passed into the function.
I interpret this question as, "I want to construct a rainbow table by iterating through all strings that have the following qualities. The string has a length of 12, contains only the characters 0-9 and A-F, and it never has the same character appearing three times in a row."
def iter_all_strings_without_triplicates(size, last_two_digits = (None, None)):
a,b = last_two_digits
if size == 0:
yield ""
else:
for c in "0123456789ABCDEF":
if a == b == c:
continue
else:
for rest in iter_all_strings_without_triplicates(size-1, (b,c)):
yield c + rest
for s in iter_all_strings_without_triplicates(12):
print(s)
Result:
001001001001
001001001002
001001001003
001001001004
001001001005
001001001006
001001001007
001001001008
001001001009
00100100100A
00100100100B
00100100100C
00100100100D
00100100100E
00100100100F
001001001010
001001001011
...
Note that there will be several hundred terabytes' worth of values outputted, so you aren't saving much room compared to just saving every single string, triplicates or not.
import string, random
source = string.hexdigits[:16]
result = ''
while len(result) < 12 :
idx = random.randint(0,len(source))
if len(result) < 3 or result[-1] != result[-2] or result[-1] != source[idx] :
result += source[idx]
You could extract a random sequence from a list of twice each hexadecimal digits:
digits = list('1234567890ABCDEF') * 2
random.shuffle(digits)
hex_number = ''.join(digits[:12])
If you wanted to allow shorter sequences, you could randomize that too, and left fill the blanks with zeros.
import random
digits = list('1234567890ABCDEF') * 2
random.shuffle(digits)
num_digits = random.randrange(3, 13)
hex_number = ''.join(['0'] * (12-num_digits)) + ''.join(digits[:num_digits])
print(hex_number)
You could use a generator iterating a window over the strings your current implementation yields. Sth. like (hex_str[i:i + 3] for i in range(len(hex_str) - window_size + 1)) Using len and set you could count the number of different characters in the slice. Although in your example it might be easier to just compare all 3 characters.
You can create an array from 0 to 255, and use random.sample with your list to get your list

Make python code for searching through dictionary more efficent (vigenere cipher)

I'm writing some code to go through the vigenere cipher with different possibilities, and then making it only add the list to the possible options that result in a word when decrypted (that way I don't get thousands of print outs, I only get told with the most likely possibilities).
Here's the code so far
def vigenere(input):
print("VIGENERE")
key = ""
keyList = []
textList = []
for word in englishDictionary:
key = re.sub('[\W_]+', '', word)
if key[:len(input)] not in keyList:
while len(key) < len(input):
key = key + key
keyLetters = list(key.upper())
keyNumbers = []
for x in keyLetters:
keyNumbers.append(alphaNum.get(x))
vigenereOutput = ""
for z in list(input):
try:
keyNum = int(keyNumbers[0])
except IndexError:
pass
keyNum = keyNum - 1
inbetween = int(alphaNum.get(z)) - keyNum
del keyNumbers[0]
if inbetween < 1:
inbetween = 26 + inbetween
vigenereOutput = vigenereOutput + numAlpha.get(str(inbetween))
if vigenereOutput in englishDictionary:
keyList.append(key[:len(input)])
textList.append(vigenereOutput)
if len(keyList) > 0:
for keyEntry in len(keyList):
print("Key:", keyList[keyEntry])
print("Text:", textList[keyEntry])
print()
The englishDictionary is a list of all the words in my dictionary text file (fairly big file, but using a small one would ruin the point of this decoder)
Right now however it takes over 20 minutes to go through the entire dictionary... how can I speed this process up?

Cesar Cipher on Python beginner level

''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
for i in word:
r = chr(ord(i)+shift)
if r > "z":
r = chr(ord(i) - 26 + shift)
word = word.replace(i, r)
return word
if __name__ == "__main__": print encrypt("programming", 3)
This gives me wrong answers on shifts higher than 1 and words longer then 2. I can't figure out why. Any help please?
Thilo explains the problem exactly. Let's step through it:
''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
for i in word:
r = chr(ord(i)+shift)
if r > "z":
r = chr(ord(i) - 26 + shift)
word = word.replace(i, r)
return word
Try encrypt('abc', 1) and see what happens:
First loop:
i = 'a'
r = chr(ord('a')+1) = 'b'
word = 'abc'.replace('a', 'b') = 'bbc'
Second loop:
i = 'b'
r = chr(ord('b')+1) = 'c'
word = 'bbc'.replace('b', 'c') = 'ccc'
Third loop:
i = 'c'
r = chr(ord('c')+1) = 'd'
word = 'ccc'.replace('c', 'd') = 'ddd'
You don't want to replace every instance of i with r, just this one. How would you do this? Well, if you keep track of the index, you can just replace at that index. The built-in enumerate function lets you get each index and each corresponding value at the same time.
for index, ch in enumerate(word):
r = chr(ord(ch)+shift)
if r > "z":
r = chr(ord(ch) - 26 + shift)
word = new_word_replacing_one_char(index, r)
Now you just have to write that new_word_replacing_one_char function, which is pretty easy if you know slicing. (If you haven't learned slicing yet, you may want to convert the string into a list of characters, so you can just say word[index] = r, and then convert back into a string at the end.)
I don't know how Python likes replacing characters in the word while you are iterating over it, but one thing that seems to be a problem for sure is repeated letters, because replace will replace all occurrences of the letter, not just the one you are currently looking at, so you will end up shifting those repeated letters more than once (as you hit them again in a later iteration).
Come to think of it, this will also happen with non-repeated letters. For example, shifting ABC by 1 will become -> BBC -> CCC -> DDD in your three iterations.
I had this assignment as well. The hint is you have to keep track of where the values wrap, and use that to your advantage. I also recommend using the upper function call so everything is the same case, reduces the number of checks to do.
In Python, strings are immutable - that is they cannot be changed. Lists, however, can be. So to use your algorithm, use a list instead:
''' Cesar Cipher '''
def encrypt(word, shift):
word = word.lower()
# Convert the word to a list
word = list(word)
# Iterate over the word by index
for i in xrange(len(word)):
# Get the character at i
c = word[i]
# Apply shift algorithm
r = chr(ord(c)+shift)
if r > "z":
r = chr(ord(c) - 26 + shift)
# Replace the character at i
word[i] = r
# Convert the list back to a string
return ''.join(word)
if __name__ == "__main__": print encrypt("programming", 3)

Categories