I'm trying to improve my coding skills on entwicklerheld.de
and right now I'm trying to solve the transposition cipher challenge:
We consider a cipher in which the plaintext is written downward and diagonally in successive columns. The number of rows or rails is given. When reaching the lowest rail, we traverse diagonally upwards, and after reaching the top rail, there is a change of direction again. Thus, the alphabets [sic] of the message are written in a zigzag pattern. After each alphabet is written, the individual lines are combined to obtain the cipher text.
Given is the plain text "coding" and the number of rails 2. The plain text is now arranged in a zigzag pattern as described above. The encoded text is obtained by combining the lines one after the other.
Thus, the encrypt() function should return the cipher "cdnoig".
The same procedure is used for entire sentences or texts as for individual words. The only thing to note here is that spaces also count as a single character.
Given is the plain text "rank the code" and the number of rails 2.
Your function should return the cipher "rn h oeaktecd".
This should work with other examples with 2 rails as well.
The encryption is very easy with a multi dimensional array.
My question
I'm stuck at the decryption part.
My idea is to build an array with 0 and 1 (to show were a character has to be). Then fill every array (line 1... line 2 ... line 3) with the characters in the order of the cipher text.
Then I iterate a third time over the array to read the word in zig-zag.
I don't know, but it feels very strange to iterate 3 times over the array. Maybe there is a zig-zag algorithm or so?
You could first define a generator that gives the mapping for each index to the index where the character has to be taken from during encryption. But this generator would not need to get the plain text input, just the length of it. As this generator just produces the indices, it can be used to decrypt as well.
It was not clear to me whether the question is only about the case where the number of rails is 2. With a bit of extra logic, this can be made for any greater number of rails also.
Here is how that could look:
# This generator can be used for encryption and decryption:
def permutation(size, numrails):
period = numrails * 2 - 2
yield from range(0, size, period) # top rail
# Following yield-from statement only needed when number of rails > 2
yield from (
index
for rail in range(1, numrails - 1)
for pair in zip(range(rail, size, period),
range(rail + period - rail*2, size + period, period))
for index in pair
if index < size
)
yield from range(numrails - 1, size, period) # bottom rail
def encrypt(plain, numrails):
n = len(plain)
return "".join([plain[i] for i in permutation(n, numrails)])
def decrypt(encrypted, numrails):
n = len(encrypted)
plain = [None] * n
for source, target in enumerate(permutation(n, numrails)):
plain[target] = encrypted[source]
return "".join(plain)
Related
I have a script in Python 3.6.8 which reads through a very large text file, where each line is an ASCII string drawn from the alphabet {a,b,c,d,e,f}.
For each line, I have a function which fragments the string using a sliding window of size k, and then increments a fragment counter dictionary fragment_dict by 1 for each fragment seen.
The same fragment_dict is used for the entire file, and it is initialized for all possible 5^k fragments mapping to zero.
I also ignore any fragment which has the character c in it. Note that c is uncommon, and most lines will not contain it at all.
def fragment_string(mystr, fragment_dict, k):
for i in range(len(mystr) - k + 1):
fragment = mystr[i:i+k]
if 'c' in fragment:
continue
fragment_dict[fragment] += 1
Because my file is so large, I would like to optimize the performance of the above function as much as possible. Could anyone provide any potential optimizations to make this function faster?
I'm worried I may be rate limited by the speed of Python loops, in which case I would need to consider dropping down into C/Cython.
Numpy may help in speeding up your code:
x = np.array([ord(c) - ord('a') for c in mystr])
filter = np.geomspace(1, 5**(k-1), k, dtype=int)
fragment_dict = collections.Counter(np.convolve(x, filter,mode='valid'))
The idea is, represent each k length segment is a k-digit 5-ary number. Then, converting a list of 0-5 integers equivalent to the string to its 5-ary representation is equivalent to applying a convolution with [1,5,25,125,...] as filter.
I am creating a very simple encryption algorithm where I covert each letter of a word into ascii, placing the ascii values into an array and then adding a number onto each value. To then convert the ascii back to letters, which will then output the new encrypted word. Known as the ceaser cipher.
But I cannot figure out how to add the key number to each element of the array.
As others will mention, when posing a question like this, please post a code attempt first. Having clear input/output and any associated stack trace errors helps people answer your question better.
That being said, I've written a simple ceaser cipher encryption method that shifts to the right based on a given key. This works by converting the characters of a string to their numerical ascii representations using the built in method ord(). We then add the shift value to this representation to right shift the values over by a given amount. Then covert back to characters using chr() We take into account some wrapping back to the beginning of the alphabet if the shifted_value exceeds that of 'z'.
def ceaser_cipher_encryption(string, shift):
alpha_limit = 26
encrypted_msg = []
for index, character in enumerate(string.lower()):
isUpperCharacter = string[index].isupper()
shifted_value = ord(character) + shift
if shifted_value > ord('z'):
encrypted_msg.append(chr(shifted_value - alpha_limit))
else:
encrypted_msg.append(chr(shifted_value))
if isUpperCharacter:
encrypted_msg[index] = encrypted_msg[index].upper()
return ''.join(encrypted_msg)
Sample Output:
>>> ceaser_cipher_encryption("HelloWorld", 5)
MjqqtBtwqi
Try having a look online for solutions:
Caesar Cipher Function in Python
https://inventwithpython.com/chapter14.html
These links will provide you with clear answers to your questions.
So I am trying to find the key of one time-pad and I have 10 ciphertexts.(the plaintext letters are encoded as 8-bit ASCII
and the given ciphertexts are written in hex; and I'm using python 2.7)
the idea is that when you xor a character with a space the character gets uppercase or lowercase, and when you xor x with x it returns zero so when I xor two character of to ciphertexts I xor the key with the key and the message character with the message character.
so I wrote this code for xoring two hex.
def hex_to_text(s):
string=binascii.unhexlify(s)
return string
def XoR (a,b):
a="0x"+a
b="0x"+b
xor=chr(int(a,16) ^ int(b,16))
return hex_to_text(xor[2:])
when the key is an even number it the xor function works correct but when the key is odd it does not return the same character uppercase or lowercase.
what am i doing wrong?
a general idea on how to solve this, disregarding python:
lets start with saying a char is 8 bit ascii
if you look at the first char form the first ciphertext, you will probably notice that it is outside of the ascii values for plain text which one could say are a-z 0x61-0x7a A-Z 0x41-0x5a
there is a high probability that you only have to take values into account that, xored with this char, make it something inside the specified value ranges
the same holds for the other 9 texts and their respective first char
and, interestingly, the list of possible key values for this char has to hold for every ciphertext with the same key, so each and every ciphertext we look at reduces the range further
now, what can you do with this approach?
write a function that takes 2 parameters (bytes) and tests if the result of a xor falls into the specified range, if yes, return 1, if no return 0
now make 3 nested loops to call this function
outer loop (X) goes through the char positions in the ciphertext
middle loop (Y) goes from 0 to 255
inner loop (Z) goes through the ciphertexts
in the inner loop call your function with parameter 1 being the X character of your Z ciphertext and parameter 2 being Y
now what to do with the result:
you want to have a dictionary/lookup table that per position X holds an array of 255 elements
the index on these elements will be Y
the value for these elements will be the sum of your function results for all Z
in the end what you will have is for every position in your ciphertext, an array that tells you for each keybyte how likely it is the key ... the higher the value the higher the probability of being the key byte
then for each position in your ciphertext order the possible keybytes by their probability and partition them by probability
then take a chunk of all ciphertexts, lets say the first 8 to 16 chars, and calculate the plaintext for all keys in the highest probability group
store key chunk and plaintext chunk together in a list
now test your list of possible plaintexts against a common dictionary, and again rate them 1 if they contain words that can be found in a dictionary and 0 otherwise ... sum up for all different ciphertexts ... (or use another metric to rate how good a key is)
order the key chunks by the highest value (read: the key that potentialy solved the most chunks across all ciphertexts comes first and the one that produced garbage comes last) and continue with the next chunk ...
repeat this with bigger chunks, selecting not keybytes but the next smaller size of key chunks, until your chunksize gets to the ciphertext size...
of course this is an automated way to find a likely key, and there is some implementation work until you have a completely automated solution. if you just want to solve this 10 ciphertexts, you can abort the approach after the likely keybytes or the first chunks, and do the rest by hand ...
I'm trying to code an encryption program that will encode a user's Inputed - if that's a word - string. The encryption method is just a basic use of an elliptic curve encryption and I am currently working on the encryption part of the program at the moment before I work on the mathematical, inverse modules etc. Etc. Required for public and private key calculations. Currently I am using the key pub = 5 and a max value (derived from the product of 2 random primes) of 91. This is all the information needed and the word I am testing the encryption on is 'happy'.
So far here is the code.
word = 'happy'
pub = 5
m = 91
for i in range(pub):
if i == 0:
word = word
else:
word = output
for x in word:
a = [(((ord(z)*ord(z))+1)/m) for z in word]
b = [chr(i) for i in a]
c = [str(i) for i in b]
d = ''.join([str(i) for i in c])
output = d
What I am trying to do is encrypt each letter by multiplying the ASCII value it belongs too by itself and then use the chr() function to rejoin the string after a process of adding 1 then dividing by m , thus creating a new word. Then, using that new string, set it as the value of word for the next cycle in the loop, so the process continues until it has finished pub amount of times and encrypted the word. I'm having a lot of difficulties with this and I don't know where to start with explaining the issues. I'm relatively new to Python and any suggestions and/or advice on completing this fast would be very much appreciated. Thank you in advance.
First, check that your math is right. Your formula (z**2 + 1)/m grows quadratically. My understanding of crypto is quite limited, but it doesn't look right to me. It should be some kind of one-to-one mapping from input to output. But it maps several neighboring characters to the same output. Also, the results grow with every round.
You can only convert the integers back to ascii characters for a range up to 256. That's what your error message says. It's proably thrown in the second iteration of your outer for loop.
You probably need to get the value range down to 256 again.
I suppose you miss a crucial part off the algorithm you are trying to implement, maybe some modulo operation.
Also some minor python hints:
You can use the built in power operator **, so you don't have to evaluate ord() twice.
((ord(z) ** 2) + 1) / m
You can do the conversion back to the string in one step like this:
output = ''.join([str(chr(i)) for i in a])
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to split text without spaces into list of words?
There are masses of text information in people's comments which are parsed from html, but there are no delimiting characters in them. For example: thumbgreenappleactiveassignmentweeklymetaphor. Apparently, there are 'thumb', 'green', 'apple', etc. in the string. I also have a large dictionary to query whether the word is reasonable.
So, what's the fastest way to extract these words?
I'm not really sure a naive algorithm would serve your purpose well, as pointed out by eumiro, so I'll describe a slightly more complex one.
The idea
The best way to proceed is to model the distribution of the output. A good first approximation is to assume all words are independently distributed. Then you only need to know the relative frequency of all words. It is reasonable to assume that they follow Zipf's law, that is the word with rank n in the list of words has probability roughly 1/(n log N) where N is the number of words in the dictionary.
Once you have fixed the model, you can use dynamic programming to infer the position of the spaces. The most likely sentence is the one that maximizes the product of the probability of each individual word, and it's easy to compute it with dynamic programming. Instead of directly using the probability we use a cost defined as the logarithm of the inverse of the probability to avoid overflows.
The code
import math
# Build a cost dictionary, assuming Zipf's law and cost = -math.log(probability).
words = open("words-by-frequency.txt").read().split()
wordcost = dict((k,math.log((i+1)*math.log(len(words)))) for i,k in enumerate(words))
maxword = max(len(x) for x in words)
def infer_spaces(s):
"""Uses dynamic programming to infer the location of spaces in a string
without spaces."""
# Find the best match for the i first characters, assuming cost has
# been built for the i-1 first characters.
# Returns a pair (match_cost, match_length).
def best_match(i):
candidates = enumerate(reversed(cost[max(0, i-maxword):i]))
return min((c + wordcost.get(s[i-k-1:i], 9e999), k+1) for k,c in candidates)
# Build the cost array.
cost = [0]
for i in range(1,len(s)+1):
c,k = best_match(i)
cost.append(c)
# Backtrack to recover the minimal-cost string.
out = []
i = len(s)
while i>0:
c,k = best_match(i)
assert c == cost[i]
out.append(s[i-k:i])
i -= k
return " ".join(reversed(out))
which you can use with
s = 'thumbgreenappleactiveassignmentweeklymetaphor'
print(infer_spaces(s))
Examples
I am using this quick-and-dirty 125k-word dictionary I put together from a small subset of Wikipedia.
Before: thumbgreenappleactiveassignmentweeklymetaphor.
After: thumb green apple active assignment weekly metaphor.
Before: thereismassesoftextinformationofpeoplescommentswhichisparsedfromhtmlbuttherearen
odelimitedcharactersinthemforexamplethumbgreenappleactiveassignmentweeklymetapho
rapparentlytherearethumbgreenappleetcinthestringialsohavealargedictionarytoquery
whetherthewordisreasonablesowhatsthefastestwayofextractionthxalot.
After: there is masses of text information of peoples comments which is parsed from html but there are no delimited characters in them for example thumb green apple active assignment weekly metaphor apparently there are thumb green apple etc in the string i also have a large dictionary to query whether the word is reasonable so what s the fastest way of extraction thx a lot.
Before: itwasadarkandstormynighttherainfellintorrentsexceptatoccasionalintervalswhenitwascheckedbyaviolentgustofwindwhichsweptupthestreetsforitisinlondonthatoursceneliesrattlingalongthehousetopsandfiercelyagitatingthescantyflameofthelampsthatstruggledagainstthedarkness.
After: it was a dark and stormy night the rain fell in torrents except at occasional intervals when it was checked by a violent gust of wind which swept up the streets for it is in london that our scene lies rattling along the housetops and fiercely agitating the scanty flame of the lamps that struggled against the darkness.
As you can see it is essentially flawless. The most important part is to make sure your word list was trained to a corpus similar to what you will actually encounter, otherwise the results will be very bad.
Optimization
The implementation consumes a linear amount of time and memory, so it is reasonably efficient. If you need further speedups, you can build a suffix tree from the word list to reduce the size of the set of candidates.
If you need to process a very large consecutive string it would be reasonable to split the string to avoid excessive memory usage. For example you could process the text in blocks of 10000 characters plus a margin of 1000 characters on either side to avoid boundary effects. This will keep memory usage to a minimum and will have almost certainly no effect on the quality.
"Apparently" is good for humans, not for computers…
words = set(possible words)
s = 'thumbgreenappleactiveassignmentweeklymetaphor'
for i in xrange(len(s) - 1):
for j in xrange(1, len(s) - i):
if s[i:i+j] in words:
print s[i:i+j]
For possible words in /usr/share/dict/words and for j in xrange(3, len(s) - i): (minimal words length of 3), it finds:
thumb
hum
green
nap
apple
plea
lea
act
active
ass
assign
assignment
sign
men
twee
wee
week
weekly
met
eta
tap