Enciphering and Deciphering by Shuffling in Python - python

I am writing a program that enciphers (and will eventually decipher) a given string.
The encipher function takes two arguments: the string and a seed value.
Here is what I have so far:
def random_encipher(string,seed):
random.seed(seed)
alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
#shuffle alphabet
random.shuffle(alphabet)
#assign index to each letter in alphabet
for letter in alphabet:
letter = ord(letter)-97
To sum all that up, basically I'm shuffling the alphabet and assigning each letter a number value ("a" = 0, "b" = 1, . . .)
Here's what I need help with:
I need string[0] to be printed as alphabet[0] (which is the shuffled alphabet, therefore with the current seed value, alphabet[0] = "e").
But for each letter of the string, not just the zero index.

Maybe something like that?
>>> import random
>>> def random_encipher(string,seed):
random.seed(seed)
alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
#shuffle alphabet
random.shuffle(alphabet)
ciphers = []
#assign index to each letter in alphabet
for letter in string:
index = ord(letter)-97
cipher = alphabet[index]
ciphers.append(cipher)
return "".join(ciphers)
>>> random_encipher("foobar", 3)
'fwwqgc'
The point in using a list, is that as strings are immutable, appending to a string requires the string to be copied which is costly. Appending to a list and merging the elements at the end is a better choice (or use a StringIO).

What you are effectively doing is creating a mapping between letters.
Luckily, python actually has an easy-to-use map object (dict):
So, to create a random mapping:
keys = string.ascii_lowercase
# this shuffles the string
values = ''.join(
random.sample(string.ascii_lowercase, len(string.ascii_lowercase))
)
mapping = dict(zip(keys, values))
And we want the reverse of this mapping for deciphering, so we use dict comprehension:
reverse_mapping = {v: k for k, v in mapping.iteritems()}
Now it's simply a matter of using the mapping on ciphering and deciphering:
def translate(s, mapping, missing=' '):
return ''.join([mapping.get(c, missing) for c in s])
And to use the translation function:
encrypted = translate("my string", mapping)
print encrypted
# verify deciphering works
decrypted = translate(encrypted, reverse_mapping)
print decrypted
The whole thing together:
#!/usr/bin/env python
import string
import random
# ... set 'seed', or factor to a function
random.seed(seed)
keys = string.ascii_lowercase
values = ''.join(
random.sample(string.ascii_lowercase, len(string.ascii_lowercase))
)
mapping = dict(zip(keys, values))
reverse_mapping = {v: k for k, v in mapping.iteritems()}
def translate(s, mapping, missing=' '):
return ''.join([mapping.get(c, missing) for c in s])
encrypted = translate("my string", mapping)
print encrypted
# verify deciphering works
decrypted = translate(encrypted, reverse_mapping)
print decrypted
Output:
up jsqlao
my string

Here's what I ended up using:
def random_encipher(s,n):
random.seed(n)
alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
#shuffle alphabet
random.shuffle(alphabet)
print alphabet
#build empty list
list = []
string = ""
#build enciphered string
for letter in s:
if letter.isalpha(): #if character is a letter, print as corresponding index in shuffled alphabet
list.append(alphabet[ord(letter)-97],)
word = string.join(list)
else: #includes punctuation, spaces, etc.
list.append(letter),
word = string.join(list)
print word
random_encipher(s,n)

Related

Checking for duplicate letters within lists by using a histogram function

I'm trying to knock out my homework, but having difficulties incorporating the required histogram function.
This is the code I have to work with:
alphabet = "abcdefghijklmnopqrstuvwxyz"
test_dups = ["zzz","dog","bookkeeper","subdermatoglyphic","subdermatoglyphics"]
test_miss = ["zzz","subdermatoglyphic","the quick brown fox jumps over the lazy dog"]
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
I need to write a function called has_duplicates() that takes a string parameter and returns True if the string has any repeated characters. Otherwise, it should return False.
Implement has_duplicates() by creating a histogram using the histogram() function above. Do not use any of the implementations of has_duplicates() that are given in your textbook. Instead, your implementation should use the counts in the histogram to decide if there are any duplicates.
Write a loop over the strings in the provided test_dups list. Print each string in the list and whether or not it has any duplicates based on the return value of has_duplicates() for that string. For example, the output for aaa and abc would be the following.
aaa has duplicates
abc has no duplicates
Print a line like one of the above for each of the strings in test_dups.
Write a function called missing_letters that takes a string parameter and returns a new string with all the letters of the alphabet that are not in the argument string. The letters in the returned string should be in alphabetical order.
My implementation should use a histogram from the histogram() function. It should also use the global variable alphabet. It should use this global variable directly, not through an argument or a local copy. It should loop over the letters in alphabet to determine which are missing from the input parameter.
The function missing_letters should combine the list of missing letters into a string and return that string.
Write a loop over the strings in list test_miss and call missing_letters with each string. Print a line for each string listing the missing letters. For example, for the string "aaa", the output should be the following.
aaa is missing letters bcdefghijklmnopqrstuvwxyz
If the string has all the letters in alphabet, the output should say it uses all the letters. For example, the output for the string alphabet itself would be the following.
"abcdefghijklmnopqrstuvwxyz uses all the letters"
Print a line like one of the above for each of the strings in test_miss.
This is as far as I got...
def has_duplicates(t):
if histogram(t) > 1:
return True
else:
return False
Result:
'>' not supported between instances of 'str' and 'int'
The following should provide the desired result:
alphabet = "abcdefghijklmnopqrstuvwxyz"
test_dups = ["zzz","dog","bookkeeper","subdermatoglyphic","subdermatoglyphics"]
test_miss = ["zzz","subdermatoglyphic","the quick brown fox jumps over the lazy dog"]
def histogram(s):
d = dict()
for c in s:
if c not in d:
d[c] = 1
else:
d[c] += 1
return d
def has_duplicates(s):
# Return False if each letter in s is not distinct
return len(histogram(s)) != len(s)
def missing_letters(s):
h = histogram(s)
rv = ''
# Loop over letters in alphabet, if the letter is not in the histogram then
# append to the return string.
for c in alphabet:
if c not in h:
rv = rv + c
return rv
# Loop over test strings as required.
for s in test_miss:
miss = missing_letters(s)
if miss:
print(f"{s} is missing letters {miss}.")
else:
print(f"{s} uses all the letters.")
Output:
zzz is missing letters abcdefghijklmnopqrstuvwxy.
subdermatoglyphic is missing letters fjknqvwxz.
the quick brown fox jumps over the lazy dog uses all the letters.
alphabet = "abcdefghijklmnopqrstuvwxyz"
test_dups = ["zzz", "dog", "bookkeeper", "subdermatoglyphic", "subdermatoglyphics"]
test_miss = ["zzz", "subdermatoglyphic", "the quick brown fox jumps over the lazy dog"]
def histogram(string):
d = dict()
for char in string:
if char not in d:
d[char] = 1
else:
d[char] += 1
return d
# Part 1
def has_duplicate(string):
h = histogram(string)
for k, v in h.items():
if v > 1:
return True
return False
for string in test_dups:
if has_duplicate(string):
print(string, "has duplicates")
else:
print(string, "has no duplicates")
# Part 2
def missing_letters(string):
h = histogram(string)
new_list = []
for char in alphabet:
if char not in h:
new_list.append(char)
return "".join(new_list)
for string in test_miss:
new_list = missing_letters(string)
if len(new_list):
print(string, "is missing letters", new_list)
else:
print(string, "uses all letters")

How do I find the predominant letters in a list of strings

I want to check for each position in the string what is the character that appears most often on that position. If there are more of the same frequency, keep the first one. All strings in the list are guaranteed to be of identical length!!!
I tried the following way:
print(max(((letter, strings.count(letter)) for letter in strings), key=lambda x:[1])[0])
But I get: mistul or qagic
And I can not figure out what's wrong with my code.
My list of strings looks like this:
Input: strings = ['mistul', 'aidteh', 'mhfjtr', 'zxcjer']
Output: mister
Explanation: On the first position, m appears twice. Second, i appears twice twice. Third, there is no predominant character, so we chose the first, that is, s. On the fourth position, we have t twice and j twice, but you see first t, so we stay with him, on the fifth position we have e twice and the last r twice.
Another examples:
Input: ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
Output: magic
Input: ['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv', 'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq', 'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq', 'djtfzr', 'uenleo']
Expected Output: secret
Some help?
Finally a use case for zip() :-)
If you like cryptic code, it could even be done in one statement:
def solve(strings):
return ''.join([max([(letter, letters.count(letter)) for letter in letters], key=lambda x: x[1])[0] for letters in zip(*strings)])
But I prefer a more readable version:
def solve(strings):
result = ''
# "zip" the strings, so in the first iteration `letters` would be a list
# containing the first letter of each word, the second iteration it would
# be a list of all second letters of each word, and so on...
for letters in zip(*strings):
# Create a list of (letter, count) pairs:
letter_counts = [(letter, letters.count(letter)) for letter in letters]
# Get the first letter with the highest count, and append it to result:
result += max(letter_counts, key=lambda x: x[1])[0]
return result
# Test function with input data from question:
assert solve(['mistul', 'aidteh', 'mhfjtr', 'zxcjer']) == 'mister'
assert solve(['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn',
'mzsev', 'saqbl', 'myead']) == 'magic'
assert solve(['sacbkt', 'tnqaex', 'vhcrhl', 'obotnq', 'vevleg', 'rljnlv',
'jdcjrk', 'zuwtee', 'xycbvm', 'szgczt', 'imhepi', 'febybq',
'pqkdfg', 'swwlds', 'ecmrut', 'buwruy', 'icjwet', 'gebgbq',
'djtfzr', 'uenleo']) == 'secret'
UPDATE
#dun suggested a smarter way of using the max() function, which makes the one-liner actually quite readable :-)
def solve(strings):
return ''.join([max(letters, key=letters.count) for letters in zip(*strings)])
Using collections.Counter() is a nice strategy here. Here's one way to do it:
from collections import Counter
def most_freq_at_index(strings, idx):
chars = [s[idx] for s in strings]
char_counts = Counter(chars)
return char_counts.most_common(n=1)[0][0]
strings = ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih',
'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
result = ''.join(most_freq_at_index(strings, idx) for idx in range(5))
print(result)
## 'magic'
If you want something more manual without the magic of Python libraries you can do something like this:
def f(strings):
dic = {}
for string in strings:
for i in range(len(string)):
word_dic = dic.get(i, { string[i]: 0 })
word_dic[string[i]] = word_dic.get(string[i], 0) + 1
dic[i] = word_dic
largest_string = max(strings, key = len)
result = ""
for i in range(len(largest_string)):
result += max(dic[i], key = lambda x : dic[i][x])
return result
strings = ['qagic', 'cafbk', 'twggl', 'kaqtc', 'iisih', 'mbpzu', 'pbghn', 'mzsev', 'saqbl', 'myead']
f(strings)
'magic'

Matching a string (any order) to strings in an array of a huge size

def Get_Word_List(File_=[]):
with open("Words.txt") as File: #File of 250k+ words separated via new line
for Line in File:
File_.append(Line.replace("\n",""))
return File_
def Get_Input(Str=str):
Str = raw_input("Input 7 letters: ")
while len(Str) != 7:
Str = raw_input("Input 7 letter: ")
return Str.upper()
def Find_Words():
Letters = Get_Input()
List = Get_Word_List() #An Array of strings, all in uppercase
for Word in List:
pass
I am trying to match a string in any order (Max length 7), for example "ZZAFIEA" could give "FIZZ" or "FEZ" to a word or multiple words in an array of size 250k+ and i can't find away to do it, i've tried all sorts, appreciate any bit of help
This is a pretty good solution:
from collections import Counter
def counter_is_subset(x, y):
# If subtracting y from x doesn't return an empty Counter,
# x is NOT a subset of y.
return not (x - y)
def find_buildable_words(words, letters):
letters = Counter(letters)
for word in words:
if counter_is_subset(Counter(word), letters):
yield word
words = ['BLAH', 'FIZZ', 'FEZ', 'FOO', 'FAZE', 'ZEE']
letters = 'ZZAFIEA'
buildable_words = find_buildable_words(words, letters)
for word in buildable_words:
print(word)
On my computer, this runs in ~1.2 seconds with a 250,000 word list.
You can use itertools.ifilter, write a predicate that confirms if a word from the list is contained in your string, then run ifilter with your predicate on that list.
Demo:
from itertools import ifilter
compare_against = "ABEFGZ"
lst = ['EFZ', 'ZIP', 'AGA', 'ABM']
def pred(word):
for char in word:
if char not in compare_against:
return False
return True
x = ifilter(pred, lst)
for y in x:
print y
OUTPUT:
EFZ
AGA
Disclaimer
This example does not handle well duplicate characters, meaning, by definition you may decide if AGA should return or not (The character 'A' appears only once in compare_against). If you decide that AGA is not a valid output, the pred function should be modified to accommodate that restriction.

How to collect defined items in lists python

I have to find the signs "a..,z", "A,..,Z", "space", "." and "," in some data.
I have tried the code:
fh = codecs.open("mydata.txt", encoding = "utf-8")
text = fh.read()
fh1 = unicode(text)
dic_freq_signs = dict(Counter(fh1.split()))
All_freq_signs = dic_freq_signs.items()
List_signs = dic_freq_signs.keys()
List_freq_signs = dic_freq_signs.values()
BUT it gets me ALL signs not the ones i am looking for?
Can anyone help?
(And it has to be unicode)
check dictionary iteration ..
All_freq_signs = [ item for item in dic_freq_signs.items() if item.something == "somevalue"]
def criteria(value):
return value%2 == 0
All_freq_signs = [ item for item in dic_freq_signs.items() if criteria(item)]
Make sure you import string module, with it you can get character ranges a to z and A to Z easily
import string
A Counter(any_string) gives the count of each character in the string. By using split() the counter would return the counts of each word in the string, contradicting with your requirement. So I have assumed that you need character counts.
dic_all_chars = dict(Counter(fh1)) # this gives counts of all characters in the string
signs = string.lowercase + string.uppercase + ' .,' # these are the characters you want to check
# using dict comprehension and checking if the key is in the characters you want
dic_freq_signs = {key: value for key, value in dic_all_chars.items()
if key in signs}
dic_freq_signs would only have the signs that you want to count as keys and their counts as values.

Python string replace

I have this code:
ALPHABET1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
key = "TES"
ALPHABET2 = key + ALPHABET1
count_result = ALPHABET2.count("T")
if (count_result > 1):
ALPHABET3 = ALPHABET1.replace("T","")
ALPHABET2 = key + ALPHABET3
print(ALPHABET2)
I want to be able to put the keyword at the start of the alphabet string to create a new string without repeating the letters in the keyword. I'm having some problems doing this though. I need the keyword to work for all letters as it will be user input in my program. Any suggestions?
Two things:
You don't need to make the alphabet yourself, import string and use string.ascii_uppercase; and
You can use a for loop to work through the characters in your key.
To illustrate the latter:
for c in key:
alphabet = alphabet.replace(c, "")
Better yet, a list is mutable, so you can do:
alpha = [c for c in string.ascii_uppercase if c not in key]
alpha.extend(set(key))
its easy and clean to do this with a regex
import re
ALPHABET1 = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
key = "TES"
newalphabet = key.upper() + re.sub(r'%s'%'|'.join(key.upper()), '', ALPHABET1)
or with a list comprehension like #jonrsharpe suggested

Categories