Python: Automatically introduce slight word-typos into phrases?

Python: Automatically introduce slight word-typos into phrases? - python

Has anyone ideas on how to automatically introduce common typos into words of a phrase?
I found this one How to introduce typo in a string? but I think it's a bit too generic because it simply replaces every n-th letter by a random character.
I would like to kind of introduce "common" typos.
Any idea on how to do it?

For the purpose of my explanation, let's assume that you have a String variable messages that you would like to introduce typos into. My strategy for introducing typos to messages that are both typos and common, would be to replace random letters in messages with other letters that are nearby on the keyboard (ie replace a with s or d with f). Here's how:
import random # random typos
message = "The quick brown fox jumped over the big red dog."
# convert the message to a list of characters
message = list(message)
typo_prob = 0.1 # percent (out of 1.0) of characters to become typos
# the number of characters that will be typos
n_chars_to_flip = round(len(message) * typo_prob)
# is a letter capitalized?
capitalization = [False] * len(message)
# make all characters lowercase & record uppercase
for i in range(len(message)):
capitalization[i] = message[i].isupper()
message[i] = message[i].lower()
# list of characters that will be flipped
pos_to_flip = []
for i in range(n_chars_to_flip):
pos_to_flip.append(random.randint(0, len(message) - 1))
# dictionary... for each letter list of letters
# nearby on the keyboard
nearbykeys = {
'a': ['q','w','s','x','z'],
'b': ['v','g','h','n'],
'c': ['x','d','f','v'],
'd': ['s','e','r','f','c','x'],
'e': ['w','s','d','r'],
'f': ['d','r','t','g','v','c'],
'g': ['f','t','y','h','b','v'],
'h': ['g','y','u','j','n','b'],
'i': ['u','j','k','o'],
'j': ['h','u','i','k','n','m'],
'k': ['j','i','o','l','m'],
'l': ['k','o','p'],
'm': ['n','j','k','l'],
'n': ['b','h','j','m'],
'o': ['i','k','l','p'],
'p': ['o','l'],
'q': ['w','a','s'],
'r': ['e','d','f','t'],
's': ['w','e','d','x','z','a'],
't': ['r','f','g','y'],
'u': ['y','h','j','i'],
'v': ['c','f','g','v','b'],
'w': ['q','a','s','e'],
'x': ['z','s','d','c'],
'y': ['t','g','h','u'],
'z': ['a','s','x'],
' ': ['c','v','b','n','m']
}
# insert typos
for pos in pos_to_flip:
# try-except in case of special characters
try:
typo_arrays = nearbykeys[message[pos]]
message[pos] = random.choice(typo_arrays)
except:
break
# reinsert capitalization
for i in range(len(message)):
if (capitalization[i]):
message[i] = message[i].upper()
# recombine the message into a string
message = ''.join(message)
# show the message in the console
print(message)

Related

I need to encode a message from an input in Python

So i have to encode a message but it's a different encoding, if the input is CAT the output must be DQ6 it's supposed to encode changing every letter of the input into the upper left key on the keyboard, for example again: in: bear out: G3Q4. I tried to code this in dictionaries like this:
d1 = {"q": 1,"Q": 1,"w": 2,"W": 2,"e": 3,"E": 3,"r": 4,"R": 4,"t": 5,"T": 5,"y": 6,"Y": 6,"u": 7,"U": 7,"i": 8,"I": 8,"o": 9,"O": 9,"p": 0,"P": 0}
d2 = {"a": 'Q',"A": 'Q',"s": 'W',"S": 'W',"d": 'E',"D": 'E',"f": 'R',"F": 'R',"g": 'T',"G": 'T',"h": 'Y',"H": 'Y',"j": 'U',"J": 'U',"k": 'I',"K": 'I',"l": 'O',"L": 'O',"ñ": 'P',"Ñ": 'P'}
d3 = {"z": 'A',"Z": 'A',"x": 'S',"X": 'S',"c": 'D',"C": 'D',"v": 'F',"V": 'F',"b": 'G',"B": 'G',"n": 'H', "N": 'H',"m": 'J',"M": 'J',",": 'K',".": 'L',"-": 'Ñ'}
I tried this function to check for every key but everything i'm getting is "None" as the value.
text = input("Text: ")
def cif(text):
cifrd = ""
for i in text:
if i in d1:
cifrd += d1[(d1.index(i))%(len(d1))]
elif i in d2:
cifrd += d2[(d2.index(i))%(len(d2))]
elif i in d3:
cifrd += d3[(d3.index(i))%(len(d3))]
else:
cifrd += i
print("New text: ",cif(cifrd))
Appreciate any help.

Your encoding:
d1 = {"q": 1,"Q": 1,"w": 2,"W": 2,"e": 3,"E": 3,"r": 4,"R": 4,"t": 5,"T": 5,"y": 6,"Y": 6,"u": 7,"U": 7,"i": 8,"I": 8,"o": 9,"O": 9,"p": 0,"P": 0}
d2 = {"a": 'Q',"A": 'Q',"s": 'W',"S": 'W',"d": 'E',"D": 'E',"f": 'R',"F": 'R',"g": 'T',"G": 'T',"h": 'Y',"H": 'Y',"j": 'U',"J": 'U',"k": 'I',"K": 'I',"l": 'O',"L": 'O',"ñ": 'P',"Ñ": 'P'}
d3 = {"z": 'A',"Z": 'A',"x": 'S',"X": 'S',"c": 'D',"C": 'D',"v": 'F',"V": 'F',"b": 'G',"B": 'G',"n": 'H', "N": 'H',"m": 'J',"M": 'J',",": 'K',".": 'L',"-": 'Ñ'}
There are a few issues. See my comments
text = input("Text: ")
def cif(text):
cifrd = ""
for letter in text:
# There is no need to manually write out each dictionary and do a check
# Put the dictionaries in a list, iterate over each one, and if the letter
# is in the dictionary, you will get the respective letter back
for encode in [d1, d2, d3]:
# check if my letter is in the dictionary
actual = encode.get(letter)
# If you check a dictionary and the key is not there, you will get `None`, this if statement ensures you only append actual number/characters
if actual:
cifrd += str(actual)
# When using a function, return something if you need it outside of the function
return cifrd
decoded = cif(text)
print("New text: {}".format(decoded))

There are a number of issues with your code:
You need to return the "encoded" text at the end of the cif() function
You need to pass the text variable to the cif() function, not cifrd which isn't defined outside your function
Dictionaries do not have an .index() method, you access dictionary items by key, e.g., d1["q"] returns 1.
For what it's worth, there's no need to maintain three separate dictionaries, nor is there reason to maintain both lower- and upper-cased letters in your dictionary; store lower-cased or upper-cased keys, and transform the input to the correct case when accessing the translation, i.e., input "Q" -> lowercase "q" -> d1["q"].
Here:
mapping = {'q': 1, 'w': 2, 'e': 3, 'r': 4, 't': 5, 'y': 6, 'u': 7, 'i': 8, 'o': 9, 'p': 0,
'a': 'q', 's': 'w', 'd': 'e', 'f': 'r', 'g': 't', 'h': 'y', 'j': 'u', 'k': 'i', 'l': 'o', 'ñ': 'p',
'z': 'a', 'x': 's', 'c': 'd', 'v': 'f', 'b': 'g', 'n': 'h', 'm': 'j', ',': 'k', '.': 'l', '-': 'ñ'}
def cif(s: string) -> string:
encoded_string = ""
for char in s:
encoded_string += mapping.get(char.lower(), char) # leaves the character un-encoded, if the character does not have a mapping
return encoded string
I would actually suggest using str.translate(). You can pass two strings, the first being the input characters, the second being the characters to which those inputs should map:
t = str.maketrans("qwertyuiopasdfghjklñzxcvbnm,.-", "1234567890qwertyuiopasdfghjklñ")
"hello world".translate(t)
'y3oo9 294oe'

how to use for i loop with strings and array [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 1 year ago.
Improve this question
I'am trying to make a function that takes in a string as input and gives back the Nato alphabet for each corresponding letter. i've ben trying to do this for days i'am so furstrated i don't know how to treat them as an array or as a string
how i imagine it would be done is taking making both alphabets elements/strings to be equal
like Nato=alphabet
and use a for i loop to only print out the input letters.
any hints or ideas on how/what to do?
import numpy as np
def textToNato(plaintext):
plaintext=plaintext.lower()
plaintext="-".join(plaintext)
plaintext=plaintext.split()
Nato=np.array(["Alpha","Bravo","Charlie","Delta","Echo","Foxtrot",
"Golf","Kilo","Lima","Mike","November","Oscar",
"Papa","Quebec","Romeo","Sierra","Tango","Uniform"
"Victor","Whiskey","Xray","Yankee","Zulu"])
alphabet=np.array(["a","b","c","d","e","f","g","k","l","m","n","o"
,"p","q","r","s","t","u","v","w","x","y","z"])
new=plaintext
for i in range(len(Nato)): # i have no idea what i'am trying to do here
new=np.append(alphabet[i],Nato[i])
return new

I would create a different data-structure for doing the lookup. You can make a dictionary of what word each letter in the alphabet points to. Then loop through each letter in the word, lookup the letter in the dictionary, and add the nato letter to a list.
nato_alphabet = {
'a': 'Alpha', 'b': 'Bravo', 'c': 'Charlie', 'd': 'Delta', 'e': 'Echo',
'f': 'Foxtrot', 'g': 'Golf', 'h': 'Hotel', 'i': 'India', 'j': 'Juliet',
'k': 'Kilo', 'l': 'Lima', 'm': 'Mike', 'n': 'November', 'o': 'Oscar',
'p': 'Papa', 'q': 'Quebec', 'r': 'Romeo', 's': 'Sierra', 't': 'Tango',
'u': 'Uniform', 'v': 'Victor', 'w': 'Whiskey', 'x': 'Xray', 'y': 'Yankee',
'z': 'Zulu'
}
def word_to_nato(word):
output = []
for letter in word:
output.append(nato_alphabet[letter.lower()])
return output
print(word_to_nato("foobar"))
['Foxtrot', 'Oscar', 'Oscar', 'Bravo', 'Alpha', 'Romeo']

I see two problems with your code:
# ...
plaintext = plaintext.split()
# ...
# "plaintext"'s value here is overwritten by
# each element of alphabet
for plaintext in alphabet:
# ...
# Below compares each element of the alphabet
# with the alphabet list.
# You want to check each element of plaintext against
# each element of alphabet, to find the index and get the
# corresponding element of Nato
if plaintext == alphabet:
# ...
I think what you want to do is:
loop through each element of your input, and
loop through each letter of the alphabet to find the index, and
use that index to get the corresponding phonetic alphabet word.
That could be done like this:
output = ''
for char1 in plaintext:
found = False
for i, char2 in enumerate(alphabet):
if char1 == char2:
output += Nato[i] + ', '
found = True
break
if not found: output += 'not found'
return output
An easier and more efficient way is to use a dictionary (aka a hashmap):
nato_map = {
'a' : 'Alpha',
'b' : 'Bravo',
# ...
}
output = ''
for char in plaintext:
output += nato_map[char] + ', '
That way, the lookup is in constant time, rather than needing to loop through every element of the Nato list.

Letter Count with Frequency, using Dictionaries

I was wondering if anyone could help me out.
How do I get this code to record ONLY the frequency of letters in a text file into a dictionary (does NOT count spaces, line, numbers, etc)?
Additionally how do I divide each letter by the total letters to report the percent frequency of each letter in the file?
This is what I have currently:
def linguisticCalc():
"""
Asks user to input a VALID filename. File must be a text file. IF valid, returns the frequency of ONLY letters in file.
"""
filename = input("Please type your VALID filename")
if os.path.exists(filename) == True:
with open(filename, 'r') as f:
f_content = f.read()
freq = {}
for i in f_content:
if i in freq:
freq[i] += 1
else:
freq[i] = 1
print(str(freq))
else:
print("This filename is NOT valid. Use the getValidFilename function to test inputs.")

Something that might help you determine whether the character in question is a letter, is this:
import string
# code here
if character in string.ascii_letters:
# code here

Check out collections.Counter()
You can use it to Count every letter in a string:
Counter('Articles containing potentially dated statements from 2011')
It gives this output, which is useful for counting characters in a string:
Counter({'A': 1,
'r': 2,
't': 8,
'i': 4,
'c': 2,
'l': 3,
'e': 5,
's': 3,
' ': 6,
'o': 3,
'n': 5,
'a': 4,
'g': 1,
'p': 1,
'y': 1,
'd': 2,
'm': 2,
'f': 1,
'2': 1,
'0': 1,
'1': 2})

Python 3.6 for loop is only printing one string per line, why? [duplicate]

This question already has answers here:
How to print without a newline or space
(26 answers)
Closed 5 years ago.
I am making a small Morse Code translator, and everything works fine; nonetheless, the output is not shown properly.
CIPHER = {'E': "⦾", "A": '⦿', 'R': '⦾⦿', 'I': '⦿⦾', 'O': '⦿⦿',
'T': '⦾⦾⦿', 'N': '⦾⦿⦾', 'S': '⦾⦿⦿', 'L': '⦿⦾⦾',
'C': '⦿⦾⦿', 'U': '⦿⦿⦾', 'D': '⦿⦿⦿',
'P': '⦾⦾⦾⦿', 'M': '⦾⦾⦿⦾', 'H': '⦾⦾⦿⦿',
'G': '⦾⦿⦾⦾', 'B': '⦾⦿⦾⦿', 'F': '⦾⦿⦿⦾',
'Y': '⦾⦿⦿⦿', 'W': '⦿⦾⦾⦾', 'K': '⦿⦾⦾⦿',
'V': '⦿⦾⦿⦾', 'X': '⦿⦾⦿⦿', 'Z': '⦿⦿⦾⦾',
'J': '⦿⦿⦾⦿', 'Q': '⦿⦿⦿⦾',
'1': '⦾⦾⦾⦾⦿', '2': '⦾⦾⦾⦿⦿', '3': '⦾⦾⦿⦿⦿',
'4': '⦾⦿⦿⦿⦿', '5': '⦿⦿⦿⦿⦿', '6': '⦿⦾⦾⦾⦾',
'7': '⦿⦿⦾⦾⦾', '8': '⦿⦿⦿⦾⦾', '9': '⦿⦿⦿⦿⦾',
'0': '⦿⦿⦿⦿⦿'
}
def main():
msg = input("Type your message below:\n\n")
for char in msg:
if char == ' ':
print (' '*7,)
elif char not in 'abcdefghijklmnopqrstuvwxyz1234567890':
print ('')
else:
print (CIPHER[char.upper()])
I would expect the output for "Hello, World!" to be something like this:
⦾⦿⦾⦾⦿⦾⦾⦿⦿ ⦿⦿⦾⦿⦿⦾⦾⦿⦿⦿
However, the actual output looks much more like this:
⦾
⦿⦾⦾
⦿⦾⦾
⦿⦿
⦿⦿
⦾⦿
⦿⦾⦾
⦿⦿⦿
I tried removing and placing commas randomly. Then, I tried removing the '\n' statements on the input, but nothing changed with respect to the output. I tried to use the .splitlines as shown here (For loop outputting one character per line), but it stopped printing completely! Then, I googled it and did not found anything close to this problem, so I started to read more material on Python strings. I found a website (https://automatetheboringstuff.com/chapter6/) that has a good amount of material about Python strings, but I could not find anything that could solve my problem there.
I would greatly appreciate your help!

You seem to be accustomed to Python2 convention of using comma at the end of print arguments to prevent automatically adding newline. This is no longer working in Python3. You should use keyword argument end='' instead, like this: print (' '*7, end='')

Use
print(sth, end='')
to print without breaking line.

Turning an integer string to an integer in Python

I am trying to write a program in python that codes items by first turning the input word to Morse and then changes the dots and dashes to ones and zeros which will be treated as binary numbers etc.
This is a code snippet:
def mimary_encode(input):
if input.find('!')!=-1 or input.find('#')!=-1 or input.find('#')!=-1 or input.find('$')!=-1 or input.find('%')!=-1 or input.find('^')!=-1 or input.find('&')!=-1 or input.find('*')!=-1 or input.find('(')!=-1 or input.find(')')!=-1 or input.find('_')!=-1 or input.find('-')!=-1 or input.find('=')!=-1 or input.find('+')!=-1 or input.find('.')!=-1 or input.find('"')!=-1 or input.find("'")!=-1 or input.find(',')!=-1 or input.find(' ')!=-1 or input.find(';')!=-1 or input.find(':')!=-1 or input.find('[')!=-1 or input.find(']')!=-1 or input.find('{')!=-1 or input.find('}')!=-1 or input.find('?')!=-1 or input.find('<')!=-1 or input.find('>')!=-1:
print "Inputs cannot contain spaces or symbols"
else:base=input
nol=len(input)
if base.find("a")!=-1:
base=base.replace("a",".-")
if base.find("b")!=-1:
base=base.replace("a","-...")
if base.find("c")!=-1:
base=base.replace("c","-.-.")
if base.find("d")!=-1:
base=base.replace("d","-..")
if base.find("e")!=-1:
base=base.replace("e",".")
if base.find("f")!=-1:
base=base.replace("f","..-.")
if base.find("g")!=-1:
base=base.replace("g","--.")
if base.find("h")!=-1:
base=base.replace("h","....")
if base.find("i")!=-1:
base=base.replace("i","..")
if base.find("j")!=-1:
base=base.replace("j",".---")
if base.find("k")!=-1:
base=base.replace("k","-.-")
if base.find("l")!=-1:
base=base.replace("l",".-..")
if base.find("m")!=-1:
base=base.replace("m","--")
if base.find("n")!=-1:
base=base.replace("n","-.")
if base.find("o")!=-1:
base=base.replace("o","---")
if base.find("p")!=-1:
base=base.replace("p",".--.")
if base.find("q")!=-1:
base=base.replace("q","--.-")
if base.find("r")!=-1:
base=base.replace("r",".-.")
if base.find("s")!=-1:
base=base.replace("s","...")
if base.find("t")!=-1:
base=base.replace("t","-")
if base.find("u")!=-1:
base=base.replace("u","..-")
if base.find("v")!=-1:
base=base.replace("v","...-")
if base.find("w")!=-1:
base=base.replace("w",".--")
if base.find("x")!=-1:
base=base.replace("x","-..-")
if base.find("y")!=-1:
base=base.replace("y","-.--")
if base.find("z")!=-1:
base=base.replace("z","--..")
if base.find("1")!=-1:
base=base.replace("1",".----")
if base.find("2")!=-1:
base=base.replace("2","..---")
if base.find("3")!=-1:
base=base.replace("3","...--")
if base.find("4")!=-1:
base=base.replace("4","....-")
if base.find("5")!=-1:
base=base.replace("5",".....")
if base.find("6")!=-1:
base=base.replace("6","-....")
if base.find("7")!=-1:
base=base.replace("7","--...")
if base.find("8")!=-1:
base=base.replace("8","---..")
if base.find("9")!=-1:
base=base.replace("9","----.")
if base.find("0")!=-1:
base=base.replace("0","-----")
if base.find("-")!=-1:
base=base.replace("-","0")
if base.find(".")!=-1:
base=base.replace(".","1")
int(base)
mimary_encode("hi")
I know this is probably not the best way to write it, but the problem is the error python keeps giving me is:
Traceback (most recent call last):
File "C:/Documents and Settings/Moshe's Programming/Desktop/Python
Projects/Mimary/Mimary attempt 1.py", line 86, in <module>
mimary_encode("hi")
File "C:/Documents and Settings/Moshe's Programming/Desktop/Python
Projects/Mimary/Mimary attempt 1.py", line 83, in mimary_encode
print base + 1
TypeError: cannot concatenate 'str' and 'int' objects
What does this error mean? How can I fix this error? I already did turn base into an integer-didn't I?

Although your code is reaaally messed up, it works. However, your first error was raised due to the line int("base").
If you write int("base") you are trying to turn the string "base" into an integer, which is something impossible to do.
Then, you changed the code to print base + 1 which is also impossible to do, once base is a string and you can't sum strings and integers with + sign.
So, what you want to do is:
def mimary_encode(base):
#Dowhateveryouwant
return int(base) #Only if you are sure base contains only integers
print mimary_encode("hi")

The error is coming from print base + 1, where base is a string and 1 an integer.
Here is an alternative implementation of your function. First, I define the morse code encoding as a dictionary. In the function, I first convert all letters to lower case. I then use the get dictionary function to return the morse code value if it is in the dictionary, otherwise it uses an empty string to filter it. This differs from the original approach where bad data is filtered. Here, I'm only looking for data that is in my dictionary. Finally, I join together the encoded letters using a generator: code = " ".join((morse.get(c, "") for c in input_string)) which is similar to list comprehension but more efficient for large strings.
from string import letters
msg = 'I hear 13 knights from the Round Table are here!!!'
def mimary_encode(input_string):
input_string = ''.join([c.lower() if c in letters else c
for c in input_string])
code = " ".join((morse.get(c, "") for c in input_string))
return code
morse = {
'0': '-----',
'1': '.----',
'2': '..---',
'3': '...--',
'4': '....-',
'5': '.....',
'6': '-....',
'7': '--...',
'8': '---..',
'9': '----.',
'a': '.-',
'b': '-...',
'c': '-.-.',
'd': '-..',
'e': '.',
'f': '..-.',
'g': '--.',
'h': '....',
'i': '..',
'j': '.---',
'k': '-.-',
'l': '.-..',
'm': '--',
'n': '-.',
'o': '---',
'p': '.--.',
'q': '--.-',
'r': '.-.',
's': '...',
't': '-',
'u': '..-',
'v': '...-',
'w': '.--',
'x': '-..-',
'y': '-.--',
'z': '--..'}
To encode the message (defined earlier as msg):
>>> mimary_encode(msg)
'.. .... . .- .-. .---- ...-- -.- -. .. --. .... - ... ..-. .-. --- -- - .... . .-. --- ..- -. -.. - .- -... .-.. . .- .-. . .... . .-. .'
Given the one-to-one mapping of your dictionary, you can reverse it using a dictionary comprehension:
reverse_morse = {v: k for k, v in morse.iteritems()}
You can then reverse the morse code to convert it back into an alpha/numeric string.
>>> ''.join([reverse_morse.get(c, "") for c in mimary_encode(msg).split(" ")])
'ihear13knightsfromtheroundtablearehere'
Notice that all letters are converted to lower case and that the exclamations have been removed.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Automatically introduce slight word-typos into phrases? - python

Related

I need to encode a message from an input in Python

how to use for i loop with strings and array [closed]

Letter Count with Frequency, using Dictionaries

Python 3.6 for loop is only printing one string per line, why? [duplicate]

Turning an integer string to an integer in Python

Categories

Resources