Why does my code remove 999 in my replacement code? - python

I have the code below to replace all punctuation with 999 and all alphabet characters with its number position. I have included the print statement that confirms punctuation is being replaced. However I seem to override with my remaining code to replace the other characters.
import string
def encode(text):
punct = '''!()-[]{};:'"\,<>./?##$%^&*_~'''
for x in text.lower():
if x in punct:
text = text.replace(x, ".999")
print(text)
nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
return ".".join(nums)
print(encode(str(input("Enter Text: "))))
Input: 'Morning! \n'
Output: '13.15.18.14.9.14.7 \n'
Expected Output: 13.15.18.14.9.14.7.999

No, you have two independent logical "stories" here. One replaces punctuation with 999. The other filters out all the letters and builds an independent list of their alphabetic positions.
nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
return ".".join(nums)
Note that this does nothing to alter text, and it takes nothing but letters from text. If you want to include the numbers, do so:
nums = [str(ord(x) - 96)
if x >= 'a' and x <= 'z'
else x
for x in text.lower()
]
return ".".join(nums)
Output of print(encode("[hello]")):
..9.9.9.8.5.12.12.15...9.9.9

nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
This means: take every character from the lowercase version of the string, and only if it is between 'a' and 'z', convert the value and put the result in nums.
In the first step, you replace a bunch of punctuation with text that includes '.' and '9' characters. But neither '9' nor '.' is between 'a' and 'z', so of course neither is preserved in the second step.
Now that I understand what you are going for: you have fundamentally the wrong approach to splitting up the problem. You want to separate the two halves of the rule for "encoding" a given part of the input. But what you want to do is separate the whole rule for encoding a single element, from the process of applying a single-element rule to the whole input. After all - that is what list comprehensions do.
This is the concept of separation of concerns. The two business rules are part of the same concern - because implementing one rule doesn't help you implement the other. Being able to encode one input character, though, does help you encode the whole string, because there is a tool for that exact job.
We can have a complicated rule for single characters - no problem. Just put it in a separate function, so that we can give it a meaningful name and keep things simple to understand. Conceptually, our individual-character encoding is a numeric value, so we will consistently encode as a number, and then let the string-encoding process do the conversion.
def encode_char(c):
if c in '''!()-[]{};:'"\,<>./?##$%^&*_~''':
return 999
if 'a' <= c.lower() <= 'z':
return ord(c) - 96
# You should think about what to do in other cases!
# In particular, you don't want digit symbols 1 through 9 to be
# confused with letters A through I.
# So I leave the rest up to you, depending on your requirements.
Now we can apply the overall encoding process: we want a string that puts '.' in between the string representations of the values. That's straightforward:
def encode(text):
return '.'.join(str(encode_char(c)) for c in text)

Related

Different results when return multiple values in python (Cryptopal challenges)

I'm working on problem 3(set 1) of the cryptopals challenges (https://cryptopals.com/sets/1/challenges/3)
I've already found the key ('x') and decrypted the message ('Cooking mcs like a pound of bacon')
Here is my code:
from hexToBase64 import hexToBinary
from fixedXOR import xorBuffers
def binaryToChar(binaryString):
asciiValue = 0
for i in range(int(len(binaryString))-1,-1,-1):
if(binaryString[i] == '1'):
asciiValue = asciiValue + 2**(7-i)
return chr(asciiValue)
def decimalToBinary(number):
binaryString = ""
while (number != 0):
bit = number % 2
binaryString = str(bit) + binaryString
number = int(number/2)
while(len(binaryString) < 8):
binaryString = "0" + binaryString
return binaryString
def breakSingleByteXOR(cipherString):
decryptedMess = ""
lowestError = 10000
realKey = ""
for i in range(0,128):
errorChar = 0
tempKey = decimalToBinary(i)
tempMess = ""
for j in range(0,len(cipherString),2):
#Take each byte of the cipherString
cipherChar = hexToBinary(cipherString[j:j+2])
decryptedChar = binaryToChar(xorBuffers(cipherChar,tempKey))
asciiValue = ord(decryptedChar)
if (not ((asciiValue >= 65) and (asciiValue <= 90)) \
or ((asciiValue >= 90) and (asciiValue <= 122)) \
or ( asciiValue == 32 )):
# if the character is not one of the characters ("A-Z" or "a-z"
# or " ") consider it as an "error"
errorChar += 1
tempMess = tempMess + decryptedChar
if(errorChar < lowestError):
lowestError = errorChar
decryptedMess = tempMess
realKey = chr(i)
return (realKey,decryptedMess)
if __name__ == "__main__":
print(breakSingleByteXOR("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"))
The problem is when I use the function breakSingleByteXOR to return one value (decryptedMess), it came out okay "cOOKING mcS LIKE A POUND OF BACON"
But when I return 2 values with the function (as the code above - (key,decryptedMess)), I received a weird result ('x', 'cOOKING\x00mc\x07S\x00LIKE\x00A\x00POUND\x00OF\x00BACON'), can anyboby explain to me why this is the case?
Tbh, I'm learning python as I'm doing the challenges so hopefully I dont trigger anyone with these code.... I'd also really appreciate it if anyone could give me some advices on writing good python code
Thanks guys :D
It's true that the reason for the difference in the printed string is a quirk of the print function.
The deeper problem with that program is that it's not producing the correct answer. That's because the big ugly if that tries to decide whether a decrypted character is in the acceptable range is incorrect.
It's incorrect in two ways. The first is that (asciiValue >= 90) should be (asciiValue >= 97). A better way to write all of those expressions, which would have avoided this error, is to express them as (asciiValue >= ord('a')) and (asciiValue == ord(' ')) and so on, avoiding the inscrutable numbers.
The second way is that the expressions are not properly grouped. As they stand they do this:
character is not in the range 'A' to 'Z',
or character is in the range 'a' to 'z',
or character is 'space',
then count this as an error
so some of the characters that should be good (specifically 'a' through 'z' and space) are counted as bad. To fix, you need to rework the parentheses so that the condition is:
character is not in the range 'A' to 'Z',
and character is not in the range 'a' to 'z',
and character is not space,
then count this as an error
or (this is style you were trying for)
character is not (in the range 'A' to 'Z'
or in the range 'a' to 'z'
or a space)
I'm not going to give you the exact drop-in expression to fix the program, it'll be better for you to work it out for yourself. (A good way to deal with this kind of complexity is to move it into a separate function that returns True or False. That makes it easy to test that your implementation is correct, just by calling the function with different characters and seeing that the result is what you wanted.)
When you get the correct expression, you'll find that the program discovers a different "best key" and the decrypted string for that key contains no goofy out-of-range characters that behave strangely with print.
The print function is the culprit - it is translating the characters \x00 and \x07 to ASCII values when executed. Specifically, this only occurs when passing a string to the print function, not an iterable or other object (like your tuple).
This is an example:
>>> s = 'This\x00string\x00is\x00an\x00\x07Example.'
>>> s
'This\x00string\x00is\x00an\x00\x07Example.'
>>> print(s)
This string is an Example.
If you were to add the string s to an iterable (tuple, set, or list), s will not be formatted by the print function:
>>> s_list = [s]
>>> print(s_list) # List
['This\x00string\x00is\x00an\x00\x07Example.']
>>> print(set(s_list)) # Set
{'This\x00string\x00is\x00an\x00\x07Example.'}
>>> print(tuple(s_list)) # Tuple
('This\x00string\x00is\x00an\x00\x07Example.')
Edit
Because the \x00 and \x07 bytes are ASCII control characters, (\x00 being NUL and \x07 being BEL), you can't represent them in any other way. So one of the only ways you could strip these characters from the string without printing would be to use the .replace() method; but given \x00 bytes are being treated as spaces by the terminal, you would have to use s.replace('\x00', ' ') to get the same output, which has now changed the true content of the string.
Otherwise when building the string; you could try and implement some logic to check for ASCII control characters and either not add them to tempMess or add a different character like a space or similar.
References
ASCII Wiki: https://en.wikipedia.org/wiki/ASCII
Curses Module: https://docs.python.org/3.7/library/curses.ascii.html?highlight=ascii#module-curses.ascii (Might be useful if you wish to implement any logic).

Find all floats or ints in a given string

Given a string, "Hello4.2this.is random 24 text42", I want to return all ints or floats, [4.2, 24, 42]. All the other questions have solutions that return just 24. I want to return a float even if non-digit characters are next to the number. Since I am new to Python, I am trying to avoid regex or other complicated imports. I have no idea how to start. Please help. Here are some research attempts: Python: Extract numbers from a string, this didn't work since it doesn't recognize 4.2 and 42. There are other questions like the one mentioned, none of which sadly recognize 4.2 and 42.
A regex from perldoc perlretut:
import re
re_float = re.compile("""(?x)
^
[+-]?\ * # first, match an optional sign *and space*
( # then match integers or f.p. mantissas:
\d+ # start out with a ...
(
\.\d* # mantissa of the form a.b or a.
)? # ? takes care of integers of the form a
|\.\d+ # mantissa of the form .b
)
([eE][+-]?\d+)? # finally, optionally match an exponent
$""")
m = re_float.match("4.5")
print m.group(0)
# -> 4.5
To get all numbers from a string:
str = "4.5 foo 123 abc .123"
print re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", str)
# -> ['4.5', ' 123', ' .123']
Using regular expressions is likely to give you the most concise code for this problem. It is hard to beat the conciseness of
re.findall(r"[+-]? *(?:\d+(?:\.\d*)?|\.\d+)(?:[eE][+-]?\d+)?", str)
from pythad's answer.
However, you say "I am trying to avoid regex", so here's a solution that does not use regular expressions. It is obviously a bit longer than a solution using a regular expression (and probably much slower), but it is not complicated.
The code loops through the input character by character.
As it pulls each character from the string, it appends it to current (a string that holds the number currently being parsed) if appending it still maintains a valid number. When it encounters a character that cannot be appended to current, current is saved to a list of numbers, but only if current itself isn't one of '', '.', '-' or '-.'; these are strings that could potentially begin a number but are not themselves valid numbers.
When current is saved, a trailing 'e', 'e-' or 'e+' is removed. That will happen with a string such as '1.23eA'. While parsing that string, current will eventually become '1.23e', but then 'A' is encountered, which means the string does not contain a valid exponential part, so the 'e' is discarded.
After saving current, it is reset. Usually current is reset to '', but when the character that triggered current to be saved was '.' or '-', current is set to that character, because those characters could be the beginning of a new number.
Here's the function extract_numbers(s). The line before return numbers converts the list of strings to a list of integers and floating point values. If you want just the strings, remove that line.
def extract_numbers(s):
"""
Extract numbers from a string.
Examples
--------
>>> extract_numbers("Hello4.2this.is random 24 text42")
[4.2, 24, 42]
>>> extract_numbers("2.3+45-99")
[2.3, 45, -99]
>>> extract_numbers("Avogadro's number, 6.022e23, is greater than 1 million.")
[6.022e+23, 1]
"""
numbers = []
current = ''
for c in s.lower() + '!':
if (c.isdigit() or
(c == 'e' and ('e' not in current) and (current not in ['', '.', '-', '-.'])) or
(c == '.' and ('e' not in current) and ('.' not in current)) or
(c == '+' and current.endswith('e')) or
(c == '-' and ((current == '') or current.endswith('e')))):
current += c
else:
if current not in ['', '.', '-', '-.']:
if current.endswith('e'):
current = current[:-1]
elif current.endswith('e-') or current.endswith('e+'):
current = current[:-2]
numbers.append(current)
if c == '.' or c == '-':
current = c
else:
current = ''
# Convert from strings to actual python numbers.
numbers = [float(t) if ('.' in t or 'e' in t) else int(t) for t in numbers]
return numbers
If you want to get integers or floats from a string, follow the pythad's
ways...
If you want to get both integers and floats from a single string, do this:
string = "These are floats: 10.5, 2.8, 0.5; and these are integers: 2, 1000, 1975, 308 !! :D"
for line in string:
for actualValue in line.split():
value = []
if "." in actualValue:
value = re.findall('\d+\.\d+', actualValue)
else:
value = re.findall('\d+', actualValue)
numbers += value

Recursive function to convert characters

I am trying to write a program in Python which uses a recursive function to convert all the lower-case characters in a string to the next character. Here's my attempt:
def convert(s):
if len(s) < 1:
return ""
else:
return convert(chr(ord(s[0+1])))
print(convert("hello"))
When I try to run this program, it gives me the error: string index out of range. Could anyone please help me correct this? I'm not even sure if my program is coded correctly to give the required output :/
You want to return the shifted character and then call your convert function on the remainder of the string. If you must use recursion, you need to check if the string is exhausted (if not s is the same as if len(s) == 0 here because '' is equivalent to False) and bail:
def convert(s):
if not s:
return ''
c = s[0]
i = ord(c)
if 96 < i < 123:
# for lower-case characters permute a->b, b->c, ... y->z, z->a
c = chr(((i-97)+1)%26 + 97)
return c + convert(s[1:])
print(convert('hello'))
print(convert('abcdefghijklmnopqrstuvwxyz'))
Output:
ifmmp
bcdefghijklmnopqrstuvwxyza
The ASCII codes for 'a' and 'z' are 97 and 122 respectively, so we only apply the shift to characters whose codes, i, are in this range. Don't forget to wrap if the character is z: you can do this with modular arithmetic: ((i-97)+1)%26 + 97.
EDIT explanation: Subtract 97 so that the code becomes 0 to 25, then add 1 mod 26 such that 0+1 = 1, 1+1 = 2, ..., 24+1 = 25, 25+1=0. Then add back on 97 so that the code represents a letter between a and z. This way your letters will cycle round
You are trying to index the second character each time; Python indexes start at 0 so 0+1 is 1 is the second character. Your len() test doesn't guard against that, it only tests for empty strings.
You also pass in just one character to the recursive call, so you always end up with a string of length 1, which doesn't have a second character.
So your test with 'hello' does this:
convert('hello')
len('hello') > 1 -> True
s[0+1] == s[1] == 'e'; chr(ord('e')) is 'e'
return convert('e')
len('e') > 1 -> True
s[0+1] == s[1] -> 'e'[1] raises an index error
If you wanted to use recursion, then you need to decide how to detect the end of the recursion path correctly. You could test for strings shorter than 2 characters, for example, as there is no next character to use in that case.
You also need to decide what to delegate to the recursive call. For a conversion like this, you could pass in the remainder of the string.
Last but not least, you need to test if the character you are going to replace is actually lowercase.

Find symmetric words in a text [duplicate]

This question already has answers here:
how to find words that made up of letter exactly facing each other? (python) [closed]
(4 answers)
Closed 9 years ago.
I have to write a function which takes one arguments text containing a block of text in the form of a str, and returns a sorted list of “symmetric” words. A symmetric word is defined as a word where for all values i, the letter i positions from the start of the word and the letter i positions from the end of the word are equi-distant from the respective ends of the alphabet. For example, bevy is a symmetric word as: b (1 position from the start of the word) is the second letter of the alphabet and y (1 position from the end of the word) is the second-last letter of the alphabet; and e (2 positions from the start of the word) is the fifth letter of the alphabet and v (2 positions from the end of the word) is the fifth-last letter of the alphabet.
For example:
>>> symmetrics("boy bread aloz bray")
['aloz','boy']
>>> symmetrics("There is a car and a book;")
['a']
All I can think about the solution is this but I can't run it since it's wrong:
def symmetrics(text):
func_char= ",.?!:'\/"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
sym = []
for word in text.lower().split():
n = range(0,len(word))
if word[n] == word[len(word)-1-n]:
sym.append(word)
return sym
The code above doesn't take into account the position of alpha1 and alpha2 as I don't know how to put it. Is there anyone can help me?
Here is a hint:
In [16]: alpha1.index('b')
Out[16]: 1
In [17]: alpha2.index('y')
Out[17]: 1
An alternative way to approach the problem is by using the str.translate() method:
import string
def is_sym(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
tr = string.maketrans(alpha1, alpha2)
n = len(word) // 2
return word[:n] == word[::-1][:n].translate(tr)
print(is_sym('aloz'))
print(is_sym('boy'))
print(is_sym('bread'))
(The building of the translation table can be easily factored out.)
The for loop could be modified as:
for word in text.lower().split():
for n in range(0,len(word)//2):
if alpha1.index(word[n]) != alpha2.index(word[len(word)-1-n]):
break
else:
sym.append(word)
return sym
According to your symmetric rule, we may verify a symmetric word with the following is_symmetric_word function:
def is_symmetric_word(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
length = len(word)
for i in range(length / 2):
if alpha1.index(word[i]) != alpha2.index(word[length - 1 - i]):
return False
return True
And then the whole function to get all unique symmetric words out of a text can be defined as:
def is_symmetrics(text):
func_char= ",.?!:'\/;"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
sym = []
for word in text.lower().split():
if is_symmetric_word(word) and not (word in sym):
sym.append(word)
return sym
The following are two test cases from you:
is_symmetrics("boy bread aloz bray") #['boy', 'aloz']
is_symmetrics("There is a car and a book;") #['a']
Code first. Discussion below the code.
import string
# get alphabet and reversed alphabet
try:
# Python 2.x
alpha1 = string.lowercase
except AttributeError:
# Python 3.x and newer
alpha1 = string.ascii_lowercase
alpha2 = alpha1[::-1] # use slicing to reverse alpha1
# make a dictionary where the key, value pairs are symmetric
# for example symd['a'] == 'z', symd['b'] == 'y', and so on
_symd = dict(zip(alpha1, alpha2))
def is_symmetric_word(word):
if not word:
return False # zero-length word is not symmetric
i1 = 0
i2 = len(word) - 1
while True:
if i1 >= i2:
return True # we have checked the whole string
# get a pair of chars
c1 = word[i1]
c2 = word[i2]
if _symd[c1] != c2:
return False # the pair wasn't symmetric
i1 += 1
i2 -= 1
# note, added a space to list of chars to filter to a space
_filter_to_space = ",.?!:'\/ "
def _filter_ch(ch):
if ch in _filter_to_space:
return ' ' # return a space
elif ch in alpha1:
return ch # it's an alphabet letter so return it
else:
# It's something we don't want. Return empty string.
return ''
def clean(text):
return ''.join(_filter_ch(ch) for ch in text.lower())
def symmetrics(text):
# filter text: keep only chars in the alphabet or spaces
for word in clean(text).split():
if is_symmetric_word(word):
# use of yield makes this a generator.
yield word
lst = list(symmetrics("The boy...is a yob."))
print(lst) # prints: ['boy', 'a', 'yob']
No need to type the alphabet twice; we can reverse the first one.
We can make a dictionary that pairs each letter with its symmetric letter. This will make it very easy to test whether any given pair of letters is a symmetric pair. The function zip() makes pairs from two sequences; they need to be the same length, but since we are using a string and a reversed copy of the string, they will be the same length.
It's best to write a simple function that does one thing, so we write a function that does nothing but check if a string is symmetric. If you give it a zero-length string it returns False, otherwise it sets i1 to the first character in the string and i2 to the last. It compares characters as long as they continue to be symmetric, and increments i1 while decrementing i2. If the two meet or pass each other, we know we have seen the whole string and it must be symmetric, in which case we return True; if it ever finds any pair of characters that are not symmetric, it returns False. We have to do the check for whether i1 and i2 have met or passed at the top of the loop, so it won't try to check if a character is its own symmetric character. (A character can't be both 'a' and 'z' at the same time, so a character is never its own symmetric character!)
Now we write a wrapper that filters out the junk, splits the string into words, and tests each word. Not only does it convert the chosen punctuation characters to spaces, but it also strips out any unexpected characters (anything not an approved punctuation char, a space, or a letter). That way we know nothing unexpected will get through to the inner function. The wrapper is "lazy"... it is a generator that yields up one word at a time, instead of building the whole list and returning that. It's easy to use list() to force the generator's results into a list. If you want, you can easily modify this function to just build a list and return it.
If you have any questions about this, just ask.
EDIT: The original version of the code didn't do the right thing with the punctuation characters; this version does. Also, as #heltonbiker suggested, why type the alphabet when Python has a copy of it you can use? So I made that change too.
EDIT: #heltonbiker's change introduced a dependency on Python version! I left it in with a suitable try:/except block to handle the problem. It appears that Python 3.x has improved the name of the lowercase ASCII alphabet to string.ascii_lowercase instead of plain string.lowercase.

How to make letters in a string lowercas, Python

I want to make a function that would uncapitalize capitalized letters.
I was thinking I could do
if x in caps
where caps is a list of all the capital letters, and I had it return the position in the list that letter is in, I could have it replace itself with the lowercase.
How would I do this?
Thanks.
Why not use the built in "lower"
a = 'aBcDeFg'
print a.lower() # abcdefg
If that doesn't work, you can always iterate and then use ord and chr:
"".join(
# if i is between A and Z change it to between a and z
[chr(ord(i)+32) if 65<= ord(i) <= 90
# otherwise leave it as is
else i for i in a])
If you simply want to make the string lowercase try:
>>> import string
>>> myString = "abcDEfGHiJ"
>>> myString.lower()
'abcdefghij'
If you want the index of every lowercase letter (for whatever reason):
>>> [pos for pos, let in enumerate(myString) if 65 <= ord(let) <= 90]
[3, 4, 6, 7, 9]
Assuming this isn't some kind of academic problem, you would use the standard string lower() method.
x.lower()
If it is an exercise, you can flip the character to an int, check if it's in the ASCII upper case letter range, if it is then add 32 to convert it. This of course only works with the ASCII letters.
x = chr(ord(x)+32) if ord(x) > 64 and ord(x) < 91 else x
Not sure what else to say other than the library docs are right here: http://docs.python.org/library/stdtypes.html
Also, it's worth noting there really isn't a character types, only numbers and strings, when you're working with one char in a string it's really a string of length 1. Also, don't forget strings are immutable.
Python has a function for this -- str.lower(). http://docs.python.org/library/stdtypes.html#str.lower
If you really want to write the function yourself, Python's string library has a list of ASCII lowercase letters, but you could exploit the fact that ASCII characters are really just int values and add 32 to each letter that has a value between 65 and 90 (65 = A and 90 = Z in ASCII, but there are a few punctuation values in between Z and a.) No need to parse lists!

Categories