I want to make a function that would uncapitalize capitalized letters.
I was thinking I could do
if x in caps
where caps is a list of all the capital letters, and I had it return the position in the list that letter is in, I could have it replace itself with the lowercase.
How would I do this?
Thanks.
Why not use the built in "lower"
a = 'aBcDeFg'
print a.lower() # abcdefg
If that doesn't work, you can always iterate and then use ord and chr:
"".join(
# if i is between A and Z change it to between a and z
[chr(ord(i)+32) if 65<= ord(i) <= 90
# otherwise leave it as is
else i for i in a])
If you simply want to make the string lowercase try:
>>> import string
>>> myString = "abcDEfGHiJ"
>>> myString.lower()
'abcdefghij'
If you want the index of every lowercase letter (for whatever reason):
>>> [pos for pos, let in enumerate(myString) if 65 <= ord(let) <= 90]
[3, 4, 6, 7, 9]
Assuming this isn't some kind of academic problem, you would use the standard string lower() method.
x.lower()
If it is an exercise, you can flip the character to an int, check if it's in the ASCII upper case letter range, if it is then add 32 to convert it. This of course only works with the ASCII letters.
x = chr(ord(x)+32) if ord(x) > 64 and ord(x) < 91 else x
Not sure what else to say other than the library docs are right here: http://docs.python.org/library/stdtypes.html
Also, it's worth noting there really isn't a character types, only numbers and strings, when you're working with one char in a string it's really a string of length 1. Also, don't forget strings are immutable.
Python has a function for this -- str.lower(). http://docs.python.org/library/stdtypes.html#str.lower
If you really want to write the function yourself, Python's string library has a list of ASCII lowercase letters, but you could exploit the fact that ASCII characters are really just int values and add 32 to each letter that has a value between 65 and 90 (65 = A and 90 = Z in ASCII, but there are a few punctuation values in between Z and a.) No need to parse lists!
Related
I'm trying to change numbers in a string to an assigned letter (0 would be 'A', 1 => 'B' and so on...).
In this case I must use a function (def) to change the numbers to the assigned letter. I tried doing the program with if's in a for cycle with indices and it worked, but apperantly there is an easier and shorter solution. I tried doing it with isdigit() instead, but my function doesn't recognize the numbers in the given word and just prints out the same word
You can use a dictionary to map digits to letters, write a generator to get those letters and join it all into a resultant string. The dictionary's get method either gets the value or returns a default. Use this to return any non-digit characters.
>>> digit_map = dict(zip('123456789', 'ABCDEFGHI'))
>>> test = 'a1b2c3'
>>> "".join(digit_map.get(c,c) for c in test)
'aAbBcC'
Use ord to get the numeric value of a character, and chr to turn the numeric value back into a character. That lets you convert a number like 1 into its offset from another character.
Then use join to put it all together:
>>> ''.join(chr(int(c)+ord('A')) if c.isdecimal() else c for c in 'a1b2c3')
'aBbCcD'
One solution would be creating an array that would store each letter as a string as such:
alphabet = ["A", "B", "C"] // and so on
Then you could loop trough the string and find all numbers, then for each numbers get the corresponding letter by accessing it from the array as such py alphabet[i] and feed it back to the string and return that
Here is a simple function that you can use to convert numbers to letters in a given string:
def convert_numbers_to_letters(s):
# Create a dictionary that maps numbers to letters
number_to_letter = {str(i): chr(i + ord('A')) for i in range(10)}
# Iterate through each character in the string
result = ""
for c in s:
# If the character is a number, convert it to a letter
if c.isdigit():
result += number_to_letter[c]
# Otherwise, just add the character to the result
else:
result += c
return result
I have the code below to replace all punctuation with 999 and all alphabet characters with its number position. I have included the print statement that confirms punctuation is being replaced. However I seem to override with my remaining code to replace the other characters.
import string
def encode(text):
punct = '''!()-[]{};:'"\,<>./?##$%^&*_~'''
for x in text.lower():
if x in punct:
text = text.replace(x, ".999")
print(text)
nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
return ".".join(nums)
print(encode(str(input("Enter Text: "))))
Input: 'Morning! \n'
Output: '13.15.18.14.9.14.7 \n'
Expected Output: 13.15.18.14.9.14.7.999
No, you have two independent logical "stories" here. One replaces punctuation with 999. The other filters out all the letters and builds an independent list of their alphabetic positions.
nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
return ".".join(nums)
Note that this does nothing to alter text, and it takes nothing but letters from text. If you want to include the numbers, do so:
nums = [str(ord(x) - 96)
if x >= 'a' and x <= 'z'
else x
for x in text.lower()
]
return ".".join(nums)
Output of print(encode("[hello]")):
..9.9.9.8.5.12.12.15...9.9.9
nums = [str(ord(x) - 96)
for x in text.lower()
if x >= 'a' and x <= 'z'
]
This means: take every character from the lowercase version of the string, and only if it is between 'a' and 'z', convert the value and put the result in nums.
In the first step, you replace a bunch of punctuation with text that includes '.' and '9' characters. But neither '9' nor '.' is between 'a' and 'z', so of course neither is preserved in the second step.
Now that I understand what you are going for: you have fundamentally the wrong approach to splitting up the problem. You want to separate the two halves of the rule for "encoding" a given part of the input. But what you want to do is separate the whole rule for encoding a single element, from the process of applying a single-element rule to the whole input. After all - that is what list comprehensions do.
This is the concept of separation of concerns. The two business rules are part of the same concern - because implementing one rule doesn't help you implement the other. Being able to encode one input character, though, does help you encode the whole string, because there is a tool for that exact job.
We can have a complicated rule for single characters - no problem. Just put it in a separate function, so that we can give it a meaningful name and keep things simple to understand. Conceptually, our individual-character encoding is a numeric value, so we will consistently encode as a number, and then let the string-encoding process do the conversion.
def encode_char(c):
if c in '''!()-[]{};:'"\,<>./?##$%^&*_~''':
return 999
if 'a' <= c.lower() <= 'z':
return ord(c) - 96
# You should think about what to do in other cases!
# In particular, you don't want digit symbols 1 through 9 to be
# confused with letters A through I.
# So I leave the rest up to you, depending on your requirements.
Now we can apply the overall encoding process: we want a string that puts '.' in between the string representations of the values. That's straightforward:
def encode(text):
return '.'.join(str(encode_char(c)) for c in text)
I'm working on problem 3(set 1) of the cryptopals challenges (https://cryptopals.com/sets/1/challenges/3)
I've already found the key ('x') and decrypted the message ('Cooking mcs like a pound of bacon')
Here is my code:
from hexToBase64 import hexToBinary
from fixedXOR import xorBuffers
def binaryToChar(binaryString):
asciiValue = 0
for i in range(int(len(binaryString))-1,-1,-1):
if(binaryString[i] == '1'):
asciiValue = asciiValue + 2**(7-i)
return chr(asciiValue)
def decimalToBinary(number):
binaryString = ""
while (number != 0):
bit = number % 2
binaryString = str(bit) + binaryString
number = int(number/2)
while(len(binaryString) < 8):
binaryString = "0" + binaryString
return binaryString
def breakSingleByteXOR(cipherString):
decryptedMess = ""
lowestError = 10000
realKey = ""
for i in range(0,128):
errorChar = 0
tempKey = decimalToBinary(i)
tempMess = ""
for j in range(0,len(cipherString),2):
#Take each byte of the cipherString
cipherChar = hexToBinary(cipherString[j:j+2])
decryptedChar = binaryToChar(xorBuffers(cipherChar,tempKey))
asciiValue = ord(decryptedChar)
if (not ((asciiValue >= 65) and (asciiValue <= 90)) \
or ((asciiValue >= 90) and (asciiValue <= 122)) \
or ( asciiValue == 32 )):
# if the character is not one of the characters ("A-Z" or "a-z"
# or " ") consider it as an "error"
errorChar += 1
tempMess = tempMess + decryptedChar
if(errorChar < lowestError):
lowestError = errorChar
decryptedMess = tempMess
realKey = chr(i)
return (realKey,decryptedMess)
if __name__ == "__main__":
print(breakSingleByteXOR("1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736"))
The problem is when I use the function breakSingleByteXOR to return one value (decryptedMess), it came out okay "cOOKING mcS LIKE A POUND OF BACON"
But when I return 2 values with the function (as the code above - (key,decryptedMess)), I received a weird result ('x', 'cOOKING\x00mc\x07S\x00LIKE\x00A\x00POUND\x00OF\x00BACON'), can anyboby explain to me why this is the case?
Tbh, I'm learning python as I'm doing the challenges so hopefully I dont trigger anyone with these code.... I'd also really appreciate it if anyone could give me some advices on writing good python code
Thanks guys :D
It's true that the reason for the difference in the printed string is a quirk of the print function.
The deeper problem with that program is that it's not producing the correct answer. That's because the big ugly if that tries to decide whether a decrypted character is in the acceptable range is incorrect.
It's incorrect in two ways. The first is that (asciiValue >= 90) should be (asciiValue >= 97). A better way to write all of those expressions, which would have avoided this error, is to express them as (asciiValue >= ord('a')) and (asciiValue == ord(' ')) and so on, avoiding the inscrutable numbers.
The second way is that the expressions are not properly grouped. As they stand they do this:
character is not in the range 'A' to 'Z',
or character is in the range 'a' to 'z',
or character is 'space',
then count this as an error
so some of the characters that should be good (specifically 'a' through 'z' and space) are counted as bad. To fix, you need to rework the parentheses so that the condition is:
character is not in the range 'A' to 'Z',
and character is not in the range 'a' to 'z',
and character is not space,
then count this as an error
or (this is style you were trying for)
character is not (in the range 'A' to 'Z'
or in the range 'a' to 'z'
or a space)
I'm not going to give you the exact drop-in expression to fix the program, it'll be better for you to work it out for yourself. (A good way to deal with this kind of complexity is to move it into a separate function that returns True or False. That makes it easy to test that your implementation is correct, just by calling the function with different characters and seeing that the result is what you wanted.)
When you get the correct expression, you'll find that the program discovers a different "best key" and the decrypted string for that key contains no goofy out-of-range characters that behave strangely with print.
The print function is the culprit - it is translating the characters \x00 and \x07 to ASCII values when executed. Specifically, this only occurs when passing a string to the print function, not an iterable or other object (like your tuple).
This is an example:
>>> s = 'This\x00string\x00is\x00an\x00\x07Example.'
>>> s
'This\x00string\x00is\x00an\x00\x07Example.'
>>> print(s)
This string is an Example.
If you were to add the string s to an iterable (tuple, set, or list), s will not be formatted by the print function:
>>> s_list = [s]
>>> print(s_list) # List
['This\x00string\x00is\x00an\x00\x07Example.']
>>> print(set(s_list)) # Set
{'This\x00string\x00is\x00an\x00\x07Example.'}
>>> print(tuple(s_list)) # Tuple
('This\x00string\x00is\x00an\x00\x07Example.')
Edit
Because the \x00 and \x07 bytes are ASCII control characters, (\x00 being NUL and \x07 being BEL), you can't represent them in any other way. So one of the only ways you could strip these characters from the string without printing would be to use the .replace() method; but given \x00 bytes are being treated as spaces by the terminal, you would have to use s.replace('\x00', ' ') to get the same output, which has now changed the true content of the string.
Otherwise when building the string; you could try and implement some logic to check for ASCII control characters and either not add them to tempMess or add a different character like a space or similar.
References
ASCII Wiki: https://en.wikipedia.org/wiki/ASCII
Curses Module: https://docs.python.org/3.7/library/curses.ascii.html?highlight=ascii#module-curses.ascii (Might be useful if you wish to implement any logic).
This is a convoluted example, but it shows what I'm attempting to do. Say I have a string:
from string import ascii_uppercase, ascii_lowercase, digits
s = "Testing123"
I would like to replace all values in s that appear in ascii_uppercase with "L" for capital letter, all values that appear in ascii_lowercase with "l" for lowercase letter, and those in digits with "n" for a number.
I'm currently doing:
def getpattern(data):
pattern = ""
for c in data:
if c in ascii_uppercase: pattern += "L"; continue
if c in ascii_lowercase: pattern += "l"; continue
if c in digits: pattern += "n"; continue
pattern += "?"
However, this is tedious with several more lists to replace. I'm usually better at finding map-type algorithms for things like this, but I'm stumped. I can't have it replace anything that was already replaced. For example, if I run the digits one and replace it with "n", the next iteration might replace that with "l" because "n" is a lowercase letter.
getpattern("Testing123") == "Lllllllnnn"
You can create a translation table that maps all upper case letters to 'L', all lower case letters to 'l' and all digits to 'n'. Once you have such a map, you can pass it to str.translate().
from string import ascii_uppercase, ascii_lowercase, digits, maketrans
s = "Testing123"
intab = ascii_uppercase + ascii_lowercase + digits
outtab = ('L' * 26) + ('l' * 26) + ('n' * 10)
trantab = maketrans(intab, outtab)
print s.translate(trantab)
Note that in Python 3 there is no string.maketrans function. Instead, you get the method from the str object str.maketrans(). Read more about this here and the documentation here
I'm not exactly certain of the internals of str.translate(), but my educated guess is the mapping creates a length 256 string for each string character. So as it passes over your string, it'll translate \x00 to \x00, \x01 to \x01, etc, but A to L. That way you don't have to check whether each character is in your translation dictionary. I presume blindly translating all characters with no branches would result to better performance. Print ''.join(chr(i) for i in range(256)) in comparison to see this.
They're in different 32-blocks of ASCII, so you can do this:
>>> ''.join(' nLl'[ord(c) // 32] for c in s)
'Lllllllnnn'
Your example suggests that you don't have other characters, but if you do, this should work:
>>> s = "Testing123 and .?#!-+ äöüß"
>>> ''.join(' nLl'[ord(c) // 32] if c <= 'z' and c.isalnum() else '?' for c in s)
'Lllllllnnn?lll????????????'
Just in case you need to process unicode data:
import unicodedata
cat = {'Lu':'L', 'Ll':'l', 'Nd':'n'}
def getpattern(data):
return ''.join(cat.get(unicodedata.category(c),c) for c in data)
I am trying to write a program in Python which uses a recursive function to convert all the lower-case characters in a string to the next character. Here's my attempt:
def convert(s):
if len(s) < 1:
return ""
else:
return convert(chr(ord(s[0+1])))
print(convert("hello"))
When I try to run this program, it gives me the error: string index out of range. Could anyone please help me correct this? I'm not even sure if my program is coded correctly to give the required output :/
You want to return the shifted character and then call your convert function on the remainder of the string. If you must use recursion, you need to check if the string is exhausted (if not s is the same as if len(s) == 0 here because '' is equivalent to False) and bail:
def convert(s):
if not s:
return ''
c = s[0]
i = ord(c)
if 96 < i < 123:
# for lower-case characters permute a->b, b->c, ... y->z, z->a
c = chr(((i-97)+1)%26 + 97)
return c + convert(s[1:])
print(convert('hello'))
print(convert('abcdefghijklmnopqrstuvwxyz'))
Output:
ifmmp
bcdefghijklmnopqrstuvwxyza
The ASCII codes for 'a' and 'z' are 97 and 122 respectively, so we only apply the shift to characters whose codes, i, are in this range. Don't forget to wrap if the character is z: you can do this with modular arithmetic: ((i-97)+1)%26 + 97.
EDIT explanation: Subtract 97 so that the code becomes 0 to 25, then add 1 mod 26 such that 0+1 = 1, 1+1 = 2, ..., 24+1 = 25, 25+1=0. Then add back on 97 so that the code represents a letter between a and z. This way your letters will cycle round
You are trying to index the second character each time; Python indexes start at 0 so 0+1 is 1 is the second character. Your len() test doesn't guard against that, it only tests for empty strings.
You also pass in just one character to the recursive call, so you always end up with a string of length 1, which doesn't have a second character.
So your test with 'hello' does this:
convert('hello')
len('hello') > 1 -> True
s[0+1] == s[1] == 'e'; chr(ord('e')) is 'e'
return convert('e')
len('e') > 1 -> True
s[0+1] == s[1] -> 'e'[1] raises an index error
If you wanted to use recursion, then you need to decide how to detect the end of the recursion path correctly. You could test for strings shorter than 2 characters, for example, as there is no next character to use in that case.
You also need to decide what to delegate to the recursive call. For a conversion like this, you could pass in the remainder of the string.
Last but not least, you need to test if the character you are going to replace is actually lowercase.