Python text encryption: rot13 - python

I am currently doing an assignment that encrypts text by using rot 13, but some of my text wont register.
# cgi is to escape html
# import cgi
def rot13(s):
#string encrypted
scrypt=''
alph='abcdefghijklmonpqrstuvwxyz'
for c in s:
# check if char is in alphabet
if c.lower() in alph:
#find c in alph and return its place
i = alph.find(c.lower())
#encrypt char = c incremented by 13
ccrypt = alph[ i+13 : i+14 ]
#add encrypted char to string
if c==c.lower():
scrypt+=ccrypt
if c==c.upper():
scrypt+=ccrypt.upper()
#dont encrypt special chars or spaces
else:
scrypt+=c
return scrypt
# return cgi.escape(scrypt, quote = True)
given_string = 'Rot13 Test'
print rot13(given_string)
OUTPUT:
13 r
[Finished in 0.0s]

Hmmm, seems like a bunch of things are not working.
Main problem should be in ccrypt = alph[ i+13 : i+14 ]: you're missing a % len(alph) otherwise if, for example, i is equal to 18, then you'll end out of the list boundary.
In your output, in fact, only e is encoded to r because it's the only letter in your test string which, moved by 13, doesn't end out of boundary.
The rest of this answer are just tips to clean the code a little bit:
instead of alph='abc.. you can declare an import string at the beginning of the script and use a string.lowercase
instead of using string slicing, for just one character it's better to use string[i], gets the work done
instead of c == c.upper(), you can use builtin function if c.isupper() ....

The trouble you're having is with your slice. It will be empty if your character is in the second half of the alphabet, because i+13 will be off the end. There are a few ways you could fix it.
The simplest might be to simply double your alphabet string (literally: alph = alph * 2). This means you can access values up to 52, rather than just up to 26. This is a pretty crude solution though, and it would be better to just fix the indexing.
A better option would be to subtract 13 from your index, rather than adding 13. Rot13 is symmetric, so both will have the same effect, and it will work because negative indexes are legal in Python (they refer to positions counted backwards from the end).
In either case, it's not actually necessary to do a slice at all. You can simply grab a single value (unlike C, there's no char type in Python, so single characters are strings too). If you were to make only this change, it would probably make it clear why your current code is failing, as trying to access a single value off the end of a string will raise an exception.
Edit: Actually, after thinking about what solution is really best, I'm inclined to suggest avoiding index-math based solutions entirely. A better approach is to use Python's fantastic dictionaries to do your mapping from original characters to encrypted ones. You can build and use a Rot13 dictionary like this:
alph="abcdefghijklmnopqrstuvwxyz"
rot13_table = dict(zip(alph, alph[13:]+alph[:13])) # lowercase character mappings
rot13_table.update((c.upper(),rot13_table[c].upper()) for c in alph) # upppercase
def rot13(s):
return "".join(rot13_table.get(c, c) for c in s) # non-letters are ignored

First thing that may have caused you some problems - your string list has the n and the o switched, so you'll want to adjust that :) As for the algorithm, when you run:
ccrypt = alph[ i+13 : i+14 ]
Think of what happens when you get 25 back from the first iteration (for z). You are now looking for the index position alph[38:39] (side note: you can actually just say alph[38]), which is far past the bounds of the 26-character string, which will return '':
In [1]: s = 'abcde'
In [2]: s[2]
Out[2]: 'c'
In [3]: s[2:3]
Out[3]: 'c'
In [4]: s[49:50]
Out[4]: ''
As for how to fix it, there are a number of interesting methods. Your code functions just fine with a few modifications. One thing you could do is create a mapping of characters that are already 'rotated' 13 positions:
alph = 'abcdefghijklmnopqrstuvwxyz'
coded = 'nopqrstuvwxyzabcdefghijklm'
All we did here is split the original list into halves of 13 and then swap them - we now know that if we take a letter like a and get its position (0), the same position in the coded list will be the rot13 value. As this is for an assignment I won't spell out how to do it, but see if that gets you on the right track (and #Makoto's suggestion is a perfect way to check your results).

This line
ccrypt = alph[ i+13 : i+14 ]
does not do what you think it does - it returns a string slice from i+13 to i+14, but if these indices are greater than the length of the string, the slice will be empty:
"abc"[5:6] #returns ''
This means your solution turns everything from n onward into an empty string, which produces your observed output.
The correct way of implementing this would be (1.) using a modulo operation to constrain the index to a valid number and (2.) using simple character access instead of string slices, which is easier to read, faster, and throws an IndexError for invalid indices, meaning your error would have been obvious.
ccrypt = alph[(i+13) % 26]

If you're doing this as an exercise for a course in Python, ignore this, but just saying...
>>> import codecs
>>> codecs.encode('Some text', 'rot13')
'Fbzr grkg'
>>>

Related

In Python, does a set count as a buffer?

I am working through Cracking the Coding Interview (4th ed), and one of the questions is as follows:
Design an algorithm and write code to remove the duplicate characters in a string
without using any additional buffer. NOTE: One or two additional variables are fine.
An extra copy of the array is not.
I have written the following solution, which satisfies all of the test cases specified by the author:
def remove_duplicate(s):
return ''.join(sorted(set(s)))
print(remove_duplicate("abcd")) // output "abcd"
print(remove_duplicate("aaaa")) // output "a"
print(remove_duplicate("")) // output ""
print(remove_duplicate("aabb")) // output "ab"
Does my use of a set in my solution count as the use of an additional buffer, or is my solution adequate? If my solution is not adequate, what would be a better way to go about this?
Thank you very much!
Only the person administering the question or evaluating the answer could say for sure, but I would say that a set does count as a buffer.
If there are no repeated characters in the string, the length of the set would equal that of the string. In fact, since a set has significant overhead, since it works on a hash list, the set would probably take more memory than the string. If the string holds Unicode, the number of unique characters could be very large.
If you do not know how many unique characters are in the string, you will not be able to predict the length of the set. The possible-long and probably-unpredictable length of the set makes it count as a buffer--or worse, given the possible longer length than the string.
To follow up on v.coder's comment, I rewrote the code he (or she) was referring to in Python, and added some comments to try to explain what is going on.
def removeduplicates(s):
"""Original java implementation by
Druv Gairola (http://stackoverflow.com/users/495545/dhruv-gairola)
in his/her answer
http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
"""
# python strings are immutable, so first converting the string to a list of integers,
# each integer representing the ascii value of the letter
# (hint: look up "ascii table" on the web)
L = [ord(char) for char in s]
# easiest solution is to use a set, but to use Druv Gairola's method...
# (hint, look up "bitmaps" on the web to learn more!)
bitmap = 0
#seen = set()
for index, char in enumerate(L):
# first check for duplicates:
# number of bits to shift left (the space is the "lowest"
# character on the ascii table, and 'char' here is the position
# of the current character in the ascii table. so if 'char' is
# a space, the shift length will be 0, if 'char' is '!', shift
# length will be 1, and so on. This naturally requires the
# integer to actually have as many "bit positions" as there are
# characters in the ascii table from the space to the ~,
# but python uses "very big integers" (BigNums? I am not really
# sure here..) - so that's probably going to be fine..
shift_length = char - ord(' ')
# make a new integer where only one bit is set;
# the bit position the character corresponds to
bit_position = 1 << shift_length
# if the same bit is already set [to 1] in the bitmap,
# the result of AND'ing the two integers together
# will be an integer where that only that exact bit is
# set - but that still means that the integer will be greater
# than zero. (assuming that the so-called "sign bit" of the
# integer doesn't get set. Again, I am not entirely sure about
# how python handles integers this big internally.. but it
# seems to work fine...)
bit_position_already_occupied = bitmap & bit_position > 0
if bit_position_already_occupied:
#if char in seen:
L[index] = 0
else:
# update the bitmap to indicate that this character
# is now seen.
# so, same procedure as above. first find the bit position
# this character represents...
bit_position = char - ord(' ')
# make an integer that has a single bit set:
# the bit that corresponds to the position of the character
integer = 1 << bit_position
# "add" the bit to the bitmap. The way we do this is that
# we OR the current bitmap with the integer that has the
# required bit set to 1. The result of OR'ing two integers
# is that all bits that are set to 1 in *either* of the two
# will be set to 1 in the result.
bitmap = bitmap | integer
#seen.add(char)
# finally, turn the list back to a string to be able to return it
# (again, just kind of a way to "get around" immutable python strings)
return ''.join(chr(i) for i in L if i != 0)
if __name__ == "__main__":
print(removeduplicates('aaaa'))
print(removeduplicates('aabcdee'))
print(removeduplicates('aabbccddeeefffff'))
print(removeduplicates('&%!%)(FNAFNZEFafaei515151iaaogh6161626)([][][ ao8faeo~~~````%!)"%fakfzzqqfaklnz'))

best way to get an integer from string without using regex

I would like to get some integers from a string (the 3rd one). Preferable without using regex.
I saw a lot of stuff.
my string:
xp = '93% (9774/10500)'
So i would like the code to return a list with integers from a string. So the desired output would be: [93, 9774, 10500]
Some stuff like this doesn't work:
>>> new = [int(s) for s in xp.split() if s.isdigit()]
>>> print new
[]
>>> int(filter(str.isdigit, xp))
93977410500
Since the problem is that you have to split on different chars, you can first replace everything that's not a digit by a space then split, a one-liner would be :
xp = '93% (9774/10500)'
''.join([ x if x.isdigit() else ' ' for x in xp ]).split() # ['93', '9774', '10500']
Using regex (sorry!) to split the string by a non-digit, then filter on digits (can have empty fields) and convert to int.
import re
xp = '93% (9774/10500)'
print([int(x) for x in filter(str.isdigit,re.split("\D+",xp))])
result:
[93, 9774, 10500]
Since this is Py2, using str, it looks like you don't need to consider the full Unicode range; since you're doing this more than once, you can slightly improve on polku's answer using str.translate:
# Create a translation table once, up front, that replaces non-digits with
import string
nondigits = ''.join(c for c in map(chr, range(256)) if not c.isdigit())
nondigit_to_space_table = string.maketrans(nondigits, ' ' * len(nondigits))
# Then, when you need to extract integers use the table to efficiently translate
# at C layer in a single function call:
xp = '93% (9774/10500)'
intstrs = xp.translate(nondigit_to_space_table).split() # ['93', '9774', 10500]
myints = map(int, intstrs) # Wrap in `list` constructor on Py3
Performance-wise, for the test string on my 64 bit Linux 2.7 build, using translate takes about 374 nanoseconds to run, vs. 2.76 microseconds for the listcomp and join solution; the listcomp+join takes >7x longer. For larger strings (where the fixed overhead is trivial compared to the actual work), the listcomp+join solution takes closer to 20x longer.
Main advantage to polku's solution is that it requires no changes on Py3 (on which it should seamlessly support non-ASCII strings), where str.translate builds the translation table a different way there (str.translate) and it would be impractical to make a translation table that handled all non-digits in the whole Unicode space.
Since the format is fixed, you can use consecutive split().
It's not very pretty, or general, but sometimes the direct and "stupid" solution is not so bad:
a, b = xp.split("%")
x = int(a)
y = int(b.split("/")[0].strip()[1:])
z = int(b.split("/")[1].strip()[:-1])
print(x, y, z) # prints "93 9774 10500"
Edit: Clarified that the poster specifically said that his format is fixed. This solution is not very pretty, but it does what it's supposed to.

Error:string index out of range, defining a function

I'm practicing coding on codingbat.com since I'm a complete beginner in python, and here is one of the exercises:
Given a string, return a new string made of every other char starting with the first, so "Hello" yields "Hlo".
Here is my attempt at defining the function string_bits(str):
def string_bits(str):
char = 0
first = str[char]
for char in range(len(str)):
char += 2
every_other = str[char]
return (first + every_other)
Running the code gives an error. What's wrong with my code?
A different approach, with an explanation:
If you need to handle a sentence, where spaces would be included, you can do this using slicing. On a string slicing works as:
[start_of_string:end_of_string:jump_this_many_char_in_string]
So, you want to jump only every second letter, so you do:
[::2]
The first two are empty, because you just want to step every second character.
So, you can do this in one line, like this:
>>> " ".join(i[::2] for i in "Hello World".split())
'Hlo Wrd'
What just happened above, is we take our string, use split to make it a list. The split by default will split on a space, so we will have:
["Hello", "World"]
Then, what we will do from there, is using a comprehension, iterate through each item of the list, which will give us a word at a time, and from there we will perform the desired string manipulation per i[::2].
The comprehension is: (documentation)
i[::2] for i in "Hello World".split()
Finally, we call "".join (doc), which will now change our list back to a string, to finally give us the output:
"Hlo Wrd"
Check out the slicing section from the docs: https://docs.python.org/3/tutorial/introduction.html
The problem is that the char += 2 returns a value greater than len(str) as len(str)-1 (the range) + 2 is longer than the string. You could do:
def string_bits(string):
if len(string) == 2:
return string[0]
result = ''
for char in range(0,len(string),2):#range created value sin increments of two
result += string[char]
return result
A more succinct method would be:
def string_bits(string):
return string[::2]
You should avoid using 'str' as a variable name as it is a reserved word by Python.
Ok, for me:
You should not use str as a variable name as it is a python built-in function (replace str by my_str for example)
For example, 'Hello' length is 5, so 0 <= index <= 4. Here you are trying to access index 3+2=5 (when char = 3) in your for loop.
You can achieve what you want with the following code:
def string_bits(my_str):
result = ""
for char in range(0, len(my_str), 2):
result += my_str[char]
return result
The error you are getting means that you are trying to get the nth letter of a string that has less than n characters.
As another suggestion, strings are Sequence-types in Python, which means they have a lot of built-in functionalities for doing exactly what you're trying to do here. See Built-in Types - Python for more information, but know that sequence types support slicing - that is, selection of elements from the sequence.
So, you could slice your string like this:
def string_bits(input_string):
return input_string[::2]
Meaning "take my input_string from the start (:) to the end (:) and select every second (2) element"

Caesar Cipher algorithm with strings and for loop Python

The assignment is to write a Caesar Cipher algorithm that receives 2 parameters, the first being a String parameter, the second telling how far to shift the alphabet. The first part is to set up a method and set up two strings, one normal and one shifted. I have done this. Then I need to make a loop to iterate through the original string to build a new string, by finding the original letters and selecting the appropriate new letter from the shifted string. I've spent at least two hours staring at this one, and talked to my teacher so I know I'm doing some things right. But as for what goes in the while loop, I really don't have a clue. Any hints or pushes in the right direction would be very helpful so I at least have somewhere to start would be great, thank you.
def cipher(x, dist):
alphabet = "abcdefghijklmnopqrstuvwxyz"
shifted = "xyzabcdefghijklmnopqrstuvw"
stringspot = 0
shiftspot = (x.find("a"))
aspot = (x.find("a"))
while stringspot < 26:
aspot = shifted(dist)
shifted =
stringspot = stringspot + 1
ans =
return ans
print(cipher("abcdef", 1))
print(cipher("abcdef", 2))
print(cipher("abcdef", 3))
print(cipher("dogcatpig", 1))
Here are some pushes and hints:
You should validate your inputs. In particular, make sure that the shift distance is "reasonable," where reasonable means something you can handle. I recommend <=25.
If the maximum shift amount is 25, the letter 'a' plus 25 would get 'z'. The letter 'z' plus 25 will go past the end of the alphabet. But it wouldn't go past the end of TWO alphabets. So that's one way to handle wrap-around.
User #zondo, in his solution, handles upper-case letters. You didn't mention if you want to handle them or not. You may want to clarify that with your teacher.
If you know about dictionaries, you might want to build one to make it easy to map the old letters to the new letters.
You need to realize that strings are treated as tuples or lists - you can index them. I don't see you doing that in your code.
You can get an "ASCII code" number for a letter using ord(). The numbers are arbitrary, but both upper and lower case numbers are packed together tightly in ranges of 26. This means you can do math with them. (For example, ord('a') is 97. Not super useful. But ord('b') - ord('a') is 1, which might be good to know.)
alphabet and shifted are supposed to be a mapping between the original stream and the ciphertext. The loop's job is to iterate over all letters in the stream substitute them. More specifically, the letter in alphabet and the substitute letter in shifted reside at the same index, hence the mapping. In pseudocode:
ciphertext = empty
for each letter in x
i = index of letter in alphabet
new_letter = shifted[i]
add new_letter to ciphertext
The whole loop can be simplified to a comprehension list, but this shouldn't be your primary concern.
For more direct mapping than doing as in the pseudocode above, look into dictionaries.
Another thing that stands out in your code is the generation of shifted, which should depend on the argument dist so it can't just be hardcoded. So, if dist is 5, the first letter in shifted should be whatever lies at the 0+5 in alphabet, and so on. Hint: modulo operator.

Python trick in finding leading zeros in string

I have a binary string say '01110000', and I want to return the number of leading zeros in front without writing a forloop. Does anyone have any idea on how to do that? Preferably a way that also returns 0 if the string immediately starts with a '1'
If you're really sure it's a "binary string":
input = '01110000'
zeroes = input.index('1')
Update: it breaks when there's nothing but "leading" zeroes
An alternate form that handles the all-zeroes case.
zeroes = (input+'1').index('1')
Here is another way:
In [36]: s = '01110000'
In [37]: len(s) - len(s.lstrip('0'))
Out[37]: 1
It differs from the other solutions in that it actually counts the leading zeroes instead of finding the first 1. This makes it a little bit more general, although for your specific problem that doesn't matter.
A simple one-liner:
x = '01110000'
leading_zeros = len(x.split('1', 1)[0])
This partitions the string into everything up to the first '1' and the rest after it, then counts the length of the prefix. The second argument to split is just an optimization and represents the number of splits to perform, meaning the function will stop after it found the first '1' instead of splitting it on all occurences. You could just use x.split('1')[0] if performance doesn't matter.
I'd use:
s = '00001010'
sum(1 for _ in itertools.takewhile('0'.__eq__, s))
Rather pythonic, works in the general case, for example on the empty string and non-binary strings, and can handle strings of any length (or even iterators).
If you know it's only 0 or 1:
x.find(1)
(will return -1 if all zeros; you may or may not want that behavior)
If you don't know which number would be next to zeros i.e. "1" in this case, and you just want to check if there are leading zeros, you can convert to int and back and compare the two.
"0012300" == str(int("0012300"))
How about re module?
a = re.search('(?!0)', data)
then a.start() is the position.
I'm using has_leading_zero = re.match(r'0\d+', str(data)) as a solution that accepts any number and treats 0 as a valid number without a leading zero

Categories