I am working through Cracking the Coding Interview (4th ed), and one of the questions is as follows:
Design an algorithm and write code to remove the duplicate characters in a string
without using any additional buffer. NOTE: One or two additional variables are fine.
An extra copy of the array is not.
I have written the following solution, which satisfies all of the test cases specified by the author:
def remove_duplicate(s):
return ''.join(sorted(set(s)))
print(remove_duplicate("abcd")) // output "abcd"
print(remove_duplicate("aaaa")) // output "a"
print(remove_duplicate("")) // output ""
print(remove_duplicate("aabb")) // output "ab"
Does my use of a set in my solution count as the use of an additional buffer, or is my solution adequate? If my solution is not adequate, what would be a better way to go about this?
Thank you very much!
Only the person administering the question or evaluating the answer could say for sure, but I would say that a set does count as a buffer.
If there are no repeated characters in the string, the length of the set would equal that of the string. In fact, since a set has significant overhead, since it works on a hash list, the set would probably take more memory than the string. If the string holds Unicode, the number of unique characters could be very large.
If you do not know how many unique characters are in the string, you will not be able to predict the length of the set. The possible-long and probably-unpredictable length of the set makes it count as a buffer--or worse, given the possible longer length than the string.
To follow up on v.coder's comment, I rewrote the code he (or she) was referring to in Python, and added some comments to try to explain what is going on.
def removeduplicates(s):
"""Original java implementation by
Druv Gairola (http://stackoverflow.com/users/495545/dhruv-gairola)
in his/her answer
http://stackoverflow.com/questions/2598129/function-to-remove-duplicate-characters-in-a-string/10473835#10473835
"""
# python strings are immutable, so first converting the string to a list of integers,
# each integer representing the ascii value of the letter
# (hint: look up "ascii table" on the web)
L = [ord(char) for char in s]
# easiest solution is to use a set, but to use Druv Gairola's method...
# (hint, look up "bitmaps" on the web to learn more!)
bitmap = 0
#seen = set()
for index, char in enumerate(L):
# first check for duplicates:
# number of bits to shift left (the space is the "lowest"
# character on the ascii table, and 'char' here is the position
# of the current character in the ascii table. so if 'char' is
# a space, the shift length will be 0, if 'char' is '!', shift
# length will be 1, and so on. This naturally requires the
# integer to actually have as many "bit positions" as there are
# characters in the ascii table from the space to the ~,
# but python uses "very big integers" (BigNums? I am not really
# sure here..) - so that's probably going to be fine..
shift_length = char - ord(' ')
# make a new integer where only one bit is set;
# the bit position the character corresponds to
bit_position = 1 << shift_length
# if the same bit is already set [to 1] in the bitmap,
# the result of AND'ing the two integers together
# will be an integer where that only that exact bit is
# set - but that still means that the integer will be greater
# than zero. (assuming that the so-called "sign bit" of the
# integer doesn't get set. Again, I am not entirely sure about
# how python handles integers this big internally.. but it
# seems to work fine...)
bit_position_already_occupied = bitmap & bit_position > 0
if bit_position_already_occupied:
#if char in seen:
L[index] = 0
else:
# update the bitmap to indicate that this character
# is now seen.
# so, same procedure as above. first find the bit position
# this character represents...
bit_position = char - ord(' ')
# make an integer that has a single bit set:
# the bit that corresponds to the position of the character
integer = 1 << bit_position
# "add" the bit to the bitmap. The way we do this is that
# we OR the current bitmap with the integer that has the
# required bit set to 1. The result of OR'ing two integers
# is that all bits that are set to 1 in *either* of the two
# will be set to 1 in the result.
bitmap = bitmap | integer
#seen.add(char)
# finally, turn the list back to a string to be able to return it
# (again, just kind of a way to "get around" immutable python strings)
return ''.join(chr(i) for i in L if i != 0)
if __name__ == "__main__":
print(removeduplicates('aaaa'))
print(removeduplicates('aabcdee'))
print(removeduplicates('aabbccddeeefffff'))
print(removeduplicates('&%!%)(FNAFNZEFafaei515151iaaogh6161626)([][][ ao8faeo~~~````%!)"%fakfzzqqfaklnz'))
Related
I'm attempting to create a switch statement for a given value. The value is a 16-bit unsigned number, and I want to jump to the appropriate pattern. Each pattern is a hexadecimal string, but an underscore denotes a wildcard. For example, (0x1234 matches '1234' and '12_4' but not '56_8'). While I'm only posting a subset of these patterns, assume they cover the entire range of 0x0000-0xFFFF.
patterns = {
'15__': foo,
'2__0': bar,
'8__0': baz,
...
}
...
def run(self, x: int) -> None:
# x to string (0x567f -> "567F")
x_str = str(hex(opcode))[2:].zfill(4).upper()
# Search for the matching pattern and execute the associated method
for pattern, instruction in patterns.items():
if all([x_str[i] == pattern[i] for i in range(len(pattern)) if pattern[i] != '_'):
instruction(x)
break
Now, this works. However, it is incredibly slow, and defeats the purpose of using a dictionary since it just iterates through it. Also, since it has to convert x to a string (with formatting) then check that string against the pattern string, the whole thing is a giant bottleneck. I'm looking for a way to, preferably, get it closer to an actual lookup table, bonus points if we don't need to convert x to a string.
A switch statement of this type is an iterative process, not a jump table. In the general case you present, the way to avoid iteration is to generate the graph of partial indexing decisions, based on the specific arrangement of common digits (hexits) and wild cards in your table.
Instead, try simply speeding up your matching. I suggest that you take a mask-and-match approach to your table keys. Code the "don't-care" (wild-card) positions separately, and keep the key as a tuple of match and mask values. Your examples would be
patterns = {
(0x1500, 0xFF00): foo,
(0x2000, 0xF00F): bar,
(0x8000, 0xF00F): baz,
...
}
To check a particular key against your candidate cand, you look for bit equality, but mask off any mismatches in the wild-card positions:
cand ^ match # bit inequality; mismatch is 1
result & mask # force don't-care bits to 0
So that you can check
if (cand ^ match) & mask:
continue # Something doesn't match
else:
return value from dict
Your dict format is
(match, mask): value
Can you handle the logic for iteration and return value?
The assignment is to write a Caesar Cipher algorithm that receives 2 parameters, the first being a String parameter, the second telling how far to shift the alphabet. The first part is to set up a method and set up two strings, one normal and one shifted. I have done this. Then I need to make a loop to iterate through the original string to build a new string, by finding the original letters and selecting the appropriate new letter from the shifted string. I've spent at least two hours staring at this one, and talked to my teacher so I know I'm doing some things right. But as for what goes in the while loop, I really don't have a clue. Any hints or pushes in the right direction would be very helpful so I at least have somewhere to start would be great, thank you.
def cipher(x, dist):
alphabet = "abcdefghijklmnopqrstuvwxyz"
shifted = "xyzabcdefghijklmnopqrstuvw"
stringspot = 0
shiftspot = (x.find("a"))
aspot = (x.find("a"))
while stringspot < 26:
aspot = shifted(dist)
shifted =
stringspot = stringspot + 1
ans =
return ans
print(cipher("abcdef", 1))
print(cipher("abcdef", 2))
print(cipher("abcdef", 3))
print(cipher("dogcatpig", 1))
Here are some pushes and hints:
You should validate your inputs. In particular, make sure that the shift distance is "reasonable," where reasonable means something you can handle. I recommend <=25.
If the maximum shift amount is 25, the letter 'a' plus 25 would get 'z'. The letter 'z' plus 25 will go past the end of the alphabet. But it wouldn't go past the end of TWO alphabets. So that's one way to handle wrap-around.
User #zondo, in his solution, handles upper-case letters. You didn't mention if you want to handle them or not. You may want to clarify that with your teacher.
If you know about dictionaries, you might want to build one to make it easy to map the old letters to the new letters.
You need to realize that strings are treated as tuples or lists - you can index them. I don't see you doing that in your code.
You can get an "ASCII code" number for a letter using ord(). The numbers are arbitrary, but both upper and lower case numbers are packed together tightly in ranges of 26. This means you can do math with them. (For example, ord('a') is 97. Not super useful. But ord('b') - ord('a') is 1, which might be good to know.)
alphabet and shifted are supposed to be a mapping between the original stream and the ciphertext. The loop's job is to iterate over all letters in the stream substitute them. More specifically, the letter in alphabet and the substitute letter in shifted reside at the same index, hence the mapping. In pseudocode:
ciphertext = empty
for each letter in x
i = index of letter in alphabet
new_letter = shifted[i]
add new_letter to ciphertext
The whole loop can be simplified to a comprehension list, but this shouldn't be your primary concern.
For more direct mapping than doing as in the pseudocode above, look into dictionaries.
Another thing that stands out in your code is the generation of shifted, which should depend on the argument dist so it can't just be hardcoded. So, if dist is 5, the first letter in shifted should be whatever lies at the 0+5 in alphabet, and so on. Hint: modulo operator.
I was answering some programming problems in the internet and this problem interests me. The problem is defined as follows:
This code prints all the permutations of the string lexicographically. Something is wrong with it. Find and fix it by modifying or adding one line!
Input:
The input consists of a single line containing a string of lowercase characters with no spaces in between. Its length is at most 7 characters, and its characters are sorted lexicographically.
Output:
All permutations of the string printed one in each line, listed lexicographically.
def permutations():
global running
global characters
global bitmask
if len(running) == len(characters):
print(''.join(running))
else:
for i in xrange(len(characters)):
if ((bitmask>>i)&1) == 0:
bitmask |= 1<<i
running.append(characters[i])
permutations()
running.pop()
raw = raw_input()
characters = list(raw)
running = []
bitmask = 0
permutations()
Can somebody answer it for me and explain how it works? I am not really familiar in the applications of bitmasking. Thank you.
You should make the bitmask bit 0 again by adding the line:
bitmask ^= 1<<i
Code:
def permutations():
global running
global characters
global bitmask
if len(running) == len(characters):
print(''.join(running))
else:
for i in xrange(len(characters)):
if ((bitmask>>i)&1) == 0:
bitmask |= 1<<i
running.append(characters[i])
permutations()
bitmask ^= 1<<i #make the bit zero again.
running.pop()
raw = raw_input()
characters = list(raw)
running = []
bitmask = 0
permutations()
Explanation:
Bitmask is an integer that is treated as a string of bits. In your case the length of this string is equal to the length of the input string.
Each position in this string signifies whether the corresponding character has already added in the partially built string or not.
The code works by building a new string starting from an empty string. Whenever any character is added, the bitmask records it. Then the string is sent deeper into recursion for further addition of characters. When the code returns from recursion, then the added character is to be removed and the bitmask value has to be made to its original value.
More information about masking can be found here.http://en.wikipedia.org/wiki/Mask_%28computing%29
EDIT:
Say the input string is "abcde" and the bitmask at any point in the execution of the code is "00100". This means that only the character 'c' has been added so far to the partially built string.
Hence we should not add the character 'c' again.
The "if" condition ((bitmask >> i) & 1) == 0 checks whether the i'th bit in bitmask has been set, ie., whether the i'th character has already been added in the string. If it is not added, only then the character gets appended, otherwise not.
If the bit operations are new to you then I suggest you look up on this topic on the internet.
I am currently doing an assignment that encrypts text by using rot 13, but some of my text wont register.
# cgi is to escape html
# import cgi
def rot13(s):
#string encrypted
scrypt=''
alph='abcdefghijklmonpqrstuvwxyz'
for c in s:
# check if char is in alphabet
if c.lower() in alph:
#find c in alph and return its place
i = alph.find(c.lower())
#encrypt char = c incremented by 13
ccrypt = alph[ i+13 : i+14 ]
#add encrypted char to string
if c==c.lower():
scrypt+=ccrypt
if c==c.upper():
scrypt+=ccrypt.upper()
#dont encrypt special chars or spaces
else:
scrypt+=c
return scrypt
# return cgi.escape(scrypt, quote = True)
given_string = 'Rot13 Test'
print rot13(given_string)
OUTPUT:
13 r
[Finished in 0.0s]
Hmmm, seems like a bunch of things are not working.
Main problem should be in ccrypt = alph[ i+13 : i+14 ]: you're missing a % len(alph) otherwise if, for example, i is equal to 18, then you'll end out of the list boundary.
In your output, in fact, only e is encoded to r because it's the only letter in your test string which, moved by 13, doesn't end out of boundary.
The rest of this answer are just tips to clean the code a little bit:
instead of alph='abc.. you can declare an import string at the beginning of the script and use a string.lowercase
instead of using string slicing, for just one character it's better to use string[i], gets the work done
instead of c == c.upper(), you can use builtin function if c.isupper() ....
The trouble you're having is with your slice. It will be empty if your character is in the second half of the alphabet, because i+13 will be off the end. There are a few ways you could fix it.
The simplest might be to simply double your alphabet string (literally: alph = alph * 2). This means you can access values up to 52, rather than just up to 26. This is a pretty crude solution though, and it would be better to just fix the indexing.
A better option would be to subtract 13 from your index, rather than adding 13. Rot13 is symmetric, so both will have the same effect, and it will work because negative indexes are legal in Python (they refer to positions counted backwards from the end).
In either case, it's not actually necessary to do a slice at all. You can simply grab a single value (unlike C, there's no char type in Python, so single characters are strings too). If you were to make only this change, it would probably make it clear why your current code is failing, as trying to access a single value off the end of a string will raise an exception.
Edit: Actually, after thinking about what solution is really best, I'm inclined to suggest avoiding index-math based solutions entirely. A better approach is to use Python's fantastic dictionaries to do your mapping from original characters to encrypted ones. You can build and use a Rot13 dictionary like this:
alph="abcdefghijklmnopqrstuvwxyz"
rot13_table = dict(zip(alph, alph[13:]+alph[:13])) # lowercase character mappings
rot13_table.update((c.upper(),rot13_table[c].upper()) for c in alph) # upppercase
def rot13(s):
return "".join(rot13_table.get(c, c) for c in s) # non-letters are ignored
First thing that may have caused you some problems - your string list has the n and the o switched, so you'll want to adjust that :) As for the algorithm, when you run:
ccrypt = alph[ i+13 : i+14 ]
Think of what happens when you get 25 back from the first iteration (for z). You are now looking for the index position alph[38:39] (side note: you can actually just say alph[38]), which is far past the bounds of the 26-character string, which will return '':
In [1]: s = 'abcde'
In [2]: s[2]
Out[2]: 'c'
In [3]: s[2:3]
Out[3]: 'c'
In [4]: s[49:50]
Out[4]: ''
As for how to fix it, there are a number of interesting methods. Your code functions just fine with a few modifications. One thing you could do is create a mapping of characters that are already 'rotated' 13 positions:
alph = 'abcdefghijklmnopqrstuvwxyz'
coded = 'nopqrstuvwxyzabcdefghijklm'
All we did here is split the original list into halves of 13 and then swap them - we now know that if we take a letter like a and get its position (0), the same position in the coded list will be the rot13 value. As this is for an assignment I won't spell out how to do it, but see if that gets you on the right track (and #Makoto's suggestion is a perfect way to check your results).
This line
ccrypt = alph[ i+13 : i+14 ]
does not do what you think it does - it returns a string slice from i+13 to i+14, but if these indices are greater than the length of the string, the slice will be empty:
"abc"[5:6] #returns ''
This means your solution turns everything from n onward into an empty string, which produces your observed output.
The correct way of implementing this would be (1.) using a modulo operation to constrain the index to a valid number and (2.) using simple character access instead of string slices, which is easier to read, faster, and throws an IndexError for invalid indices, meaning your error would have been obvious.
ccrypt = alph[(i+13) % 26]
If you're doing this as an exercise for a course in Python, ignore this, but just saying...
>>> import codecs
>>> codecs.encode('Some text', 'rot13')
'Fbzr grkg'
>>>
Hey, I'm trying to decode a multilevel Caesar cipher. By that I mean a string of letters could have been shifted several times, so if I say apply_shifts[(2,3),(4,5)], that means I shift everything from the 2nd letter by 3 followed by everything from the 4th letter by 5. Here's my code so far.
def find_best_shifts_rec(wordlist, text, start):
"""
Given a scrambled string and a starting position from which
to decode, returns a shift key that will decode the text to
words in wordlist, or None if there is no such key.
Hint: You will find this function much easier to implement
if you use recursion.
wordlist: list of words
text: scambled text to try to find the words for
start: where to start looking at shifts
returns: list of tuples. each tuple is (position in text, amount of shift)
"""
for shift in range(27):
text=apply_shifts(text, [(start,-shift)])
#first word is text.split()[0]
#test if first word is valid. if not, go to next shift
if is_word(wordlist,text.split()[0])==False:
continue
#enter the while loop if word is valid, otherwise never enter and go to the next shift
i=0
next_index=0
shifts={}
while is_word(wordlist,text.split()[i])==True:
next_index+= len(text.split()[i])
i=i+1
#once a word isn't valid, then try again, starting from the new index.
if is_word(wordlist,text.split()[i])==False:
shifts[next_index]=i
find_best_shifts_rec(wordlist, text, next_index)
return shifts
My problems are
1) my code isn't running properly and I don't understand why it is messing up (it's not entering my while loop)
and
2) I don't know how to test whether none of my "final shifts" (e.g. the last part of my string) are valid words and I also don't know how to go from there to the very beginning of my loop again.
Help would be much appreciated.
I think the problem is that you always work on the whole text, but apply the (new) shifting at some start inside of the text. So your check is_word(wordlist,text.split()[0]) will always check the first word, which is - of course - a word after your first shift.
What you need to do instead is to get the first word after your new starting point, so check the actually unhandled parts of the text.
edit
Another problem I noticed is the way you are trying out to find the correct shift:
for shift in range(27):
text=apply_shifts(text, [(start,-shift)])
So you basically want to try all shifts from 0 to 26 until the first word is accepted. It is okay to do it like that, but note that after the first tried shifting, the text has changed. As such you are not shifting it by 1, 2, 3, ... but by 1, 3, 6, 10, ... which is of course not what you want, and you will of course miss some shifts while doing some identical ones multiple times.
So you need to temporarily shift your text and check the status of that temporary text, before you continue to work with the text. Or alternatively, you always shift by 1 instead.
edit²
And another problem I noticed is with the way you are trying to use recursion to get your final result. Usually recursion (with a result) works the way that you keep calling the function itself and pass the return values along, or collect the results. In your case, as you want to have multiple values, and not just a single value from somewhere inside, you need to collect each of the shifting results.
But right now, you are throwing away the return values of the recursive calls and just return the last value. So store all the values and make sure you don't lose them.
Pseudo-code for recursive function:
coded_text = text from start-index to end of string
if length of coded_text is 0, return "valid solution (no shifts)"
for shift in possible_shifts:
decoded_text = apply shift of (-shift) to coded_text
first_word = split decoded_text and take first piece
if first_word is a valid word:
rest_of_solution = recurse on (text preceding start-index)+decoded_text, starting at start+(length of first_word)+1
if rest_of_solution is a valid solution
if shift is 0
return rest_of_solution
else
return (start, -shift mod alphabet_size) + rest_of_solution
# no valid solution found
return "not a valid solution"
Note that this is guaranteed to give an answer composed of valid words - not necessarily the original string. One specific example: 'a add hat' can be decoded in place of 'a look at'.