I want to replace each character of a string by a different one, shifted over in the alphabet. I'm shifting by 2 in the example below, so a -> c, b -> d, etc.
I'm trying to use a regular expression and the sub function to accomplish this, but I'm getting an error.
This is the code that I have:
p = re.compile(r'(\w)')
test = p.sub(chr(ord('\\1') + 2), text)
print test
where the variable text is an input string.
And I'm getting this error:
TypeError: ord() expected a character, but string of length 2 found
I think the problem is that I the ord function is being called on the literal string "\1" and not on the \w character matched by the regular expression. What is the right way to do this?
It won't work like this. Python first runs chr(ord('\\') + 2 and then passes that result to p.sub.
You need to put it in a separate function or use an anonymous function (lambda):
p = re.compile(r'(\w)')
test = p.sub(lambda m: chr(ord(m.group(1)) + 2), text)
print test
Or better yet use maketrans instead of regular expressions:
import string
shift = 2
t = string.maketrans(string.ascii_lowercase, string.ascii_lowercase[shift:] +
string.ascii_lowercase[:shift])
string.translate(text, t)
Full version
def shouldShift(char):
return char in string.lowercase
def caesarShift(string, n):
def letterToNum(char):
return ord(char)-ord('a')
def numToLetter(num):
return chr(num+ord('a'))
def shiftByN(char):
return numToLetter((letterToNum(char)+n) % 26)
return ''.join((shiftByN(c) if shouldShift(c) else c) for c in string.lower())
One-liner
If you really want a one-liner, it would be this, but I felt it was uglier:
''.join(chr((ord(c)-ord('a')+n)%26 + ord('a')) for c in string)
Demo
>>> caesarShift(string.lowercase, 3)
'defghijklmnopqrstuvwxyzabc'
Try this, using list comprehensions:
input = 'ABC'
''.join(chr(ord(c)+2) for c in input)
> 'CDE'
It's simpler than using regular expressions.
def CaesarCipher(s1,num):
new_str = ''
for i in s1:
asc_V = ord(i)
if asc_V in range(65, 91):
if asc_V + num > 90:
asc_val = 65 + (num - 1 - (90 - asc_V))
else:
asc_val = asc_V + num
new_str = new_str + chr(asc_val)
elif (asc_V in range(97, 123)):
if asc_V + num > 122:
asc_val = 97 + (num - 1 - (122 - asc_V))
else:
asc_val = asc_V + num
new_str = new_str + chr(asc_val)
else:
new_str = new_str + i
return new_str
print (CaesarCipher("HEllo", 4))
print (CaesarCipher("xyzderBYTE", 2))
Related
Given a string with variables and parentheses:
'a((bc)((de)f))'
and a string of operators:
'+-+-+'
I would like to insert each operator (in order) into the first string between the following patterns (where char is defined as a character that is not an open or close parenthesis):
char followed by char
char followed by '('
')' followed by '('
')' followed by char
To give the result:
'a+((b-c)+((d-e)+f))'
Edit: I got it to work with the following code, but is there a more elegant way to do this, i.e. without a for loop?
x = 'a((bc)((de)f))'
operators = '+-+-+'
y = x
z = 0
for i in range(len(x)):
if i < len(x)-1:
xx = x[i]
isChar = True if x[i] != '(' and x[i] != ')' else False
isPO = True if x[i] == '(' else False
isPC = True if x[i] == ')' else False
isNxtChar = True if x[i+1] != '(' and x[i+1] != ')' else False
isNxtPO = True if x[i+1] == '(' else False
isNxtPC = True if x[i+1] == ')' else False
if (isChar and (isNxtChar or isNxtPO)) or (isPC and (isNxtPO or isNxtChar)):
aa = operators[z]
split1 = x[:i+1]
split2 = x[i+1:]
y = y[:i+z+1] + operators[z] + x[i+1:]
if z+1 < len(operators):
z+=1
print (y)
initialExpr = 'a((bc)((de)f))'
operators = '+-+-+'
countOp = 0
countChar = 0
for char in initialExpr:
countChar += 1
print(char,end='')
if countChar < len(initialExpr) and (char == ')' or char.isalpha()) and (initialExpr[countChar] == '(' or initialExpr[countChar].isalpha()):
print(operators[countOp], end='')
countOp += 1
This should do the job.
Assumption is the the variables, parenthesis and operators are in the right order and number.
One-liner using re:
import re
s = "a((bc)((de)f))"
o = "+-+-+"
print(
re.sub(
r"(?:[a-z](?:\(|[a-z]))|(?:\)(?:\(|[a-z]))",
lambda g, i=iter(o): next(i).join(g.group()),
s,
)
)
Prints:
a+((b-c)+((d-e)+f))
You can use a regex matching the pairs of characters inside which you want to insert an operator.
Then, you can use re.sub with a replacement function that joins the two characters with the next operator.
We can use a class with a __call__ method, that uses an iterator on the operators:
import re
rules = re.compile(r'[a-z]{2}|[a-z]\(|\)\(|\)[a-z]')
class Replace:
def __init__(self, operators):
self.it_operators = iter(operators)
def __call__(self, match):
return next(self.it_operators).join(match.group())
variables = 'a((bc)((de)f))'
operators = '+-+-+'
print(rules.sub(Replace(operators), variables))
# a+((b-c)+((d-e)+f))
Replace(operators) returns a callable Replace instance with an it_operators attribute that is an iterator, ready to iterate on the operators.
For each matching pair of characters, sub calls this instance, and its __call__ method returns the replacement for the two characters, that it builds by joining them with the next operator.
I want to replace all substring occurrences in a string, but I wish not to use the replace method. At the moment, experiments have led me to this:
def count_substrings_and_replace(string, substring, rpl=None):
string_size = len(string)
substring_size = len(substring)
count = 0
_o = string
for i in range(0, string_size - substring_size + 1):
if string[i:i + substring_size] == substring:
if rpl:
print(_o[:i] + rpl + _o[i + substring_size:])
count += 1
return count, _o
count_substrings_and_replace("aaabaaa", "aaa", "ddd")
but I have output like this:
dddbaaa
aaabddd
not dddbddd.
Update 1:
I figured out that I can only replace correctly with a string of the same length of substring. For example for count_substrings_and_replace("aaabaaa", "aaa", "d") I got output: (2, 'dbaad') not dbd
Update 2:
Issue described in update 1 did appear because of string comparing relative to the original string (line 8) that does not change throughout the process.
Fixed:
def count_substrings_and_replace(string, substring, rpl=None):
string_size = len(string)
substring_size = len(substring)
count = 0
_o = string
for i in range(0, string_size - substring_size + 1):
if _o[i:i + substring_size] == substring:
if rpl:
_o = _o[:i] + rpl + _o[i + substring_size:]
count += 1
return count, _o
count_substrings_and_replace("aaabaaa", "aaa", "d")
Output: (2, dbd)
You never update the value of _o when a match is found, you're only printing out what it'd look like if it was to be replaced. Instead, inside that innermost if statement should be two lines like:
_o = _o[:i] + rpl + _o[i + substring_size:]
print(_o)
That would print the string every time a match is found and replaced, moving the print statement to run after the for loop would make it only run once the entire string was parsed and replaced appropriately.
Just my mistake. I had to pass the value to the variable on each iteration not print:
_o = _o[:i] + rpl + _o[i + substring_size:]
I have a string:
a = babababbaaaaababbbab
And it needs to be shortened so it looks like this:
(ba)3(b)2(a)5ba(b)3ab
So basically it needs to take all repeating characters and write how many times they are repeating instead of printing them.
I managed to do half of this:
from itertools import groupby
a = 'babababbaaaaababbbab'
grouped = ["".join(grp) for patt,grp in groupby(a)]
solved = [str(len(i)) + i[0] for i in grouped if len(i) >= 2]
but this only does this for characters that are repeating but not patterns. I get it that I could do this by finding 'ab' pattern in string but this needs to be viable for every possible string. Has anyone encountered something similar?
You can easily do this with regex:
>>> repl= lambda match:'({}){}'.format(match.group(1), len(match.group())//len(match.group(1)))
>>> re.sub(r'(.+?)\1+', repl, 'babababbaaaaababbbab')
'(ba)3(b)2(a)5ba(b)3ab'
Not much to explain here. The pattern (.+?)\1+ matches repeating character sequences, and the lambda function rewrites them to the form (sequence)number.
This is what I came up with, the code is a mess, but I just wanted to have a quick fun, so I let it be like this
a = 'babababbaaaaababbbab'
def compress(text):
for i in range(1, len(text) // 2):
for j, c in enumerate(text[:-i if i > 0 else len(text)]):
pattern = text[j:i+j]
new_text = pattern_repeats_processor(pattern, text, j)
if new_text != text:
return compress(new_text)
return text
def pattern_repeats_processor(pattern, text, i):
chunk = pattern
count = 1
while chunk == pattern and i + (count + 1) * len(pattern) < len(text):
chunk = text[i + count * len(pattern): i + (count + 1) * len(pattern)]
if chunk == pattern:
count = count + 1
else:
break
if count > 1:
return text[:i] + '(' + pattern + ')' + str(count) + text[i + (count + 0) * len(pattern):]
return text
print(compress(a))
print(a)
It makes
babababbaaaaababbbab =>
(ba)3(b)2(a)5ba(b)3ab
P.S. Of course answer of Rowing is miles better, pretty impressive even
I'm not sure what exactly you're looking for but here hope this helps.
A=a.count('a')
B=a.count('b')
AB=a.count('ab')
BAB=a.count('bab')
BA=a.count('ba')
print(A,'(a)',B,'(b)',AB,'(ab)',BAB,'(bab)',BA,'(ba)')
I am looking at this python program and almost understood its flow but I am unable to understand ch[:prefix_len%len(ch)] in the following part:
else:
prefix = ch * (prefix_len/len(ch)) + ch[:prefix_len%len(ch)]
suffix = ch * (suffix_len/len(ch)) + ch[:suffix_len%len(ch)]
Here is the context:
def banner(text, ch='=', length=78):
if text is None:
return ch * length
elif len(text) + 2 + len(ch)*2 > length:
# Not enough space for even one line char (plus space) around text.
return text
else:
remain = length - (len(text) + 2)
prefix_len = remain / 2
suffix_len = remain - prefix_len
if len(ch) == 1:
prefix = ch * prefix_len
suffix = ch * suffix_len
else:
prefix = ch * (prefix_len/len(ch)) + ch[:prefix_len%len(ch)]
suffix = ch * (suffix_len/len(ch)) + ch[:suffix_len%len(ch)]
return prefix + ' ' + text + ' ' + suffix
Could somebody please help me to understand this. Thank you.
Sure!
ch[:prefix_len % len(ch)] is accessing a slice of the ch sequence starting from the beginning (since there's no value before the : and going to one character before the index defined by prefix_len % len(ch).
This value is prefix_len (defined earlier as the length of the prefix, not surprisingly) modulus the length of ch. (Think of it as the remainder left over after integer division of prefix_len / len(ch).
I ran the function like: print(banner("Hello everyone!", "1234")) and got:
123412341234123412341234123412 Hello everyone! 1234123412341234123412341234123
so you can see it's fitting the ch value (1234 in my case) in the space it has.
They're adding the remainder.
Say prefix = 10, and ch = '#&+'
If you just multiply ch by prefix_len / len(ch), you'll get 9, but you know you need 10.
So ch[:prefix_len % len(ch)] is just indexing into ch string for the remainder.
Make sense?
I'm new to Python, coming from Java and C. How can I increment a char? In Java or C, chars and ints are practically interchangeable, and in certain loops, it's very useful to me to be able to do increment chars, and index arrays by chars.
How can I do this in Python? It's bad enough not having a traditional for(;;) looper - is there any way I can achieve what I want to achieve without having to rethink my entire strategy?
In Python 2.x, just use the ord and chr functions:
>>> ord('c')
99
>>> ord('c') + 1
100
>>> chr(ord('c') + 1)
'd'
>>>
Python 3.x makes this more organized and interesting, due to its clear distinction between bytes and unicode. By default, a "string" is unicode, so the above works (ord receives Unicode chars and chr produces them).
But if you're interested in bytes (such as for processing some binary data stream), things are even simpler:
>>> bstr = bytes('abc', 'utf-8')
>>> bstr
b'abc'
>>> bstr[0]
97
>>> bytes([97, 98, 99])
b'abc'
>>> bytes([bstr[0] + 1, 98, 99])
b'bbc'
"bad enough not having a traditional for(;;) looper"?? What?
Are you trying to do
import string
for c in string.lowercase:
...do something with c...
Or perhaps you're using string.uppercase or string.letters?
Python doesn't have for(;;) because there are often better ways to do it. It also doesn't have character math because it's not necessary, either.
Check this: USING FOR LOOP
for a in range(5):
x='A'
val=chr(ord(x) + a)
print(val)
LOOP OUTPUT: A B C D E
I came from PHP, where you can increment char (A to B, Z to AA, AA to AB etc.) using ++ operator. I made a simple function which does the same in Python. You can also change list of chars to whatever (lowercase, uppercase, etc.) is your need.
# Increment char (a -> b, az -> ba)
def inc_char(text, chlist = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'):
# Unique and sort
chlist = ''.join(sorted(set(str(chlist))))
chlen = len(chlist)
if not chlen:
return ''
text = str(text)
# Replace all chars but chlist
text = re.sub('[^' + chlist + ']', '', text)
if not len(text):
return chlist[0]
# Increment
inc = ''
over = False
for i in range(1, len(text)+1):
lchar = text[-i]
pos = chlist.find(lchar) + 1
if pos < chlen:
inc = chlist[pos] + inc
over = False
break
else:
inc = chlist[0] + inc
over = True
if over:
inc += chlist[0]
result = text[0:-len(inc)] + inc
return result
There is a way to increase character using ascii_letters from string package which ascii_letters is a string that contains all English alphabet, uppercase and lowercase:
>>> from string import ascii_letters
>>> ascii_letters[ascii_letters.index('a') + 1]
'b'
>>> ascii_letters
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
Also it can be done manually;
>>> letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> letters[letters.index('c') + 1]
'd'
def doubleChar(str):
result = ''
for char in str:
result += char * 2
return result
print(doubleChar("amar"))
output:
aammaarr
For me i made the fallowing as a test.
string_1="abcd"
def test(string_1):
i = 0
p = ""
x = len(string_1)
while i < x:
y = (string_1)[i]
i=i+1
s = chr(ord(y) + 1)
p=p+s
print(p)
test(string_1)