Make multiple modifications to string: how to, being inmutable in Python? - python

I'm new to Python, so maybe I'm asking for something very easy but I can't think of the problem in a Python way.
I have a compressed string. The idea is, if a character gets repeated 4-15 times, I make this change:
'0000' ---> '0|4'
If more than 15 times, I use a slash and two digits to represent the amount (working with hexadecimal values):
'00...(16 times)..0' ---> '0/10'
So, accustomed to other languages, my approach is the following:
def uncompress(line):
verticalBarIndex = line.index('|')
while verticalBarIndex!=-1:
repeatedChar = line[verticalBarIndex-1:verticalBarIndex]
timesRepeated = int(line[verticalBarIndex+1:verticalBarIndex+2], 16)
uncompressedChars = [repeatedChar]
for i in range(timesRepeated):
uncompressedChars.append(repeatedChar)
uncompressedString = uncompressedChars.join()
line = line[:verticalBarIndex-1] + uncompressedString + line[verticalBarIndex+2:]
verticalBarIndex = line.index('|') #next one
slashIndex = line.index('/')
while slashIndex!=-1:
repeatedChar = line[slashIndex-1:slashIndex]
timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16)
uncompressedChars = [repeatedChar]
for i in range(timesRepeated):
uncompressedChars.append(repeatedChar)
uncompressedString = uncompressedChars.join()
line = line[:slashIndex-1] + uncompressedString + line[slashIndex+3:]
slashIndex = line.index('/') #next one
return line
Which I know it is wrong, since strings are inmutable in Python, and I am changing line contents all the time until no '|' or '/' are present.
I know UserString exists, but I guess there is an easier and more Pythonish way of doing it, which would be great to learn.
Any help?

The changes necessary to get your code running with the sample strings:
Change .index() to .find(). .index() raises an exception if the substring isn't found, .find() returns -1.
Change uncompressedChars.join() to ''.join(uncompressedChars).
Change timesRepeated = int(line[slashIndex+1:verticalBarIndex+3], 16) to timesRepeated = int(line[slashIndex+1:slashIndex+3], 16)
Set uncompressedChars = [] to start with, instead of uncompressedChars = [repeatedChar].
This should get it working properly. There are a lot of places where the code an be tidied and otpimised, but this works.

The most common pattern I have seen is to use a list of characters. Lists are mutable and work as you describe above.
To create a list from a string
mystring = 'Hello'
mylist = list(mystring)
To create a string from a list
mystring = ''.join(mylist)

You should build a list of substrings as you go and join them at the end:
def uncompress(line):
# No error checking, sorry. Will crash with empty strings.
result = []
chars = iter(line)
prevchar = chars.next() # this is the previous character
while True:
try:
curchar = chars.next() # and this is the current character
if curchar == '|':
# current character is a pipe.
# Previous character is the character to repeat
# Get next character, the number of repeats
curchar = chars.next()
result.append(prevchar * int(curchar, 16))
elif curchar == '/':
# current character is a slash.
# Previous character is the character to repeat
# Get two next characters, the number of repeats
curchar = chars.next()
nextchar = chars.next()
result.append(prevchar * int(curchar + nextchar, 16))
else:
# No need to repeat the previous character, append it to result.
result.append(curchar)
prevchar = curchar
except StopIteration:
# No more characters. Append the last one to result.
result.append(curchar)
break
return ''.join(result)

Related

how to recursively create nested list from string input

So, I would like to convert my string input
'f(g,h(a,b),a,b(g,h))'
into the following list
['f',['g','h',['a','b'],'a','b',['g','h']]]
Essentially, I would like to replace all '(' into [ and all ')' into ].
I have unsuccessfully tried to do this recursively. I thought I would iterate through all the variables through my word and then when I hit a '(' I would create a new list and start extending the values into that newest list. If I hit a ')', I would stop extending the values into the newest list and append the newest list to the closest outer list. But I am very new to recursion, so I am struggling to think of how to do it
word='f(a,f(a))'
empty=[]
def newlist(word):
listy=[]
for i, letter in enumerate(word):
if letter=='(':
return newlist([word[i+1:]])
if letter==')':
listy.append(newlist)
else:
listy.extend(letter)
return empty.append(listy)
Assuming your input is something like this:
a = 'f,(g,h,(a,b),a,b,(g,h))'
We start by splitting it into primitive parts ("tokens"). Since your tokens are always a single symbol, this is rather easy:
tokens = list(a)
Now we need two functions to work with the list of tokens: next_token tells us which token we're about to process and pop_token marks a token as processed and removes it from the list:
def next_token():
return tokens[0] if tokens else None
def pop_token():
tokens.pop(0)
Your input consist of "items", separated by a comma. Schematically, it can be expressed as
items = item ( ',' item )*
In the python code, we first read one item and then keep reading further items while the next token is a comma:
def items():
result = [item()]
while next_token() == ',':
pop_token()
result.append(item())
return result
An "item" is either a sublist in parentheses or a letter:
def item():
return sublist() or letter()
To read a sublist, we check if the token is a '(', the use items above the read the content and finally check for the ')' and panic if it is not there:
def sublist():
if next_token() == '(':
pop_token()
result = items()
if next_token() == ')':
pop_token()
return result
raise SyntaxError()
letter simply returns the next token. You might want to add some checks here to make sure it's indeed a letter:
def letter():
result = next_token()
pop_token()
return result
You can organize the above code like this: have one function parse that accepts a string and returns a list and put all functions above inside this function:
def parse(input_string):
def items():
...
def sublist():
...
...etc
tokens = list(input_string)
return items()
Quite an interesting question, and one I originally misinterpreted. But now this solution works accordingly. Note that I have used list concatenation + operator for this solution (which you usually want to avoid) so feel free to improve upon it however you see fit.
Good luck, and I hope this helps!
# set some global values, I prefer to keep it
# as a set incase you need to add functionality
# eg if you also want {{a},b} or [ab<c>ed] to work
OPEN_PARENTHESIS = set(["("])
CLOSE_PARENTHESIS = set([")"])
SPACER = set([","])
def recursive_solution(input_str, index):
# base case A: when index exceeds or equals len(input_str)
if index >= len(input_str):
return [], index
char = input_str[index]
# base case B: when we reach a closed parenthesis stop this level of recursive depth
if char in CLOSE_PARENTHESIS:
return [], index
# do the next recursion, return it's value and the index it stops at
recur_val, recur_stop_i = recursive_solution(input_str, index + 1)
# with an open parenthesis, we want to continue the recursion after it's associated
# closed parenthesis. and also the recur_val should be within a new dimension of the list
if char in OPEN_PARENTHESIS:
continued_recur_val, continued_recur_stop_i = recursive_solution(input_str, recur_stop_i + 1)
return [recur_val] + continued_recur_val, continued_recur_stop_i
# for spacers eg "," we just ignore it
if char in SPACER:
return recur_val, recur_stop_i
# and finally with normal characters, we just extent it
return [char] + recur_val, recur_stop_i
You can get the expected answer using the following code but it's still in string format and not a list.
import re
a='(f(g,h(a,b),a,b(g,h))'
ans=[]
sub=''
def rec(i,sub):
if i>=len(a):
return sub
if a[i]=='(':
if i==0:
sub=rec(i+1,sub+'[')
else:
sub=rec(i+1,sub+',[')
elif a[i]==')':
sub=rec(i+1,sub+']')
else:
sub=rec(i+1,sub+a[i])
return sub
b=rec(0,'')
print(b)
b=re.sub(r"([a-z]+)", r"'\1'", b)
print(b,type(b))
Output
[f,[g,h,[a,b],a,b,[g,h]]
['f',['g','h',['a','b'],'a','b',['g','h']] <class 'str'>

I'm trying to replace a character in Python while iterating over a string and but it doesn't work

This is the code I currently have:
letter = raw_input("Replace letter?")
traversed = raw_input("Traverse in?")
replacewith = raw_input("Replace with?")
traverseint = 0
for i in traversed:
traverseint = traverseint + 1
if i == letter:
traversed[traverseint] = replacewith
print i
print(traversed)
str in python are immutable by nature. That means, you can not modify the existing object. For example:
>>> 'HEllo'[3] = 'o'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment
In order to replace the character in the string, ideal way is to use str.replace() method. For example:
>>> 'HEllo'.replace('l', 'o')
'HEooo'
Without using str.replace(), you may make your program run by using a temporary string as:
my_str = '' # Temporary string
for i in traversed:
# traverseint = traverseint + 1 # Not required
if i == letter:
i = replacewith
my_str += i
Here my_str will hold the value of transformed traversed. OR, even better way to do this is by transforming the string to list (as mentioned by #chepner), update the values of list and finally join the list to get back the string. For example:
traversed_list = list(traversed)
for i, val in enumerate(traversed_list):
if val == letter:
traversed_list[i] = replacewith
print i
my_str = ''.join(traversed_list)
I can not comment yet, but want add a bit to Moinuddin Quadri answer.
If index of replacement is not required, str.replace() should be a best solution.
If replacement index is required, just use str.index() or str.find() for determine an replacement index, then use slice (see table) to "cut" ends and sum replacement between begin and end, or just call str.replace().
while True:
index = traversed.find(letter)
if index < 0:
break
print index
traversed = traversed[:index] + replacewith + traversed[index + len(letter):]
#or
traversed = traversed.replace(letter, replacewith, 1)
Str is immutable, so direct slice assignment is not possible.
If you want directly modify a string, you should use a mutable type, like bytearray.
To check if string contains a substring you can use in
letter in traversed
"System" does not allow me to post more than 2 links. But all methods I have mentioned are on the same page.
You shouldn't modify containers you are iterating over. And you cant edit strings by position.
Make a copy of the string first and make it a list object
letter = raw_input("Replace letter?")
traversed = raw_input("Traverse in?")
modify = list(traversed)
replacewith = raw_input("Replace with?")
for traverseint,i in enumerate(modify):
if i == letter:
modify[traverseint] = replacewith
print i
print(''.join(modify))
You can also just create empty string and add letters (python 3.5)
letter = input("Replace letter?")
traversed = input("Traverse in?")
replacewith = input("Replace with?")
temp = ''
for i in traversed:
if i == letter:
temp += replacewith
else:
temp += i
print(temp)
We can also define own replace like below:
def replace(str, idx, char):
if -1 < idx < len(str):
return '{str_before_idx}{char}{str_after_idx}'.format(
str_before_idx=str[0:idx],
char=char,
str_after_idx=str[idx+1:len(str)]
)
else:
raise IndexError
Where str is string to be manipulated, idx is an index, char is character to be replaced at index idx.

How do I reverse words in a string with Python

I am trying to reverse words of a string, but having difficulty, any assistance will be appreciated:
S = " what is my name"
def reversStr(S):
for x in range(len(S)):
return S[::-1]
break
What I get now is: eman ym si tahw
However, I am trying to get: tahw is ym eman (individual words reversed)
def reverseStr(s):
return ' '.join([x[::-1] for x in s.split(' ')])
orig = "what is my name"
reverse = ""
for word in orig.split():
reverse = "{} {}".format(reverse, word[::-1])
print(reverse)
Since everyone else's covered the case where the punctuation moves, I'll cover the one where you don't want the punctuation to move.
import re
def reverse_words(sentence):
return re.sub(r'[a-zA-Z]+', lambda x : x.group()[::-1], sentence)
Breaking this down.
re is python's regex module, and re.sub is the function in that module that handles substitutions. It has three required parameters.
The first is the regex you're matching by. In this case, I'm using r'\w+'. The r denotes a raw string, [a-zA-Z] matches all letters, and + means "at least one".
The second is either a string to substitute in, or a function that takes in a re.MatchObject and outputs a string. I'm using a lambda (or nameless) function that simply outputs the matched string, reversed.
The third is the string you want to do a find in a replace in.
So "What is my name?" -> "tahW si ym eman?"
Addendum:
I considered a regex of r'\w+' initially, because better unicode support (if the right flags are given), but \w also includes numbers and underscores. Matching - might also be desired behavior: the regexes would be r'[a-zA-Z-]+' (note trailing hyphen) and r'[\w-]+' but then you'd probably want to not match double-dashes (ie --) so more regex modifications might be needed.
The built-in reversed outputs a reversed object, which you have to cast back to string, so I generally prefer the [::-1] option.
inplace refers to modifying the object without creating a copy. Yes, like many of us has already pointed out that python strings are immutable. So technically we cannot reverse a python string datatype object inplace. However, if you use a mutable datatype, say bytearray for storing the string characters, you can actually reverse it inplace
#slicing creates copy; implies not-inplace reversing
def rev(x):
return x[-1::-1]
# inplace reversing, if input is bytearray datatype
def rev_inplace(x: bytearray):
i = 0; j = len(x)-1
while i<j:
t = x[i]
x[i] = x[j]
x[j] = t
i += 1; j -= 1
return x
Input:
x = bytearray(b'some string to reverse')
rev_inplace(x)
Output:
bytearray(b'esrever ot gnirts emose')
Try splitting each word in the string into a list (see: https://docs.python.org/2/library/stdtypes.html#str.split).
Example:
>>string = "This will be split up"
>>string_list = string.split(" ")
>>string_list
>>['This', 'will', 'be', 'split', 'up']
Then iterate through the list and reverse each constituent list item (i.e. word) which you have working already.
def reverse_in_place(phrase):
res = []
phrase = phrase.split(" ")
for word in phrase:
word = word[::-1]
res.append(word)
res = " ".join(res)
return res
[thread has been closed, but IMO, not well answered]
the python string.lib doesn't include an in place str.reverse() method.
So use the built in reversed() function call to accomplish the same thing.
>>> S = " what is my name"
>>> ("").join(reversed(S))
'eman ym si tahw'
There is no obvious way of reversing a string "truly" in-place with Python. However, you can do something like:
def reverse_string_inplace(string):
w = len(string)-1
p = w
while True:
q = string[p]
string = ' ' + string + q
w -= 1
if w < 0:
break
return string[(p+1)*2:]
Hope this makes sense.
In Python, strings are immutable. This means you cannot change the string once you have created it. So in-place reverse is not possible.
There are many ways to reverse the string in python, but memory allocation is required for that reversed string.
print(' '.join(word[::-1] for word in string))
s1 = input("Enter a string with multiple words:")
print(f'Original:{s1}')
print(f'Reverse is:{s1[::-1]}')
each_word_new_list = []
s1_split = s1.split()
for i in range(0,len(s1_split)):
each_word_new_list.append(s1_split[i][::-1])
print(f'New Reverse as List:{each_word_new_list}')
each_word_new_string=' '.join(each_word_new_list)
print(f'New Reverse as String:{each_word_new_string}')
If the sentence contains multiple spaces then usage of split() function will cause trouble because you won't know then how many spaces you need to rejoin after you reverse each word in the sentence. Below snippet might help:
# Sentence having multiple spaces
given_str = "I know this country runs by mafia "
tmp = ""
tmp_list = []
for i in given_str:
if i != ' ':
tmp = tmp + i
else:
if tmp == "":
tmp_list.append(i)
else:
tmp_list.append(tmp)
tmp_list.append(i)
tmp = ""
print(tmp_list)
rev_list = []
for x in tmp_list:
rev = x[::-1]
rev_list.append(rev)
print(rev_list)
print(''.join(rev_list))
output:
def rev(a):
if a == "":
return ""
else:
z = rev(a[1:]) + a[0]
return z
Reverse string --> gnirts esreveR
def rev(k):
y = rev(k).split()
for i in range(len(y)-1,-1,-1):
print y[i],
-->esreveR gnirts

How to remove substring from a string in python?

How can I remove the all lowercase letters before and after "Johnson" in these strings?
str1 = 'aBcdJohnsonzZz'
str2 = 'asdVJohnsonkkk'
Expected results are as below:
str1 = 'BJohnsonZ'
str2 = 'VJohnson'
You can partition the string, check it had the separator, than translate out lowercase letters, eg:
from string import ascii_lowercase as alc
str1 = 'aBcdJohnsonzZz'
p1, sep, p2 = str1.partition('Johnson')
if sep:
str1 = p1.translate(None, alc) + sep + p2.translate(None, alc)
print str1
str.partition() is your friend here:
def munge(text, match):
prefix, match, suffix = text.partition(match)
prefix = "".join(c for c in prefix if not c.islower())
suffix = "".join(c for c in suffix if not c.islower())
return prefix + match + suffix
Example use:
>>> munge("aBcdJohnsonzZz", "Johnson")
'BJohnsonZ'
>>> munge("asdVJohnsonkkk", "Johnson")
'VJohnson'
import re
def foo(input_st, keep_st):
parts = input_st.split(keep_st)
clean_parts = [re.sub("[a-z]*", "", part) for part in parts]
return keep_st.join(clean_parts)
Other methods using the partition module don't seem to take into account your trigger word being repeated. This example will work in the case you have 'aBcJohnsonDeFJohnsonHiJkL' in the event that, that particular case is of concern to you.
There are a couple of ways you could tackle this. Here's the simplest one I could think of. The idea is to tackle it in three parts. First off, you need to know the middle string. In your case 'Johnson.' Then you can remove the lowercase letters from the part before and the part after.
def removeLowercaseAround(full, middle):
stop_at = full.index(middle) #the beginning of the name
start_again = stop_at+len(middle) #the end of the name
new_str = ''; #the string we'll return at the end
for i in range(stop_at): #for each char until the middle starts
if not full[i].islower(): #if it is not a lowercase char
new_str += full[i] #add it to the end of the new string
new_str+=middle #then add the middle char
for i in range(start_again, len(full)): #do the same thing with the end
if not full[i].islower(): #if it is not a lowercase char
new_str += full[i] #add it to the string
return new_str
print removeLowercaseAround('ABcdJohnsonzZZ', 'Johnson')
Not exactly very simple or streamlined, but you could do this sort of thing (based partially on Zero Piraeus')
(edited to reflect errors)
def remove_lower(string):
return ''.join(filter(str.isupper, string))
def strip_johnson(input_str):
prefix, match, postfix = input_str.partition("Johnson")
return (
remove_lower(prefix) +
match +
remove_lower(postfix)
)

Parsing strings in python

So my problem is this, I have a file that looks like this:
[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1
This would of course translate to
' This is an example file!'
I am looking for a way to parse the original content into the end content, so that a [BACKSPACE] will delete the last character(spaces included) and multiple backspaces will delete multiple characters. The [SHIFT] doesnt really matter as much to me. Thanks for all the help!
Here's one way, but it feels hackish. There's probably a better way.
def process_backspaces(input, token='[BACKSPACE]'):
"""Delete character before an occurence of "token" in a string."""
output = ''
for item in (input+' ').split(token):
output += item
output = output[:-1]
return output
def process_shifts(input, token='[SHIFT]'):
"""Replace characters after an occurence of "token" with their uppecase
equivalent. (Doesn't turn "1" into "!" or "2" into "#", however!)."""
output = ''
for item in (' '+input).split(token):
output += item[0].upper() + item[1:]
return output
test_string = '[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1'
print process_backspaces(process_shifts(test_string))
If you don't care about the shifts, just strip them, load
(defun apply-bspace ()
(interactive)
(let ((result (search-forward "[BACKSPACE]")))
(backward-delete-char 12)
(when result (apply-bspace))))
and hit M-x apply-bspace while viewing your file. It's Elisp, not python, but it fits your initial requirement of "something I can download for free to a PC".
Edit: Shift is trickier if you want to apply it to numbers too (so that [SHIFT]2 => #, [SHIFT]3 => #, etc). The naive way that works on letters is
(defun apply-shift ()
(interactive)
(let ((result (search-forward "[SHIFT]")))
(backward-delete-char 7)
(upcase-region (point) (+ 1 (point)))
(when result (apply-shift))))
This does exactly what you want:
def shift(s):
LOWER = '`1234567890-=[];\'\,./'
UPPER = '~!##$%^&*()_+{}:"|<>?'
if s.isalpha():
return s.upper()
else:
return UPPER[LOWER.index(s)]
def parse(input):
input = input.split("[BACKSPACE]")
answer = ''
i = 0
while i<len(input):
s = input[i]
if not s:
pass
elif i+1<len(input) and not input[i+1]:
s = s[:-1]
else:
answer += s
i += 1
continue
answer += s[:-1]
i += 1
return ''.join(shift(i[0])+i[1:] for i in answer.split("[SHIFT]") if i)
>>> print parse("[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1")
>>> This is an example file!
It seems that you could use a regular expression to search for (something)[BACKSPACE] and replace it with nothing...
re.sub('.?\[BACKSPACE\]', '', YourString.replace('[SHIFT]', ''))
Not sure what you meant by "multiple spaces delete multiple characters".
You need to read the input, extract the tokens, recognize them, and give a meaning to them.
This is how I would do it:
# -*- coding: utf-8 -*-
import re
upper_value = {
1: '!', 2:'"',
}
tokenizer = re.compile(r'(\[.*?\]|.)')
origin = "[SHIFT]this isrd[BACKSPACE][BACKSPACE] an example file[SHIFT]1"
result = ""
shift = False
for token in tokenizer.findall(origin):
if not token.startswith("["):
if(shift):
shift = False
try:
token = upper_value[int(token)]
except ValueError:
token = token.upper()
result = result + token
else:
if(token == "[SHIFT]"):
shift = True
elif(token == "[BACKSPACE]"):
result = result[0:-1]
It's not the fastest, neither the elegant solution, but I think it's a good start.
Hope it helps :-)

Categories