iterating replacing string in a text - python

I’m writing a program that has to replace the string “+” by “!”, and strings “*+” by “!!” in a particular text. As an example, I need to go from:
some_text = ‘here is +some*+ text and also +some more*+ text here’
to
some_text_new = ‘here is !some!! text and also !some more!! text here’
You’ll notice that “+” and “*+” enclose particular words in my text. After I run the program, those words need be enclosed between “!” and “!!” instead.
I wrote the following code but it iterates several times before giving the right output. How can I avoid that iteration?….
def many_cues(value):
if has_cue_marks(value) is True:
add_text(value)
#print value
def has_cue_marks(value):
return '+' in value and'+*' in value
def add_text(value):
n = '+'
m = "+*"
text0 = value
for n in text0:
text1 = text0.replace(n, ‘!', 3)
print text1
for m in text0:
text2 = text0.replace(m, ‘!!’, 3)
print text2

>>> x = 'here is +some*+ text and also +some more*+ text here'
>>> x = x.replace('*+','!!')
>>> x
'here is +some!! text and also +some more!! text here'
>>> x = x.replace('+','!')
>>> x
'here is !some!! text and also !some more!! text here'
The final argument to replace is optional - if you leave it out, it will replace all instances of the word. So, just use replace on the larger substring first so you don't accidentally take out some of the smaller, then use replace on the smaller, and you should be all set.

It can be done using regex groups
import re
def replacer(matchObj):
if matchObj.group(1) == '*+':
return '!!'
elif matchObj.group(2) == '+'
return '!'
text = 'here is +some*+ text and also +some more*+ text here'
replaced = re.sub(r'(\*\+)|(\+)', replacer, text)
Notice that the order of the groups are important since you have common characters in the two patterns you want to replace

Related

Replace string in list using dictionary in Python

How can I replace string in list using dictionary?
I have
text = ["h#**o+","+&&&orld"]
replacement = {"#":"e","*":"l","+":"w","&":""}
I want:
correct = ["Hellow
World"]
I have try:
def correct(text,replacement):
for word, replacement in replacement.items():
text = text.replace(word, replacement)
But:
AttributeError: 'list' object has no attribute 'replace'
What you have is mostly correct except your correct function seems to be wanting to correct only a single str (e.g. "h#**o+" => "hellow"), whereas your variable text is currently a list or strs. So if you want to get "hellow world" you need to call correct multiple times to get a list of corrected words, which you can then join into a string.
Try this runnable example!
#!/usr/bin/env python
words = ["h#**o+","+&&&orld"]
replacement = {"#":"e","*":"l","+":"w","&":""}
def correct(text,replacement):
for word, replacement in replacement.items():
text = text.replace(word, replacement)
return text
def correct_multiple(words, replacement):
new_words = [correct(word, replacement) for word in words] # get a list of results
combined_str = " ".join(new_words) # join the list into a string
return combined_str
output = correct_multiple(words, replacement)
print(f"{output=}")
<script src="https://modularizer.github.io/pyprez/pyprez.min.js"></script>
You can do this too:
text = ["h#**o+","+&&&orld"]
replacement = {"#":"e","*":"l","+":"w","&":""}
string1 = " ".join(text) # join the words into one string
string2 = string1.translate(string1.maketrans(replacement))
string3 = string2.title()
print(string1 + '\n' + string2 + '\n' + string3)
# h#**o+ +&&&orld
# hellow world
# Hellow World
I've separated the proceedings into 3 successive steps to demonstrate the effect of each step.
text is a LIST of strings, not a string. You can't call string methods on it.
text[0].replace() would be a thing...

Clean long string from spaces and tab in python

supposing to have a long string to create and this string is within a method of a class, what is the best way to write the code?
def printString():
mystring = '''title\n
{{\\usepackage}}\n
text continues {param}
'''.format(param='myParameter')
return mystring
this method is well formatted but the final string has unwanted spaces:
a = printString()
print(a)
title
{\usepackage}
text continues myParameter
while this method gives the corrected results but the code can become messy if the string(s) is long:
def printString():
mystring = '''title\n
{{\\usepackage}}\n
text continues {param}
'''.format(param='myParameter')
return mystring
a = printString()
print(a)
title
{\usepackage}
text continues myParameter
some hints to have a good code quality and the results?
Try enclosing the string you want with brackets, like so:
def printString():
mystring = ('title\n'
'{{\\usepackage}}\n'
'text continues {param}').format(param='myParameter')
return mystring
This would allow you to break the string to several lines while c=having control over the whitespace.
You can use brackets to maintain tidiness of long strings inside functions.
def printString():
mystring = ("title\n"
"{{\\usepackage}}\n"
"text continues {param}"
).format(param='myParameter')
return (mystring)
print(printString())
Results in:
title
{\usepackage}
text continues myParameter
You may also wish to explicitly use the + symbol to represent string concatenation, but that changes this from a compile time operation to a runtime operation. Source
def printString():
mystring = ("title\n" +
"{{\\usepackage}}\n" +
"text continues {param}"
).format(param='myParameter')
return (mystring)
You can use re.sub to cleanup any spaces and tabs at the beginning of each lines
>>> import re
>>> def printString():
... mystring = '''title\n
... {{\\usepackage}}\n
... text continues {param}
... '''.format(param='myParameter')
...
... return re.sub(r'\n[ \t]+', '\n', mystring)
...
This gives the following o/p
>>> a = printString()
>>> print (a)
title
{\usepackage}
text continues myParameter

Python Regular Expression: Replace Withing a group

Is there a way to do substitution on a group?
Say I am trying to insert a link into text, based on custom formatting. So, given something like this:
This is a random text. This should be a [[link somewhere]]. And some more text at the end.
I want to end up with
This is a random text. This should be a link somewhere. And some more text at the end.
I know that '\[\[(.*?)\]\]' will match stuff within square brackets as group 1, but then I want to do another substitution on group 1, so that I can replace space with _.
Is that doable in a single re.sub regex expression?
You can use a function as a replacement instead of string.
>>> import re
>>> def as_link(match):
... link = match.group(1)
... return '{}'.format(link.replace(' ', '_'), link)
...
>>> text = 'This is a random text. This should be a [[link somewhere]]. And some more text at the end.'
>>> re.sub(r'\[\[(.*?)\]\]', as_link, text)
'This is a random text. This should be a link somewhere. And some more text at the end.'
You could do something like this.
import re
pattern = re.compile(r'\[\[([^]]+)\]\]')
def convert(text):
def replace(match):
link = match.group(1)
return '{}'.format(link.replace(' ', '_'), link)
return pattern.sub(replace, text)
s = 'This is a random text. This should be a [[link somewhere]]. .....'
convert(s)
See working demo

Search and replace with "whole word only" option [duplicate]

This question already has answers here:
Match a whole word in a string using dynamic regex
(1 answer)
Word boundary with words starting or ending with special characters gives unexpected results
(2 answers)
Closed 4 years ago.
I have a script that runs into my text and search and replace all the sentences I write based in a database.
The script:
with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
for l in f:
s = l.split('*')
editor.replace(s[0],s[1])
And the Database example:
Event*Evento*
result*resultado*
And so on...
Now what is happening is that I need the "whole word only" in that script, because I'm finding myself with problems.
For example with Result and Event, because when I replace for Resultado and Evento, and I run the script one more time in the text the script replace again the Resultado and Evento.
And the result after I run the script stays like this Resultadoado and Eventoo.
Just so you guys know.. Its not only for Event and Result, there is more then 1000+ sentences that I already set for the search and replace to work..
I don't need a simples search and replace for two words.. because I'm going to be editing the database over and over for different sentences..
You want a regular expression. You can use the token \b to match a word boundary: i.e., \bresult\b would match only the exact word "result."
import re
with open('C:/Users/User/Desktop/Portuguesetranslator.txt') as f:
for l in f:
s = l.split('*')
editor = re.sub(r"\b%s\b" % s[0] , s[1], editor)
Use re.sub:
replacements = {'the':'a',
'this':'that'}
def replace(match):
return replacements[match.group(0)]
# notice that the 'this' in 'thistle' is not matched
print re.sub('|'.join(r'\b%s\b' % re.escape(s) for s in replacements),
replace, 'the cat has this thistle.')
Prints
a cat has that thistle.
Notes:
All the strings to be replaced are joined into a single pattern so
that the string needs to be looped over just once.
The source strings are passed to re.escape to make avoid
interpreting them as regular expressions.
The words are surrounded by r'\b' to make sure matches are for
whole words only.
A replacement function is used so that any match can be replaced.
Use re.sub instead of normal string replace to replace only whole words.So your script,even if it runs again will not replace the already replaced words.
>>> import re
>>> editor = "This is result of the match"
>>> new_editor = re.sub(r"\bresult\b","resultado",editor)
>>> new_editor
'This is resultado of the match'
>>> newest_editor = re.sub(r"\bresult\b","resultado",new_editor)
>>> newest_editor
'This is resultado of the match'
It is very simple. use re.sub, don't use replace.
import re
replacements = {r'\bthe\b':'a',
r'\bthis\b':'that'}
def replace_all(text, dic):
for i, j in dic.iteritems():
text = re.sub(i,j,text)
return text
replace_all("the cat has this thistle.", replacements)
It will print
a cat has that thistle.
import re
match = {} # create a dictionary of words-to-replace and words-to-replace-with
f = open("filename", "r")
data = f.read() # string of all file content
def replace_all(text, dic):
for i, j in dic.items():
text = re.sub(r"\b%s\b" % i, j, text)
# r"\b%s\b"% enables replacing by whole word matches only
return text
data = replace_all(data, match)
print(data) # you can copy and paste the result to whatever file you like

parsing a line of text to get a specific number

I have a line of text in the form " some spaces variable = 7 = '0x07' some more data"
I want to parse it and get the number 7 from "some variable = 7". How can this be done in python?
I would use a simpler solution, avoiding regular expressions.
Split on '=' and get the value at the position you expect
text = 'some spaces variable = 7 = ...'
if '=' in text:
chunks = text.split('=')
assignedval = chunks[1]#second value, 7
print 'assigned value is', assignedval
else:
print 'no assignment in line'
Use a regular expression.
Essentially, you create an expression that goes something like "variable = (\d+)", do a match, and then take the first group, which will give you the string 7. You can then convert it to an int.
Read the tutorial in the link above.
Basic regex code snippet to find numbers in a string.
>>> import re
>>> input = " some spaces variable = 7 = '0x07' some more data"
>>> nums = re.findall("[0-9]*", input)
>>> nums = [i for i in nums if i] # remove empty strings
>>> nums
['7', '0', '07']
Check out the documentation and How-To on python.org.

Categories