Optional multiline string replacement [duplicate] - python

This question already has answers here:
How do I match any character across multiple lines in a regular expression?
(26 answers)
python regular expression across multiple lines
(2 answers)
Closed 2 years ago.
I have a long string with some placeholders like %name% which should be substituted with a value given by a dict. According to this link, I was able to solve it. Some parts however should only be included in the returned string, if another parameter is True. For example with this formatting:
>>optional:This should only get printed, if 'optional' is True<<
I might get it to work, but I was not able to create a regex expression that works also with multiline
import re
# This works (https://stackoverflow.com/questions/26844742/advanced-string-replacements-in-python)
def replaceParameter(string, replacements):
return re.sub('%(\w+)%', lambda m: replacements[m.group(1)], string)
# This does not work
def replaceOptionalText(myString, replacements):
occurences = re.findall(">>(.*?):(.*)<<", myString, re.MULTILINE)
# ... #
myLongString = r"""My name is %name%.
I want to >>eat:eat some %food%.
(in two lines)<<
>>drink:drink something<<
"""
replacements = {
'name': 'John',
'eat': True,
'food': 'Apples',
'drink': False,
}
myLongString = replaceOptionalText(myLongString, replacements)
myLongString = replaceParameter(myLongString, replacements)
print(myLongString)
with the expected Output:
My name is John.
I want to eat some Apples.
(in two lines)

Related

why python regex is not finding numbers? [duplicate]

This question already has answers here:
re.findall behaves weird
(3 answers)
Closed 2 years ago.
I'm trying to find numbers in a string.
import re
text = "42 ttt 1,234 uuu 6,789,001"
finder = re.compile(r'\d{1,3}(,\d{3})*')
print(re.findall(finder, text))
It returns this:
['', ',234', ',745']
What's wrong with regex?
How can I get ['42', '1,234', '6,789,745']?
Note: I'm getting correct result at https://regexr.com
You indicate with parentheses (...) what the groups are that should be captured by the regex.
In your case, you only capture the part after (and including) the first comma. Instead, you can capture the whole number by putting a group around everything, and make the parentheses you need for * non-capturing through an initial ?:, like so:
r'(\d{1,3}(?:,\d{3})*)'
This gives the correct result:
>>> print(re.findall(finder, text))
['42', '1,234', '6,789,001']
you just need to change your finder like this.
finder = re.compile(r'\d+\,?\d+,?\d*')

Split the string every special character with regular expressions [duplicate]

This question already has answers here:
What are non-word boundary in regex (\B), compared to word-boundary?
(2 answers)
Closed 3 years ago.
I want to split my string into pieces but every some text and a special character. I have a string:
str = "ImEmRe#b'aEmRe#b'testEmRe#b'string"
I want my string to be split every EmRe#b' characters as you can see it contais the ' and that's the problem.
I tried doing re.split(r"EmRe#b'\B", str), re.split(r"EmRe#b?='\B", str) and also I tried both of them but without the r before the pattern. How do I do it? I'm really new to regular expressions. I would even say I've never used them.
Firstly, change the name of your variable, since str() is a built-in Python function.
If you named your variable word, you could get a list of elements split by your specified string by doing this:
>>> word = "ImEmRe#b'aEmRe#b'testEmRe#b'string"
>>> word
"ImEmRe#b'aEmRe#b'testEmRe#b'string"
>>> word.split("EmRe#b'")
['Im', 'a', 'test', 'string']
Allowing you to use them in many more ways than just a string! It can be saved to a variable, of course:
>>> foo = word.split("EmRe#b'")
>>> foo
['Im', 'a', 'test', 'string']

Replace word only if it stands alone [duplicate]

This question already has answers here:
How to match a whole word with a regular expression?
(4 answers)
Closed 3 years ago.
I have a function with which I want to anonymize texts by replacing the name of a person by 'visitor'.
To do so, I have written the following function:
def replaceName(text, name):
newText = text.replace(name, 'visitor')
return str(newText)
And I apply it using:
all_transcripts['msgText'] = all_transcripts.apply(lambda x: replaceName(x['msgText'], x['nameGuest']), axis=1)
However, this also replaces the name if it is a part of another word. Therefore, I want to only replace the instances where this word stands by itself. I have tried it as " "+name+" ", however, this does not work if the name is at the beginning or end of a sentence.
Furthermore, I have considered: Python regular expression match whole word. However, here they say how to find such words, but not how to replace it. I am having trouble to both find and replace it.
Who can help me with this?
import re
text = "Mark this isMark example Mark."
print (re.sub(r"\bMark\b", "visitor", text))
output:
visitor this isMark example visitor.

why do i get these results using rstrip()? [duplicate]

This question already has answers here:
Remove substring only at the end of string [duplicate]
(11 answers)
Closed 4 years ago.
I want to keep the file names without the.csv extension, but using rstrip('.csv') deletes the last letter in the strings ending in s:
data_files = [
"ap_2010.csv",
"class_size.csv",
"demographics.csv",
"graduation.csv",
"hs_directory.csv",
"sat_results.csv"
]
data_names = [name.rstrip('.csv') for name in data_files]
I get this results:
["ap_2010", "class_size", "demographic","graduation","hs_directory", "sat_result"]
The end s of strings demographics and sat_results has been removed, why does this happen??
This is because rstrip() strips all characters separately from the end of your string.
>>> 'abcdxyx'.rstrip('yx')
'abcd'
This will search for y and x to strip from the right side of your string. If you like to remove the .csv you can use split instead.
>>> "ap_2010.csv".split('.')[0]
"ap_2010"
Also for Filenames it is good practice to use the function os.path.splitext:
>>> import os
>>> os.path.splitext('ap_2010.csv')[0]
"ap_2010"
You can get your intended output with this:
data_files = [
"ap_2010.csv",
"class_size.csv",
"demographics.csv",
"graduation.csv",
"hs_directory.csv",
"sat_results.csv"
]
data_names = [name.replace('.csv','') for name in data_files]

Replace sequence of chars in string with its length [duplicate]

This question already has answers here:
Python replace string pattern with output of function
(4 answers)
Closed 5 years ago.
Say I have the following string:
mystr = "6374696f6e20????28??????2c??2c????29"
And I want to replace every sequence of "??" with its length\2. So for the example above, I want to get the following result:
mystr = "6374696f6e2022832c12c229"
Meaning:
???? replaced with 2
?????? replaced with 3
?? replaced with 1
???? replaced with 2
I tried the following but I'm not sure it's the good approach, and anyway -- it doesn't work:
regex = re.compile('(\?+)')
matches = regex.findall(mystr)
if matches:
for match in matches:
match_length = len(match)/2
if (match_length > 0):
mystr= regex .sub(match_length , mystr)
You can use a callback function in Python's re.sub. FYI lambda expressions are shorthand to create anonymous functions.
See code in use here
import re
mystr = "6374696f6e20????28??????2c??2c????29"
regex = re.compile(r"\?+")
print(re.sub(regex, lambda m: str(int(len(m.group())/2)), mystr))
There seems to be uncertainty about what should happen in the case of ???. The above code will result in 1 since it converts to int. Without int conversion the result would be 1.0. If you want to ??? to become 1? you can use the pattern (?:\?{2})+ instead.

Categories