Is there a way to do substitution on a group?
Say I am trying to insert a link into text, based on custom formatting. So, given something like this:
This is a random text. This should be a [[link somewhere]]. And some more text at the end.
I want to end up with
This is a random text. This should be a link somewhere. And some more text at the end.
I know that '\[\[(.*?)\]\]' will match stuff within square brackets as group 1, but then I want to do another substitution on group 1, so that I can replace space with _.
Is that doable in a single re.sub regex expression?
You can use a function as a replacement instead of string.
>>> import re
>>> def as_link(match):
... link = match.group(1)
... return '{}'.format(link.replace(' ', '_'), link)
...
>>> text = 'This is a random text. This should be a [[link somewhere]]. And some more text at the end.'
>>> re.sub(r'\[\[(.*?)\]\]', as_link, text)
'This is a random text. This should be a link somewhere. And some more text at the end.'
You could do something like this.
import re
pattern = re.compile(r'\[\[([^]]+)\]\]')
def convert(text):
def replace(match):
link = match.group(1)
return '{}'.format(link.replace(' ', '_'), link)
return pattern.sub(replace, text)
s = 'This is a random text. This should be a [[link somewhere]]. .....'
convert(s)
See working demo
Related
in python, i'm trying to extract all time-ranges (of the form HHmmss-HHmmss) from a string. i'm using this python code.
text = "random text 0700-1300 random text 1830-2230 random 1231 text"
regex = "(.*(\d{4,10}-\d{4,10}))*.*"
match = re.search(regex, text)
this only returns 1830-2230 but i'd like to get 0700-1300 and 1830-2230. in my application there may be zero or any number of time-ranges (within reason) in the text string. i'd appreciate any hints.
Try to use re.findall to find all matches (Regex demo.):
import re
text = "random text 0700-1300 random text 1830-2230 random 1231 text"
pat = r"\b\d{4,10}-\d{4,10}\b"
for m in re.findall(pat, text):
print(m)
Prints:
0700-1300
1830-2230
Try this regex.
(\d{4}-\d{4})
All you need is pattern {four digts}minus{ four digits}. Other parts are not needed.
I want to add quotes around all hyphenated words in a string.
With an example string, the desired function add_quotes() should perform like this:
>>> s = '{name = first-name}'
>>> add_quotes(s)
{name = "first-name"}
I know how to find all occurances of hyphenated works using this Regex selector, but don't know how to add quotes around each of those occurances in the original string.
>>> import re
>>> s = '{name = first-name}'
>>> re.findall(r'\w+(?:-\w+)+', s)
['first-name']
Regex can be used to do this with Python Module re from the standard library.
import re
def add_quotes(s):
return re.sub(r'\w+(?:-\w+)+', r'"\g<0>"', s)
s = '{name = first-name}'
add_quotes(s) # returns '{name = "first-name"}'
where the occurances of hyphenated words are found using this selector.
I want to convert time separator from the French way to a more standard way:
"17h30" becomes "17:30"
"9h" becomes "9:00"
Using regexp I can transform 17h30 to 17:30 but I did not find an elegant way of transforming 9h into 9:00
Here's what I did so far:
import re
texts = ["17h30", "9h"]
hour_regex = r"(\d?\d)h(\d\d)?"
[re.sub(hour_regex, r"\1:\2", txt) for txt in texts]
>>> ['17:30', '9:']
What I want to do is "if \2 did not match anything, write 00".
PS: Of course I could use a more detailed regex like "([12]?\d)h[0123456]\d" to be more precise when matching hours, but this is not the point here.
Effectively with re.compile function and or condition:
import re
texts = ["17h30", "9h"]
hour_regex = re.compile(r"(\d{1,2})h(\d\d)?")
res = [hour_regex.sub(lambda m: f'{m.group(1)}:{m.group(2) or "00"}', txt)
for txt in texts]
print(res) # ['17:30', '9:00']
You can do a slight (crooked) way:
import re
texts = ["17h30", "9h"]
hour_regex = r"(\d?\d)h(\d\d)?"
print([re.sub(r':$', ':00', re.sub(hour_regex, r"\1:\2", txt)) for txt in texts])
# ['17:30', '9:00']
I want to find a pattern and replace it with another
Suppose i have:
"Name":"hello"
And want to do this
Name= "hello"
Using python regex
The string could be anything inside double quotes so i need to find pattern "": "" and replace it with =""
This expression,
^"\s*([^"]+?)\s*"\s*:\s*"?([^"]+)"?$
has two capturing groups:
([^"]+?)
for collecting our desired data. Then, we would simply re.sub.
In this demo, the expression is explained, if you might be interested.
Test
import re
result = re.sub('^"\s*([^"]+?)\s*"\s*:\s*"?([^"]+)"?$', '\\1= "\\2"', '" Name ":" hello "')
print(result)
Why not use this regex:
import re
s = '"Name":"hello"'
print(re.sub('"(.*)":"(.*)"', '\\1= \"\\2\"', s))
Output:
Name= "hello"
Explanation here.
For strings containing more than one of those kind of strings, you would need to add some python code to it:
import re
s = '"Name":"hello", "Name2":"hello2"'
print(re.sub('"(.*?)":"(.*?)"', '\\1= \"\\2\"', s))
Output:
Name= "hello", Name2= "hello2"
Using pure Python, this is as simple as:
s = '"Name":"hello"'
print(s.replace(':', '= ').replace('"', '', 2))
# Name= "hello"
I have a string with tagged elements inside. I want to remove the tags and add some characters to the content inside the tags.
s = 'Hello there <something>, this is more text <tagged content>'
result = 'Hello there somethingADDED, this is more text tagged contentADDED
So far, I've tried
import re
result = re.search('\<(.*)\>', s)
result = result.group(1)
and s = s.split('>') and regex each substring one by one, but it doesn't seem like the correct or efficient way of doing this.
Use back-reference \1.
x="Hello there <something>, this is more text <tagged content>"
print re.sub(r"<([^>]*)>",r"\1added",x)
Output :Hello there somethingadded, this is more text tagged contentadded