Replacing regex with optional pattern - python

I want to convert time separator from the French way to a more standard way:
"17h30" becomes "17:30"
"9h" becomes "9:00"
Using regexp I can transform 17h30 to 17:30 but I did not find an elegant way of transforming 9h into 9:00
Here's what I did so far:
import re
texts = ["17h30", "9h"]
hour_regex = r"(\d?\d)h(\d\d)?"
[re.sub(hour_regex, r"\1:\2", txt) for txt in texts]
>>> ['17:30', '9:']
What I want to do is "if \2 did not match anything, write 00".
PS: Of course I could use a more detailed regex like "([12]?\d)h[0123456]\d" to be more precise when matching hours, but this is not the point here.

Effectively with re.compile function and or condition:
import re
texts = ["17h30", "9h"]
hour_regex = re.compile(r"(\d{1,2})h(\d\d)?")
res = [hour_regex.sub(lambda m: f'{m.group(1)}:{m.group(2) or "00"}', txt)
for txt in texts]
print(res) # ['17:30', '9:00']

You can do a slight (crooked) way:
import re
texts = ["17h30", "9h"]
hour_regex = r"(\d?\d)h(\d\d)?"
print([re.sub(r':$', ':00', re.sub(hour_regex, r"\1:\2", txt)) for txt in texts])
# ['17:30', '9:00']

Related

Regex to add quotes around hyphenated strings

I want to add quotes around all hyphenated words in a string.
With an example string, the desired function add_quotes() should perform like this:
>>> s = '{name = first-name}'
>>> add_quotes(s)
{name = "first-name"}
I know how to find all occurances of hyphenated works using this Regex selector, but don't know how to add quotes around each of those occurances in the original string.
>>> import re
>>> s = '{name = first-name}'
>>> re.findall(r'\w+(?:-\w+)+', s)
['first-name']
Regex can be used to do this with Python Module re from the standard library.
import re
def add_quotes(s):
return re.sub(r'\w+(?:-\w+)+', r'"\g<0>"', s)
s = '{name = first-name}'
add_quotes(s) # returns '{name = "first-name"}'
where the occurances of hyphenated words are found using this selector.

Getting word from string

How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"
You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example
You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]
many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).
I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)
Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example
using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())
Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True

Extract date from string in python

How can I extract "20151101" (as string) from "Campaign on 01.11.2015"?
I have read this one:
Extracting date from a string in Python
. But I am getting stuck when converting from Match object to string.
With minor tweaks in the aforementioned post, you can get it to work.
import re
from datetime import datetime
text = "Campaign on 01.11.2015"
match = re.search(r'\d{2}.\d{2}.\d{4}', text)
date = datetime.strptime(match.group(), '%d.%m.%Y').date()
print str(date).replace("-", "")
20151101
Here is one way, using re.sub():
import re
s = "Campaign on 01.11.2015"
new_s = re.sub(r"Campaign on (\d+)\.(\d+)\.(\d+)", r'\3\2\1', s)
print new_s
And another, using re.match():
import re
s = "Campaign on 01.11.2015"
match = re.match(r"Campaign on (\d+)\.(\d+)\.(\d+)", s)
new_s = match.group(3)+match.group(2)+match.group(1)
print new_s
a slightly more robust regex: .*?\b(\d{2})\.(\d{2})\.(\d{4})\b
(nn/nn/nnnn format with word boundaries)
replace string:\3\2\1
demo
Lets get crazy : D
"".join(reversed(a.split()[-1].split(".")))
With magic of list.
In [15]: ''.join(a.split()[-1].split('.')[::-1])
Out[15]: '20151101'

Python Regular Expression: Replace Withing a group

Is there a way to do substitution on a group?
Say I am trying to insert a link into text, based on custom formatting. So, given something like this:
This is a random text. This should be a [[link somewhere]]. And some more text at the end.
I want to end up with
This is a random text. This should be a link somewhere. And some more text at the end.
I know that '\[\[(.*?)\]\]' will match stuff within square brackets as group 1, but then I want to do another substitution on group 1, so that I can replace space with _.
Is that doable in a single re.sub regex expression?
You can use a function as a replacement instead of string.
>>> import re
>>> def as_link(match):
... link = match.group(1)
... return '{}'.format(link.replace(' ', '_'), link)
...
>>> text = 'This is a random text. This should be a [[link somewhere]]. And some more text at the end.'
>>> re.sub(r'\[\[(.*?)\]\]', as_link, text)
'This is a random text. This should be a link somewhere. And some more text at the end.'
You could do something like this.
import re
pattern = re.compile(r'\[\[([^]]+)\]\]')
def convert(text):
def replace(match):
link = match.group(1)
return '{}'.format(link.replace(' ', '_'), link)
return pattern.sub(replace, text)
s = 'This is a random text. This should be a [[link somewhere]]. .....'
convert(s)
See working demo

Regex: Replace one pattern with another

I am trying to replace one regex pattern with another regex pattern.
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
pattern = re.compile('\d+x\d+') # for st_srt
re.sub(pattern, 'S\1E\2',st_srt)
I know the use of S\1E\2 is wrong here. The reason am using \1 and \2 is to catch the value 01 and 02 and use it in S\1E\2.
My desired output is:
st_srt = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'
So, what is the correct way to achieve this.
You need to capture what you're trying to preserve. Try this:
pattern = re.compile(r'(\d+)x(\d+)') # for st_srt
st_srt = re.sub(pattern, r'S\1E\2', st_srt)
Well, it looks like you already accepted an answer, but I think this is what you said you're trying to do, which is get the replace string from 'st_mkv', then use it in 'st_srt':
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
replace_pattern = re.compile(r'Awake\.([^.]+)\.')
m = replace_pattern.match(st_mkv)
replace_string = m.group(1)
new_srt = re.sub(r'^Awake\.[^.]+\.', 'Awake.{0}.'.format(replace_string), st_srt)
print new_srt
Try using this regex:
([\w+\.]+){5}\-\w+
copy the stirngs into here: http://www.gskinner.com/RegExr/
and paste the regex at the top.
It captures the names of each string, leaving out the extension.
You can then go ahead and append the extension you want, to the string you want.
EDIT:
Here's what I used to do what you're after:
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt' // dont actually need this one
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
replace_pattern = re.compile(r'([\w+\.]+){5}\-\w+')
m = replace_pattern.match(st_mkv)
new_string = m.group(0)
new_string += '.srt'
>>> new_string
'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.srt'
import re
st_srt = 'Awake.01x02.iNTERNAL.WEBRiP.XViD-GeT.srt'
st_mkv = 'Awake.S01E02.iNTERNAL.WEBRiP.XViD-GeT.mkv'
pattern = re.compile(r'(\d+)x(\d+)')
st_srt_new = re.sub(pattern, r'S\1E\2', st_srt)
print st_srt_new

Categories