Extract date from string in python - python

How can I extract "20151101" (as string) from "Campaign on 01.11.2015"?
I have read this one:
Extracting date from a string in Python
. But I am getting stuck when converting from Match object to string.

With minor tweaks in the aforementioned post, you can get it to work.
import re
from datetime import datetime
text = "Campaign on 01.11.2015"
match = re.search(r'\d{2}.\d{2}.\d{4}', text)
date = datetime.strptime(match.group(), '%d.%m.%Y').date()
print str(date).replace("-", "")
20151101

Here is one way, using re.sub():
import re
s = "Campaign on 01.11.2015"
new_s = re.sub(r"Campaign on (\d+)\.(\d+)\.(\d+)", r'\3\2\1', s)
print new_s
And another, using re.match():
import re
s = "Campaign on 01.11.2015"
match = re.match(r"Campaign on (\d+)\.(\d+)\.(\d+)", s)
new_s = match.group(3)+match.group(2)+match.group(1)
print new_s

a slightly more robust regex: .*?\b(\d{2})\.(\d{2})\.(\d{4})\b
(nn/nn/nnnn format with word boundaries)
replace string:\3\2\1
demo

Lets get crazy : D
"".join(reversed(a.split()[-1].split(".")))

With magic of list.
In [15]: ''.join(a.split()[-1].split('.')[::-1])
Out[15]: '20151101'

Related

I want to extract data using regular expression in python

I have a string = "ProductId%3D967164%26Colour%3Dbright-royal" and i want to extract data using regex so output will be 967164bright-royal.
I have tried with this (?:ProductId%3D|Colour%3D)(.*) in python with regex, but getting output as 967164%26Colour%3Dbright-royal.
Can anyone please help me to find out regex for it.
You don't need a regex here, use urllib.parse module:
from urllib.parse import parse_qs, unquote
qs = "ProductId%3D967164%26Colour%3Dbright-royal"
d = parse_qs(unquote(qs))
print(d)
# Output:
{'ProductId': ['967164'], 'Colour': ['bright-royal']}
Final output:
>>> ''.join(i[0] for i in d.values())
'967164bright-royal'
Update
>>> ''.join(re.findall(r'%3D(\S*?)(?=%26|$)', qs))
'967164bright-royal'
The alternative matches on the first part, you can not get a single match for 2 separate parts in the string.
If you want to capture both values using a regex in a capture group:
(?:ProductId|Colour)%3D(\S*?)(?=%26|$)
Regex demo
import re
pattern = r"(?:ProductId|Colour)%3D(\S*?)(?=%26|$)"
s = "ProductId%3D967164%26Colour%3Dbright-royal"
print(''.join(re.findall(pattern, s)))
Output
967164bright-royal
If you must use a regular expression and you can guarantee that the string will always be formatted the way you expect, you could try this.
import re
pattern = r"ProductId%3D(\d+)%26Colour%3D(.*)"
string = "ProductId%3D967164%26Colour%3Dbright-royal"
matches = re.match(pattern, string)
print(f"{matches[1]}{matches[2]}")

Remove String between two characters for all occurrences

I am looking for help on string manipulation in Python 3.
Input String
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
Desired Output
s = "ID,FIRST_NM,LAST_NM,FILLER1"
Basically, the objective is to remove anything between space and comma at all occurrences in the input string.
Any help is much appreciated
using simple regex
import re
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
res = re.sub('\s\w+', '', s)
print(res)
# output ID,FIRST_NM,LAST_NM,FILLER1
You can use regex
import re
s = "ID bigint,FIRST_NM string,LAST_NM string,FILLER1 string"
s = ','.join(re.findall('\w+(?= \w+)', s))
print(s)
Output:
ID,FIRST_NM,LAST_NM,FILLER1

Replacing regex with optional pattern

I want to convert time separator from the French way to a more standard way:
"17h30" becomes "17:30"
"9h" becomes "9:00"
Using regexp I can transform 17h30 to 17:30 but I did not find an elegant way of transforming 9h into 9:00
Here's what I did so far:
import re
texts = ["17h30", "9h"]
hour_regex = r"(\d?\d)h(\d\d)?"
[re.sub(hour_regex, r"\1:\2", txt) for txt in texts]
>>> ['17:30', '9:']
What I want to do is "if \2 did not match anything, write 00".
PS: Of course I could use a more detailed regex like "([12]?\d)h[0123456]\d" to be more precise when matching hours, but this is not the point here.
Effectively with re.compile function and or condition:
import re
texts = ["17h30", "9h"]
hour_regex = re.compile(r"(\d{1,2})h(\d\d)?")
res = [hour_regex.sub(lambda m: f'{m.group(1)}:{m.group(2) or "00"}', txt)
for txt in texts]
print(res) # ['17:30', '9:00']
You can do a slight (crooked) way:
import re
texts = ["17h30", "9h"]
hour_regex = r"(\d?\d)h(\d\d)?"
print([re.sub(r':$', ':00', re.sub(hour_regex, r"\1:\2", txt)) for txt in texts])
# ['17:30', '9:00']

regex to extract data between quotes

As title says string is '="24digit number"' and I want to extract number between "" (example: ="000021484123647598423458" should get me '000021484123647598423458').
There are answers that answer how to get data between " but in my case I also need to confirm that =" exist without capturing (there are also other "\d{24}" strings, but they are for other stuff) it.
I couldn't modify these answers to get what I need.
My latest regex was ((?<=\")\d{24}(?=\")) and string is ="000021484123647598423458".
UPDATE: I think I will settle with pattern r'^(?:\=\")(\d{24})(?:\")' because I just want to capture digit characters.
word = '="000021484123647598423458"'
pattern = r'^(?:\=\")(\d{24})(?:\")'
match = re.findall(pattern, word)[0]
Thank you all for suggestions.
You could have it like:
=(['"])(\d{24})\1
See a demo on regex101.com.
In Python:
import re
string = '="000021484123647598423458"'
rx = re.compile(r'''=(['"])(\d{24})\1''')
print(rx.search(string).group(2))
# 000021484123647598423458
Any one of the following works:
>>> st = '="000021484123647598423458"'
>>> import re
>>> re.findall(r'".*\d+.*"',st)
['"000021484123647598423458"']
or
>>> re.findall(r'".*\d{24}.*"',st)
['"000021484123647598423458"']
or
>>> re.findall(r'"\d{24}"',st)
['"000021484123647598423458"']

python String Pattern Matching

two string expression: #RequestMapping(value = "/list/base/info") or #RequestMapping("/list/base/info")
How do I get uri /list/base/info value by String Pattern Matching?
In this case you can try getting it by split:
expression.split("\"")[1]
import re
re.match(
'#RequestMapping\((value\s*=\s*)?"([^"]+)"\)',
'#RequestMapping(value = "/list/base/info")'
).group(2)
That would output:
'/list/base/info'

Categories