I have following options, but all of them return full string. I need to remove date at the beginning with regex.
d = re.match('^\d{2}.\d{2}.\d{4}(.*?)$', '01.10.2018Any text..')
d = re.match('^[0-9]{2}.[0-9]{2}.[0-9]{4}(.*?)$', '01.10.2018Any text..')
How to do that? Python 3.6
You could use sub to match the date like pattern (Note that that does not validate a date) from the start of the string ^\d{2}\.\d{2}\.\d{4} and replace with an empty string.
And as #UnbearableLightness mentioned, you have to escape the dot \. if you want to match it literally.
import re
result = re.sub(r'^\d{2}\.\d{2}\.\d{4}', '', '01.10.2018Any text..')
print(result) # Any text..
Demo
Grab the first group of the match
>>> d = re.match('^\d{2}.\d{2}.\d{4}(.*?)$', '01.10.2018Any text..').group(1)
>>> print (d)
'Any text..'
If you are not sure, if there would be a match, you have to check it first
>>> s = '01.10.2018Any text..'
>>> match = re.match('^\d{2}.\d{2}.\d{4}(.*?)$', s)
>>> d = match.group(1) if match else s
>>> print(d)
'Any text..'
Use a group to extract the date part:
d = re.match('^(\d{2}.\d{2}.\d{4})(.*?)$', '01.10.2018Any text..')
if d:
print(d.group(1))
print(d.group(2))
Group 0 is the whole string, I added a pair of parentheses in the regex around the date. This is group 1. Group 2 is the text after which is what you're after
Related
I have strings like below
>>> s1
'this_is-a.string-123-with.number'
>>> s2
'this_is-a123.456string-123-with.number'
>>> s3
'one-0more-str.999'
need to get everything before all-numbers (not alphanumeric) after splitting, so get this_is-a.string- from s1 and this_is-a123.456string- from s2 and one-0more-str. from s3.
>>> for a in re.split('-|_|\.',s2):
... if a.isdigit():
... r=re.split(a,s2)[0]
... break
>>> print(r)
# expected: this_is-a123.456string-
# got: this_is-a
Above piece of code works for s1, but not for s2, as 123 matches a123 in s2, there should be a better pythonic way?
More info:
with s3 example, when we split with - or _ or . as delimiter, 999 is the only thing we get as all numbers, so everything before that is one-0more-str. which needs to be printed, if we take s2 as example, after splitting with dash or underbar or dot as delimiter, 123 will be the all number (isdigit), so get everything before that which is this_is-a123.456string-, so if input string is going to be this_1s-a-4.test, output should be this_1s-a-, because 4 is the all-number after splitting.
This will work for your example cases:
def fn(s):
return re.match("(.*?[-_.]|^)\d+([-_.]|$)", s).group(1)
(^ and $ match the beginning and end of the string respectively and the ? in .*? does a non-greedy match.)
Some more cases:
>>> fn("111")
""
>>> fn(".111")
"."
>>> fn(".1.11")
"."
You might also want to think about what you want to get if there is no group of all numbers:
>>> fn("foobar")
Not sure it will work in all cases but you can try:
for a in re.split('-|_|\.',s2).reverse():
if a.isdigit():
r=re.rsplit(a,s2)[0]
break
print(r)
This works for you examples
Code
def parse(s):
""" Splits on successive digits,
then takes everything up to last split on digits """
return ''.join(re.split(r'(\d+)', s)[:-2])
Tests
Using specified strings
for t in ['this_is-a.string-123-with.number',
'this_is-a123.456string-123-with.number',
'one-0more-str.999']:
print(f'{parse(t)}')
Output
this_is-a.string-
this_is-a123.456string-
one-0more-str.
Explanation
String
s = 'this_is-a123.456string-123-with.number'
Split on group of digits
re.split(r'(\d+)', s)
Out: ['this_is-a', '123', '.', '456', 'string-', '123', '-with.number']
Leave out last two items in split
re.split(r'(\d+)', s)[:-2] # [:-2] slice dropping last two items of list
Out: ['this_is-a', '123', '.', '456', 'string-']
Join list into string
''.join(re.split(r'(\d+)', s)[:-2]) # join items
Out: this_is-a123.456string-
If I understood correctly what you want, you can use a single regular expression to get the values you need:
import re
s1='this_is-a.string-123-with.number'
s2='this_is-a123.456string-123-with.number'
s3='one-0more-str.999'
# matches any group that is in between "all numbers"...
regex = re.compile('(.*[-\._])\d+([-\._].*)?')
m = regex.match(s1)
print(m.groups())
m = regex.match(s2)
print(m.groups())
m = regex.match(s3)
print(m.groups())
when you run this the result is the following:
('this_is-a.string-', '-with.number')
('this_is-a123.456string-', '-with.number')
('one-0more-str.', None)
If you are interested only in the first group you can use only:
>>> print(m.group(1))
one-0more-str.
If you want to filter for the cases where there is no second group:
>>> print([i for i in m.groups() if i])
['one-0more-str.']
How to split the string using regex
input :
result = '1,000.03AM2,97.2323,089.301,903.230.0034,928.9911,24.30AM'
Want to split this so that I can store into different strings for further use like following
o/p should be :
a = 1,000.03AM, b = 2,97.23, c = 23,089.30, d = 1,903.23, e = 0.00, f = 34,928.99, g = 11,24.30AM
I have tried like this but it's showing wrong output
import re
print(re.findall(r'[0-9.]+|[^0-9.]', result))
You may extract the strings using
re.findall(r'\d+(?:,\d+)*(?:\.\d{2})?[^,\d]*', text)
See the regex demo
Details
\d+ - 1+ digits
(?:,\d+)* - 0 or more repetitions of a comma and 1+ digits
(?:\.\d{2})? - an optional occurrence of a dot and 2 digits
[^,\d]* - any 0 or more chars other than a comma and digit.
Python demo:
import re
text = "1,000.03AM2,97.2323,089.301,903.230.0034,928.9911,24.30AM"
print( re.findall(r'\d+(?:,\d+)*(?:\.\d{2})?[^,\d]*', text) )
# => ['1,000.03AM', '2,97.23', '23,089.30', '1,903.23', '0.00', '34,928.99', '11,24.30AM']
For your result you need following regex:
re.findall(r"[\d,]+\.\d{2}(?:AM)?", result)
This produce following:
['1,000.03AM', '2,97.23', '23,089.30', '1,903.23', '0.00', '34,928.99', '11,24.30AM']
Regex explanation:
[\d,] - match digits and comma
[\d,]+\.\d{2} - match whole float value (with two digest after dot)
(?:AM)? - matching optional AM in non-capturing group, in example below I use (?=AM)? to not include it into result
In case on the place of AM you have anything else, you may edit (?:AM) to (?:AM|Other|...)
If you need to parse it as float, I have two suggestion for you. First is removing comma:
map(lambda x: float(x.replace(",", "")), re.findall(r"[\d,]+\.\d{2}(?=AM)?", s))
Result:
[1000.03, 297.23, 23089.3, 1903.23, 0.0, 34928.99, 1124.3]
Another variant is using locale:
>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US.UTF8')
'en_US.UTF8'
>>> list(map(lambda x: locale.atof(x), re.findall(r"[\d,]+\.\d{2}(?=AM)?", s)))
[1000.03, 297.23, 23089.3, 1903.23, 0.0, 34928.99, 1124.3]
Provided if string length and its parameter remains same.
Most efficient solution would be.
a = result[0:10]
b = result[10:17]
c = result[17:26]
d = result[26:34]
e = result[34:38]
f = result[38:47]
Hope this helps.
I am trying to find in the following string TreeModel/Node/Node[1]/Node[4]/Node[1] this :
TreeModel/Node
TreeModel/Node/Node[1]
TreeModel/Node/Node[1]/Node[4]
TreeModel/Node/Node[1]/Node[4]/Node[1]
Using regular expression in python. Here is the code I tried:
string = 'TreeModel/Node/Node[1]/Node[4]/Node[1]'
pattern = r'.+?Node\[[1-9]\]'
print re.findall(pattern=pattern,string=string)
#result : ['TreeModel/Node/Node[1]', '/Node[4]', '/Node[1]']
#expected result : ['TreeModel/Node', 'TreeModel/Node/Node[1]', 'TreeModel/Node/Node[1]/Node[4]', 'TreeModel/Node/Node[1]/Node[4]/Node[1]']
You can use split here:
>>> s = 'TreeModel/Node/Node[1]/Node[4]/Node[1]'
>>> split_s = s.split('/')
>>> ['/'.join(split_s[:i]) for i in range(2, len(split_s)+1)]
['TreeModel/Node',
'TreeModel/Node/Node[1]',
'TreeModel/Node/Node[1]/Node[4]',
'TreeModel/Node/Node[1]/Node[4]/Node[1]']
You can also use regex:
for i in range(2, s.count('/')+2):
s_ = '[^/]+/*'
regex = re.search(r'('+s_*i+')', s).group(0)
print(regex)
TreeModel/Node/
TreeModel/Node/Node[1]/
TreeModel/Node/Node[1]/Node[4]/
TreeModel/Node/Node[1]/Node[4]/Node[1]
I'm not good in Python at all but for regex part with your specific structure of string below regex matches each segment:
/?(?:{[^{}]*})?[^/]+
Where braces and preceding / is optional. It matches a slash mark (if any) then braces with their content (if any) then the rest up to next slash mark.
Python code (see live demo here):
matches = re.findall(r'/?(?:{[^{}]*})?[^/]+', string)
output = ''
for i in range(len(matches)):
output += matches[i];
print(output)
I want to execute substitutions using regex, not for all matches but only for specific ones. However, re.sub substitutes for all matches. How can I do this?
Here is an example.
Say, I have a string with the following content:
FOO=foo1
BAR=bar1
FOO=foo2
BAR=bar2
BAR=bar3
What I want to do is this:
re.sub(r'^BAR', '#BAR', s, index=[1,2], flags=re.MULTILINE)
to get the below result.
FOO=foo1
BAR=bar1
FOO=foo2
#BAR=bar2
#BAR=bar3
You could pass replacement function to re.sub that keeps track of count and checks if the given index should be substituted:
import re
s = '''FOO=foo1
BAR=bar1
FOO=foo2
BAR=bar2
BAR=bar3'''
i = 0
index = {1, 2}
def repl(x):
global i
if i in index:
res = '#' + x.group(0)
else:
res = x.group(0)
i += 1
return res
print re.sub(r'^BAR', repl, s, flags=re.MULTILINE)
Output:
FOO=foo1
BAR=bar1
FOO=foo2
#BAR=bar2
#BAR=bar3
You could
Split your string using s.splitlines()
Iterate over the individual lines in a for loop
Track how many matches you have found so far
Only perform substitutions on those matches in the numerical ranges you want (e.g. matches 1 and 2)
And then join them back into a single string (if need be).
I have some strings that look like this
S25m\S25m_16Q_-2dB.png
S25m\S25m_1_16Q_0dB.png
S25m\S25m_2_16Q_2dB.png
I want to get the string between slash and the last underscore, and also the string between last underscore and extension, so
Desired:
[S25m_16Q, S25m_1_16Q, S25m_2_16Q]
[-2dB, 0dB, 2dB]
I was able to get the whole thing between slash and extension by doing
foo = "S25m\S25m_16Q_-2dB.png"
match = re.search(r'([a-zA-Z0-9_-]*)\.(\w+)', foo)
match.group(1)
But I don't know how to make a pattern so I could split it by the last underscore.
Capture the groups you want to get.
>>> re.search(r'([-\w]*)_([-\w]+)\.\w+', "S25m\S25m_16Q_-2dB.png").groups()
('S25m_16Q', '-2dB')
>>> re.search(r'([-\w]*)_([-\w]+)\.\w+', "S25m\S25m_1_16Q_0dB.png").groups()
('S25m_1_16Q', '0dB')
>>> re.search(r'([-\w]*)_([-\w]+)\.\w+', "S25m\S25m_2_16Q_2dB.png").groups()
('S25m_2_16Q', '2dB')
* matches the previous character set greedily (consumes as many as possible); it continues to the last _ since \w includes letters, numbers, and underscore.
>>> zip(*[m.groups() for m in re.finditer(r'([-\w]*)_([-\w]+)\.\w+', r'''
... S25m\S25m_16Q_-2dB.png
... S25m\S25m_1_16Q_0dB.png
... S25m\S25m_2_16Q_2dB.png
... ''')])
[('S25m_16Q', 'S25m_1_16Q', 'S25m_2_16Q'), ('-2dB', '0dB', '2dB')]
A non-regex solution (albeit rather messy):
>>> import os
>>> s = "S25m\S25m_16Q_-2dB.png"
>>> first, _, last = s.partition("\\")[2].rpartition('_')
>>> print (first, os.path.splitext(last)[0])
('S25m_16Q', '-2dB')
I know it says using re, but why not just use split?
strings = """S25m\S25m_16Q_-2dB.png
S25m\S25m_1_16Q_0dB.png
S25m\S25m_2_16Q_2dB.png"""
strings = strings.split("\n")
parts = []
for string in strings:
string = string.split(".png")[0] #Get rid of file extension
string = string.split("\\")
splitString = string[1].split("_")
firstPart = "_".join(splitString[:-1]) # string between slash and last underscore
parts.append([firstPart, splitString[-1]])
for line in parts:
print line
['S25m_16Q', '-2dB']
['S25m_1_16Q', '0dB']
['S25m_2_16Q', '2dB']
Then just transpose the array,
for line in zip(*parts):
print line
('S25m_16Q', 'S25m_1_16Q', 'S25m_2_16Q')
('-2dB', '0dB', '2dB')