python regular expression, extracting set of numbers from a string - python

How can i get the number 24 and 200 from the string "Size:24 Resp_code:200"
by using re in python?, i have tried with \d+ but then i only get 24
in addition i have also tried this out:
import re
string2 = " Size:24 Resp_code:200"
regx = "(\d+) Resp_code:(\d+)"
print re.search(regx, string2).group(0)
print re.search(regx, string2).group(1)
here the out put is:
24 Resp_code:200
24
any advice on how to solve this ?
thanks in advance

The group 0 contains the whole matched string. Extract group 1, group 2 instead.
>>> string2 = " Size:24 Resp_code:200"
>>> regx = r"(\d+) Resp_code:(\d+)"
>>> match = re.search(regx, string2)
>>> match.group(1), match.group(2)
('24', '200')
>>> match.groups() # to get all groups from 1 to however many groups
('24', '200')
or using re.findall:
>>> re.findall(r'\d+', string2)
['24', '200']

Use:
print re.search(regx, string2).group(1) // 24
print re.search(regx, string2).group(2) // 200
group(0) prints whole string matched by your regex. Where group(1) is first match and group(2) is second match.

Check the doc:
If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, group1 defaults to zero (the whole match is returned)
You are doing it right but groups don't start at 0, but 1, group(0) will print the whole match:
>>> re.search(regx, string2).group(1,2)
('24', '200')

Related

how to skip backslash followed by integer?

i have regex https://regex101.com/r/2H5ew6/1
(\!|\#)(1)
Hello!1 World
and i wanna get first mark (!|#) and change the number 1 to another number 2
I did
{\1}2_
\1\\2_
but it adds extra text and i just wanna change the number
i expect result to be
Hello!2_World
and ifusing # to be
Hello#2_World
Match and capture either ! or # in a named capture group, here called char, if followed by one or more digits and a whitespace:
(?P<char>[!#])\d+\s
Substitute with the named capture, \g<char> followed by 2_:
\g<char>2_
Demo
If you only want the substitution if there's a 1 following either ! or #, replace \d+ with 1.
In your substitution you need to change the {\1}2_ to just 2_.
string = "Hello!1 World"
pattern = "(\!|\#)(1)"
replacement = "2_"
result = re.sub(pattern, replacement, string)
Why not: string.replace('!1 ', '!2_').replace('#1 ', '#2_') ?
>>> string = "Hello!1 World"
>>> repl = lambda s: s.replace('!1 ', '!2_').replace('#1 ', '#2_')
>>> string2 = repl(string)
>>> string2
'Hello!2_World'
>>> string = "Hello!12 World"
>>> string2 = repl(string)
>>> string2
'Hello!12 World'
The replacement for you pattern should be \g<1>2_
Regex demo
You could also shorten your pattern to a single capture with a character class [!#] and a match and use the same replacement as above.
([!#])1
Regex demo
Or with a lookbehind assertion without any groups and replace with 2_
(?<=[!#])1
Regex demo

How to get the matching word in a regex with alternations?

In python, suppose I want to search the string
"123"
for occurrences of the pattern
"abc|1.*|def|.23" .
I would currently do this as follows:
import re
re.match ("abc|1.*|def|.23", "123") .
The above returns a match object from which I can retrieve the starting and ending indices of the match in the string, which in this case would be 0 and 3.
My question is: How can I retrieve the particular word(s) in the regular expression which matched with
"123" ?
In other words: I would like to get "1.*" and ".23". Is this possible?
Given your string always have a common separator - in our case "|"
you can try:
str = "abc|1.*|def|.23"
matches = [s for s in str.split("|") if re.match(s, "123")]
print(matches)
output:
['1.*', '.23']
Another approach would be to create one capture group for each token in the alternation:
import re
s = 'def'
rgx = r'\b(?:(abc)|(1.*)|(def)|(.23))\b'
m = re.match(rgx, s)
print(m.group(0)) #=> def
print(m.group(1)) #=> None
print(m.group(2)) #=> None
print(m.group(3)) #=> def
print(m.group(4)) #=> None
This example shows the match is 'def' and was matched by the 3rd capture group,(def).
Python code

Replacing Certain Parts of a String Python

I can not seem to solve this. I have many different strings, and they are always different. I need to replace the ends of them though, but they are always different lengths. Here is a example of a couple strings:
string1 = "thisisnumber1(111)"
string2 = "itsraining(22252)"
string3 = "fluffydog(3)"
Now when I print these out it will of course print the following:
thisisnumber1(111)
itsraining(22252)
fluffydog(3)
What I would like it to print though is the follow:
thisisnumber1
itsraining
fluffydog
I would like it to remove the part in the parentheses for each string, but I do not know how sense the lengths are always changing. Thank You
You can use str.rsplit for this:
>>> string1 = "thisisnumber1(111)"
>>> string2 = "itsraining(22252)"
>>> string3 = "fluffydog(3)"
>>>
>>> string1.rsplit("(")
['thisisnumber1', '111)']
>>> string1.rsplit("(")[0]
'thisisnumber1'
>>>
>>> string2.rsplit("(")
['itsraining', '22252)']
>>> string2.rsplit("(")[0]
'itsraining'
>>>
>>> string3.rsplit("(")
['fluffydog', '3)']
>>> string3.rsplit("(")[0]
'fluffydog'
>>>
str.rsplit splits the string from right-to-left rather than left-to-right like str.split. So, we split the string from right-to-left on ( and then retrieve the element at index 0 (the first element). This will be everything before the (...) at the end of each string.
Your other option is to use regular expressions, which can give you more precise control over what you want to get.
import re
regex = regex = r"(.+)\(\d+\)"
print re.match(regex, string1).groups()[0] #returns thisisnumber1
print re.match(regex, string2).groups()[0] #returns itsraining
print re.match(regex, string3).groups()[0] #returns fluffydog
Breakdown of what's happening:
regex = r"(.+)\(\d+\)" is the regular expression, the formula for the string you're trying to find
.+ means match 1 or more character of any kind except newline
\d+ means match 1 or more digit
\( and \) are the "(" and ")" characters
putting .+ in parentheses puts that string sequence in a group, meaning that group of characters is one that you want to be able to access later on. We don't put the sequence \(\d+\) in a group because we don't care about those characters.
regex.match(regex, string1).groups() gives every substring in string1 that was part of a group. Since you only want 1 substring, you just access the 0th element.
There's a nice tutorial on regular expressions on Tutorial's Point here if you want to learn more.
Since you say in a comment:
"all that will be in the parentheses will be numbers"
so you'll always have digits between your parens, I'd recommend taking a look at removing them with the regular expression module:
import re
string1 = "thisisnumber1(111)"
string2 = "itsraining(22252)"
string3 = "fluffydog(3)"
strings = string1, string2, string3
for s in strings:
s_replaced = re.sub(
r'''
\( # must escape the parens, since these are special characters in regex
\d+ # one or more digits, 0-9
\)
''', # this regular expression will be replaced by the next argument
'', replace the above with an empty string
s, # the string we're modifying
re.VERBOSE) # verbose flag allows us to comment regex clearly
print(s_replaced)
prints:
thisisnumber1
itsraining
fluffydog

python return matching and non-matching patterns of string

I would like to split a string into parts that match a regexp pattern and parts that do not match into a list.
For example
import re
string = 'my_file_10'
pattern = r'\d+$'
# I know the matching pattern can be obtained with :
m = re.search(pattern, string).group()
print m
'10'
# The final result should be as following
['my_file_', '10']
Put parenthesis around the pattern to make it a capturing group, then use re.split() to produce a list of matching and non-matching elements:
pattern = r'(\d+$)'
re.split(pattern, string)
Demo:
>>> import re
>>> string = 'my_file_10'
>>> pattern = r'(\d+$)'
>>> re.split(pattern, string)
['my_file_', '10', '']
Because you are splitting on digits at the end of the string, an empty string is included.
If you only ever expect one match, at the end of the string (which the $ in your pattern forces here), then just use the m.start() method to obtain an index to slice the input string:
pattern = r'\d+$'
match = re.search(pattern, string)
not_matched, matched = string[:match.start()], match.group()
This returns:
>>> pattern = r'\d+$'
>>> match = re.search(pattern, string)
>>> string[:match.start()], match.group()
('my_file_', '10')
You can use re.split to make a list of those separate matches and use filter, which filters out all elements which are considered false ( empty strings )
>>> import re
>>> filter(None, re.split(r'(\d+$)', 'my_file_015_01'))
['my_file_015_', '01']

Regex find a digit

I have the following string, Hello, season 2 (VSF) and I need to parse "2" out of it. Here is what I'm trying:
s = 'Hello, season 2 (VSF)'
re.findall('Season|Saison|Staffel[\s]+\d',s)
>>> ["Season"]
How would I get "Season 2" here?
Season|Saison|Staffel should be grouped. Also specify re.IGNORECASE or re.I flag to match case-insensitively.
s = 'Hello, season 2 (VSF)'
>>> re.findall(r'(?:Season|Saison|Staffel)\s+\d+', s, flags=re.IGNORECASE)
['season 2']
>>> re.findall(r'(?:Season|Saison|Staffel)\s+\d+', s) # without re.I
[]
Use non-capturing group. Otherwise the pattern include a capturing group and re.findall return a list of matched group instead of match string.
>>> re.findall(r'(Season|Saison|Staffel)\s+\d+', s, flags=re.IGNORECASE)
['season']

Categories