I have a regex <type '_sre.SRE_Pattern'> and I would like to substitute the matched string with another string. Here is what I have:
compiled = re.compile(r'some regex expression')
s = 'some regex expression plus some other stuff'
compiled.sub('substitute', s)
print(s)
and s should be
'substitute plus some other stuff'
However, my code is not working and the string is not changing.
re.sub is not an inplace operation. From the docs:
Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl.
Ergo, you must assign the return value back to a.
...
s = compiled.sub('substitute', s)
print(s)
This gives
'substitute plus some other stuff'
As you'd expect.
Related
I'm trying to match strings in the lines of a file and write the matches minus the first one and the last one
import os, re
infile=open("~/infile", "r")
out=open("~/out", "w")
pattern=re.compile("=[A-Z0-9]*>")
for line in infile:
out.write( pattern.search(line)[1:-1] + '\n' )
Problem is that it says that Match is not subscriptable, when I try to add .group() it says that Nonegroup has no attritube group, groups() returns that .write needs a tuple etc
Any idea how to get .search to return a string ?
The re.search function returns a Match object.
If the match fails, the re.search function will return None. To extract the matching text, use the Match.group method.
>>> match = re.search("a.", "abc")
>>> if match is not None:
... print(match.group(0))
'ab'
>>> print(re.search("a.", "a"))
None
That said, it's probably a better idea to use groups to find the required section of the match:
>>> match = re.search("=([A-Z0-9]*)>", "=abc>") # Notice brackets
>>> match.group(0)
'=abc>'
>>> match.group(1)
'abc'
This regex can then be used with findall as #WiktorStribiżew suggests.
You seem to need only the part of strings between = and >. In this case, it is much easier to use a capturing group around the alphanumeric pattern and use it with re.findall that will never return None, but just an empty list upon no match, or a list of captured texts if found. Also, I doubt you need empty matches, so use + instead of *:
pattern=re.compile(r"=([A-Z0-9]+)>")
^ ^
and then
"\n".join(pattern.findall(line))
I would like to left pad zeros to the number in a string. For example, the string
hello120_c
padded to 5 digits should become
hello00120_c
I would like to use re.sub to make the replacement. Here is my code:
>>> re.sub('(\d+)', r'\1'.zfill(5), 'hello120_c')
which returns
>>> 'hello000120_c'
which has 6 digits rather than 5. Checking '120'.zfill(5) alone gives '00120'. Also, re.findall appears to confirm the regular expression is matching the full '120'.
What is causing re.sub to act differently?
You cannot use the backreference directly. Use a lamda:
re.sub(r'\d+', lambda x: x.group(0).zfill(5), 'hello120_c')
# => hello00120_c
Also, note that you do not need a capturing group since you can access the matched value via .group(0). Also, note the r'...' (raw string literal) used to declare the regex.
See IDEONE demo:
import re
res = re.sub(r'\d+', lambda x: x.group(0).zfill(5), 'hello120_c')
print(res)
I was in IDLE, and decided to use regex to sort out a string. But when I typed in what the online tutorial told me to, all it would do was print:
<_sre.SRE_Match object at 0x00000000031D7E68>
Full program:
import re
reg = re.compile("[a-z]+8?")
str = "ccc8"
print(reg.match(str))
result:
<_sre.SRE_Match object at 0x00000000031D7ED0>
Could anybody tell me how to actually print the result?
You need to include .group() after to the match function so that it would print the matched string otherwise it shows only whether a match happened or not. To print the chars which are captured by the capturing groups, you need to pass the corresponding group index to the .group() function.
>>> import re
>>> reg = re.compile("[a-z]+8?")
>>> str = "ccc8"
>>> print(reg.match(str).group())
ccc8
Regex with capturing group.
>>> reg = re.compile("([a-z]+)8?")
>>> print(reg.match(str).group(1))
ccc
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.
If you need to get the whole match value, you should use
m = reg.match(r"[a-z]+8?", text)
if m: # Always check if a match occurred to avoid NoneType issues
print(m.group()) # Print the match string
If you need to extract a part of the regex match, you need to use capturing groups in your regular expression. Enclose those patterns with a pair of unescaped parentheses.
To only print captured group results, use Match.groups:
Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.
So, to get ccc and 8 and display only those, you may use
import re
reg = re.compile("([a-z]+)(8?)")
s = "ccc8"
m = reg.match(s)
if m:
print(m.groups()) # => ('ccc', '8')
See the Python demo
I can not seem to solve this. I have many different strings, and they are always different. I need to replace the ends of them though, but they are always different lengths. Here is a example of a couple strings:
string1 = "thisisnumber1(111)"
string2 = "itsraining(22252)"
string3 = "fluffydog(3)"
Now when I print these out it will of course print the following:
thisisnumber1(111)
itsraining(22252)
fluffydog(3)
What I would like it to print though is the follow:
thisisnumber1
itsraining
fluffydog
I would like it to remove the part in the parentheses for each string, but I do not know how sense the lengths are always changing. Thank You
You can use str.rsplit for this:
>>> string1 = "thisisnumber1(111)"
>>> string2 = "itsraining(22252)"
>>> string3 = "fluffydog(3)"
>>>
>>> string1.rsplit("(")
['thisisnumber1', '111)']
>>> string1.rsplit("(")[0]
'thisisnumber1'
>>>
>>> string2.rsplit("(")
['itsraining', '22252)']
>>> string2.rsplit("(")[0]
'itsraining'
>>>
>>> string3.rsplit("(")
['fluffydog', '3)']
>>> string3.rsplit("(")[0]
'fluffydog'
>>>
str.rsplit splits the string from right-to-left rather than left-to-right like str.split. So, we split the string from right-to-left on ( and then retrieve the element at index 0 (the first element). This will be everything before the (...) at the end of each string.
Your other option is to use regular expressions, which can give you more precise control over what you want to get.
import re
regex = regex = r"(.+)\(\d+\)"
print re.match(regex, string1).groups()[0] #returns thisisnumber1
print re.match(regex, string2).groups()[0] #returns itsraining
print re.match(regex, string3).groups()[0] #returns fluffydog
Breakdown of what's happening:
regex = r"(.+)\(\d+\)" is the regular expression, the formula for the string you're trying to find
.+ means match 1 or more character of any kind except newline
\d+ means match 1 or more digit
\( and \) are the "(" and ")" characters
putting .+ in parentheses puts that string sequence in a group, meaning that group of characters is one that you want to be able to access later on. We don't put the sequence \(\d+\) in a group because we don't care about those characters.
regex.match(regex, string1).groups() gives every substring in string1 that was part of a group. Since you only want 1 substring, you just access the 0th element.
There's a nice tutorial on regular expressions on Tutorial's Point here if you want to learn more.
Since you say in a comment:
"all that will be in the parentheses will be numbers"
so you'll always have digits between your parens, I'd recommend taking a look at removing them with the regular expression module:
import re
string1 = "thisisnumber1(111)"
string2 = "itsraining(22252)"
string3 = "fluffydog(3)"
strings = string1, string2, string3
for s in strings:
s_replaced = re.sub(
r'''
\( # must escape the parens, since these are special characters in regex
\d+ # one or more digits, 0-9
\)
''', # this regular expression will be replaced by the next argument
'', replace the above with an empty string
s, # the string we're modifying
re.VERBOSE) # verbose flag allows us to comment regex clearly
print(s_replaced)
prints:
thisisnumber1
itsraining
fluffydog
This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
What exactly is a "raw string regex" and how can you use it?
(7 answers)
Closed 7 months ago.
I need a python regular expression to check if a word is present in a string. The string is separated by commas, potentially.
So for example,
line = 'This,is,a,sample,string'
I want to search based on "sample", this would return true. I am crappy with reg ex, so when I looked at the python docs, I saw something like
import re
re.match(r'sample', line)
But I don't know why there was an 'r' before the text to be matched. Can someone help me with the regular expression?
Are you sure you need a regex? It seems that you only need to know if a word is present in a string, so you can do:
>>> line = 'This,is,a,sample,string'
>>> "sample" in line
True
The r makes the string a raw string, which doesn't process escape characters (however, since there are none in the string, it is actually not needed here).
Also, re.match matches from the beginning of the string. In other words, it looks for an exact match between the string and the pattern. To match stuff that could be anywhere in the string, use re.search. See a demonstration below:
>>> import re
>>> line = 'This,is,a,sample,string'
>>> re.match("sample", line)
>>> re.search("sample", line)
<_sre.SRE_Match object at 0x021D32C0>
>>>
r stands for a raw string, so things like \ will be automatically escaped by Python.
Normally, if you wanted your pattern to include something like a backslash you'd need to escape it with another backslash. raw strings eliminate this problem.
short explanation
In your case, it does not matter much but it's a good habit to get into early otherwise something like \b will bite you in the behind if you are not careful (will be interpreted as backspace character instead of word boundary)
As per re.match vs re.search here's an example that will clarify it for you:
>>> import re
>>> testString = 'hello world'
>>> re.match('hello', testString)
<_sre.SRE_Match object at 0x015920C8>
>>> re.search('hello', testString)
<_sre.SRE_Match object at 0x02405560>
>>> re.match('world', testString)
>>> re.search('world', testString)
<_sre.SRE_Match object at 0x015920C8>
So search will find a match anywhere, match will only start at the beginning
You do not need regular expressions to check if a substring exists in a string.
line = 'This,is,a,sample,string'
result = bool('sample' in line) # returns True
If you want to know if a string contains a pattern then you should use re.search
line = 'This,is,a,sample,string'
result = re.search(r'sample', line) # finds 'sample'
This is best used with pattern matching, for example:
line = 'my name is bob'
result = re.search(r'my name is (\S+)', line) # finds 'bob'
As everyone else has mentioned it is better to use the "in" operator, it can also act on lists:
line = "This,is,a,sample,string"
lst = ['This', 'sample']
for i in lst:
i in line
>> True
>> True
One Liner implementation:
a=[1,3]
b=[1,2,3,4]
all(i in b for i in a)