I am trying to find words in a string and replace them with themselves in reverse-form.
So, when I have This 17, I want to put out sihT 17.
But I don't know how to reverse the string itself in re.sub()
import re
pat_word = re.compile("[a-zA-Z]+")
input = raw_input ("Input: ")
match = pat_word.findall(input)
if match:
s = re.sub(pat_word, "reverse", input)
print s
You can use a function inside re.sub:
s = re.sub(pat_word, lambda m:m.group(0)[::-1], input)
Or simply:
s = pat_word.sub(lambda m:m.group(0)[::-1], input)
From help(re.sub):
sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used.
Note that input is a built-in function in Python, so don't use it as a variable name.
Related
suppose i have a string
exp = '"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME" IN ("CPU","Storage")'
I want to split the string based on word IN
so my exprected result is
['"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME"','IN','("CPU","Storage")']
but in my case it doesnt work
This is what i have tried
import re
exp_split = re.split(r'( in )',exp,re.I)
re documentation:
re.split(pattern, string, maxsplit=0, flags=0)
The split() function expects that the third positional argument is the maxsplit argument. Your code gives re.I to maxsplit and no flags. You should give flags as a keyword argument like so:
exp_split = re.split(r'( in )',exp, flags=re.I)
its simply necessary to capitalize your delimiter and if you dont want the spaces in your result keep them outside your capturing group:
exp_split = re.split(r'\s(IN)\s', exp, re.I)
exp_split
Output
['"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME"', 'IN', '("CPU","Storage")']
I want to replace XML tags, with a sequence of repeated characters that has the same number of characters of the tag.
For example:
<o:LastSaved>2013-01-21T21:15:00Z</o:LastSaved>
I want to replace it with:
#############2013-01-21T21:15:00Z##############
How can we use RegEx for this?
re.sub accepts a function as replacement:
re.sub(pattern, repl, string, count=0, flags=0)
If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.
Here's an example:
In [1]: import re
In [2]: def repl(m):
...: return '#' * len(m.group())
...:
In [3]: re.sub(r'<[^<>]*?>', repl,
...: '<o:LastSaved>2013-01-21T21:15:00Z</o:LastSaved>')
Out[3]: '#############2013-01-21T21:15:00Z##############'
The pattern I used may need some polishing, I'm not sure what's the canonical solution to matching XML tags is. But you get the idea.
I have a regex <type '_sre.SRE_Pattern'> and I would like to substitute the matched string with another string. Here is what I have:
compiled = re.compile(r'some regex expression')
s = 'some regex expression plus some other stuff'
compiled.sub('substitute', s)
print(s)
and s should be
'substitute plus some other stuff'
However, my code is not working and the string is not changing.
re.sub is not an inplace operation. From the docs:
Return the string obtained by replacing the leftmost non-overlapping
occurrences of pattern in string by the replacement repl.
Ergo, you must assign the return value back to a.
...
s = compiled.sub('substitute', s)
print(s)
This gives
'substitute plus some other stuff'
As you'd expect.
I would like to left pad zeros to the number in a string. For example, the string
hello120_c
padded to 5 digits should become
hello00120_c
I would like to use re.sub to make the replacement. Here is my code:
>>> re.sub('(\d+)', r'\1'.zfill(5), 'hello120_c')
which returns
>>> 'hello000120_c'
which has 6 digits rather than 5. Checking '120'.zfill(5) alone gives '00120'. Also, re.findall appears to confirm the regular expression is matching the full '120'.
What is causing re.sub to act differently?
You cannot use the backreference directly. Use a lamda:
re.sub(r'\d+', lambda x: x.group(0).zfill(5), 'hello120_c')
# => hello00120_c
Also, note that you do not need a capturing group since you can access the matched value via .group(0). Also, note the r'...' (raw string literal) used to declare the regex.
See IDEONE demo:
import re
res = re.sub(r'\d+', lambda x: x.group(0).zfill(5), 'hello120_c')
print(res)
I was in IDLE, and decided to use regex to sort out a string. But when I typed in what the online tutorial told me to, all it would do was print:
<_sre.SRE_Match object at 0x00000000031D7E68>
Full program:
import re
reg = re.compile("[a-z]+8?")
str = "ccc8"
print(reg.match(str))
result:
<_sre.SRE_Match object at 0x00000000031D7ED0>
Could anybody tell me how to actually print the result?
You need to include .group() after to the match function so that it would print the matched string otherwise it shows only whether a match happened or not. To print the chars which are captured by the capturing groups, you need to pass the corresponding group index to the .group() function.
>>> import re
>>> reg = re.compile("[a-z]+8?")
>>> str = "ccc8"
>>> print(reg.match(str).group())
ccc8
Regex with capturing group.
>>> reg = re.compile("([a-z]+)8?")
>>> print(reg.match(str).group(1))
ccc
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.
If you need to get the whole match value, you should use
m = reg.match(r"[a-z]+8?", text)
if m: # Always check if a match occurred to avoid NoneType issues
print(m.group()) # Print the match string
If you need to extract a part of the regex match, you need to use capturing groups in your regular expression. Enclose those patterns with a pair of unescaped parentheses.
To only print captured group results, use Match.groups:
Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.
So, to get ccc and 8 and display only those, you may use
import re
reg = re.compile("([a-z]+)(8?)")
s = "ccc8"
m = reg.match(s)
if m:
print(m.groups()) # => ('ccc', '8')
See the Python demo