Use python regex to replace password with asterisks [duplicate] - python

I want to replace XML tags, with a sequence of repeated characters that has the same number of characters of the tag.
For example:
<o:LastSaved>2013-01-21T21:15:00Z</o:LastSaved>
I want to replace it with:
#############2013-01-21T21:15:00Z##############
How can we use RegEx for this?

re.sub accepts a function as replacement:
re.sub(pattern, repl, string, count=0, flags=0)
If repl is a function, it is called for every non-overlapping occurrence of pattern. The function takes a single match object argument, and returns the replacement string.
Here's an example:
In [1]: import re
In [2]: def repl(m):
...: return '#' * len(m.group())
...:
In [3]: re.sub(r'<[^<>]*?>', repl,
...: '<o:LastSaved>2013-01-21T21:15:00Z</o:LastSaved>')
Out[3]: '#############2013-01-21T21:15:00Z##############'
The pattern I used may need some polishing, I'm not sure what's the canonical solution to matching XML tags is. But you get the idea.

Related

split string based on regular expression value in python

suppose i have a string
exp = '"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME" IN ("CPU","Storage")'
I want to split the string based on word IN
so my exprected result is
['"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME"','IN','("CPU","Storage")']
but in my case it doesnt work
This is what i have tried
import re
exp_split = re.split(r'( in )',exp,re.I)
re documentation:
re.split(pattern, string, maxsplit=0, flags=0)
The split() function expects that the third positional argument is the maxsplit argument. Your code gives re.I to maxsplit and no flags. You should give flags as a keyword argument like so:
exp_split = re.split(r'( in )',exp, flags=re.I)
its simply necessary to capitalize your delimiter and if you dont want the spaces in your result keep them outside your capturing group:
exp_split = re.split(r'\s(IN)\s', exp, re.I)
exp_split
Output
['"security_datafilter"."PRODUCT_CATEGORIES"."CATEGORY_NAME"', 'IN', '("CPU","Storage")']

Python re.sub not returning expected string of int

I would like to left pad zeros to the number in a string. For example, the string
hello120_c
padded to 5 digits should become
hello00120_c
I would like to use re.sub to make the replacement. Here is my code:
>>> re.sub('(\d+)', r'\1'.zfill(5), 'hello120_c')
which returns
>>> 'hello000120_c'
which has 6 digits rather than 5. Checking '120'.zfill(5) alone gives '00120'. Also, re.findall appears to confirm the regular expression is matching the full '120'.
What is causing re.sub to act differently?
You cannot use the backreference directly. Use a lamda:
re.sub(r'\d+', lambda x: x.group(0).zfill(5), 'hello120_c')
# => hello00120_c
Also, note that you do not need a capturing group since you can access the matched value via .group(0). Also, note the r'...' (raw string literal) used to declare the regex.
See IDEONE demo:
import re
res = re.sub(r'\d+', lambda x: x.group(0).zfill(5), 'hello120_c')
print(res)

How to print regex match results in python 3?

I was in IDLE, and decided to use regex to sort out a string. But when I typed in what the online tutorial told me to, all it would do was print:
<_sre.SRE_Match object at 0x00000000031D7E68>
Full program:
import re
reg = re.compile("[a-z]+8?")
str = "ccc8"
print(reg.match(str))
result:
<_sre.SRE_Match object at 0x00000000031D7ED0>
Could anybody tell me how to actually print the result?
You need to include .group() after to the match function so that it would print the matched string otherwise it shows only whether a match happened or not. To print the chars which are captured by the capturing groups, you need to pass the corresponding group index to the .group() function.
>>> import re
>>> reg = re.compile("[a-z]+8?")
>>> str = "ccc8"
>>> print(reg.match(str).group())
ccc8
Regex with capturing group.
>>> reg = re.compile("([a-z]+)8?")
>>> print(reg.match(str).group(1))
ccc
re.match(pattern, string, flags=0)
If zero or more characters at the beginning of string match the regular expression pattern, return a corresponding MatchObject instance. Return None if the string does not match the pattern; note that this is different from a zero-length match.
Note that even in MULTILINE mode, re.match() will only match at the beginning of the string and not at the beginning of each line.
If you need to get the whole match value, you should use
m = reg.match(r"[a-z]+8?", text)
if m: # Always check if a match occurred to avoid NoneType issues
print(m.group()) # Print the match string
If you need to extract a part of the regex match, you need to use capturing groups in your regular expression. Enclose those patterns with a pair of unescaped parentheses.
To only print captured group results, use Match.groups:
Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The default argument is used for groups that did not participate in the match; it defaults to None.
So, to get ccc and 8 and display only those, you may use
import re
reg = re.compile("([a-z]+)(8?)")
s = "ccc8"
m = reg.match(s)
if m:
print(m.groups()) # => ('ccc', '8')
See the Python demo

Search and replace string with reverse

I am trying to find words in a string and replace them with themselves in reverse-form.
So, when I have This 17, I want to put out sihT 17.
But I don't know how to reverse the string itself in re.sub()
import re
pat_word = re.compile("[a-zA-Z]+")
input = raw_input ("Input: ")
match = pat_word.findall(input)
if match:
s = re.sub(pat_word, "reverse", input)
print s
You can use a function inside re.sub:
s = re.sub(pat_word, lambda m:m.group(0)[::-1], input)
Or simply:
s = pat_word.sub(lambda m:m.group(0)[::-1], input)
From help(re.sub):
sub(pattern, repl, string, count=0, flags=0)
Return the string obtained by replacing the leftmost
non-overlapping occurrences of the pattern in string by the
replacement repl. repl can be either a string or a callable;
if a string, backslash escapes in it are processed. If it is
a callable, it's passed the match object and must return
a replacement string to be used.
Note that input is a built-in function in Python, so don't use it as a variable name.

How to make re.split() inclusive

I have the following:
>>> x='STARSHIP_TROOPERS_INVASION_2012_LOCDE'
>>> re.split('_\d{4}',x)[0]
'STARSHIP_TROOPERS_INVASION'
How would I get the year included? For example:
STARSHIP_TROOPERS_INVASION_2012
Note there are tens of thousands of titles, and I need to split on the year for each. I can't do a normal python split() here.
A more straightforward solution would be using re.search()/MatchObject.end():
m = re.search('_\d{4}', x)
print x[:m.end(0)]
If you want to stick with split(), you can use a lookbehind:
re.split('(?<=_\d{4}).', x)
(This work even when the year is at the end of the string, because split() returns an array with the original string in case the delimiter isn't found.)
If its always going to be the same pattern, then why not:
>>> x = 'STARSHIP_TROOPERS_INVASION_2012_LOCDE'
>>> x[:x.rfind('_')]
'STARSHIP_TROOPERS_INVASION_2012'
For your original regular expression, since you aren't capturing the matched group, it is not part of your matches:
>>> re.split('_\d{4}',x)
['STARSHIP_TROOPERS_INVASION', '_LOCDE']
>>> re.split('_(\d{4})',x)
['STARSHIP_TROOPERS_INVASION', '2012', '_LOCDE']
The () marks the selection as a captured group:
Matches whatever regular expression is inside the parentheses, and
indicates the start and end of a group; the contents of a group can be
retrieved after a match has been performed, and can be matched later
in the string with the \number special sequence, described below. To
match the literals '(' or ')', use ( or ), or enclose them inside a
character class: [(] [)].
you may use both split() and search() supposing you have a single such date in your string you wish to split at.
import re
x='STARSHIP_TROOPERS_INVASION_2012_LOCDE'
date=re.search('_\d{4}',x).group(0)
print(date)
gives
>>>
_2012
and
print(re.split('_\d{4}',x)[0]+date)
gives
STARSHIP_TROOPERS_INVASION_2012

Categories