How to input a variable string into re.search in python [duplicate] - python

This question already has answers here:
How to use a variable inside a regular expression?
(12 answers)
Closed 4 years ago.
Initially I had my date regex working as follows, to capture "February 12, 2018" for example
match = re.search(r'(January|February|March|April|May|June|July|August|September?|October?|November|December)\s+\d{1,2},\s+\d{4}', date).group()
But I want it to become more flexible, and input my variable string into my regex but I can't seem to get it to work after looking through many of the stackoverflow threads about similar issues. I'm quite a novice so I'm not sure what's going wrong. I'm aware that simply MONTHS won't work. Thank you
MONTHS = "January|February|March|April|May|June|July|August|September|October|November|December"
match = re.search(r'(MONTHS)\s+\d{1,2},\s+\d{4}', date).group()
print(match)
'NoneType' object has no attribute 'group'

You've got MONTHS as just a part of the match string, python doesn't know that it's supposed to be referencing a variable that's storing another string.
So instead, try:
match = re.search(r'(' + MONTHS + ')\s+\d{1,2},\s+\d{4}', date).group()
That will concatenate (stick together) three strings, the first bit, then the string stored in your MONTHS variable, and then the last bit.

If you want to substitute something into a string, you need to use either format strings (whether an f-string literal or the format or format_map methods on string objects) or printf-style formatting (or template strings, or a third-party library… but usually one of the first two).
Normally, format strings are the easiest solution, but they don't play nice with strings that need braces for other purposes. You don't want that {4} to be treated as "fill in the 4th argument", and escaping it as {{4}} makes things less readable (and when you're dealing with regular expressions, they're already unreadable enough…).
So, printf-style formatting is probably a better option here:
pattern = r'(%s)\s+\d{1,2},\s+\d{4}' % (MONTHS,)
… or:
pattern = r'(%(MONTHS)s)\s+\d{1,2},\s+\d{4}' % {'MONTHS': MONTHS}

Related

Reversing/mirroring special characters in Python [duplicate]

This question already has an answer here:
Modify a string by swapping multiple letters
(1 answer)
Closed 2 years ago.
I'd like to reverse/mirror special characters in Python.
Let's say my string is 'hello (one) sun {apple}'
My output string would have to be '{elppa} nus (eno) olleh'
Of course, with typical reversing, the outcome is '}elppa{ nus )eno( olleh' which is not what I need.
Is there any "easy" way to do this? Using regex, maybe?
So, basically you want the braces encapsulating the text to remain same and mirror everything else? In that case you can run a function after mirroring to revert all the braces. The easiest way would be to run the replace subroutine, like below:
line = line.replace('<', '!#!#')
line = line.replace('>', '<')
line = line.replace('!#!#', '>')
Here I am taking <> as an example, I replace '<' with '!#!#' temporarily, then replace '>' with '<', and then substitute '!#!#' with '>'.
Not a very robust method, but an easy quick fix.

Ignore special characters when creating a regular expression in python [duplicate]

This question already has answers here:
Escaping regex string
(4 answers)
Closed 6 years ago.
Is there a way to ignore special character meaning when creating a regular expression in python? In other words, take the string "as is".
I am writing code that uses internally the expect method from a Telnet object, which only accepts regular expressions. Therefore, the answer cannot be the obvious "use == instead of regular expression".
I tried this
import re
SPECIAL_CHARACTERS = "\\.^$*+?{}[]|():" # backslash must be placed first
def str_to_re(s):
result = s
for c in SPECIAL_CHARACTERS:
result = result.replace(c,'\\'+c)
return re.compile(result)
TEST = "Bob (laughing). Do you know 1/2 equals 2/4 [reference]?"
re_bad = re.compile(TEST)
re_good = str_to_re(TEST)
print re_bad.match(TEST)
print re_good.match(TEST)
It works, since the first one does not recognize the string, and the second one does. I looked at the options in the python documentation, and was not able to find a simpler way. Or are there any cases my solution does not cover (I used python docs to build SPECIAL_CHARACTERS)?
P.S. The problem can apply to other libraries. It does not apply to the pexpect library, because it provides the expect_exact method which solves this problem. However, someone could want to specify a mix of strings (as is) and regular expressions.
If 'reg' is the regex, you gotta use a raw string as follows
pat = re.compile(r'reg')
If reg is a name bound to a regex str, use
reg = re.escape(reg)
pat = re.compile(reg)

Python generate string based on regex format [duplicate]

This question already has answers here:
Reversing a regular expression in Python
(8 answers)
Closed 1 year ago.
I have some difficulties learning regex in python. I want to parse my tornado web route configuration along with arguments into a request path string without handlers request.path method.
For example, I have route with patterns like:
/entities/([0-9]+)
/product/([0-9]+/actions
The expected result combine with integer parameter (123) will be a string like:
/entities/123
/product/123/actions
How do I generate string based on that pattern?
Thank you very much in advance!
This might be a possible duplicate to:
Reversing a regular expression in Python
Generate a String that matches a RegEx in Python
Using the answer provided by #bjmc a solution works like this:
>>> import rstr
>>> intermediate = rstr.xeger(\d+)
>>> path = '/product/' + intermediate + '/actions'
Depending on how long you want your intermediate integer, you could replace the regex: \d{1,3}

Adding thousand separator while printing a number [duplicate]

This question already has answers here:
How to print a number using commas as thousands separators
(30 answers)
Closed 9 years ago.
I don't really know the "name" for this problem, so it might be a incorrect title, but the problem is simple, if I have a number
for example:
number = 23543
second = 68471243
I want to it make print() like this.
23,543
68,471,243
I hope this explains enough or else add comments.
Any help is appreciated!
If you only need to add comma as thousand separator and are using Python version 3.6 or greater:
print(f"{number:,g}")
This uses the formatted string literals style. The item in braces {0} is the object to be formatted as a string. The colon : states that output should be modified. The comma , states that a comma should be used as thousands separator and g is for general number. [1]
With older Python 3 versions, without the f-strings:
print("{0:,g}".format(number))
This uses the format-method of the str-objects [2]. The item in braces {0} is a place holder in string, the colon : says that stuff should be modified. The comma , states that a comma should be used as thousands separator and g is for general number [3]. The format-method of the string object is then called and the variable number is passed as an argument.
The 68,471,24,3 seems a bit odd to me. Is it just a typo?
Formatted string literals
Python 3 str.format()
Python 3 Format String Syntax
The easiest way is setting the locale to en_US.
Example:
import locale
locale.setlocale(locale.LC_ALL, 'en_US')
number = 23543
second = 68471243
print locale.format("%d", number, grouping=True)
print locale.format("%d", second, grouping=True)
prints:
23,543
68,471,243

How can I print a string using .format(), and print literal curly brackets around my replaced string [duplicate]

This question already has answers here:
Closed 11 years ago.
Possible Duplicate:
How can I print a literal “{}” characters in python string and also use .format on it?
Basically, I want to use .format(), like this:
my_string = '{{0}:{1}}'.format('hello', 'bonjour')
And have it match:
my_string = '{hello:bonjour}' #this is a string with literal curly brackets
However, the first piece of code gives me an error.
The curly brackets are important, because I'm using Python to communicate with a piece of software via text-based commands. I have no control over what kind of formatting the fosoftware expects, so it's crucial that I sort out all the formatting on my end. It uses curly brackets around strings to ensure that spaces in the strings are interpreted as single strings, rather than multiple arguments — much like you normally do with quotation marks in file paths, for example.
I'm currently using the older method:
my_string = '{%s:%s}' % ('hello', 'bonjour')
Which certainly works, but .format() seems easier to read, and when I'm sending commands with five or more variables all in one string, then readability becomes a significant issue.
Thanks!
Here is the new style:
>>> '{{{0}:{1}}}'.format('hello', 'bonjour')
'{hello:bonjour}'
But I thinking escaping is somewhat hard to read, so I prefer to switch back to the older style to avoid escaping:
>>> '{%s:%s}' % ('hello', 'bonjour')
'{hello:bonjour}'

Categories