How to search/extract patterns in a string? - python

I have a pattern I want to search for in my message.
The patterns are:
1. "aaa-b3-c"
2. "a3-b6-c"
3. "aaaa-bb-c"
I know how to search for one of the patterns, but how do I search for all 3?
Also, how do you identify and extract dates in this format: 5/21 or 5/21/2019.
found = re.findall(r'.{3}-.{2}-.{1}', message)

Try this :
found = re.findall(r'a{2,4}-b{2}-c', message)

You could use
a{2,4}-bb-c
as a pattern.
Now you need to check the match for truthiness:
match = re.search(pattern, string)
if match:
# do sth. here
As from Python 3.8 you can use the walrus operator as in
if (match := re.search(pattern, string)) is not None:
# do sth. here

try this:
re.findall(r'a.*-b.*-c',message)

The first part could be a quantifier {2,4} instead of 3. The dot matches any character except a newline, [a-zA-Z0-9] will match a upper or lowercase char a-z or a digit:
\b[a-zA-Z0-9]{2,4}-[a-zA-Z0-9]{2}-[a-zA-Z0-9]\b
Demo
You could add word boundaries \b or anchors ^ and $ on either side if the characters should not be part of a longer word.
For the second pattern you could also use \d with a quantifier to match a digit and an optional patter to match the part with / and 4 digits:
\d{1,2}/\d{2}(?:/\d{4})?
Regex demo
Note that the format does not validate a date itself. Perhaps this page can help you creating / customize a more specific date format.

Here, we might just want to write three expressions, and swipe our inputs from left to right just to be safe and connect them using logical ORs and in case we had more patterns we can simply add to it, similar to:
([a-z]+-[a-z]+[0-9]+-[a-z]+)
([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])
([a-z]+-[a-z]+-[a-z])
which would add to:
([a-z]+-[a-z]+[0-9]+-[a-z]+)|([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])|([a-z]+-[a-z]+-[a-z])
Then, we might want to bound it with start and end chars:
^([a-z]+-[a-z]+[0-9]+-[a-z]+)$|^([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])$|^([a-z]+-[a-z]+-[a-z])$
or
^(([a-z]+-[a-z]+[0-9]+-[a-z]+)|([a-z]+[0-9]+-[a-z]+[0-9]+-[a-z])|([a-z]+-[a-z]+-[a-z]))$
RegEx
If this expression wasn't desired, it can be modified or changed in regex101.com.
RegEx Circuit
jex.im visualizes regular expressions:

Related

Regex for parsing uid from URL

I am trying to parse UIDs from URLs. However regex is not something I am good at so seeking for some help.
Example Input:
https://example.com/d/iazs9fEil/somethingelse?foo=bar
Example Output:
iazs9fEil
What I've tried so far is
([/d/]+[\d\x])\w+
Which somehow works, but returns in with the /d/ prefix, so the output is /d/iazs9fEil.
How to change the regex to not contain the /d/ prefix?
EDIT:
I've tried this regex ([^/d/]+[\d\x])\w+ which outputs the correct string which is iazs9fEil, but also returns the rest of the url, so here it is somethingelse?foo=bar
In short, you may use
match = re.search(r'/d/(\w+)', your_string) # Look for a match
if match: # Check if there is a match first
print(match.group(1)) # Now, get Group 1 value
See this regex demo and a regex graph:
NOTE
/ is not any special metacharacter, do not escape it in Python string patterns
([/d/]+[\d\x])\w+ matches and captures into Group 1 any one or more slashes or digits (see [/d/]+, a positive character class) and then a digit or (here, Python shows an error: sre_contants.error incomplete escape \x, probably it could parse it as x, but it is not the case), and then matches 1+ word chars. You put the /d/ into a character class and it stopped matching a char sequence, [/d/]+ matches slashes and digits in any order and amount, and certainly places this string into Group 1.
Try (?<=/d/)[^/]+
Explanation:
(?<=/d/) - positive lookbehind, assure that what's preceeding is /d/
[^/]+ - match one or more characters other than /, so it matches everything until /
Demo
You could use a capturing group:
https?://.*?/d/([^/\s]+)
Regex demo

Python regex number by looking behind

I am extracting numbers in such format string.
AB1234
AC1234
AD1234
As you see, A is always there and the second char excludes ". I write below code to extract number.
re.search(r'(?<=A[^"])\d*',input)
But I encountered an error.
look-behind requires fixed-width pattern
So is there any convenient way to extract numbers? Now I know how to search twice to get them.Thanks in advance.
Note A is a pattern , in fact A is a world in a long string.
The regex in your example works, so I'm guessing your actual pattern has variable width character matches (*, +, etc). Unfortunately, regex look behinds do not support those. What I can suggest as an alternative, is to use a capture group and extract the matching string -
m = re.search(r'A\D+(\d+)', s)
if m:
r = m.group(1)
Details
A # your word
\D+ # anything that is not a digit
( # capture group
\d+ # 1 or more digits
)
If you want to take care of double quotes, you can make a slight modification to the regular expression by including a character class -
r'A[^\d"]+(\d+)'
Tye using this regex instead:
re.search(r'(?=A[^"]\d*)\d*',input)

How to search part of pattern in regex python

I can match pattern as it is. But can I search only part of the pattern? or I have to send it separately again.
e.g. pattern = '/(\w+)/(.+?)'
I can search this pattern using re.search and then use group to get individual groups.
But can I search only for say (\w+) ?
e.g.
pattern = '/(\w+)/(.+?)'
pattern_match = re.search(pattern, string)
print pattern_match.group(1)
Can I just search for part of pattern. e.g. pattern.group(1) or something
You can make any part of a regular expression optional by wrapping it in a non-matching group followed by a ?, i.e. (?: ... )?.
pattern = '/(\w+)(?:/(.+))?'
This will match /abc/def as well as /abc.
In both examples pattern_match.group(1) will be abc, but pattern_match.group(2) will be def in the first one and an empty string in the second one.
For further reference, have a look at (?:x) in the special characters table at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions
EDIT
Changed the second group to (.+), since I assume you want to match more than one character. .+ is called a "greedy" match, which will try to match as much as possible. .+? on the other hand is a "lazy" match that will only match the minimum number of characters necessary. In case of /abc/def, this will only match the d from def.
That pattern is merely a character string; send the needed slice however you want. For instance:
re.search(pattern[:6], string)
uses only the first 6 characters of your pattern. If you need to detect the end of the first pattern -- and you have no intervening right-parens -- you can use
rparen_pos = pattern.index(')')
re.search(pattern[:rparen_pos+1], string)
Another possibility is
pat1 = '/(\w+)'
pat2 = '/(.+?)'
big_match = re.search(pat1+pat2, string)
small_match = re.search(pat1, string)
You can get more innovative with expression variables ($1, $2, etc.); see the links below for more help.
http://flockhart.virtualave.net/RBIF0100/regexp.html
https://docs.python.org/2/howto/regex.html

regex: Only match a hierarchical number in python when the whole number is correct

in python 2.7 i want to match a hierachical number only when the whole number is correct.
my_str1 = "10.2.15"
my_str2 = "10..2.15"
my_str3 = "10.2..15"
My regex is:
pattern = re.compile(r"^\d+\.?\d+\.?\d+")
This matches my_str1 and my_str3 (but not the whole one).
As in my_str2 i want no match for my_str3. What do i have to change in the regex?
Thank you.
You need to also use an end of string anchor $ that will force the match from the start (^) to the end of string.
^\d+\.?\d+\.?\d+$
^
See demo
If you need to allow optional . + digits sequences, use this version with grouping and a limiting quantifier:
^\d+(?:\.?\d+){0,2}$
^ ^^^^^^
You can adjust the min/max values as needed.
See another demo
^(?!.*\.\.)[\d.]+$
You can also use lookahead to negate matches where .. occur.See demo.
https://regex101.com/r/vV1wW6/29

Using anchors in python regex to get exact match

I need to validate a version number consisting of 'v' plus positive int, and nothing else
eg "v4", "v1004"
I have
import re
pattern = "\Av(?=\d+)\W"
m = re.match(pattern, "v303")
if m is None:
print "noMatch"
else:
print "match"
But this doesn't work! Removing the \A and \W will match for v303 but will also match for v30G, for example
Thanks
Pretty straightforward. First, put anchors on your pattern:
"^patternhere$"
Now, let's put together the pattern:
"^v\d+$"
That should do it.
I think you may want \b (word boundary) rather than \A (start of string) and \W (non word character), also you don't need to use lookahead (the (?=...)).
Try: "\bv(\d+)" if you need to capture the int, "\bv\d+" if you don't.
Edit: You probably want to use raw string syntax for Python regexes, r"\bv\d+\b", since "\b" is a backspace character in a regular string.
Edit 2: Since + is "greedy", no trailing \b is necessary or desired.
Simply use
\bv\d+\b
Or enclosed it with ^\bv\d+\b$
to match it entirely..

Categories