Regex pattern to match the string - python

What is the regex pattern to match a string starting with abc-def-xyz and ending with anything ??

Update
Since you only want to match host names that begin with abc-def you can simply use str.startswith():
hosts = ['abc-def.1.desktop.rul.com',
'abc-def.2.desktop.rul.com',
'abc-def.3.desktop.rul.com',
'abc-def.4.desktop.rul.com',
'abc-def.44.desktop.rul.com',
'abc-def.100.desktop.rul.com',
'qwe-rty.100.desktop.rul.com',
'z.100.desktop.rul.com',
'192.168.1.10',
'abc-def.100abc.desktop.rul.com']
filtered_hosts = [host for host in hosts if host.startswith('abc-def')]
print filtered_hosts
Output
['abc-def.1.desktop.rul.com', 'abc-def.2.desktop.rul.com', 'abc-def.3.desktop.rul.com', 'abc-def.4.desktop.rul.com', 'abc-def.44.desktop.rul.com', 'abc-def.100.desktop.rul.com', 'abc-def.100abc.desktop.rul.com']
Original regex solution follows.
Let's say that your data is a list of host names such as these:
hosts = ['abc-def.1.desktop.rul.com',
'abc-def.2.desktop.rul.com',
'abc-def.3.desktop.rul.com',
'abc-def.4.desktop.rul.com',
'abc-def.44.desktop.rul.com',
'abc-def.100.desktop.rul.com',
'qwe-rty.100.desktop.rul.com',
'z.100.desktop.rul.com',
'192.168.1.10',
'abc-def.100abc.desktop.rul.com']
import re
pattern = re.compile(r'abc-def\.\d+\.')
filtered_hosts = [host for host in hosts if pattern.match(host)]
print filtered_hosts
Output
['abc-def.1.desktop.rul.com', 'abc-def.2.desktop.rul.com', 'abc-def.3.desktop.rul.com', 'abc-def.4.desktop.rul.com', 'abc-def.44.desktop.rul.com', 'abc-def.100.desktop.rul.com']
The regex pattern says to match any lines that start with abc-def. followed by one or more digits, followed by a dot.
If you wanted to match a more generic pattern such as any sequence of 3 lowercase letters followed by a - and then another 3 lowercase letters, you could do this:
pattern = re.compile(r'[a-z]{3}-[a-z]{3}\.\d+\.')
Now the output also includes 'qwe-rty.100.desktop.rul.com'.

Related

Split string with multiple possible delimiters to get substring

I am trying to make a simple Discord bot to respond to some user input and having difficulty trying to parse the response for the info I need. I am trying to get their "gamertag"/username but the format is a little different sometimes.
So, my idea was to make a list of delimiter words I am looking for (different versions of the word gamertag such as Gamertag:, Gamertag -, username, etc.)
Then, look line by line for one that contains any of those delimiters.
Split the string on first matching delim, strip non alphanumeric characters
I had it kinda working for a single line, then realized some people don't put it on the first line so added line by line check and messed it up (on line 19 I just realized).. Also thought there must be a better way than this? please advise, some kinda working code at this link and copied below:
testString = """Application
Gamertag : testGamertag
Discord - testDiscord
Age - 25"""
applicationString = testString
gamertagSplitList = [ "gamertag", "Gamertag","Gamertag:", "gamertag:"]
#splWord = 'Gamertag'
lineNum = 0
for line in applicationString.partition('\n'):
print(line)
if line in gamertagSplitList:
applicationString = line
break
#get first line
#applicationString = applicationString.partition('\n')[0]
res = ""
#split on word, want to split on first occurrence of list of words
for splitWord in gamertagSplitList:
if splitWord in applicationString:
res = applicationString.split(splitWord)
break
splitString = res[1]
#res = test_string.split(spl_word, 1)
#splitString = res[1]
#get rid of non alphaNum characters
finalString = "" #define string for ouput
for character in splitString:
if(character.isalnum()):
# if character is alphanumeric concat to finalString
finalString = finalString + character
print(finalString)
Don't know if this will work with all your different inputs, but you can tweak it to get what you want :
import re
gamertagSplitList = ["gamertag", "Gamertag", "Gamertag:", "gamertag:"]
applicationString = """Application
Gamertag : testGamertag
Discord - testDiscord
Age - 25"""
for line in applicationString.split('\n'):
line = line.replace(' ', '')
for tag in gamertagSplitList:
if tag in line:
gamer_tag = line.replace(tag, '', 1)
break
print(re.sub(r'\W+', '', gamer_tag))
Output :
testGamertag
You can do it without any loops with a single regex:
import re
gamertagSplitList = ["gamertag", "Gamertag"]
applicationString = """Application
Gamertag : testGamertag
Discord - testDiscord
Age - 25"""
print(re.search(r'(' + '|'.join(gamertagSplitList) + ')\s*[:-]?\s*(\w+)\s*', applicationString)[2])
If all values in gamertagSplitList differ just by casing, you can simplify that even further:
print(re.search(r'gamertag\s*[:-]?\s*(\w+)\s*', applicationString, re.IGNORECASE)[1])
Let's take a closer look at this regex:
gamertag will match a string 'gamertag'
\s* will match any (including none) whitespace characters (space, newline, tab, etc.)
[:-]? will match either none or a single character which is either : or -
(\w+) will match 1 or more alphanumeric characters. Parenthesis here denote a group -- specific substring that we can extract later from the match.
By using re.IGNORECASE we make matching case insensitive, so that separator GaMeRtAg will also be recognised by this pattern.
The indexing part [1] means that we're interested in a first group in our pattern (remember the parenthesis). A group with index 0 is always a full match, and groups from index 1 upwards represent substrings that match subexpressions in parenthesis (ordered by their ( appearance in the regex).

Python : Extract mails from the string of filenames

I want to get the mail from the filenames. Here is a set of examples of filenames :
string1 = "benoit.m.fontaine#outlook.fr_2022-05-11T11_59_58+00_00.pdf"
string2 = "jeane_benrand#toto.pt_test.pdf"
string3 = "rosy.gray#amazon.co.uk-fdsdfsd-saf.pdf"
I would like to split the filename by the parts. The first one would contain the email and the second one is the rest. So it should give for the string2 :
['jeane_benrand#toto.pt', '_test.pdf']
I try this regex function however it does not work for the second and third string.
email = re.search(r"[a-z0-9\.\-+_]+#[a-z0-9\.\-+_]+\.[a-z]+", string)
Thank you for your help
Given the samples you provided, you can do something like this:
import re
strings = ["benoit.m.fontaine#outlook.fr_2022-05-11T11_59_58+00_00.pdf",
"jeane_benrand#toto.pt_test.pdf",
"rosy.gray#amazon.co.uk-fdsdfsd-saf.pdf"]
pattern = r'([^#]+#[\.A-Za-z]+)(.*)'
[re.findall(pattern, string)[0] for string in strings]
Output:
[('benoit.m.fontaine#outlook.fr', '_2022-05-11T11_59_58+00_00.pdf'),
('jeane_benrand#toto.pt', '_test.pdf'),
('rosy.gray#amazon.co.uk', '-fdsdfsd-saf.pdf')]
Mail pattern explanation ([^#]+#[\.A-Za-z]+):
[^#]+: any combination of characters except #
#: at
[\.A-Za-z]+: any combination of letters and dots
Rest pattern explanation (.*)
(.*): any combination of characters

Select String after string with regex in python

Imagine that we have a string like:
Routing for Networks:
0.0.0.0/32
5.6.4.3/24
2.3.1.4/32
Routing Information Sources:
Gateway Distance Last Update
192.168.61.100 90 00:33:51
192.168.61.103 90 00:33:43
Irregular IPs:
1.2.3.4/24
5.4.3.3/24
I need to get a list of IPs between "Routing for Networks:" and "Routing Information Sources:" like below:
['0.0.0.0/32","5.6.4.3/24","2.3.1.4/32"]
What I have done till now is:
Routing for Networks:\n(.+(?:\n.+)*)\nRouting
But it is not working as expected.
UPDATE:
my code is as bellow:
re.findall("Routing for Networks:\n(.+(?:\n.+)*)\nRouting", string)
The value of capture group 1 included the newlines. You can split the value of capture group 1 on a newline to get the separated values.
If you want to use re.findall, you will a list of group 1 values, and you can split every value in the list on a newline.
An example with a single group 1 match:
import re
pattern = r"Routing for Networks:\n(.+(?:\n.+)*)\nRouting"
s = ("Routing for Networks:\n"
"0.0.0.0/32\n"
"5.6.4.3/24\n"
"2.3.1.4/32\n"
"Routing Information Sources:\n"
"Gateway Distance Last Update\n"
"192.168.61.100 90 00:33:51\n"
"192.168.61.103 90 00:33:43")
m = re.search(pattern, s)
if m:
print(m.group(1).split("\n"))
Output
['0.0.0.0/32', '5.6.4.3/24', '2.3.1.4/32']
For a bit more precise match, and if there can be multiple of the same consecutive parts, you can match the format and use an assertion for Routing instead of a match:
Routing for Networks:\n((?:(?:\d{1,3}\.){3}\d{1,3}/\d+\n)+)(?=Routing)
Example
pattern = r"Routing for Networks:\n((?:(?:\d{1,3}\.){3}\d{1,3}/\d+\n)+)(?=Routing)"
s = "..."
m = re.search(pattern, s)
if m:
print([s for s in m.group(1).split("\n") if s])
See a regex demo and a Python demo.

Match a word that doesn't contain a dot and isn't an IP regex

I want to get a list and filter it (In this case it's a list of a record, a domain name and an ip).
I want the list to be something like so:
10.0.0.10 ansible0 ben1.com
ansible1 ben1.com 10.0.0.10
Aka you can put the ip the zone and the record anywhere and it will still catch them.
Now i got 2 regex, one that catches the domain (with the dot) and the IP:
Domain: [a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}
Simple IP: (?:[0-9]{1,3}\.){3}[0-9]{1,3}
With these i can catch in python all the domain names and put them into a list and all ips.
Now i only need to catch the "subdomain" (In this case ansible1 and ansible0).
I want it to be able to have numbers and characters like - _ * and so on, anything but a ..
How can i do it via regex?
You can use this regex with 3 alternations and 3 named groups:
(?P<domain>[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,})|
(?P<ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3})|
(?P<sub>[^\s.]+)
RegEx Demo
Named groups domain and ip are using regex you've provided. 3rd group is (?P<sub>[^\s.]+) that is matching 1+ of any characters that are not dot and not whitespace.
Code:
import re
arr = ['10.0.0.10 ansible0 ben1.com', 'ansible1 ben1.com 10.0.0.10']
rx = re.compile(r'(?P<domain>[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,})|(?P<ip>(?:[0-9]{1,3}\.){3}[0-9]{1,3})|(?P<sub>[^\s.]+)')
subs = []
for i in arr:
for m in rx.finditer(i):
if (m.group('sub')): subs.append(m.group('sub'))
print (subs)
Output:
['ansible0', 'ansible1']

Filtering a list of strings using regex

I have a list of strings that looks like this,
strlist = [
'list/category/22',
'list/category/22561',
'list/category/3361b',
'list/category/22?=1512',
'list/category/216?=591jf1!',
'list/other/1671',
'list/1y9jj9/1yj32y',
'list/category/91121/91251',
'list/category/0027',
]
I want to use regex to find the strings in this list, that contain the following string /list/category/ followed by an integer of any length, but that's it, it cannot contain any letters or symbols after that.
So in my example, the output should look like this
list/category/22
list/category/22561
list/category/0027
I used the following code:
newlist = []
for i in strlist:
if re.match('list/category/[0-9]+[0-9]',i):
newlist.append(i)
print(i)
but this is my output:
list/category/22
list/category/22561
list/category/3361b
list/category/22?=1512
list/category/216?=591jf1!
list/category/91121/91251
list/category/0027
How do I fix my regex? And also is there a way to do this in one line using a filter or match command instead of a for loop?
You can try the below regex:
^list\/category\/\d+$
Explanation of the above regex:
^ - Represents the start of the given test String.
\d+ - Matches digits that occur one or more times.
$ - Matches the end of the test string. This is the part your regex missed.
Demo of the above regex in here.
IMPLEMENTATION IN PYTHON
import re
pattern = re.compile(r"^list\/category\/\d+$", re.MULTILINE)
match = pattern.findall("list/category/22\n"
"list/category/22561\n"
"list/category/3361b\n"
"list/category/22?=1512\n"
"list/category/216?=591jf1!\n"
"list/other/1671\n"
"list/1y9jj9/1yj32y\n"
"list/category/91121/91251\n"
"list/category/0027")
print (match)
You can find the sample run of the above implementation here.

Categories