Python Regex find all matches after specific word

Python Regex find all matches after specific word - python

I have a string as below
"Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
I'm currently struggling to write a function in python that will find all aliases and put them in a list. So basically, I need a list that will be ['alias1.myserver.mysite.com', 'myserver.mysite.com']
I tried the following code
pattern = '(?<=Aliases: )([\S*]+)'
name = re.findall(pattern, mystring)
but it only matches the first alias and not both of them.
Any ideas on this?
Greatly appreciated!

Try the following:
import re
s = "Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
l = re.findall(r'\S+', s.split('Aliases: ')[1])
print(l)
Prints:
['alias1.myserver.mysite.com', 'myserver.mysite.com']
Explanation
First we split the string into two pieces and keep the second piece with s.split('Aliases: ')[1]. This evaluates to the part of the string that follows 'Aliases: '.
Next we use findall with the regaular expression:
\S+
This matches all consecutive strings of one or more non-space characters.
But this can be more simply done in this case without using a regex:
s = "Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
l = s.split('Aliases: ')[1].split()
print(l)

Try this :
import re
regex = re.compile(r'[\n\r\t]')
t="Server: myserver.mysite.com\r\nAddress: 111.122.133.144\r\n\r\nName: myserver.mysite.com\r\nAddress: 123.144.412.111\r\nAliases: alias1.myserver.mysite.com\r\n\t myserver.mysite.com\r\n\r\n"
t = regex.sub(" ", t)
t = t.split("Aliases:")[1].strip().split()
print(t)

Related

Regular expression for YYYY-MM-DDTHH:MM:SS is not detecting the presence of .00Z [duplicate]

Suppose I have a string like test-123.
I want to test whether it matches a pattern like test-<number>, where <number> means one or more digit symbols.
I tried this code:
import re
correct_string = 'test-251'
wrong_string = 'test-123x'
regex = re.compile(r'test-\d+')
if regex.match(correct_string):
print 'Matching correct string.'
if regex.match(wrong_string):
print 'Matching wrong_string.'
How can I make it so that only the correct_string matches, and the wrong_string doesn't? I tried using .search instead of .match but it didn't help.

Try with specifying the start and end rules in your regex:
re.compile(r'^test-\d+$')

For exact match regex = r'^(some-regex-here)$'
^ : Start of string
$ : End of string

Since Python 3.4 you can use re.fullmatch to avoid adding ^ and $ to your pattern.
>>> import re
>>> p = re.compile(r'\d{3}')
>>> bool(p.match('1234'))
True
>>> bool(p.fullmatch('1234'))
False

I think It may help you -
import re
pattern = r"test-[0-9]+$"
s = input()
if re.match(pattern,s) :
print('matched')
else :
print('not matched')

You can try re.findall():
import re
correct_string = 'test-251'
if len(re.findall("test-\d+", correct_string)) > 0:
print "Match found"

A pattern such as \btest-\d+\b should do you;
matches = re.search(r'\btest-\d+\', search_string)
Demo
This requires the matching of word boundaries, so prevents other substrings from occuring after your desired match.

How to start at a specific letter and end when it hits a digit?

I have some sample strings:
s = 'neg(able-23, never-21) s2-1/3'
i = 'amod(Market-8, magical-5) s1'
I've got the problem where I can figure out if the string has 's1' or 's3' using:
word = re.search(r's\d$', s)
But if I want to know if the contains 's2-1/3' in it, it won't work.
Is there a regex expression that can be used so that it works for both cases of 's#' and 's#+?
Thanks!

You can allow the characters "-" and "/" to be captured as well, in addition to just digits. It's hard to tell the exact pattern you're going for here, but something like this would capture "s2-1/3" from your example:
import re
s = "neg(able-23, never-21) s2-1/3"
word = re.search(r"s\d[-/\d]*$", s)

I'm guessing that maybe you would want to extract that with some expression, such as:
(s\d+)-?(.*)$
Demo 1
or:
(s\d+)-?([0-9]+)?\/?([0-9]+)?$
Demo 2
Test
import re
expression = r"(s\d+)-?(.*)$"
string = """
neg(able-23, never-21) s211-12/31
neg(able-23, never-21) s2-1/3
amod(Market-8, magical-5) s1
"""
print(re.findall(expression, string, re.M))
Output
[('s211', '12/31'), ('s2', '1/3'), ('s1', '')]

Getting word from string

How can i get word example from such string:
str = "http://test-example:123/wd/hub"
I write something like that
print(str[10:str.rfind(':')])
but it doesn't work right, if string will be like
"http://tests-example:123/wd/hub"

You can use this regex to capture the value preceded by - and followed by : using lookarounds
(?<=-).+(?=:)
Regex Demo
Python code,
import re
str = "http://test-example:123/wd/hub"
print(re.search(r'(?<=-).+(?=:)', str).group())
Outputs,
example
Non-regex way to get the same is using these two splits,
str = "http://test-example:123/wd/hub"
print(str.split(':')[1].split('-')[1])
Prints,
example

You can use following non-regex because you know example is a 7 letter word:
s.split('-')[1][:7]
For any arbitrary word, that would change to:
s.split('-')[1].split(':')[0]

many ways
using splitting:
example_str = str.split('-')[-1].split(':')[0]
This is fragile, and could break if there are more hyphens or colons in the string.
using regex:
import re
pattern = re.compile(r'-(.*):')
example_str = pattern.search(str).group(1)
This still expects a particular format, but is more easily adaptable (if you know how to write regexes).

I am not sure why do you want to get a particular word from a string. I guess you wanted to see if this word is available in given string.
if that is the case, below code can be used.
import re
str1 = "http://tests-example:123/wd/hub"
matched = re.findall('example',str1)

Split on the -, and then on :
s = "http://test-example:123/wd/hub"
print(s.split('-')[1].split(':')[0])
#example

using re
import re
text = "http://test-example:123/wd/hub"
m = re.search('(?<=-).+(?=:)', text)
if m:
print(m.group())

Python strings has built-in function find:
a="http://test-example:123/wd/hub"
b="http://test-exaaaample:123/wd/hub"
print(a.find('example'))
print(b.find('example'))
will return:
12
-1
It is the index of found substring. If it equals to -1, the substring is not found in string. You can also use in keyword:
'example' in 'http://test-example:123/wd/hub'
True

How can I make a regex match the entire string?

Suppose I have a string like test-123.
I want to test whether it matches a pattern like test-<number>, where <number> means one or more digit symbols.
I tried this code:
import re
correct_string = 'test-251'
wrong_string = 'test-123x'
regex = re.compile(r'test-\d+')
if regex.match(correct_string):
print 'Matching correct string.'
if regex.match(wrong_string):
print 'Matching wrong_string.'
How can I make it so that only the correct_string matches, and the wrong_string doesn't? I tried using .search instead of .match but it didn't help.

Try with specifying the start and end rules in your regex:
re.compile(r'^test-\d+$')

For exact match regex = r'^(some-regex-here)$'
^ : Start of string
$ : End of string

Since Python 3.4 you can use re.fullmatch to avoid adding ^ and $ to your pattern.
>>> import re
>>> p = re.compile(r'\d{3}')
>>> bool(p.match('1234'))
True
>>> bool(p.fullmatch('1234'))
False

I think It may help you -
import re
pattern = r"test-[0-9]+$"
s = input()
if re.match(pattern,s) :
print('matched')
else :
print('not matched')

You can try re.findall():
import re
correct_string = 'test-251'
if len(re.findall("test-\d+", correct_string)) > 0:
print "Match found"

A pattern such as \btest-\d+\b should do you;
matches = re.search(r'\btest-\d+\', search_string)
Demo
This requires the matching of word boundaries, so prevents other substrings from occuring after your desired match.

Breaking up substrings in Python based on characters

I am trying to write code that will take a string and remove specific data from it. I know that the data will look like the line below, and I only need the data within the " " marks, not the marks themselves.
inputString = 'type="NN" span="123..145" confidence="1.0" '
Is there a way to take a Substring of a string within two characters to know the start and stop points?

You can extract all the text between pairs of " characters using regular expressions:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
pat=re.compile('"([^"]*)"')
while True:
mat=pat.search(inputString)
if mat is None:
break
strings.append(mat.group(1))
inputString=inputString[mat.end():]
print strings
or, easier:
import re
inputString='type="NN" span="123..145" confidence="1.0" '
strings=re.findall('"([^"]*)"', inputString)
print strings
Output for both versions:
['NN', '123..145', '1.0']

fields = inputString.split('"')
print fields[1], fields[3], fields[5]

You could split the string at each space to get a list of 'key="value"' substrings and then use regular expressions to parse the substrings.
Using your input string:
>>> input_string = 'type="NN" span="123..145" confidence="1.0" '
>>> input_string_split = input_string.split()
>>> print input_string_split
[ 'type="NN"', 'span="123..145"', 'confidence="1.0"' ]
Then use regular expressions:
>>> import re
>>> pattern = r'"([^"]+)"'
>>> for substring in input_string_split:
match_obj = search(pattern, substring)
print match_obj.group(1)
NN
123..145
1.0
The regular expression '"([^"]+)"' matches anything within quotation marks (provided there is at least one character). The round brackets indicate the bit of the regular expression that you are interested in.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Regex find all matches after specific word - python

Related

Regular expression for YYYY-MM-DDTHH:MM:SS is not detecting the presence of .00Z [duplicate]

How to start at a specific letter and end when it hits a digit?

Getting word from string

How can I make a regex match the entire string?

Breaking up substrings in Python based on characters

Categories

Resources