Find string between two patterns with an AND condition in Python - python

I would like identify the string of characters that is between two patterns (lettre/ and " in example). In addition, the identified string should not correspond to a third pattern (somth?other in example).
Python 3.7 running on MAC OSX 10.13
import re
strings = ['lettre/abc"','lettre/somth?other"','lettre/acc"','lettre/edf"de','lettre/nhy"','lettre/somth?other"']
res0_1 = re.search('lettre/.*?\"', strings[0])
res1_1 = re.search('lettre/.*?\"', strings[1])
print(res0_1)
<re.Match object; span=(0, 11), match='lettre/abc"'>
print(res1_1)
<re.Match object; span=(0, 19), match='lettre/somth?other"'>
res0_2 = re.search('lettre/(.*?\"&^[somth\?other])', strings[0])
res1_2 = re.search('lettre/(.*?\"&^[somth\?other])', strings[1])
print(res0_2)
None
print(res1_2)
None
I would like to get res0_1 for strings[0] and res1_2 for strings[1].

As I understand it
Try this:
import re
strings = ['lettre/abc"','lettre/somth?other"','lettre/acc"','lettre/edf"de','lettre/nhy"','lettre/somth?other"']
res0_1 = re.findall('lettre/(.*)\"', strings[0])
res1_2 = re.findall('lettre/(.*)\"', strings[1])
print(res0_1)
print(res1_2)
Hope it helps

I think below code can give you what you asked in the question.
import re
strings = ['lettre/abc"','lettre/somth?other"','lettre/acc"','lettre/edf"de','lettre/nhy"','lettre/somth?other"']
for i in strings:
if 'somth?other' not in i.split('/')[1]:
print(i.split('/')[1].split('"')[0])

Since you do not want to get a match if there is somth?other to the right of / you may use
r'lettre/(?!somth\?other)[^"]*"'
See the regex demo and the regex graph:
Details
lettre/ - a literal substring
(?!somth\?other) - no somth?other substring allowed immediately to the right of the current location
[^"]* - 0+ chars other than "
" - a double quotation mark.

Try to use this site instead of try and error.
https://regex101.com/
In [7]: import re
...: strings = ['lettre/abc"','lettre/somth?other"','lett
...: re/acc"','lettre/edf"de','lettre/nhy"','lettre/somth
...: ?other"']
...:
In [8]: c = re.compile('(?=lettre/.*?\")(^((?!.*somth\?other
...: .*).)*$)')
In [9]: for string in strings:
...: print(c.match(string))
...:
<re.Match object; span=(0, 11), match='lettre/abc"'>
None
<re.Match object; span=(0, 11), match='lettre/acc"'>
<re.Match object; span=(0, 13), match='lettre/edf"de'>
<re.Match object; span=(0, 11), match='lettre/nhy"'>
None

Related

Extract from (a word) to (another word) in a string using REGEX

I'm trying to extract an entire piece of text using a REGEX expression, but i can't find the right syntax.
For Example this can be my string (that comes from .read):
Here there are some stuff that can be whatever
Run: 55 / 100
Here there are some stuff that can be whatever
DOCKED: ENDMDL
Here there are some stuff that can be whatever
I want to extract from "Run:" to "ENDMDL"
So for now I'm arrived here:
with open("docking.txt","r") as f:
new_content = f.read()
pattern_tot = r'(\w{3}\W\s{3})(\d+)(\s/\s)(\d\d)(.+)(DOCKED:\sENDMDL)'
pattern_2 = r'(\w{3}\W\s{3})(\d+)(\s/\s)(\d\d)'
for i in re.finditer(pattern_2,new_content):
print(i)
The ouput is:
<re.Match object; span=(6242, 6255), match='Run: 1 / 10'>
<re.Match object; span=(10453, 10466), match='Run: 2 / 10'>
<re.Match object; span=(14664, 14677), match='Run: 3 / 10'>
<re.Match object; span=(18875, 18888), match='Run: 4 / 10'>
<re.Match object; span=(23086, 23099), match='Run: 5 / 10'>
<re.Match object; span=(423401, 423416), match='Run: 100 / 10'>
With pattern_2 i do have the right output (see above).
If i use pattern_tot, it just does not return me anything.
I understood that the problem is somewhere in the pattern_tot regex expression r'(\w{3}\W\s{3})(\d+)(\s/\s)(\d\d)(.+)(DOCKED:\sENDMDL)' (probably (.+)). I don't really know what to use instead.
You can use re.findall method by providing the pattern to match your case for finding the substring between two strings, then it will return list of all matches in a string:
import re
str = "Here there are some stuff that can be whatever1\
Run: 55 / 100\
Here there are some stuff that can be whatever2\
DOCKED: ENDMDL \
Here there are some stuff that can be whatever3\
Run: 80 / 100\
Here there are some stuff that can be whatever4\
DOCKED: ENDMDL "
matches = re.findall('Run:(.*?)ENDMDL', str)
print(matches)
Output:
[' 55 / 100 Here there are some stuff that can be whatever2DOCKED: ', ' 80 / 100 Here there are some stuff that can be whatever4DOCKED: ']
In your case when reading a text file you should enable re.DOTALL flag to match also newlines in the pattern:
re.findall('Run:(.*?)ENDMDL', str, re.DOTALL)
Update:
You could also define function to find string between 2 strings
def find_between2str(start, end, text):
return re.findall(f'{start}(.*?){end}', text, re.DOTALL)
matches = find_between2str("Run:", "ENDMDL", str)

How do I validate this string that can be enclosed between one or more spaces (but never none) via a regex?

This is my code:
input_text_l = "hjahsdjhas pal sahashjas"
regex= re.compile(r"\s*\¿?(?:pal1|pal2|pal)\s*\??") #THIS IS THE REGEX THAT DOES NOT WORK CORRECTLY
if regex.search(input_text_l):
not_tag = " ".join(regex_tag.split(input_text_l))
#print(not_tag)
else:
pass
And this is a simple diagram on how the regular expression should work.
I hope you can help me with this.
Don't do complicated things: \s+…\s+ is sufficient. + is actually exactly what you wrote: "one or more but not none"
import re
re.search('\s+pal\s+', input_text_l)
or for several patterns:
re.search('\s+(?:pal1|pal2|pal)\s+', input_text_l)
Example:
>>> input_text_l = "hjahsdjhas pal sahashjas"
>>> re.search('\s+(?:pal1|pal2|pal)\s+', input_text_l)
<re.Match object; span=(10, 15), match=' pal '>

Why on Windows, python3, os.path.abspath doesn't deal with leading slashes the same way if it's just a dir or if it's more?

On Windows, python3:
>>> print(os.path.abspath("//foo/foo.txt"))
\\foo\foo.txt
>>> print(os.path.abspath("//foo"))
\foo
on python2:
>>> print(os.path.abspath("//foo/foo.txt"))
\\foo\foo.txt
>>> print(os.path.abspath("//foo"))
\\foo
why is this the case?
And how would you deal with this, given that I have to compare paths together, and some are just like the first example, and others like the second?
The only horrible way I have to find this would be:
In [34]: re.match(r"^(//|\\\\)(?!.+(/|\\))", "//foo")
Out[34]: <re.Match object; span=(0, 2), match='//'>
In [35]: re.match(r"^(//|\\\\)(?!.+(/|\\))", "\\\\foo")
Out[35]: <re.Match object; span=(0, 2), match='\\\\'>
In [36]: re.match(r"^(//|\\\\)(?!.+(/|\\))", "//foo/bar")
In [37]: re.match(r"^(//|\\\\)(?!.+(/|\\))", "\\\\foo\\bar")
So I end up having to do something like:
file_path = "//foo"
match = False
if re.match(r"^(//|\\\\)(?!.+(/|\\))", file_path):
match = True
file_path = os.path.abspath(file_path)
if match:
file_path = file_path.replace("\\", "\\\\")
Actually, Python 3 is right and Python 2 is not. UNC paths must be composed of at least two "components":
a server or hostname
a share name
The server and the share name make up the volume.
more info here

How to convert REGEX array to String array in Python Chatbot?

I have a Chatbot with interactive communication.I used nltk library.I have modified Chat class for necessary functions.I want to save session.However I did it.But when I print the list which has session record, just print different way from I expect.
Output : [<re.Match object; span=(0, 9), match='Hello'>, <re.Match object; span=(0, 4), match='Fine,How are you'>, <re.Match object; span=(0, 6), match='Thanks'>, <re.Match object; span=(0, 3), match='bye'>]
How can I convert this array to normal String array ? I just need
match ='blah blah'
part.Thanks all.
Try:
l = [m.group(0) for m in matches]
where matches is the array of match objects you started with.
This will give you for l:
['Hello', 'Fine,How are you', 'Thanks', 'bye']

Python Regex: OR statement does not work in regex module

Hi I want apply the following expression to check substitutions, insertions, deletion counts. However the OR statement seems like it does not work. Regex check only the first statement in the paranthesis.
For example:
correct_string = "20181201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
regex.fullmatch(regex_pattern, correct_string)
Output:
<regex.Match object; span=(0, 8), match='20181201', fuzzy_counts=(1, 0, 0)>
It says there is one substitution because of the 5th digit however if in the OR statement it exist
Another example:
correct_string = "20180201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
regex.fullmatch(regex_pattern, correct_string)
Output:
<regex.Match object; span=(0, 8), match='20180201'>
In this case it says there are no substitutions which is correct according to first statement in the OR.
How can I solve this. Thank you.
You need to use regex.ENHANCEMATCH:
By default, fuzzy matching searches for the first match that meets the given constraints. The ENHANCEMATCH flag will cause it to attempt to improve the fit (i.e. reduce the number of errors) of the match that it has found.
Python demo:
import regex
correct_string = "20181201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
print(regex.fullmatch(regex_pattern, correct_string, regex.ENHANCEMATCH))
// => <regex.Match object; span=(0, 8), match='20181201'>
See the online Python demo.

Categories