How to convert REGEX array to String array in Python Chatbot? - python

I have a Chatbot with interactive communication.I used nltk library.I have modified Chat class for necessary functions.I want to save session.However I did it.But when I print the list which has session record, just print different way from I expect.
Output : [<re.Match object; span=(0, 9), match='Hello'>, <re.Match object; span=(0, 4), match='Fine,How are you'>, <re.Match object; span=(0, 6), match='Thanks'>, <re.Match object; span=(0, 3), match='bye'>]
How can I convert this array to normal String array ? I just need
match ='blah blah'
part.Thanks all.

Try:
l = [m.group(0) for m in matches]
where matches is the array of match objects you started with.
This will give you for l:
['Hello', 'Fine,How are you', 'Thanks', 'bye']

Related

Extract from (a word) to (another word) in a string using REGEX

I'm trying to extract an entire piece of text using a REGEX expression, but i can't find the right syntax.
For Example this can be my string (that comes from .read):
Here there are some stuff that can be whatever
Run: 55 / 100
Here there are some stuff that can be whatever
DOCKED: ENDMDL
Here there are some stuff that can be whatever
I want to extract from "Run:" to "ENDMDL"
So for now I'm arrived here:
with open("docking.txt","r") as f:
new_content = f.read()
pattern_tot = r'(\w{3}\W\s{3})(\d+)(\s/\s)(\d\d)(.+)(DOCKED:\sENDMDL)'
pattern_2 = r'(\w{3}\W\s{3})(\d+)(\s/\s)(\d\d)'
for i in re.finditer(pattern_2,new_content):
print(i)
The ouput is:
<re.Match object; span=(6242, 6255), match='Run: 1 / 10'>
<re.Match object; span=(10453, 10466), match='Run: 2 / 10'>
<re.Match object; span=(14664, 14677), match='Run: 3 / 10'>
<re.Match object; span=(18875, 18888), match='Run: 4 / 10'>
<re.Match object; span=(23086, 23099), match='Run: 5 / 10'>
<re.Match object; span=(423401, 423416), match='Run: 100 / 10'>
With pattern_2 i do have the right output (see above).
If i use pattern_tot, it just does not return me anything.
I understood that the problem is somewhere in the pattern_tot regex expression r'(\w{3}\W\s{3})(\d+)(\s/\s)(\d\d)(.+)(DOCKED:\sENDMDL)' (probably (.+)). I don't really know what to use instead.
You can use re.findall method by providing the pattern to match your case for finding the substring between two strings, then it will return list of all matches in a string:
import re
str = "Here there are some stuff that can be whatever1\
Run: 55 / 100\
Here there are some stuff that can be whatever2\
DOCKED: ENDMDL \
Here there are some stuff that can be whatever3\
Run: 80 / 100\
Here there are some stuff that can be whatever4\
DOCKED: ENDMDL "
matches = re.findall('Run:(.*?)ENDMDL', str)
print(matches)
Output:
[' 55 / 100 Here there are some stuff that can be whatever2DOCKED: ', ' 80 / 100 Here there are some stuff that can be whatever4DOCKED: ']
In your case when reading a text file you should enable re.DOTALL flag to match also newlines in the pattern:
re.findall('Run:(.*?)ENDMDL', str, re.DOTALL)
Update:
You could also define function to find string between 2 strings
def find_between2str(start, end, text):
return re.findall(f'{start}(.*?){end}', text, re.DOTALL)
matches = find_between2str("Run:", "ENDMDL", str)

Why on Windows, python3, os.path.abspath doesn't deal with leading slashes the same way if it's just a dir or if it's more?

On Windows, python3:
>>> print(os.path.abspath("//foo/foo.txt"))
\\foo\foo.txt
>>> print(os.path.abspath("//foo"))
\foo
on python2:
>>> print(os.path.abspath("//foo/foo.txt"))
\\foo\foo.txt
>>> print(os.path.abspath("//foo"))
\\foo
why is this the case?
And how would you deal with this, given that I have to compare paths together, and some are just like the first example, and others like the second?
The only horrible way I have to find this would be:
In [34]: re.match(r"^(//|\\\\)(?!.+(/|\\))", "//foo")
Out[34]: <re.Match object; span=(0, 2), match='//'>
In [35]: re.match(r"^(//|\\\\)(?!.+(/|\\))", "\\\\foo")
Out[35]: <re.Match object; span=(0, 2), match='\\\\'>
In [36]: re.match(r"^(//|\\\\)(?!.+(/|\\))", "//foo/bar")
In [37]: re.match(r"^(//|\\\\)(?!.+(/|\\))", "\\\\foo\\bar")
So I end up having to do something like:
file_path = "//foo"
match = False
if re.match(r"^(//|\\\\)(?!.+(/|\\))", file_path):
match = True
file_path = os.path.abspath(file_path)
if match:
file_path = file_path.replace("\\", "\\\\")
Actually, Python 3 is right and Python 2 is not. UNC paths must be composed of at least two "components":
a server or hostname
a share name
The server and the share name make up the volume.
more info here

Find string between two patterns with an AND condition in Python

I would like identify the string of characters that is between two patterns (lettre/ and " in example). In addition, the identified string should not correspond to a third pattern (somth?other in example).
Python 3.7 running on MAC OSX 10.13
import re
strings = ['lettre/abc"','lettre/somth?other"','lettre/acc"','lettre/edf"de','lettre/nhy"','lettre/somth?other"']
res0_1 = re.search('lettre/.*?\"', strings[0])
res1_1 = re.search('lettre/.*?\"', strings[1])
print(res0_1)
<re.Match object; span=(0, 11), match='lettre/abc"'>
print(res1_1)
<re.Match object; span=(0, 19), match='lettre/somth?other"'>
res0_2 = re.search('lettre/(.*?\"&^[somth\?other])', strings[0])
res1_2 = re.search('lettre/(.*?\"&^[somth\?other])', strings[1])
print(res0_2)
None
print(res1_2)
None
I would like to get res0_1 for strings[0] and res1_2 for strings[1].
As I understand it
Try this:
import re
strings = ['lettre/abc"','lettre/somth?other"','lettre/acc"','lettre/edf"de','lettre/nhy"','lettre/somth?other"']
res0_1 = re.findall('lettre/(.*)\"', strings[0])
res1_2 = re.findall('lettre/(.*)\"', strings[1])
print(res0_1)
print(res1_2)
Hope it helps
I think below code can give you what you asked in the question.
import re
strings = ['lettre/abc"','lettre/somth?other"','lettre/acc"','lettre/edf"de','lettre/nhy"','lettre/somth?other"']
for i in strings:
if 'somth?other' not in i.split('/')[1]:
print(i.split('/')[1].split('"')[0])
Since you do not want to get a match if there is somth?other to the right of / you may use
r'lettre/(?!somth\?other)[^"]*"'
See the regex demo and the regex graph:
Details
lettre/ - a literal substring
(?!somth\?other) - no somth?other substring allowed immediately to the right of the current location
[^"]* - 0+ chars other than "
" - a double quotation mark.
Try to use this site instead of try and error.
https://regex101.com/
In [7]: import re
...: strings = ['lettre/abc"','lettre/somth?other"','lett
...: re/acc"','lettre/edf"de','lettre/nhy"','lettre/somth
...: ?other"']
...:
In [8]: c = re.compile('(?=lettre/.*?\")(^((?!.*somth\?other
...: .*).)*$)')
In [9]: for string in strings:
...: print(c.match(string))
...:
<re.Match object; span=(0, 11), match='lettre/abc"'>
None
<re.Match object; span=(0, 11), match='lettre/acc"'>
<re.Match object; span=(0, 13), match='lettre/edf"de'>
<re.Match object; span=(0, 11), match='lettre/nhy"'>
None

Python Regex: OR statement does not work in regex module

Hi I want apply the following expression to check substitutions, insertions, deletion counts. However the OR statement seems like it does not work. Regex check only the first statement in the paranthesis.
For example:
correct_string = "20181201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
regex.fullmatch(regex_pattern, correct_string)
Output:
<regex.Match object; span=(0, 8), match='20181201', fuzzy_counts=(1, 0, 0)>
It says there is one substitution because of the 5th digit however if in the OR statement it exist
Another example:
correct_string = "20180201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
regex.fullmatch(regex_pattern, correct_string)
Output:
<regex.Match object; span=(0, 8), match='20180201'>
In this case it says there are no substitutions which is correct according to first statement in the OR.
How can I solve this. Thank you.
You need to use regex.ENHANCEMATCH:
By default, fuzzy matching searches for the first match that meets the given constraints. The ENHANCEMATCH flag will cause it to attempt to improve the fit (i.e. reduce the number of errors) of the match that it has found.
Python demo:
import regex
correct_string = "20181201"
regex_pattern = r"((20[0-9]{2})(0[1-9]|1[0-2])(0[1-9]|1[0-9]|2[0-9]|3[0-1])){e}"
print(regex.fullmatch(regex_pattern, correct_string, regex.ENHANCEMATCH))
// => <regex.Match object; span=(0, 8), match='20181201'>
See the online Python demo.

Match a completed string with regex pattern

I have a string,
s = '`re.``search`(*pattern*, *string*, *flags=0*)',
Easily it produces such a result using sub
In [100]: re.sub(r'[`*]','',s)
Out[100]: 're.search(pattern, string, flags=0)'
I'd like to refactor it by writing a whole regex pattern instead of substituting.
In [101]: re.search(r'[^`*]+',s)
Out[101]: <_sre.SRE_Match object; span=(1, 4), match='re.'>
It stops at first match 're., while I intend to retrieve the completed.
How to accomplish such a task?

Categories