Traceback for regular expression

Traceback for regular expression - python

Lets say i have a regular expression:
match = re.search(pattern, content)
if not match:
raise Exception, 'regex traceback' # i want to throw here the regex matching process.
If regular expression fails to match then i want to throw in exception Its working and where it fails to match the regular expression pattern, at what stage etc. Is it possible even to achieve the desired functionality?

I have something that helps me to debug complex regex patterns among my codes.
Does this help you ? :
import re
li = ('ksjdhfqsd\n'
'5 12478 abdefgcd ocean__12 ty--\t\t ghtr789\n'
'qfgqrgqrg',
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n',
'2 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877',
'9 54879 bbdecddf antarctic__13 18:13pomodoro\t\t ghtr6798',
'ksjdhfqsd\n'
'5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\n'
'qfgqrgqrg',
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n',
'25 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877',
'9 54879 bbdeYddf antarctic__13 18:13pomodoro\t\t ghtr6798')
tupleRE = ('^\d',
' ',
'\d{5}',
' ',
'[abcdefghi]+',
' ',
'(?=[a-z\d_ ]{14} [^ ]+\t\t ght)',
'[a-z]+',
'__',
'[\d]+',
' +',
'[^\t]+',
'\t\t',
' ',
'ght',
'(r[5-9]+|u[0-4]+)',
'$')
def REtest(ch, tuplRE, flags = re.MULTILINE):
for n in xrange(len(tupleRE)):
regx = re.compile(''.join(tupleRE[:n+1]), flags)
testmatch = regx.search(ch)
if not testmatch:
print '\n -*- tupleRE :\n'
print '\n'.join(str(i).zfill(2)+' '+repr(u)
for i,u in enumerate(tupleRE[:n]))
print ' --------------------------------'
# tupleRE doesn't works because of element n
print str(n).zfill(2)+' '+repr(tupleRE[n])\
+" doesn't match anymore from this ligne "\
+str(n)+' of tupleRE'
print '\n'.join(str(n+1+j).zfill(2)+' '+repr(u)
for j,u in enumerate(tupleRE[n+1:
min(n+2,len(tupleRE))]))
for i in xrange(n):
match = re.search(''.join(tupleRE[:n-i]),ch, flags)
if match:
break
matching_portion = match.group()
matching_li = '\n'.join(map(repr,
matching_portion.splitlines(True)[-5:]))
fin_matching_portion = match.end()
print ('\n\n -*- Part of the tested string which is concerned :\n\n'
'######### matching_portion ########\n'+matching_li + '\n'
'##### end of matching_portion #####\n'
'-----------------------------------\n'
'######## unmatching_portion #######')
print '\n'.join(map(repr,
ch[fin_matching_portion:
fin_matching_portion+300].splitlines(True)) )
break
else:
print '\n SUCCES . The regex integrally matches.'
for x in li:
print ' -*- Analyzed string :\n%r' % x
REtest(x,tupleRE)
print '\nmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm'
result
-*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__12 ty--\t\t ghtr789\nqfgqrgqrg'
SUCCESS . The regex integrally matches.
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n'
SUCCESS . The regex integrally matches.
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'2 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877'
SUCCESS . The regex integrally matches.
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'9 54879 bbdecddf antarctic__13 18:13pomodoro\t\t ghtr6798'
SUCCESS . The regex integrally matches.
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\nqfgqrgqrg'
-*- tupleRE :
00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
--------------------------------
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)' doesn't match anymore from this ligne 6 of tupleRE
07 '[a-z]+'
-*- Part of the tested string which is concerned :
######### matching_portion ########
'5 12478 abdefgcd '
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'ocean__1247101247887 ty--\t\t ghtr789\n'
'qfgqrgqrg'
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n'
-*- tupleRE :
00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'
07 '[a-z]+'
08 '__'
09 '[\\d]+'
10 ' +'
11 '[^\t]+'
12 '\t\t'
13 ' '
14 'ght'
15 '(r[5-9]+|u[0-4]+)'
--------------------------------
16 '$' doesn't match anymore from this ligne 16 of tupleRE
-*- Part of the tested string which is concerned :
######### matching_portion ########
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'940\n'
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'25 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877'
-*- tupleRE :
00 '^\\d'
--------------------------------
01 ' ' doesn't match anymore from this ligne 1 of tupleRE
02 '\\d{5}'
-*- Part of the tested string which is concerned :
######### matching_portion ########
'2'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'5 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877'
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'9 54879 bbdeYddf antarctic__13 18:13pomodoro\t\t ghtr6798'
-*- tupleRE :
00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
--------------------------------
05 ' ' doesn't match anymore from this ligne 5 of tupleRE
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'
-*- Part of the tested string which is concerned :
######### matching_portion ########
'9 54879 bbde'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'Yddf antarctic__13 18:13pomodoro\t\t ghtr6798'
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm

I've used Kodos (http://kodos.sourceforge.net/about.html) in the past to perform RegEx debugging. It's not the ideal solution since you want something for run-time, but it may be helpful to you.

if you need to test the re, you can probably use groups followed by * ... as in ( sometext)*
use this along w/ your desired regex, and then you should be able to pluck out your failure locations
and then leverage the following, as stated on python.org
pos
The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.
endpos
The value of endpos which was passed to the search() or match() method of the > RegexObject. This is the index into the string beyond which the RE engine will not go.
lastindex
The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and ((ab)) will have lastindex == 1 if applied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applied to the same string.
lastgroup
The name of the last matched capturing group, or None if the group didn’t have a name, or if no group was matched at all.
re
The regular expression object whose match() or search() method produced this MatchObject instance.
string
The string passed to match() or search().
so for a very simple example
>>> m1 = re.compile(r'the real thing')
>>> m2 = re.compile(r'(the)* (real)* (thing)*')
>>> if not m1.search(mytextvar):
>>> res = m2.search(mytextvar)
>>> print res.lastgroup
>>> #raise my exception

Related

Removing different string patterns from Pandas column

I have the following column which consists of email subject headers:
Subject
EXT || Transport enquiry
EXT || RE: EXTERNAL: RE: 0001 || Copy of enquiry
EXT || FW: Model - Jan
SV: [EXTERNAL] Calculations
What I want to achieve is:
Subject
Transport enquiry
0001 || Copy of enquiry
Model - Jan
Calculations
and for this I am using the below code which only takes into account the first regular expression that I am passing and ignoring the rest
def clean_subject_prelim(text):
text = re.sub(r'^EXT \|\| $' , '' , text)
text = re.sub(r'EXT \|\| RE: EXTERNAL: RE:', '' , text)
text = re.sub(r'EXT \|\| FW:', '' , text)
text = re.sub(r'^SV: \[EXTERNAL]$' , '' , text)
return text
df['subject_clean'] = df['Subject'].apply(lambda x: clean_subject_prelim(x))
Why this is not working, what am I missing here?

You can use
pattern = r"""(?mx) # MULTILINE mode on
^ # start of string
(?: # non-capturing group start
EXT\s*\|\|\s*(?:RE:\s*EXTERNAL:\s*RE:|FW:)? # EXT || or EXT || RE: EXTERNAL: RE: or EXT || FW:
| # or
SV:\s*\[EXTERNAL]# SV: [EXTERNAL]
) # non-capturing group end
\s* # zero or more whitespaces
"""
df['subject_clean'] = df['Subject'].str.replace(pattern', '', regex=True)
See the regex demo.
Since the re.X ((?x)) is used, you should escape literal spaces and # chars, or just use \s* or \s+ to match zero/one or more whitespaces.

Get rid of the $ sign in the first expression and switch some of regex expressions from place. Like this:
import pandas as pd
import re
def clean_subject_prelim(text):
text = re.sub(r'EXT \|\| RE: EXTERNAL: RE:', '' , text)
text = re.sub(r'EXT \|\| FW:', '' , text)
text = re.sub(r'^EXT \|\|' , '' , text)
text = re.sub(r'^SV: \[EXTERNAL]' , '' , text)
return text
data = {"Subject": [
"EXT || Transport enquiry",
"EXT || RE: EXTERNAL: RE: 0001 || Copy of enquiry",
"EXT || FW: Model - Jan",
"SV: [EXTERNAL] Calculations"]}
df = pd.DataFrame(data)
df['subject_clean'] = df['Subject'].apply(lambda x: clean_subject_prelim(x))

Matching Regex on new line Python

The following Regex gives me this output (note that I am using Python):
Which is perfect and exactly how I want it to be. However when I match this code in Python it works but I doesn't capture the next line of vlans when I use groupdict (talking about the second entry):
{'port_name': 'Te1/0/1', 'description': 'CVH10 Mgt+Clstr', 'duplex': 'Full', 'speed': '10000', 'neg': 'Off', 'link_state': 'Up', 'flow_control': 'On', 'mode': ' T', 'vlans': '(1),161-163'}
{'port_name': 'Te1/0/2', 'description': 'CVH10 VM 1', 'duplex': 'Full', 'speed': '10000', 'neg': 'Off', 'link_state': 'Up', 'flow_control': 'On', 'mode': ' T', 'vlans': '(1),11,101,110,'}
{'port_name': 'Fo2/1/1', 'description': None, 'duplex': 'N/A', 'speed': 'N/A', 'neg': 'N/A', 'link_state': 'Detach', 'flow_control': 'N/A', 'mode': None, 'vlans': None}
{'port_name': 'Te2/0/8', 'description': None, 'duplex': 'Full', 'speed': '10000', 'neg': 'Off', 'link_state': 'Down', 'flow_control': 'Off', 'mode': ' A', 'vlans': '1'}
As you can see in the Regex above the second entry matches 19 vlans, but the Python output only gives me 4. How can I fix this?
This is the code that I'm running:
from sys import argv
import re
import pprint
pp = pprint.PrettyPrinter()
script, filename = argv
interface_details = re.compile(r'^(?P<port_name>[\w\/]+)[^\S\r\n]+(?P<description>(?!Full\b|N\/A\b)\S+(?:[^\S\r\n]+\S+)*?)?\s+(?P<duplex>Full|N\/A)\b\s+(?P<speed>[\d\w\/]+)\s+(?P<neg>[\w\/]+)\s+(?P<link_state>[\w]+)\s+(?P<flow_control>[\w\/]+)(?:(?P<mode>[^\S\r\n]+\w+)(?:[^\S\r\n]+(?P<vlans>[\d(),-]+(?:\r?\n[^\S\r\n]+[\d(),-]+)*))?)?')
local_list = []
def main():
with open(filename) as current_file:
for linenumber, line in enumerate(current_file, 1):
working_dict = {}
interface_details_result = interface_details.match(line)
if interface_details_result is not None:
working_dict.update(interface_details_result.groupdict())
local_list.append(working_dict)
for each in local_list:
print(each)
if __name__ == '__main__':
main()
Note that I'm using argv so it's runned as: python3 main.py test.txt
The data of the text file is listed below
>show interfaces status
Port Description Duplex Speed Neg Link Flow M VLAN
State Ctrl
--------- --------------- ------ ------- ---- ------ ----- -- -------------------
Te1/0/1 CVH10 Mgt+Clstr Full 10000 Off Up On T (1),161-163
Te1/0/2 CVH10 VM 1 Full 10000 Off Up On T (1),11,101,110,
120,130,140,150,
160,170,180,190,
200,210,230,240,
250,666,999
Fo2/1/1 N/A N/A N/A Detach N/A
Te2/0/8 Full 10000 Off Down Off A 1

Currently you are reading separate lines, so the pattern will not match for the lines that have only this:
120,130,140,150,
What you could do is read the whole file instead using current_file.read() and add re.M enabling multiline.
In your code you are using this, which will first update the dict, and then append the working_dict resulting in n times the same (last) value as it points to the same dict.
working_dict.update(interface_details_result.groupdict())
local_list.append(working_dict)
If you want to gather all the groupdict's in a list, you can append it using local_list.append(m.groupdict())
import re
import pprint
pp = pprint.PrettyPrinter()
interface_details = re.compile(r'^(?P<port_name>[\w\/]+)[^\S\r\n]+(?P<description>(?!Full\b|N\/A\b)\S+(?:[^\S\r\n]+\S+)*?)?\s+(?P<duplex>Full|N\/A)\b\s+(?P<speed>[\d\w\/]+)\s+(?P<neg>[\w\/]+)\s+(?P<link_state>[\w]+)\s+(?P<flow_control>[\w\/]+)(?:(?P<mode>[^\S\r\n]+\w+)(?:[^\S\r\n]+(?P<vlans>[\d(),-]+(?:\r?\n[^\S\r\n]+[\d(),-]+)*))?)?', re.M)
def main():
local_list = []
with open(filename) as current_file:
all_lines = re.finditer(interface_details, current_file.read())
for m in all_lines:
local_list.append(m.groupdict())
for each in local_list:
print(each)
if __name__ == '__main__':
main()

You are matching line by line.
Te1/0/2 CVH10 VM 1 Full 10000 Off Up On T (1),11,101,110,
120,130,140,150,
160,170,180,190,
200,210,230,240,
250,666,999
The first line which is-
Te1/0/2 CVH10 VM 1 Full 10000 Off Up On T (1),11,101,110,
passes your regex expression.
But the following lines doesn't. For example the second line is-
120,130,140,150,
For this interface_details.match(" 120,130,140,150,") doesn't match the regex.

Continuing #anirudh's answer,
test_str will hold the entire string data read from file and regex will be your regex
finditer() will return an iterable of matches. re.MULTILINE param will enable pattern search on the entire multi-line string/data
regex = r"^(?P<port_name>[\w\/]+)[^\S\r\n]+(?P<description>(?!Full\b|N\/A\b)\S+(?:[^\S\r\n]+\S+)*?)?\s+(?P<duplex>Full|N\/A)\b\s+(?P<speed>[\d\w\/]+)\s+(?P<neg>[\w\/]+)\s+(?P<link_state>[\w]+)\s+(?P<flow_control>[\w\/]+)(?:(?P<mode>[^\S\r\n]+\w+)(?:[^\S\r\n]+(?P<vlans>[\d(),-]+(?:\r?\n[^\S\r\n]+[\d(),-]+)*))?)?"
test_str = ("Port Description Duplex Speed Neg Link Flow M VLAN\n"
" State Ctrl\n"
"--------- --------------- ------ ------- ---- ------ ----- -- -------------------\n"
"Te1/0/1 CVH10 Mgt+Clstr Full 10000 Off Up On T (1),161-163\n"
"Te1/0/2 CVH10 VM 1 Full 10000 Off Up On T (1),11,101,110,\n"
" 120,130,140,150,\n"
" 160,170,180,190,\n"
" 200,210,230,240,\n"
" 250,666,999\n"
"Fo2/1/1 N/A N/A N/A Detach N/A\n"
"Te2/0/8 Full 10000 Off Down Off A 1")
for match in re.finditer(regex, test_str, re.MULTILINE):
print(match.groupdict())
This will get you the result you need. Above solution is a combination of this answer and the code generated from this site

How do I split a string and keep the separators using python re library?

Here is my code:
import re
string = r"('Option A' | 'Option B') & ('Option C' | 'Option D')"
word_list = re.split(r"[\(.\)]", string)
-> ['', "'Option A' | 'Option B'", ' & ', "'Option C' | 'Option D'", '']
I want the following result:
-> ["('Option A' | 'Option B')", ' & ', "('Option C' | 'Option D')"]

You can use re.findall to capture each parenthesis group:
import re
string = r"('Option A' | 'Option B') & ('Option C' | 'Option D')"
pattern = r"(\([^\)]+\))"
re.findall(pattern, string)
# ["('Option A' | 'Option B')", "('Option C' | 'Option D')"]
This also works with re.split
re.split(pattern, string)
# ['', "('Option A' | 'Option B')", ' & ', "('Option C' | 'Option D')", '']
If you want to remove empty elements from using re.split you can:
[s for s in re.split(pattern, string) if s]
# ["('Option A' | 'Option B')", ' & ', "('Option C' | 'Option D')"]
How the pattern works:
( begin capture group
\( matches the character ( literally
[^\)]+ Match between one and unlimited characters that are not )
\) matches the character ) literally
) end capture group

Convert a list of tab prefixed strings to a dictionary

Text mining attempts here, I would like to turn the below:
a=['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n'
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
to this:
{'Colors.of.the universe':{Black:111,Grey:222,White:11},
'Movies of the week':{Mission Impossible:121,Die_Hard:123,Jurassic Park:33},
'Lands.categories.said': {Desert:33212,forest:4532,grassland:431,tundra:243451}}
Tried this code below but it was not good:
{words[1]:words[1:] for words in a}
which gives
{'o': 'olors.of.the universe:\n',
' ': ' tundra : 243451\n',
'a': 'ands.categories.said:\n'}
It only takes the first word as the key which is not what's needed.
A dict comprehension is an interesting approach.

a = ['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n',
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
result = dict()
current_key = None
for w in a:
# If starts with tab - its an item (under category)
if w.startswith(' '):
# Splitting item (i.e. ' Desert: 33212\n' -> [' Desert', ' 33212\n']
splitted = w.split(':')
# Setting the key and the value of the item
# Removing redundant spaces and '\n'
# Converting value to number
k, v = splitted[0].strip(), int(splitted[1].replace('\n', ''))
result[current_key][k] = v
# Else, it's a category
else:
# Removing ':' and '\n' form category name
current_key = w.replace(':', '').replace('\n', '')
# If category not exist - create a dictionary for it
if not current_key in result.keys():
result[current_key] = {}
# {'Colors.of.the universe': {'Black': 111, 'Grey': 222, 'White': 11}, 'Movies of the week': {'Mission Impossible': 121, 'Die_Hard': 123, 'Jurassic Park': 33}, 'Lands.categories.said': {'Desert': 33212, 'forest': 4532, 'grassland': 431, 'tundra': 243451}}
print(result)

That's really close to valid YAML already. You could just quote the property labels and parse. And parsing a known format is MUCH superior to dealing with and/or inventing your own. Even if you're just exploring base python, exploring good practices is just as (probably more) important.
import re
import yaml
raw = ['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n',
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
# Fix spaces in property names
fixed = []
for line in raw:
match = re.match(r'^( *)(\S.*?): ?(\S*)\s*', line)
if match:
fixed.append('{indent}{safe_label}:{value}'.format(
indent = match.group(1),
safe_label = "'{}'".format(match.group(2)),
value = ' ' + match.group(3) if match.group(3) else ''
))
else:
raise Exception("regex failed")
parsed = yaml.load('\n'.join(fixed), Loader=yaml.FullLoader)
print(parsed)

Rewrite file content with new content

In a program - the program doesn't matter -, only the first lines, I open an empty file (named empty.txt).
Then I define functions, but never use them on main ... so, I do not actually write anything.
This the nearly complete code:
from os import chdir
chdir('C:\\Users\\Julien\\Desktop\\PS BOT')
fic=open('empty.txt','r+')
def addtodic(txt):
"""Messages de la forme !add id,message ; txt='id,message' """
fic.write(txt+'\n')
fic.seek(0)
def checkdic(txt):
"""Messages de la forme !lien id ; txt='id' """
for i in fic.readlines().split('\n'):
ind=i.index(',')
if i[:ind]==txt:
fic.seek(0)
return i[ind+1:]
fic.seek(0)
return 'Not found'
Then I launch it, and using the console, I simply ask "fic.write( 'tadam' )", like, to check if the writing works well before moving on.
%run "C:/Users/Julien/Desktop/PS BOT/dic.py"
fic
Out[8]: <open file 'empty.txt', mode 'r+' at 0x0000000008D9ED20>
fic.write('tadam')
fic.readline()
Out[10]: 'os import chdir\n'
fic.readline()
Out[11]: "chdir('C:\\\\Users\\\\Julien\\\\Desktop\\\\PS BOT')\n"
fic.readline()
Out[12]: '\n'
fic.readline()
Out[13]: "fic=open('empty.txt','r+')\n"
fic.readlines()
Out[14]:
['\n',
'def addtodic(txt):\n',
' """Messages de la forme !add id,message ; txt=\'id,message\' """\n',
' fic.seek(0)\n',
" fic.write(txt)+'\\n'\n",
'\n',
'def checkdic(txt):\n',
' """Messages de la forme !lien id ; txt=\'id\' """\n',
" for i in fic.readline().split('\\n'):\n",
" ind=i.index(',')\n",
' if i[:ind]==txt:\n',
' fic.seek(0)\n',
' return i[ind+1:]\n',
' fic.seek(0)\n',
" return 'Not found'\n",
' \n',
'def removedic(txt):\n',
' """Messages de la forme !remove id ; txt=\'id\' """\n',
' check=True\n',
' while check:\n',
' i=fic.readline()\n',
' if i[:len(txt)]==txt: \n',
' fic.seek(0)\n',
' return check\n',
'#removedic fauxeturn check\r\n',
"#removedic faux tmp_file = open(filename,'w')\n",
' tmp_file.write(data)\n',
' tmp_file.close()\n',
' return filename\n',
'\n',
' # TODO: This should be removed when Term is refactored.\n',
' def write(self,data):\n',
' """Write a string to the default output"""\n',
' io.stdout.write(data)\n',
'\n',
' # TODO: This should be removed when Term is refactored.\n',
' def write_err(self,data):\n',
' """Write a string to the default error output"""\n',
' io.stderr.write(data)\n',
'\n',
' def ask_yes_no(self, prompt, default=None):\n',
' if self.quiet:\n',
' return True\n',
' return ask_yes_no(prompt,default)\n',
'\n',
' def show_usage(self):\n',
' """Show a usage message"""\n',
' page.page(IPython.core.usage.interactive_usage)\n',
'\n',
' def extract_input_lines(self, range_str, raw=False):\n',
' """Return as a string a set of input history slices.\n',
'\n',
' Parameters\n',
' ----------\n',
' range_str : string\n',
' The set of slices is given as a string, like "~5/6-~4/2 4:8 9",\n',
' since this function is for use by magic functions which get their\n',
' arguments as strings. The number before the / is the session\n',
' number: ~n goes n back from the current session.\n',
'\n',
' Optional Parameters:\n',
' - raw(False): by default, the processed input is used. If this is\n',
' true, the raw input history is used instead.\n',
'\n',
' Note that slices can be called with two notations:\n',
'\n',
' N:M -> standard python form, means including items N...(M-1).\n',
'\n',
' N-M -> include items N..M (closed endpoint)."""\n',
' lines = self.history_manager.get_range_by_str(range_str, raw=raw)\n',
' return "\\n".join(x for _, _, x in lines)\n',
'\n',
' def find_user_code(self, target, raw=True, py_only=False, skip_encoding_cookie=True):\n',
' """Get a code string from history, file, url, or a string or macro.\n',
'\n',
' This is mainly used by magic functions.\n',
'\n',
' Parameters\n',
' ----------\n',
'\n',
' target : str\n',
'\n',
' A string specifying code to retrieve. This will be tried respectively\n',
' as: ranges of input history (see %history for syntax), url,\n',
' correspnding .py file, filename, or an expression evaluating to a\n',
' string or Macro in the user namespace.\n',
'\n',
' raw : bool\n',
' If true (default), retrieve raw history. Has no effect on the other\n',
' retrieval mechanisms.\n',
'\n',
' py_only : bool (default False)\n',
' Only try to fetch python code, do not try alternative methods to decode file\n',
' if unicode fails.\n',
'\n',
' Returns\n',
' -------\n',
' A string of code.\n',
'\n',
' ValueError is raised if nothing is found, and TypeError if it evaluates\n',
' to an object of another type. In each case, .args[0] is a printable\n',
' message.\n',
' """\n',
' code = self.extract_input_lines(target, raw=raw) # Grab history\n',
' if code:\n',
' return code\n',
' utarget = unquote_filename(target)\n',
' try:\n',
" if utarget.startswith(('http://', 'https://')):\n",
' return openpy.read_py_url(utarget, skip_encoding_cookie=skip_encoding_cookie)\n',
' except UnicodeDecodeError:\n',
' if not py_only :\n',
' from urllib import urlopen # Deferred import\n',
' response = urlopen(target)\n',
" return response.read().decode('latin1')\n",
' raise ValueError(("\'%s\' seem to be un']
KABOOM ! Has anybody an explanation ? By the way, I use Python 2.7 with Enthought Canopy.

When you open a file with 'r+', it doesn't get truncated, it still retains its old contents. To truncate it to 0 bytes, call fic.truncate(0) right after opening it.
You must seek between read and write operations on the same file object (otherwise the results are undefined because of buffering), e.g. add a fic.seek(0, 0) (or any other seek) after the write call.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Traceback for regular expression - python

I've used Kodos (http://kodos.sourceforge.net/about.html) in the past to perform RegEx debugging. It's not the ideal solution since you want something for run-time, but it may be helpful to you.

Related

Removing different string patterns from Pandas column

Matching Regex on new line Python

How do I split a string and keep the separators using python re library?

Convert a list of tab prefixed strings to a dictionary

Rewrite file content with new content

Categories

Resources