why are the spaces in between the words not showing up - python

I have a morse program but the spaces in between the words are not showing does any one have any ideas? Prefer the simplest way to do so
sample input:
APRIL FOOLS DAY
output for encode_Morse function:
' .- .--. .-. .. .-.. ..-. --- --- .-.. ... -.. .- -.-- '
output for the decode_Morse function:
APRILFOOLSDAY
MORSE_CODES={'A':' .- ','B':' -... ','C':' -.-. ',
'D':' -.. ','E':' . ','F':' ..-. ','G':' --. ',
'H':' .... ','I':' .. ','J':' .--- ','K':' -.- ',
'L':' .-.. ','M':' -- ','N':' -. ','O':' --- ',
'P':' .--. ','Q':' --.- ','R':' .-. ',
'S':' ... ','T':' - ','U':' ..- ','V':' ...- ',
'W':' .-- ','X':' -..- ','Y':' -.-- ','Z':' --.. '}
##Define functions here
def encode_Morse(my_msg):
#my_msg=my_msg.upper()
my_msg_Morse=""
for letter in my_msg:
if letter!=" " and letter not in MORSE_CODES:
my_msg_Morse+="*"
elif letter!=" ":
my_msg_Morse+= MORSE_CODES[letter]
else:
my_msg_Morse+=" "
return my_msg_Morse+""
def decode_Morse(my_msg):
string=""
for word in my_msg.split(" "):
for ch in word.split():
if ch!=" " and ch!="*":
string=string + list(MORSE_CODES.keys())[list(MORSE_CODES.values()).index(" "+ch+" ")]
elif ch==" ":
string+=" "
string=string+""
return string

The split function absorbes your delimiter
I propose :
def decode_Morse(my_msg):
words = []
for word in my_msg.split(" "):
string = ""
for ch in word.split():
string=string + list(MORSE_CODES.keys())[list(MORSE_CODES.values()).index(" "+ch+" ")]
words.append(string)
return " ".join(words)

I propse you this solution:
MORSE_CODES={
'A':'.-','B':'-...','C':'-.-.',
'D':'-..','E':'.','F':'..-.','G':'--.',
'H':'....','I':'..','J':'.---','K':'-.-',
'L':'.-..','M':'--','N':'-.','O':'---',
'P':'.--.','Q':'--.-','R':'.-.',
'S':'...','T':'-','U':'..-','V':'...-',
'W':'.--','X':'-..-','Y':'-.--','Z':'--..'
}
R_MORSE_CODES = {v:k for k,v in MORSE_CODES.items()}
def encode_morse(msg):
words = msg.split()
return " ".join(" ".join(MORSE_CODES.get(c, '*') for c in w) for w in words)
def decode_morse(msg):
words = msg.split(" ")
return " ".join("".join(R_MORSE_CODES.get(c, '?') for c in w.split()) for w in words)
# Original message
msg = "APRIL FOOLS DAY"
enc_msg = encode_morse(msg)
print(enc_msg)
# .- .--. .-. .. .-.. ..-. --- --- .-.. ... -.. .- -.--
dec_msg = decode_morse(enc_msg)
print(dec_msg)
# APRIL FOOLS DAY
Deviating from your solution, I
do not use spaces in the translation table between characters and morse codes.
use one space character to seperate single morse codes and two space to mark word separation
For back translation i reverse the dictionary keys and values to another translation table called R_MORSE_CODES for better readability.
Using one and two spaces is sufficient to allow compatibility to decode a morse code back to its original message, as long as any unknown characters appear.

Related

how to separate characters only if they have spaces between them?

I'm making a morse code translator in python, and I successfully created a program that translates words into morse code, but now I want to make an option to translate morse code into words. while I was doing so, I realized that if I wanted to translate a letter that uses more than 2 characters, it printed out the letters e and t. I deducted that this was caused by adding every character into a list and translating those separately. Is there a way i can check if there is a space between characters and separating them only if there is?
Here is my code so far:
codes = { ' ':' ', 'A':'.-', 'B':'-...',
'C':'-.-.', 'D':'-..', 'E':'.',
'F':'..-.', 'G':'--.', 'H':'....',
'I':'..', 'J':'.---', 'K':'-.-',
'L':'.-..', 'M':'--', 'N':'-.',
'O':'---', 'P':'.--.', 'Q':'--.-',
'R':'.-.', 'S':'...', 'T':'-',
'U':'..-', 'V':'...-', 'W':'.--',
'X':'-..-', 'Y':'-.--', 'Z':'--..',
'1':'.----', '2':'..---', '3':'...--',
'4':'....-', '5':'.....', '6':'-....',
'7':'--...', '8':'---..', '9':'----.',
'0':'-----', ', ':'--..--', '.':'.-.-.-',
'?':'..--..', '/':'-..-.', '-':'-....-',
'(':'-.--.', ')':'-.--.-'}
ask = input("A: translate english to code \nB: translate code to english").upper()
if ask == "A":
i = input("")
mylist = list(i)
for i in mylist:
if i == " ":
print(codes[i], end="", flush=True)
else:
print(codes[i.upper()] + " ", end="", flush=True)
elif ask == "B":
print("Make sure to add 1 space between letters and 2 spaces between words!")
i = input("")
mylist = list(i)
key_list = list(codes.keys())
val_list = list(codes.values())
for i in mylist:
position = val_list.index(i)
print(key_list[position], end="", flush=True)
The str.split() method without an argument splits on whitespace
Here's a simplification of your code.
# Modified to make ' ' entry be just single space
# This allows us to add a space after every character rather than treating space as special when generating Morse code
codes = { ' ':' ', 'A':'.-', 'B':'-...',
'C':'-.-.', 'D':'-..', 'E':'.',
'F':'..-.', 'G':'--.', 'H':'....',
'I':'..', 'J':'.---', 'K':'-.-',
'L':'.-..', 'M':'--', 'N':'-.',
'O':'---', 'P':'.--.', 'Q':'--.-',
'R':'.-.', 'S':'...', 'T':'-',
'U':'..-', 'V':'...-', 'W':'.--',
'X':'-..-', 'Y':'-.--', 'Z':'--..',
'1':'.----', '2':'..---', '3':'...--',
'4':'....-', '5':'.....', '6':'-....',
'7':'--...', '8':'---..', '9':'----.',
'0':'-----', ', ':'--..--', '.':'.-.-.-',
'?':'..--..', '/':'-..-.', '-':'-....-',
'(':'-.--.', ')':'-.--.-'}
# Generate reverse code table (i.e. to go from Morse code to english)
codes_rev = {v:k for k, v in codes.items()}
ask = input("A: translate english to code \nB: translate code to english").upper()
if ask == "A":
for letter in input("enter text: ").upper(): # can apply upper to all letters (leaves space unchanged)
# We space a space against all letters
print(codes[letter] + ' ', end="") # all letters are followed by a space
# this cause a space to be two spaces
elif ask == "B":
print("Make sure to add 1 space between letters and 2 spaces between words!")
for word in input("enter morse code: ").split(' '): # Words are separated by double spaces
for letter in word.split(' '): # letters are separated by single spaces
if letter: # handles case of empty string on split at end of line
print(codes_rev[letter], end="")
print(' ', end = "") # space between words
else:
print('A or B should le entered')
Usage
Encoding
A: translate english to code
B: translate code to englisha
enter text: A journey of a thousand miles begins with a single step.
.- .--- --- ..- .-. -. . -.-- --- ..-. .- - .... --- ..- ... .- -. -.. -- .. .-.. . ... -... . --. .. -. ... .-- .. - .... .- ... .. -. --. .-.. . ... - . .--. .-.-.-
Decoding
A: translate english to code
B: translate code to englishb
Make sure to add 1 space between letters and 2 spaces between words!
enter morse code: .- .--- --- ..- .-. -. . -.-- --- ..-. .- - .... --- ..- ... .- -. -.. -- .. .-.. . ... -... . --. .. -. ... .-- .. - .... .- ... .. -. --. .-.. . ... - . .--.
A JOURNEY OF A THOUSAND MILES BEGINS WITH A SINGLE STEP

Convert a list of tab prefixed strings to a dictionary

Text mining attempts here, I would like to turn the below:
a=['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n'
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
to this:
{'Colors.of.the universe':{Black:111,Grey:222,White:11},
'Movies of the week':{Mission Impossible:121,Die_Hard:123,Jurassic Park:33},
'Lands.categories.said': {Desert:33212,forest:4532,grassland:431,tundra:243451}}
Tried this code below but it was not good:
{words[1]:words[1:] for words in a}
which gives
{'o': 'olors.of.the universe:\n',
' ': ' tundra : 243451\n',
'a': 'ands.categories.said:\n'}
It only takes the first word as the key which is not what's needed.
A dict comprehension is an interesting approach.
a = ['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n',
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
result = dict()
current_key = None
for w in a:
# If starts with tab - its an item (under category)
if w.startswith(' '):
# Splitting item (i.e. ' Desert: 33212\n' -> [' Desert', ' 33212\n']
splitted = w.split(':')
# Setting the key and the value of the item
# Removing redundant spaces and '\n'
# Converting value to number
k, v = splitted[0].strip(), int(splitted[1].replace('\n', ''))
result[current_key][k] = v
# Else, it's a category
else:
# Removing ':' and '\n' form category name
current_key = w.replace(':', '').replace('\n', '')
# If category not exist - create a dictionary for it
if not current_key in result.keys():
result[current_key] = {}
# {'Colors.of.the universe': {'Black': 111, 'Grey': 222, 'White': 11}, 'Movies of the week': {'Mission Impossible': 121, 'Die_Hard': 123, 'Jurassic Park': 33}, 'Lands.categories.said': {'Desert': 33212, 'forest': 4532, 'grassland': 431, 'tundra': 243451}}
print(result)
That's really close to valid YAML already. You could just quote the property labels and parse. And parsing a known format is MUCH superior to dealing with and/or inventing your own. Even if you're just exploring base python, exploring good practices is just as (probably more) important.
import re
import yaml
raw = ['Colors.of.the universe:\n',
' Black: 111\n',
' Grey: 222\n',
' White: 11\n',
'Movies of the week:\n',
' Mission Impossible: 121\n',
' Die_Hard: 123\n',
' Jurassic Park: 33\n',
'Lands.categories.said:\n',
' Desert: 33212\n',
' forest: 4532\n',
' grassland : 431\n',
' tundra : 243451\n']
# Fix spaces in property names
fixed = []
for line in raw:
match = re.match(r'^( *)(\S.*?): ?(\S*)\s*', line)
if match:
fixed.append('{indent}{safe_label}:{value}'.format(
indent = match.group(1),
safe_label = "'{}'".format(match.group(2)),
value = ' ' + match.group(3) if match.group(3) else ''
))
else:
raise Exception("regex failed")
parsed = yaml.load('\n'.join(fixed), Loader=yaml.FullLoader)
print(parsed)

Python, find all the possible letter combinations in given morse code

I had to find all the possible letter combinations in a given morse code. The length of the decoded word can be maximum 10 letters. The given file with the letters and the morse code to it looks like this:
A .-
B -...
C -.-.
D -..
E .
F ..-.
G --.
H ....
I ..
J .---
K -.-
L .-..
M --
N -.
O ---
P .--.
Q --.-
R .-.
S ...
T -
U ..-
V ...-
W .--
X -..-
Y -.--
Z --..
The given morse code is this:
morse = '-.----.-.-...----.-.-.-.----.-'
My code looks like this:
def morse_file_to_dict(filename):
with open(filename) as file:
return dict(line.strip().split() for line in file)
def word_to_morse(s, my_dict):
return ''.join([my_dict[w] for w in s])
def adding_to_set(given_morse, my_set, my_dict, word='', start=0):
for char in my_dict:
if my_dict[char] == given_morse[start:start + len(my_dict[char])] and len(word) < 10:
start = start + len(my_dict[char])
word = word + char
adding_to_set(given_morse, my_set, my_dict, word, start)
if word_to_morse(word, my_dict) == given_morse:
my_set.add(word)
words = set()
morse = '-.----.-.-...----.-.-.-.----.-'
pairs = morse_file_to_dict('morse_alphabet.txt')
adding_to_set(morse, words, pairs)
print(len(words))
print(words)
My output is:
5
{'KMCBMQRKMK', 'KMCBMGKRMQ', 'KMCBMGCKMK', 'KMNCEJCCMQ', 'KMCDAMCCMQ'}
BUT, the answer should be: 10571 words, not 5
What should i change to get all of them?
Thank you for your time and answer!
I would suggest using recursion and a dictionary to map morse code to letters (not letters to morse code):
morseFile="""A .-
B -...
C -.-.
D -..
E .
F ..-.
G --.
H ....
I ..
J .---
K -.-
L .-..
M --
N -.
O ---
P .--.
Q --.-
R .-.
S ...
T -
U ..-
V ...-
W .--
X -..-
Y -.--
Z --.."""
morse = {code:letter for line in morseFile.split("\n") for letter,code in [line.split()]}
The function can be built as a generator to avoid storing all the possibilities in a big list:
def decode(coded,maxLen=10):
if not maxLen: return
for size in range(1,min(4,len(coded))+1):
code = coded[:size]
if code not in morse: continue
remaining = coded[size:]
if not remaining: yield morse[code]
for rest in decode(remaining,maxLen-1):
yield morse[code] + rest
output:
print(sum(1 for _ in decode("-.----.-.-...----.-.-.-.----.-")))
10571
for string in decode("-.----.-.-...----.-.-.-.----.-"):
if len(string)<9: print(string)
YQLWGCYQ
YQLWQRYQ
YQLJNCYQ
YQLJKRYQ
YQLJCNYQ
YQLJCKWQ
YQLJCKJK
YQLJCCMQ
YQLJCCOK
Here is a working solution. I made changes from codes and suggestions in comments and answers. (The Morse to translate is different too)
def word_to_morse(s, my_dict):
return ''.join([my_dict[w] for w in s])
def adding_to_set(given_morse, my_set, my_dict, word='', start=0):
for char in my_dict:
if my_dict[char] == given_morse[start:start + len(my_dict[char])] and len(word) < 10:
new_start = start + len(my_dict[char])
new_word = word + char
adding_to_set(given_morse, my_set, my_dict, new_word, new_start)
if word_to_morse(new_word, my_dict) == given_morse:
my_set.add(new_word)
words = set()
# the morse code I want to decrypt
morse = '.-.--...-....-.'
# adding morse alphabet here
pairs={'A': '.-', 'B': '-...', 'C': '-.-.',
'D': '-..', 'E': '.', 'F': '..-.',
'G': '--.', 'H': '....', 'I': '..',
'J': '.---', 'K': '-.-', 'L': '.-..',
'M': '--', 'N': '-.', 'O': '---',
'P': '.--.', 'Q': '--.-', 'R': '.-.',
'S': '...', 'T': '-', 'U': '..-',
'V': '...-', 'W': '.--', 'X': '-..-',
'Y': '-.--', 'Z': '--..',
}
adding_to_set(morse, words, pairs)
print(len(words))
print(words)
c++ solution:
#include <stdio.h>
#include <string.h>
#include <math.h>
#include <stdlib.h>
char buffer[26];
int l=0;
char *Morse[26];
//initializing Morse Code array
void initMorse(){
Morse[0] = "._" ;
Morse[1] = "_...";
Morse[2] = "_._." ;
Morse[3] = "_.." ;
Morse[4] = "." ; //E
Morse[5] = ".._." ;
Morse[6] = "__." ;
Morse[7] = "...." ; //H
Morse[8] = ".." ; //I
Morse[9] = ".___" ; //J
Morse[10] = "_._" ; //K
Morse[11] = "._.." ;
Morse[12] = "__" ; //M
Morse[13] = "_." ;
Morse[14] = "___" ; //O
Morse[15] = ".__." ; //P
Morse[16] = "__._" ;
Morse[17] = "._." ; //R
Morse[18] = "..." ;
Morse[19] = "_" ;
Morse[20] = ".._" ;
Morse[21] = "..._" ; //V
Morse[22] = ".__" ;
Morse[23] = "_.._" ;
Morse[24] = "_.__" ;
Morse[25] = "__.." ; //Z
}
int solution(char *s,int strt,char **Morse,int len){
int i,j,noMatch=0,k,prev,tem;
int mlen;
if(strt!=len)
for(i=0;i<26;i++){
mlen=strlen(Morse[i]);
if(strt+mlen<=len){
for(j=strt,k=0;j<strt+mlen&&k<mlen;j++,k++){
if(Morse[i][k]==s[j])
continue;
else {
noMatch=1;
break;
}
}
}
else{
continue;
}
if(noMatch==0){
//print pattern when complete string matched
if(strt+mlen==len){
buffer[l]=i+65;
printf("%s\n",buffer);
buffer[l]=0;
}
else{
noMatch=0;
buffer[l]=i+65;
l++;
solution(s,strt+mlen,Morse,len);
l--; // while backtracking
buffer[l]=0; // clearing buffer just upto the previous location
}
}
else{
noMatch=0;
}
}
else{
buffer[l]=0;
}
return 1;
}
int main() {
char s[100];
printf("Enter the input string of Morse code:\n");
scanf("%s",s);
initMorse();
printf("Possible translations are:\n");
solution(s,0,Morse,strlen(s));
for
return 0;
}

replace trademark symbol (™) when alone

I'm trying to remove trademark symbol (™) but only in the case it's not followed by any other symbol for instance I might have ’ which is a bad encoding of quotation mark (') so I don't want to remove trademark symbol (™) and hence broking the pattern that i'm using to replace xx™ with quotation mark.
dict = {};
chars = {
'\xe2\x84\xa2': '', # ™
'\xe2\x80\x99': "'", # ’
}
def stats_change(char, number):
if dict.has_key(char):
dict[char] = dict[char]+number
else:
dict[char] = number # Add new entry
def replace_chars(match):
char = match.group(0)
stats_change(char,1)
return chars[char]
i, nmatches = re.subn("(\\" + '|\\'.join(chars.keys()) + ")", replace_chars, i)
count_matches += nmatches
Input: foo™ oof
Output: foo oof
Input: o’f oof
Output: o'f oof
Any suggestions ?

Traceback for regular expression

Lets say i have a regular expression:
match = re.search(pattern, content)
if not match:
raise Exception, 'regex traceback' # i want to throw here the regex matching process.
If regular expression fails to match then i want to throw in exception Its working and where it fails to match the regular expression pattern, at what stage etc. Is it possible even to achieve the desired functionality?
I have something that helps me to debug complex regex patterns among my codes.
Does this help you ? :
import re
li = ('ksjdhfqsd\n'
'5 12478 abdefgcd ocean__12 ty--\t\t ghtr789\n'
'qfgqrgqrg',
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n',
'2 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877',
'9 54879 bbdecddf antarctic__13 18:13pomodoro\t\t ghtr6798',
'ksjdhfqsd\n'
'5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\n'
'qfgqrgqrg',
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n',
'25 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877',
'9 54879 bbdeYddf antarctic__13 18:13pomodoro\t\t ghtr6798')
tupleRE = ('^\d',
' ',
'\d{5}',
' ',
'[abcdefghi]+',
' ',
'(?=[a-z\d_ ]{14} [^ ]+\t\t ght)',
'[a-z]+',
'__',
'[\d]+',
' +',
'[^\t]+',
'\t\t',
' ',
'ght',
'(r[5-9]+|u[0-4]+)',
'$')
def REtest(ch, tuplRE, flags = re.MULTILINE):
for n in xrange(len(tupleRE)):
regx = re.compile(''.join(tupleRE[:n+1]), flags)
testmatch = regx.search(ch)
if not testmatch:
print '\n -*- tupleRE :\n'
print '\n'.join(str(i).zfill(2)+' '+repr(u)
for i,u in enumerate(tupleRE[:n]))
print ' --------------------------------'
# tupleRE doesn't works because of element n
print str(n).zfill(2)+' '+repr(tupleRE[n])\
+" doesn't match anymore from this ligne "\
+str(n)+' of tupleRE'
print '\n'.join(str(n+1+j).zfill(2)+' '+repr(u)
for j,u in enumerate(tupleRE[n+1:
min(n+2,len(tupleRE))]))
for i in xrange(n):
match = re.search(''.join(tupleRE[:n-i]),ch, flags)
if match:
break
matching_portion = match.group()
matching_li = '\n'.join(map(repr,
matching_portion.splitlines(True)[-5:]))
fin_matching_portion = match.end()
print ('\n\n -*- Part of the tested string which is concerned :\n\n'
'######### matching_portion ########\n'+matching_li + '\n'
'##### end of matching_portion #####\n'
'-----------------------------------\n'
'######## unmatching_portion #######')
print '\n'.join(map(repr,
ch[fin_matching_portion:
fin_matching_portion+300].splitlines(True)) )
break
else:
print '\n SUCCES . The regex integrally matches.'
for x in li:
print ' -*- Analyzed string :\n%r' % x
REtest(x,tupleRE)
print '\nmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm'
result
-*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__12 ty--\t\t ghtr789\nqfgqrgqrg'
SUCCESS . The regex integrally matches.
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12340\n'
SUCCESS . The regex integrally matches.
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'2 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877'
SUCCESS . The regex integrally matches.
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'9 54879 bbdecddf antarctic__13 18:13pomodoro\t\t ghtr6798'
SUCCESS . The regex integrally matches.
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'ksjdhfqsd\n5 12478 abdefgcd ocean__1247101247887 ty--\t\t ghtr789\nqfgqrgqrg'
-*- tupleRE :
00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
--------------------------------
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)' doesn't match anymore from this ligne 6 of tupleRE
07 '[a-z]+'
-*- Part of the tested string which is concerned :
######### matching_portion ########
'5 12478 abdefgcd '
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'ocean__1247101247887 ty--\t\t ghtr789\n'
'qfgqrgqrg'
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12940\n'
-*- tupleRE :
00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
05 ' '
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'
07 '[a-z]+'
08 '__'
09 '[\\d]+'
10 ' +'
11 '[^\t]+'
12 '\t\t'
13 ' '
14 'ght'
15 '(r[5-9]+|u[0-4]+)'
--------------------------------
16 '$' doesn't match anymore from this ligne 16 of tupleRE
-*- Part of the tested string which is concerned :
######### matching_portion ########
'6 48788 bcfgdebc atlantic__7899 %fg#\t\t ghtu12'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'940\n'
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'25 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877'
-*- tupleRE :
00 '^\\d'
--------------------------------
01 ' ' doesn't match anymore from this ligne 1 of tupleRE
02 '\\d{5}'
-*- Part of the tested string which is concerned :
######### matching_portion ########
'2'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'5 47890 bbcedefg arctic__124 **juyf\t\t ghtr89877'
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
-*- Analyzed string :
'9 54879 bbdeYddf antarctic__13 18:13pomodoro\t\t ghtr6798'
-*- tupleRE :
00 '^\\d'
01 ' '
02 '\\d{5}'
03 ' '
04 '[abcdefghi]+'
--------------------------------
05 ' ' doesn't match anymore from this ligne 5 of tupleRE
06 '(?=[a-z\\d_ ]{14} [^ ]+\t\t ght)'
-*- Part of the tested string which is concerned :
######### matching_portion ########
'9 54879 bbde'
##### end of matching_portion #####
-----------------------------------
######## unmatching_portion #######
'Yddf antarctic__13 18:13pomodoro\t\t ghtr6798'
mwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwmwm
I've used Kodos (http://kodos.sourceforge.net/about.html) in the past to perform RegEx debugging. It's not the ideal solution since you want something for run-time, but it may be helpful to you.
if you need to test the re, you can probably use groups followed by * ... as in ( sometext)*
use this along w/ your desired regex, and then you should be able to pluck out your failure locations
and then leverage the following, as stated on python.org
pos
The value of pos which was passed to the search() or match() method of the RegexObject. This is the index into the string at which the RE engine started looking for a match.
endpos
The value of endpos which was passed to the search() or match() method of the > RegexObject. This is the index into the string beyond which the RE engine will not go.
lastindex
The integer index of the last matched capturing group, or None if no group was matched at all. For example, the expressions (a)b, ((a)(b)), and ((ab)) will have lastindex == 1 if applied to the string 'ab', while the expression (a)(b) will have lastindex == 2, if applied to the same string.
lastgroup
The name of the last matched capturing group, or None if the group didn’t have a name, or if no group was matched at all.
re
The regular expression object whose match() or search() method produced this MatchObject instance.
string
The string passed to match() or search().
so for a very simple example
>>> m1 = re.compile(r'the real thing')
>>> m2 = re.compile(r'(the)* (real)* (thing)*')
>>> if not m1.search(mytextvar):
>>> res = m2.search(mytextvar)
>>> print res.lastgroup
>>> #raise my exception

Categories