I need to find a given pattern in a text file and print the matching patterns. The text file is a string of digits and the pattern can be any string of digits or placeholders represented by 'X'.
I figured the way to approach this problem would be by loading the sequence into a variable, then creating a list of testable subsequences, and then testing each subsequence. This is my first function in python so I'm confused as to how to create the list of test sequences easily and then test it.
def find(pattern): #finds a pattern in the given input file
with open('sequence.txt', 'r') as myfile:
string = myfile.read()
print('Test data is:', string)
testableStrings = []
#how to create a list of testable sequences?
for x in testableStrings:
if x == pattern:
print(x)
return
For example, searching for "X10X" in "11012102" should print "1101" and "2102".
Let pattern = "X10X", string = "11012102", n = len(pattern) - just for followed illustration:
Without using regular expressions, your algorithm may be as follows:
Construct a list of all subsequences of string with length of n:
In[2]: parts = [string[i:i+n] for i in range(len(string) - n + 1)]
In[3]: parts
Out[3]: ['1101', '1012', '0121', '1210', '2102']
Compare pattern with each element in parts:
for part in parts:
The comparison of pattern with part (both have now equal lengths) will be symbol with symbol in corresponding positions:
for ch1, ch2 in zip(pattern, part):
If ch1 is the X symbol or ch1 == ch2, the comparison of corresponding symbols will continue, else we will break it:
if ch1 == "X" or ch1 == ch2:
continue
else:
break
Finally, if all symbol with symbol comparisons were successful, i. e. all pairs of corresponding symbols were exhausted, the else branch of the for statement will be executed (yes, for statements may have an else branch for that case).
Now you may perform any actions with that matched part, e. g. print it or append it to some list:
else:
print(part)
So all in one place:
pattern = "X10X"
string = "11012102"
n = len(pattern)
parts = [string[i:i+n] for i in range(len(string) - n + 1)]
for part in parts:
for ch1, ch2 in zip(pattern, part):
if ch1 == "X" or ch1 == ch2:
continue
else:
break
else:
print(part)
The output:
1101
2102
You probably wanted to create the list of testable sequences from the individual rows of the input file. So instead of
with open('sequence.txt', 'r') as myfile:
string = myfile.read()
use
with open('sequence.txt') as myfile: # 'r' is default
testableStrings = [row.strip() for row in myfile]
The strip() method removes whitespace characters from the start and end of rows, including \n symbols at the end of lines.
Example of the sequence.txt file:
123456789
87654321
111122223333
The output of the print(testableStrings) command:
['123456789', '87654321', '111122223333']
Related
I have a CSV file with the following data:
bel.lez.za;bellézza
e.la.bo.ra.re;elaboràre
a.li.an.te;alïante
u.mi.do;ùmido
the first value is the word divided in syllables and the second is for the stress.
I'd like to merge the the two info and obtain the following output:
bel.léz.za
e.la.bo.rà.re
a.lï.an.te
ù.mi.do
I computed the position of the stressed vowel and tried to substitute the same unstressed vowel in the first value, but full stops make indexing difficult. Is there a way to tell python to ignore full stops while counting? or is there an easier way to perform it? Thx
After splitting the two values for each line I computed the position of the stressed vowels:
char_list=['ò','à','ù','ì','è','é','ï']
for character in char_list:
if character in value[1]:
position_of_stressed_vowel=value[1].index(character)
I'd suggest merging/aligning the two forms in parallel instead of trying to substitute things via indexing. The idea is to iterate through the plain form and take out one character from the accented form for every character from the plain form, keeping dots as they are.
(Or perhaps, the idea is to add the dots to the accented form instead of adding the accented characters to the syllabified form.)
def merge_accents(plain, accented):
output = ""
acc_chars = iter(accented)
for char in plain:
if char == ".":
output += char
else:
output += next(acc_chars)
return output
Test:
data = [['bel.lez.za', 'bellézza'],
['e.la.bo.ra.re', 'elaboràre'],
['a.li.an.te', 'alïante'],
['u.mi.do', 'ùmido']]
# Returns
# bel.léz.za
# e.la.bo.rà.re
# a.lï.an.te
# ù.mi.do
for plain, accented in data:
print(merge_accents(plain, accented))
Is there a way to tell python to ignore full stops while counting?
Yes, by implementing it yourself using an index lookup that tells you which index in the space-delimited string an index in the word is equivalent to:
i = 0
corrected_index = []
for char in value[0]:
if char != ".":
corrected_index.append(i)
i+=1
now, you can correct the index and replace the character:
value[0][corrected_index[position_of_stressed_vowel]] = character
Make sure to use UTF-16 as encoding for your "stressed vowel" characters to have a single index.
You can loop over the two halfs of the string, keep track of the index in the first half, excluding the dots and add the character at the tracked index from the second half of the string to a buffer (modified) string. Like the code below:
data = ['bel.lez.za;bellézza',
'e.la.bo.ra.re;elaboràre',
'a.li.an.te;alïante',
'u.mi.do;ùmido']
converted_data = []
# Loop over the data.
for pair in data:
# Split the on ";"
first_half, second_half = pair.split(';')
# Create variables to keep track of the current letter and the modified string.
current_letter = 0
modified_second_half = ''
# Loop over the letter of the first half of the string.
for current_char in first_half:
# If the current_char is a dot add it to the modified string.
if current_char == '.':
modified_second_half += '.'
# If the current_char is not a dot add the current letter from the second half to the modified string,
# and update the current letter value.
else:
modified_second_half += second_half[current_letter]
current_letter += 1
converted_data.append(modified_second_half)
print(converted_data)
data = ['bel.lez.za;bellézza',
'e.la.bo.ra.re;elaboràre',
'a.li.an.te;alïante',
'u.mi.do;ùmido']
def slice_same(input, lens):
# slices the given string into the given lengths.
res = []
strt = 0
for size in lens:
res.append(input[strt : strt + size])
strt += size
return res
# split into two.
data = [x.split(';') for x in data]
# Add third column that's the length of each piece.
data = [[x, y, [len(z) for z in x.split('.')]] for x, y in data]
# Put text and lens through function.
data = ['.'.join(slice_same(y, z)) for x, y, z in data]
print(data)
Output:
['bel.léz.za',
'e.la.bo.rà.re',
'a.lï.an.te',
'ù.mi.do']
When the name is given, for example Aberdeen Scotland.
I need to get the result of Adbnearldteoecns.
Leaving the first word plain, but reverse the last word and put in between the first word.
I have done so far:
coordinatesf = "Aberdeen Scotland"
for line in coordinatesf:
separate = line.split()
for i in separate [0:-1]:
lastw = separate[1][::-1]
print(i)
A bit dirty but it works:
coordinatesf = "Aberdeen Scotland"
new_word=[]
#split the two words
words = coordinatesf.split(" ")
#reverse the second and put to lowercase
words[1]=words[1][::-1].lower()
#populate the new string
for index in range(0,len(words[0])):
new_word.insert(2*index,words[0][index])
for index in range(0,len(words[1])):
new_word.insert(2*index+1,words[1][index])
outstring = ''.join(new_word)
print outstring
Note that what you want to do is only well-defined if the the input string is composed of two words with the same lengths.
I use assertions to make sure that is true but you can leave them out.
def scramble(s):
words = s.split(" ")
assert len(words) == 2
assert len(words[0]) == len(words[1])
scrambledLetters = zip(words[0], reversed(words[1]))
return "".join(x[0] + x[1] for x in scrambledLetters)
>>> print(scramble("Aberdeen Scotland"))
>>> AdbnearldteoecnS
You could replace the x[0] + x[1] part with sum() but I think that makes it less readable.
This splits the input, zips the first word with the reversed second word, joins the pairs, then joins the list of pairs.
coordinatesf = "Aberdeen Scotland"
a,b = coordinatesf.split()
print(''.join(map(''.join, zip(a,b[::-1]))))
Sorry in advance for such a long post
EDIT--
Modified from Norman's Solution to print and return if we find an exact solution, otherwise print all approximate matches. It's currently still only getting 83/85 matches for a specific example of searching for etnse on the dictionary file provided below on the third pastebin link.
def doMatching(file, origPattern):
entireFile = file.read()
patterns = []
startIndices = []
begin = time.time()
# get all of the patterns associated with the given phrase
for pattern in generateFuzzyPatterns(origPattern):
patterns.append(pattern)
for m in re.finditer(pattern, entireFile):
startIndices.append((m.start(), m.end(), m.group()))
# if the first pattern(exact match) is valid, then just print the results and we're done
if len(startIndices) != 0 and startIndices[0][2] == origPattern:
print("\nThere is an exact match at: [{}:{}] for {}").format(*startIndices[0])
return
print('Used {} patterns:').format(len(patterns))
for i, p in enumerate(patterns, 1):
print('- [{}] {}').format(i, p)
# list for all non-overlapping starting indices
nonOverlapping = []
# hold the last matches ending position
lastEnd = 0
# find non-overlapping matches by comparing each matches starting index to the previous matches ending index
# if the starting index > previous items ending index they aren't overlapping
for start in sorted(startIndices):
print(start)
if start[0] >= lastEnd:
# startIndicex[start][0] gets the ending index from the current matches tuple
lastEnd = start[1]
nonOverlapping.append(start)
print()
print('Found {} matches:').format(len(startIndices))
# i is the key <starting index> assigned to the value of the indices (<ending index>, <string at those indices>
for start in sorted(startIndices):
# *startIndices[i] means to unpack the tuple associated to the key i's value to be used by format as 2 inputs
# for explanation, see: http://stackoverflow.com/questions/2921847/what-does-the-star-operator-mean-in-python
print('- [{}:{}] {}').format(*start)
print()
print('Found {} non-overlapping matches:').format(len(nonOverlapping))
for ov in nonOverlapping:
print('- [{}:{}] {}').format(*ov)
end = time.time()
print(end-begin)
def generateFuzzyPatterns(origPattern):
# Escape individual symbols.
origPattern = [re.escape(c) for c in origPattern]
# Find exact matches.
pattern = ''.join(origPattern)
yield pattern
# Find matches with changes. (replace)
for i in range(len(origPattern)):
t = origPattern[:]
# replace with a wildcard for each index
t[i] = '.'
pattern = ''.join(t)
yield pattern
# Find matches with deletions. (omitted)
for i in range(len(origPattern)):
t = origPattern[:]
# remove a char for each index
t[i] = ''
pattern = ''.join(t)
yield pattern
# Find matches with insertions.
for i in range(len(origPattern) + 1):
t = origPattern[:]
# insert a wildcard between adjacent chars for each index
t.insert(i, '.')
pattern = ''.join(t)
yield pattern
# Find two adjacent characters being swapped.
for i in range(len(origPattern) - 1):
t = origPattern[:]
if t[i] != t[i + 1]:
t[i], t[i + 1] = t[i + 1], t[i]
pattern = ''.join(t)
yield pattern
ORIGINAL:
http://pastebin.com/bAXeYZcD - the actual function
http://pastebin.com/YSfD00Ju - data to use, should be 8 matches for 'ware' but only gets 6
http://pastebin.com/S9u50ig0 - data to use, should get 85 matches for 'etnse' but only gets 77
I left all of the original code in the function because I'm not sure exactly what is causing the problem.
you can search for 'Board:isFull()' on anything to get the error stated below.
examples:
assume you named the second pastebin 'someFile.txt' in a folder named files in the same directory as the .py file.
file = open('./files/someFile.txt', 'r')
doMatching(file, "ware")
OR
file = open('./files/someFile.txt', 'r')
doMatching(file, "Board:isFull()")
OR
assume you named the third pastebin 'dictionary.txt' in a folder named files in the same directory as the .py file.
file = open('./files/dictionary.txt', 'r')
doMatching(file, "etnse")
--EDIT
The functions parameters work like so:
file is the location of a file.
origPattern is a phrase.
The function is basically supposed to be a fuzzy search. It's supposed to take the pattern and search through a file to find matches that are either exact, or with a 1 character deviation. i.e.: 1 missing character, 1 extra character, 1 replaced character, or 1 character swapped with an adjacent character.
For the most part it works, But i'm running into a few problems.
First, when I try to use something like 'Board:isFull()' for origPattern I get the following:
raise error, v # invalid expression
sre_constants.error: unbalanced parenthesis
the above is from the re library
I've tried using re.escape() but it doesn't change anything.
Second, when I try some other things like 'Fun()' it says it has a match at some index that doesn't even contain any of that; it's just a line of '*'
Third, When it does find matches it doesn't always find all of the matches. For example, there's one file I have that should find 85 matches, but it only comes up with like 77, and another with 8 but it only comes up with 6. However, they are just alphabetical so it's likely only a problem with how I do searching or something.
Any help is appreciated.
I also can't use fuzzyfinder
I found some issues in the code:
re.escape() seems to not work because its result is not assigned.
Do origPattern = re.escape(origPattern).
When pattern is correctly escaped, be mindful of not breaking the escaping when manipulating the pattern.
Example: re.escape('Fun()') yields the string Fun\(\). The two \( substrings in it must never be separated: never remove, replace, or swap a \ without the char it escapes.
Bad manipulations: Fun(\) (removal), Fu\n(\) (swap), Fun\.{0,2}\).
Good manipulations: Fun\) (removal), Fu\(n\) (swap), Fun.{0,2}\).
You find too few matches because you only try to find fuzzy matches if there are no exact matches. (See line if indices.__len__() != 0:.) You must always look for them.
The loops inserting '.{0,2}' produce one too many pattern, e.g. 'ware.{0,2}' for ware. Unless you intend that, this pattern will find wareXY which has two insertions.
The patterns with .{0,2} don't work as described; they allow one change and one insertion.
I'm not sure about the code involving difflib.Differ. I don't understand it, but I suspect there should be no break statements.
Even though you use a set to store indices, matches from different regexes may still overlap.
You don't use word boundaries (\b) in your regexes, though for natural language that would make sense.
Not a bug, but: Why do you call magic methods explicitly?
(E.g. indices.__len__() != 0 instead of len(indices) != 0.)
I rewrote your code a bit to address any issues I saw:
def doMatching(file, origPattern):
entireFile = file.read()
patterns = []
startIndices = {}
for pattern in generateFuzzyPatterns(origPattern):
patterns.append(pattern)
startIndices.update((m.start(), (m.end(), m.group())) for m in re.finditer(pattern, entireFile))
print('Used {} patterns:'.format(len(patterns)))
for i, p in enumerate(patterns, 1):
print('- [{}] {}'.format(i, p))
nonOverlapping = []
lastEnd = 0
for start in sorted(startIndices):
if start >= lastEnd:
lastEnd = startIndices[start][0]
nonOverlapping.append(start)
print()
print('Found {} matches:'.format(len(startIndices)))
for i in sorted(startIndices):
print('- [{}:{}] {}'.format(i, *startIndices[i]))
print()
print('Found {} non-overlapping matches:'.format(len(nonOverlapping)))
for i in nonOverlapping:
print('- [{}:{}] {}'.format(i, *startIndices[i]))
def generateFuzzyPatterns(origPattern):
# Escape individual symbols.
origPattern = [re.escape(c) for c in origPattern]
# Find exact matches.
pattern = ''.join(origPattern)
yield pattern
# Find matches with changes.
for i in range(len(origPattern)):
t = origPattern[:]
t[i] = '.'
pattern = ''.join(t)
yield pattern
# Find matches with deletions.
for i in range(len(origPattern)):
t = origPattern[:]
t[i] = ''
pattern = ''.join(t)
yield pattern
# Find matches with insertions.
for i in range(len(origPattern) + 1):
t = origPattern[:]
t.insert(i, '.')
pattern = ''.join(t)
yield pattern
# Find two adjacent characters being swapped.
for i in range(len(origPattern) - 1):
t = origPattern[:]
if t[i] != t[i + 1]:
t[i], t[i + 1] = t[i + 1], t[i]
pattern = ''.join(t)
yield pattern
I am trying to solve the Longest Common Subsequence in Python. I've completed it and it's working fine although I've submitted it and it says it's 50% partially completed. I'm not sure what I'm missing here, any help is appreciated.
CHALLENGE DESCRIPTION:
You are given two sequences. Write a program to determine the longest common subsequence between the two strings (each string can have a maximum length of 50 characters). NOTE: This subsequence need not be contiguous. The input file may contain empty lines, these need to be ignored.
INPUT SAMPLE:
The first argument will be a path to a filename that contains two strings per line, semicolon delimited. You can assume that there is only one unique subsequence per test case. E.g.:
XMJYAUZ;MZJAWXU
OUTPUT SAMPLE:
The longest common subsequence. Ensure that there are no trailing empty spaces on each line you print. E.g.:
MJAU
My code is
# LONGEST COMMON SUBSEQUENCE
import argparse
def get_longest_common_subsequence(strings):
# here we will store the subsequence list
subsequences_list = list()
# split the strings in 2 different variables and limit them to 50 characters
first = strings[0]
second = strings[1]
startpos = 0
# we need to start from each index in the first string so we can find the longest subsequence
# therefore we do a loop with the length of the first string, incrementing the start every time
for start in range(len(first)):
# here we will store the current subsequence
subsequence = ''
# store the index of the found character
idx = -1
# loop through all the characters in the first string, starting at the 'start' position
for i in first[start:50]:
# search for the current character in the second string
pos = second[0:50].find(i)
# if the character was found and is in the correct sequence add it to the subsequence and update the index
if pos > idx:
subsequence += i
idx = pos
# if we have a subsequence, add it to the subsequences list
if len(subsequence) > 0:
subsequences_list.append(subsequence)
# increment the start
startpos += 1
# sort the list of subsequences with the longest at the top
subsequences_list.sort(key=len, reverse=True)
# return the longest subsequence
return subsequences_list[0]
def main():
parser = argparse.ArgumentParser()
parser.add_argument('filename')
args = parser.parse_args()
# read file as the first argument
with open(args.filename) as f:
# loop through each line
for line in f:
# if the line is empty it means it's not valid. otherwise print the common subsequence
if line.strip() not in ['\n', '\r\n', '']:
strings = line.replace('\n', '').split(';')
if len(strings[0]) > 50 or len(strings[1]) > 50:
break
print get_longest_common_subsequence(strings)
return 0
if __name__ == '__main__':
main()
The following solution prints unordered/unsorted longest common subsequences/substrings from semi-colon-separated string pairs. If a string from the pair is longer than 50 characters, then the pair is skipped (its not difficult to trim it to length 50 if that is desired).
Note: if sorting/ordering is desired it can be implemented (either alphabetic order, or sort by the order of the first string or sort by the order of the second string.
with open('filename.txt') as f:
for line in f:
line = line.strip()
if line and ';' in line and len(line) <= 101:
a, b = line.split(';')
a = set(a.strip())
b = set(b.strip())
common = a & b # intersection
if common:
print ''.join(common)
Also note: If the substrings have internal common whitespace (ie ABC DE; ZM YCA) then it will be part of the output because it will not be stripped. If that is not desired then you can replace the line a = set(a.strip()) with a = {char for char in a if char.strip()} and likewise for b.
def lcs_recursive(xlist,ylist):
if not xlist or not ylist:
return []
x,xs,y,ys, = xlist[0],xlist[1:],ylist[0],ylist[1:]
if x == y:
return [x] + lcs_recursive(xs,ys)
else:
return max(lcs_recursive(xlist,ys),lcs_recursive(xs,ylist),key=len)
s1 = 'XMJYAUZ'
s2 = 'MZJAWXU'
print (lcs_recursive(s1,s2))
This will give the correct answer MJAU and X & Z are not part of the answer because they are sequential (Note:- Subsequent)
This question already has answers here:
how to find words that made up of letter exactly facing each other? (python) [closed]
(4 answers)
Closed 9 years ago.
I have to write a function which takes one arguments text containing a block of text in the form of a str, and returns a sorted list of “symmetric” words. A symmetric word is defined as a word where for all values i, the letter i positions from the start of the word and the letter i positions from the end of the word are equi-distant from the respective ends of the alphabet. For example, bevy is a symmetric word as: b (1 position from the start of the word) is the second letter of the alphabet and y (1 position from the end of the word) is the second-last letter of the alphabet; and e (2 positions from the start of the word) is the fifth letter of the alphabet and v (2 positions from the end of the word) is the fifth-last letter of the alphabet.
For example:
>>> symmetrics("boy bread aloz bray")
['aloz','boy']
>>> symmetrics("There is a car and a book;")
['a']
All I can think about the solution is this but I can't run it since it's wrong:
def symmetrics(text):
func_char= ",.?!:'\/"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
sym = []
for word in text.lower().split():
n = range(0,len(word))
if word[n] == word[len(word)-1-n]:
sym.append(word)
return sym
The code above doesn't take into account the position of alpha1 and alpha2 as I don't know how to put it. Is there anyone can help me?
Here is a hint:
In [16]: alpha1.index('b')
Out[16]: 1
In [17]: alpha2.index('y')
Out[17]: 1
An alternative way to approach the problem is by using the str.translate() method:
import string
def is_sym(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
tr = string.maketrans(alpha1, alpha2)
n = len(word) // 2
return word[:n] == word[::-1][:n].translate(tr)
print(is_sym('aloz'))
print(is_sym('boy'))
print(is_sym('bread'))
(The building of the translation table can be easily factored out.)
The for loop could be modified as:
for word in text.lower().split():
for n in range(0,len(word)//2):
if alpha1.index(word[n]) != alpha2.index(word[len(word)-1-n]):
break
else:
sym.append(word)
return sym
According to your symmetric rule, we may verify a symmetric word with the following is_symmetric_word function:
def is_symmetric_word(word):
alpha1 = 'abcdefghijklmnopqrstuvwxyz'
alpha2 = 'zyxwvutsrqponmlkjihgfedcba'
length = len(word)
for i in range(length / 2):
if alpha1.index(word[i]) != alpha2.index(word[length - 1 - i]):
return False
return True
And then the whole function to get all unique symmetric words out of a text can be defined as:
def is_symmetrics(text):
func_char= ",.?!:'\/;"
for letter in text:
if letter in func_char:
text = text.replace(letter, ' ')
sym = []
for word in text.lower().split():
if is_symmetric_word(word) and not (word in sym):
sym.append(word)
return sym
The following are two test cases from you:
is_symmetrics("boy bread aloz bray") #['boy', 'aloz']
is_symmetrics("There is a car and a book;") #['a']
Code first. Discussion below the code.
import string
# get alphabet and reversed alphabet
try:
# Python 2.x
alpha1 = string.lowercase
except AttributeError:
# Python 3.x and newer
alpha1 = string.ascii_lowercase
alpha2 = alpha1[::-1] # use slicing to reverse alpha1
# make a dictionary where the key, value pairs are symmetric
# for example symd['a'] == 'z', symd['b'] == 'y', and so on
_symd = dict(zip(alpha1, alpha2))
def is_symmetric_word(word):
if not word:
return False # zero-length word is not symmetric
i1 = 0
i2 = len(word) - 1
while True:
if i1 >= i2:
return True # we have checked the whole string
# get a pair of chars
c1 = word[i1]
c2 = word[i2]
if _symd[c1] != c2:
return False # the pair wasn't symmetric
i1 += 1
i2 -= 1
# note, added a space to list of chars to filter to a space
_filter_to_space = ",.?!:'\/ "
def _filter_ch(ch):
if ch in _filter_to_space:
return ' ' # return a space
elif ch in alpha1:
return ch # it's an alphabet letter so return it
else:
# It's something we don't want. Return empty string.
return ''
def clean(text):
return ''.join(_filter_ch(ch) for ch in text.lower())
def symmetrics(text):
# filter text: keep only chars in the alphabet or spaces
for word in clean(text).split():
if is_symmetric_word(word):
# use of yield makes this a generator.
yield word
lst = list(symmetrics("The boy...is a yob."))
print(lst) # prints: ['boy', 'a', 'yob']
No need to type the alphabet twice; we can reverse the first one.
We can make a dictionary that pairs each letter with its symmetric letter. This will make it very easy to test whether any given pair of letters is a symmetric pair. The function zip() makes pairs from two sequences; they need to be the same length, but since we are using a string and a reversed copy of the string, they will be the same length.
It's best to write a simple function that does one thing, so we write a function that does nothing but check if a string is symmetric. If you give it a zero-length string it returns False, otherwise it sets i1 to the first character in the string and i2 to the last. It compares characters as long as they continue to be symmetric, and increments i1 while decrementing i2. If the two meet or pass each other, we know we have seen the whole string and it must be symmetric, in which case we return True; if it ever finds any pair of characters that are not symmetric, it returns False. We have to do the check for whether i1 and i2 have met or passed at the top of the loop, so it won't try to check if a character is its own symmetric character. (A character can't be both 'a' and 'z' at the same time, so a character is never its own symmetric character!)
Now we write a wrapper that filters out the junk, splits the string into words, and tests each word. Not only does it convert the chosen punctuation characters to spaces, but it also strips out any unexpected characters (anything not an approved punctuation char, a space, or a letter). That way we know nothing unexpected will get through to the inner function. The wrapper is "lazy"... it is a generator that yields up one word at a time, instead of building the whole list and returning that. It's easy to use list() to force the generator's results into a list. If you want, you can easily modify this function to just build a list and return it.
If you have any questions about this, just ask.
EDIT: The original version of the code didn't do the right thing with the punctuation characters; this version does. Also, as #heltonbiker suggested, why type the alphabet when Python has a copy of it you can use? So I made that change too.
EDIT: #heltonbiker's change introduced a dependency on Python version! I left it in with a suitable try:/except block to handle the problem. It appears that Python 3.x has improved the name of the lowercase ASCII alphabet to string.ascii_lowercase instead of plain string.lowercase.