I have an assignment that requires me to use regular expressions in python to find alliterative expressions in a file that consists of a list of names. Here are the specific instructions:
" Open a file and return all of the alliterative names in the file.
For our purposes a "name" is a two sequences of letters separated by
a space, with capital letters only in the leading positions.
We call a name alliterative if the first and last names begin
with the same letter, with the exception that s and sh are considered
distinct, and likewise for c/ch and t/th.The names file will contain a list of strings separated by commas.Suggestion: Do this in two stages." This is my attempt so far:
def check(regex, string, flags=0):
return not (re.match("(?:" + regex + r")\Z", string, flags=flags)) is None
def alliterative(names_file):
f = open(names_file)
string = f.read()
lst = string.split(',')
lst2 = []
for i in lst:
x=lst[i]
if re.search(r'[A-Z][a-z]* [A-Z][a-z]*', x):
k=x.split(' ')
if check('{}'.format(k[0][0]), k[1]):
if not check('[cst]', k[0][0]):
lst2.append(x)
elif len(k[0])==1:
if len(k[1])==1:
lst2.append(x)
elif not check('h',k[1][1]):
lst2.append(x)
elif len(k[1])==1:
if not check('h',k[0][1]):
lst2.append(x)
return lst2
There are two issues that I have: first, what I coded seems to make sense to me, the general idea behind it is that I first check that the names are in the correct format (first name, last name, all letters only, only first letters of first and last names capitalized), then check to see if the starting letters of the first and last names match, then see if those first letters are not c s or t, if they aren't we add the name to the new list, if they are, we check to see that we aren't accidentally matching a [cst] with an [cst]h. The code compiles but when I tried to run it on this list of names:
Umesh Vazirani, Vijay Vazirani, Barbara Liskov, Leslie Lamport, Scott Shenker, R2D2 Rover, Shaq, Sam Spade, Thomas Thing
it returns an empty list instead of ["Vijay Vazirani", "Leslie Lamport", "Sam Spade", "Thomas Thing"] which it is supposed to return. I added print statements to alliterative so see where things were going wrong and it seems that the line
if check('{}'.format(k[0][0]), k[1]):
is an issue.
More than the issues with my program though, I feel like I am missing the point of regular expressions: am I overcomplicating this? Is there a nicer way to do this with regular expressions?
Please consider improving your question.
Especially the question is only useful for those who want to answer to the exactly the same question, which I think is almost no chance.
Please think how to improve so that it can be generallized to the point where this QA can be helpful to others.
I think your direction is about right.
It's a good idea to check the input rightness using regular
expression. r'[A-Z][a-z]* [A-Z][a-z]*' is a good expression.
You can group the output by parentheses. So that you can easily get first and last name later on
Keep in mind the difference between re.match and re.search. re.search(r'[A-Z][a-z]* [A-Z][a-z]*', 'aaRob Smith') returns a MatchObject. See this.
Also comment on general programming style
Better to name variables first and last for readability, rather than k[0] and k[1] (and how is the letter k picked!?)
Here's one way to do:
import re
FULL_NAME_RE = re.compile(r'^([A-Z][a-z]*) ([A-Z][a-z]*)$')
def is_alliterative(name):
"""Returns True if it matches the alliterative requirement otherwise False"""
# If not matches the name requirement, reject
match = FULL_NAME_RE.match(name)
if not match:
return False
first, last = match.group(1, 2)
first, last = first.lower(), last.lower() # easy to assume all lower-cases
if first[0] != last[0]:
return False
if first[0] in 'cst': # Check sh/ch/th
# Do special check
return _is_cst_h(first) == _is_cst_h(last)
# All check passed!
return True
def _is_cst_h(text):
"""Returns true if text is one of 'ch', 'sh', or 'th'."""
# Bad (?) assumption that the first letter is c, s, or t
return text[1:].startswith('h')
names = [
'Umesh Vazirani', 'Vijay Vazirani' , 'Barbara Liskov',
'Leslie Lamport', 'Scott Shenker', 'R2D2 Rover', 'Shaq' , 'Sam Spade', 'Thomas Thing'
]
print [name for name in names if is_alliterative(name)]
# Ans
print ['Vijay Vazirani', 'Leslie Lamport', 'Sam Spade', 'Thomas Thing']
Try this regular expression:
[a[0] for a in re.findall('((?P<caps>[A-Z])[a-z]*\\s(?P=caps)[a-z]*)', names)]
Note: It does not handle the sh/ch/th special case.
Related
My question is pretty simple, but I haven't been able to find a proper solution.
Given below is my program:
given_list = ["Terms","I","want","to","remove","from","input_string"]
input_string = input("Enter String:")
if any(x in input_string for x in given_list):
#Find the detected word
#Not in bool format
a = input_string.replace(detected_word,"")
print("Some Task",a)
Here, given_list contains the terms I want to exclude from the input_string.
Now, the problem I am facing is that the any() produces a bool result and I need the word detected by the any() and replace it with a blank, so as to perform some task.
Edit: any() function is not required at all, look for useful solutions below.
Iterate over given_list and replace them:
for i in given_list:
input_string = input_string.replace(i, "")
print("Some Task", input_string)
No need to detect at all:
for w in given_list:
input_string = input_string.replace(w, "")
str.replace will not do anything if the word is not there and the substring test needed for the detection has to scan the string anyway.
The problem with finding each word and replacing it is that python will have to iterate over the whole string, repeatedly. Another problem is you will find substrings where you don't want to. For example, "to" is in the exclude list, so you'd end up changing "tomato" to "ma"
It seems to me like you seem to want to replace whole words. Parsing is a whole new subject, but let's simplify. I'm just going to assume everything is lowercase with no punctuation, although that can be improved later. Let's use input_string.split() to iterate over whole words.
We want to replace some words with nothing, so let's just iterate over the input_string, and filter out the words we don't want, using the builtin function of the same name.
exclude_list = ["terms","i","want","to","remove","from","input_string"]
input_string = "one terms two i three want to remove"
keepers = filter(lambda w: w not in exclude_list, input_string.lower().split())
output_string = ' '.join(keepers)
print (output_string)
one two three
Note that we create an iterator that allows us to go through the whole input string just once. And instead of replacing words, we just basically skip the ones we don't want by having the iterator not return them.
Since filter requires a function for the boolean check on whether to include or exclude each word, we had to define one. I used "lambda" syntax to do that. You could just replace it with
def keep(word):
return word not in exclude_list
keepers = filter(keep, input_string.split())
To answer your question about any, use an assignment expression (Python 3.8+).
if any((word := x) in input_string for x in given_list):
# match captured in variable word
im a really beginner with python and I'm trying to modify codes that I have seen in lessons.I have tried the find all uppercase letters in string.But the problem is it only gives me one uppercase letter in string even there is more than one.
def finding_upppercase_itterative(string_input):
for i in range(len(string_input)):
if string_input[i].isupper:
return string_input[i]
return "No uppercases found"
How should i modify this code to give me all uppercase letters in given string. If someone can explain me with the logic behind I would be glad.
Thank You!
Edit 1: Thank to S3DEV i have misstyped the binary search algorithm.
If you are looking for only small changes that make your code work, one way is to use a generator function, using the yield keyword:
def finding_upppercase_itterative(string_input):
for i in range(len(string_input)):
if string_input[i].isupper():
yield string_input[i]
print(list(finding_upppercase_itterative('test THINGy')))
If you just print finding_upppercase_itterative('test THINGy'), it shows a generator object, so you need to convert it to a list in order to view the results.
For more about generators, see here: https://wiki.python.org/moin/Generators
This is the fixed code written out with a lot of detail to each step. There are some other answers with more complicated/'pythonic' ways to do the same thing.
def finding_upppercase_itterative(string_input):
uppercase = []
for i in range(len(string_input)):
if string_input[i].isupper():
uppercase.append(string_input[i])
if(len(uppercase) > 0):
return "".join(uppercase)
else:
return "No uppercases found"
# Try the function
test_string = input("Enter a string to get the uppercase letters from: ")
uppercase_letters = finding_upppercase_itterative(test_string)
print(uppercase_letters)
Here's the explanation:
create a function that takes string_input as a parameter
create an empty list called uppercase
loop through every character in string_input
[in the loop] if it is an uppercase letter, add it to the uppercase list
[out of the loop] if the length of the uppercase list is more than 0
[in the if] return the list characters all joined together with nothing as the separator ("")
[in the else] otherwise, return "No uppercases found"
[out of the function] get a test_string and store it in a variable
get the uppercase_letters from test_string
print the uppercase_letters to the user
There are shorter (and more complex) ways to do this, but this is just a way that is easier for beginners to understand.
Also: you may want to fix your spelling, because it makes code harder to read and understand, and also makes it more difficult to type the name of that misspelled identifier. For example, upppercase and itterative should be uppercase and iterative.
Something simple like this would work:
s = "My Word"
s = ''.join(ch for ch in s if ch.isupper())
return(s)
Inverse idea behind other StackOverflow question: Removing capital letters from a python string
The return statement in a function will stop the function from executing. When it finds an uppercase letter, it will see the return statement and stop.
One way to do this is to append letters to list and return them at the end:
def finding_uppercase_iterative(string_input):
letters = []
for i in range(len(string_input)):
if string_input[i].isupper():
letters.append(string_input[i])
if letters:
return letters
return "No uppercases found"
If I have an list of strings:
matches = [ 'string1', 'anotherstring', 'astringystring' ]
And I have another string that I want to test:
teststring = 'thestring1'
And I want to test each string, and if any match, do something. I have:
match = 0
for matchstring in matches:
if matchstring in teststring:
match = 1
if !match:
continue
This is in a loop, so we just go around again if we don't get a match (I can reverse this logic of course and do something if it matches), but the code looks clumsy and not pythonic, if easy to follow.
I am thinking there is a better way to do this, but I don't grok python as well as I would like. Is there a better approach?
Note the "duplicate" is the opposite question (though the same answer approach is the same).
You could use any here
Code:
if any(matchstring in teststring for matchstring in matches):
print "Matched"
Notes:
any exits as soon it see's a match.
As per as the loop what is happening is for matchstring in matches here each string from the matches is iterated.
And here matchstring in teststring we are checking if the iterated string is in the defined check string.
The any will exit as soon as it see's a True[match] in the expression.
If you want to know what the first match was you can use next:
match = next((match for match in matches if match in teststring), None)
You have to pass None as the second parameter if you don't want it to raise an exception when nothing matches. It will use the value as the default, so match will be None if nothing is found.
How about you try this:
len([ x for x in b if ((a in x) or (x in a)) ]) > 0
I've updated the answer to check the substring both ways. You can pick and choose or modify as you see fit but I think the basics should be pretty clear.
EDIT: One of the main problems with the code below is due to storing regular expression objects in dictionaries, and how to access them to see if they can match another string. But I will still leave my previous question because I think there's probably an easy way to do all of this.
I would like to find a method in python which knows how to return a boolean of whether or not two strings are referring to the same thing. I know that this is difficult, if not completely absurd in programming, but I am looking into dealing with this problem using a dictionary of alternative strings that refer to the same thing.
Here are some examples, since I know this doesn't make a whole lot of sense without them.
If I give the string:
'breakingBad.Season+01 Episode..02'
Then I would like it to match the string:
'Breaking Bad S01E02'
Or 'three.BuCkets+of H2O' can match '3 buckets of water'
I know this is nearly impossible to do with regard to '3' and 'water' etc. being synonymous, but I am willing to provide these as dictionaries of relevant regular expression synonyms to the function if need be.
I have a feeling that there is a much simpler way to do this in python, as there always is, but here is what I have so far:
import re
def check_if_match(given_string, string_to_match, alternative_dictionary):
print 'matching: ', given_string, ' against: ', string_to_match
# split the string into it's parts with pretty much any special character
list_of_given_strings = re.split(' |\+|\.|;|,|\*|\n', given_string)
print 'List of words retrieved from given string: '
print list_of_given_strings
check = False
counter = 0
for i in range(len(list_of_given_strings)):
m = re.search(list_of_given_strings[i], string_to_match, re.IGNORECASE)
m_alt = None
try:
m_alt = re.search(alternative_dictionary[list_of_given_strings[i]], string_to_match, re.IGNORECASE)
except KeyError:
pass
if m or m_alt:
if counter == len(list_of_given_strings)-1: check = True
else: counter += 1
print list_of_given_strings[i], ' found to match'
else:
print list_of_given_strings[i], ' did not match'
break
return check
string1 = 'breaking Bad.Season+01 Episode..02'
other_string_to_check = 'Breaking.Bad.S01+E01'
# make a dictionary of synonyms - here we should be saying that "S01" is equivalent to "Season 01"
alternative_dict = {re.compile(r'S[0-9]',flags=re.IGNORECASE):re.compile(r'Season [0-9]',flags=re.IGNORECASE),\
re.compile(r'E[0-9]',flags=re.IGNORECASE):re.compile(r'Episode [0-9]',flags=re.IGNORECASE)}
print check_if_match(string1, other_string_to_check, alternative_dict)
print
# another try
string2 = 'three.BuCkets+of H2O'
other_string_to_check2 = '3 buckets of water'
alternative_dict2 = {'H2O':'water', 'three':'3'}
print check_if_match(string2, other_string_to_check2, alternative_dict2)
This returns:
matching: breaking Bad.Season+01 Episode..02 against: Breaking.Bad.S01+E01
List of words retrieved from given string:
['breaking', 'Bad', 'Season', '01', 'Episode', '', '02']
breaking found to match
Bad found to match
Season did not match
False
matching: three.BuCkets+of H2O against: 3 buckets of water
List of words retrieved from given string:
['three', 'BuCkets', 'of', 'H2O']
three found to match
BuCkets found to match
of found to match
H2O found to match
True
I realize this probably means I am getting something wrong with the dictionary keys and values, but I feel like I am getting further away from a simple pythonic solution that has probably already been created.
Anyone have any thoughts?
I was tinkering with it and found some interesting things:
It might have to do with the way you are breaking up your initial words into lists
matching: breaking Bad.Season 1.Episode.1 against: Breaking.Bad.S1+E1
List of words retrieved from given string:
['breaking', 'Bad', 'Season', '1', 'Episode', '1']
I think you want it to be ..., 'Season 1', ... instead of having 'Season' and 1 be separate entries in the list.
You specify S[0-9], but this would not match double digits.
You are right about your regular expresions being stored in dictionaries; the mapping only applies in one direction. I was fiddling with the code (unfortunately don't remember what it was) by mapping r'Season [0-9]' to r'S[0-9]' instead of vice versa and it was able to match Season.
Suggestions
Instead of mapping, have an equivalence class for each string type (e.g. title, season, episode) and have some matcher code for that.
Separate the parse and compare steps. Parse each string individually into a common format or object and then do a comparison
You might need to implement some sort of state machine to know that you are processing a season and expect to see a number in a particular format right after it.
You may want to use a third party tool instead; I've heard good things about Renamer
This is not a homework question, it is an exam preparation question.
I should define a function syllables(word) that counts the number of syllables in
A word in the following way:
• a maximal sequence of vowels is a syllable;
• a final e in a word is not a syllable (or the vowel sequence it is a part
Of).
I do not have to deal with any special cases, such as a final e in a
One-syllable word (e.g., ’be’ or ’bee’).
>>> syllables(’honour’)
2
>>> syllables(’decode’)
2
>>> syllables(’oiseau’)
2
Should I use regular expression here or just list comprehension ?
I find regular expressions natural for this question. (I think a non-regex answer would take more coding. I use two string methods, 'lower' and 'endswith' to make the answer more clear.)
import re
def syllables(word):
word = word.lower()
if word.endswith('e'):
word = word[:-1]
count = len(re.findall('[aeiou]+', word))
return count
for word in ('honour', 'decode', 'decodes', 'oiseau', 'pie'):
print word, syllables(word)
Which prints:
honour 2
decode 2
decodes 3
oiseau 2
pie 1
Note that 'decodes' has one more syllable than 'decode' (which is strange, but fits your definition).
Question. How does this help you? Isn't the point of the study question that you work through it yourself? You may get more benefit in the future by posting a failed attempt in your question, so you can learn exactly where you are lacking.
Use regexps - most languages will let you count the number of matches of a regexp in a string.
Then special-case the terminal-e by checking the right-most match group.
I don't think regex is the right solution here.
It seems pretty straightforward to write this treating each string as a list.
Some pointers:
[abc] matches a, b or c.
A + after a regex token allows the token to match once or more
$ matches the end of the string.
(?<=x) matches the current position only if the previous character is an x.
(?!x) matches the current position only if the next character is not an x.
EDIT:
I just saw your comment that since this is not homework, actual code is requested.
Well, then:
[aeiou]+(?!(?<=e)$)
If you don't want to count final vowel sequences that end in e at all (like the u in tongue or the o in toe), then use
[aeiou]+(?=[^aeiou])|[aeiou]*[aiou]$
I'm sure you'll be able to figure out how it works if you read the explanation above.
Here's an answer without regular expressions. My real answer (also posted) uses regular expressions. Untested code:
def syllables(word):
word = word.lower()
if word.endswith('e'):
word = word[:-1]
vowels = 'aeiou'
in_vowel_group = False
vowel_groups = 0
for letter in word:
if letter in vowels:
if not in_vowel_group:
in_vowel_group = True
vowel_groups += 1
else:
in_vowel_group = False
return vowel_groups
Both ways work. You said yourself that it was for exam preparation. Use whichever is going to be on the exam. If they're both on the exam, use which you need more practice for. Just remember:
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. ~Jamie Zawinski
So in my opinion, don't use regex unless you need the practice.
Regular expressions would be way too complex, and a list comprehension probably wouldn't be robust enough. You will probably be able to solve this easily using a grammar lexer like PyParsing. Give it a shot!
Use a regex that matches a,e,i,o, or u, convert the string to a list, then iterate through the list... 1 for first true, 1 for next false, 2 for next true, 2 for next false, etc.
To handle the case where the last letter is 'e' following a consonant (as in ate), just check the last two letters of the word before you start. If they match that pattern truncate the final e and process as normal.
This pattern works for your definition:
(?!e$)([aeiouy]+)
Just count how many times it occurs.