Finding uppercase acronyms in a string - python

I'm trying to find uppercase acronyms in a string. For example, if the input is "I need to see you ASAP, because YOLO, you know" should return ["ASAP", "YOLO"].
#!/usr/bin/env python3
import string
def acronyms(s):
s.translate(string.punctuation)
for i, x in enumerate(s):
while x.upper():
print(x)
i += 1
def main():
print(acronyms("""I need to see you ASAP, because YOLO, you know."""))
if __name__ == "__main__":
main()
I tried to get rid of the punctuations, then loop through the string, and while it is uppercase print the letter out. It resulted in an infinite loop. I wanted to solve this using string manipulation, so no RegEx
Edits:
changes in removing punctuations for efficiency
From:
exclude = set(string.punctuation)
s = "".join(ch for ch in s if ch not in exclude)
To:
s.translate(string.punctuation)

A couple things I'd like to point out. One, you have end up with a hanging program because you have a while True and not a single break. Then, you kind of make the enumerate pretty pointless when you do n+=1.
for i, x in enumerate(s):
n+=1
This can all be easily simplified, no enumerate needed.
def acronyms(s):
exclude = set(string.punctuation)
s = "".join(ch for ch in s if ch not in exclude)
acro = [x for x in s.split() if x.isupper()]
return acro
output
['I', 'ASAP', 'YOLO']
Sadly, we do have an extra I which happens not to be an acronym, so one fix could be to make sure x is never one letter before being appended.
acro = [x for x in s.split() if x.isupper() and len(x) != 1]

Your while-loop iterates over the first character, but never breaks out to go to the next character.
You also want to filter out the 'I' as a single letter is not usually classified as an acronym.
the string.isupper() function checks for an entire string rather than a single character, so I recommend you use that. It would look like this:
def acronyms(s):
words = s.split()
acronyms = []
for word in words:
if word.isupper() and len(word) > 1:
acronyms.append(word)
return acronyms

I would highly recommend using nltk for its wonderful tokenisation package, it handles edge cases and punctuation superbly.
For a simplified approach where you define an acronym as:
all characters are alphabetical
all characters are upper case
The following should suffice:
from nltk.tokenize import word_tokenize
def get_acronyms(text):
return [
token for token in word_tokenize(text)
if token.isupper()
]

here this should work:
def acronyms(x):
ans = []
y = x.split(" ")
for i in y:
if i.isupper():
ans += [i]
return ans
isupper() returns True as long as there are no lowercase even if there is punctuation

Related

How to reverse the words of a string considering the punctuation?

Here is what I have so far:
def reversestring(thestring):
words = thestring.split(' ')
rev = ' '.join(reversed(words))
return rev
stringing = input('enter string: ')
print(reversestring(stringing))
I know I'm missing something because I need the punctuation to also follow the logic.
So let's say the user puts in Do or do not, there is no try.. The result should be coming out as .try no is there , not do or Do, but I only get try. no is there not, do or Do. I use a straightforward implementation which reverse all the characters in the string, then do something where it checks all the words and reverses the characters again but only to the ones with ASCII values of letters.
Try this (explanation in comments of code):
s = "Do or do not, there is no try."
o = []
for w in s.split(" "):
puncts = [".", ",", "!"] # change according to needs
for c in puncts:
# if a punctuation mark is in the word, take the punctuation and add it to the rest of the word, in the beginning
if c in w:
w = c + w[:-1] # w[:-1] gets everthing before the last char
o.append(w)
o = reversed(o) # reversing list to reverse sentence
print(" ".join(o)) # printing it as sentence
#output: .try no is there ,not do or Do
Your code does exactly what it should, splitting on space doesn't separator a dot ro comma from a word.
I'd suggest you use re.findall to get all words, and all punctation that interest you
import re
def reversestring(thestring):
words = re.findall(r"\w+|[.,]", thestring)
rev = ' '.join(reversed(words))
return rev
reversestring("Do or do not, there is no try.") # ". try no is there , not do or Do"
You can use regular expressions to parse the sentence into a list of words and a list of separators, then reverse the word list and combine them together to form the desired string. A solution to your problem would look something like this:
import re
def reverse_it(s):
t = "" # result, empty string
words = re.findall(r'(\w+)', s) # just the words
not_s = re.findall(r'(\W+)', s) # everything else
j = len(words)
k = len(not_s)
words.reverse() # reverse the order of word list
if re.match(r'(\w+)', s): # begins with a word
for i in range(k):
t += words[i] + not_s[i]
if j > k: # and ends with a word
t += words[k]
else: # begins with punctuation
for i in range(j):
t += not_s[i] + words[i]
if k > j: # ends with punctuation
t += not_s[j]
return t #result
def check_reverse(p):
q = reverse_it(p)
print("\"%s\", \"%s\"" % (p, q) )
check_reverse('Do or do not, there is no try.')
Output
"Do or do not, there is no try.", "try no is there, not do or Do."
It is not a very elegant solution but sure does work!

Python: Find the longest word in a string

I'm preparing for an exam but I'm having difficulties with one past-paper question. Given a string containing a sentence, I want to find the longest word in that sentence and return that word and its length. Edit: I only needed to return the length but I appreciate your answers for the original question! It helps me learn more. Thank you.
For example: string = "Hello I like cookies". My program should then return "Cookies" and the length 7.
Now the thing is that I am not allowed to use any function from the class String for a full score, and for a full score I can only go through the string once. I am not allowed to use string.split() (otherwise there wouldn't be any problem) and the solution shouldn't have too many for and while statements. The strings contains only letters and blanks and words are separated by one single blank.
Any suggestions? I'm lost i.e. I don't have any code.
Thanks.
EDIT: I'm sorry, I misread the exam question. You only have to return the length of the longest word it seems, not the length + the word.
EDIT2: Okay, with your help I think I'm onto something...
def longestword(x):
alist = []
length = 0
for letter in x:
if letter != " ":
length += 1
else:
alist.append(length)
length = 0
return alist
But it returns [5, 1, 4] for "Hello I like cookies" so it misses "cookies". Why? EDIT: Ok, I got it. It's because there's no more " " after the last letter in the sentence and therefore it doesn't append the length. I fixed it so now it returns [5, 1, 4, 7] and then I just take the maximum value.
I suppose using lists but not .split() is okay? It just said that functions from "String" weren't allowed or are lists part of strings?
You can try to use regular expressions:
import re
string = "Hello I like cookies"
word_pattern = "\w+"
regex = re.compile(word_pattern)
words_found = regex.findall(string)
if words_found:
longest_word = max(words_found, key=lambda word: len(word))
print(longest_word)
Finding a max in one pass is easy:
current_max = 0
for v in values:
if v>current_max:
current_max = v
But in your case, you need to find the words. Remember this quote (attribute to J. Zawinski):
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Besides using regular expressions, you can simply check that the word has letters. A first approach is to go through the list and detect start or end of words:
current_word = ''
current_longest = ''
for c in mystring:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
else:
if len(current_word)>len(current_longest):
current_longest = current_word
A final way is to split words in a generator and find the max of what it yields (here I used the max function):
def split_words(mystring):
current = []
for c in mystring:
if c in string.ascii_letters:
current.append(c)
else:
if current:
yield ''.join(current)
max(split_words(mystring), key=len)
Just search for groups of non-whitespace characters, then find the maximum by length:
longest = len(max(re.findall(r'\S+',string), key = len))
For python 3. If both the words in the sentence is of the same length, then it will return the word that appears first.
def findMaximum(word):
li=word.split()
li=list(li)
op=[]
for i in li:
op.append(len(i))
l=op.index(max(op))
print (li[l])
findMaximum(input("Enter your word:"))
It's quite simple:
def long_word(s):
n = max(s.split())
return(n)
IN [48]: long_word('a bb ccc dddd')
Out[48]: 'dddd'
found an error in a previous provided solution, he's the correction:
def longestWord(text):
current_word = ''
current_longest = ''
for c in text:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
if len(current_word)>len(current_longest):
current_longest = current_word
return current_longest
I can see imagine some different alternatives. Regular expressions can probably do much of the splitting words you need to do. This could be a simple option if you understand regexes.
An alternative is to treat the string as a list, iterate over it keeping track of your index, and looking at each character to see if you're ending a word. Then you just need to keep the longest word (longest index difference) and you should find your answer.
Regular Expressions seems to be your best bet. First use re to split the sentence:
>>> import re
>>> string = "Hello I like cookies"
>>> string = re.findall(r'\S+',string)
\S+ looks for all the non-whitespace characters and puts them in a list:
>>> string
['Hello', 'I', 'like', 'cookies']
Now you can find the length of the list element containing the longest word and then use list comprehension to retrieve the element itself:
>>> maxlen = max(len(word) for word in string)
>>> maxlen
7
>>> [word for word in string if len(word) == maxlen]
['cookies']
This method uses only one for loop, doesn't use any methods in the String class, strictly accesses each character only once. You may have to modify it depending on what characters count as part of a word.
s = "Hello I like cookies"
word = ''
maxLen = 0
maxWord = ''
for c in s+' ':
if c == ' ':
if len(word) > maxLen:
maxWord = word
word = ''
else:
word += c
print "Longest word:", maxWord
print "Length:", len(maxWord)
Given you are not allowed to use string.split() I guess using a regexp to do the exact same thing should be ruled out as well.
I do not want to solve your exercise for you, but here are a few pointers:
Suppose you have a list of numbers and you want to return the highest value. How would you do that? What information do you need to track?
Now, given your string, how would you build a list of all word lengths? What do you need to keep track of?
Now, you only have to intertwine both logics so computed word lengths are compared as you go through the string.
My proposal ...
import re
def longer_word(sentence):
word_list = re.findall("\w+", sentence)
word_list.sort(cmp=lambda a,b: cmp(len(b),len(a)))
longer_word = word_list[0]
print "The longer word is '"+longer_word+"' with a size of", len(longer_word), "characters."
longer_word("Hello I like cookies")
import re
def longest_word(sen):
res = re.findall(r"\w+",sen)
n = max(res,key = lambda x : len(x))
return n
print(longest_word("Hey!! there, How is it going????"))
Output : there
Here I have used regex for the problem. Variable "res" finds all the words in the string and itself stores them in the list after splitting them.
It uses split() to store all the characters in a list and then regex does the work.
findall keyword is used to find all the desired instances in a string. Here \w+ is defined which tells the compiler to look for all the words without any spaces.
Variable "n" finds the longest word from the given string which is now free of any undesired characters.
Variable "n" uses lambda expressions to define the key len() here.
Variable "n" finds the longest word from "res" which has removed all the non-string charcters like %,&,! etc.
>>>#import regular expressions for the problem.**
>>>import re
>>>#initialize a sentence
>>>sen = "fun&!! time zone"
>>>res = re.findall(r"\w+",sen)
>>>#res variable finds all the words and then stores them in a list.
>>>res
Out: ['fun','time','zone']
>>>n = max(res)
Out: zone
>>>#Here we get "zone" instead of "time" because here the compiler
>>>#sees "zone" with the higher value than "time".
>>>#The max() function returns the item with the highest value, or the item with the highest value in an iterable.
>>>n = max(res,key = lambda x:len(x))
>>>n
Out: time
Here we get "time" because lambda expression discards "zone" as it sees the key is for len() in a max() function.
list1 = ['Happy', 'Independence', 'Day', 'Zeal']
listLen = []
for i in list1:
listLen.append(len(i))
print list1[listLen.index(max(listLen))]
Output - Independence

Removing items for a list using a loop in Python

I am very new to programming in general and have started with Python. I am working through various problems to try and better my understanding.
I am trying to define a function that removes vowels from a string. This is what I have tried:
def anti_vowel(text):
new = []
for i in range(len(text)):
new.append(text[i])
print new
for x in new:
if x == "e" or x == "E" or x == "a" or x == "A" or x == "i" or x == "I" or x == "o" or x == "O" or x == "u" or x == "U":
new.remove(x)
return "".join(new)
This is removing vowels from the first words of a string, but not the final word:
eg:
anti_vowel("Hey look words!")
returns: "Hy lk words!"
Can somebody please explain where I am going wrong so I can learn from this?
Thanks :)
You should not delete items from a list while iterating through it. You will find numerous posts on Stack Overflow explaining why.
I would use the filter function
>>> vowels = 'aeiouAEIOU'
>>> myString = 'This is my string that has vowels in it'
>>> filter(lambda i : i not in vowels, myString)
'Ths s my strng tht hs vwls n t'
Written as a function, this would be
def anti_vowel(text):
vowels = 'aeiouAEIOU'
return filter(lambda letter : letter not in vowels, text)
Test
>>> anti_vowel(myString)
'Ths s my strng tht hs vwls n t'
You seem to have approached this a bit backwards. Firstly, note that:
new = []
for i in range(len(text)):
new.append(text[i])
is just:
new = list(text)
Secondly, why not check before appending, rather than afterwards? Then you only have to iterate over the characters once. This could be:
def anti_vowel(text):
"""Remove all vowels from the supplied text.""" # explanatory docstring
non_vowels = [] # clear variable names
vowels = set("aeiouAEIOU") # sets allow fast membership tests
for char in text: # iterate directly over characters, no need for 'i'
if char not in vowels: # test membership of vowels
non_vowels.append(char) # add non-vowels only
return "".join(non_vowels)
A quick example:
>>> anti_vowel("Hey look words!")
'Hy lk wrds!'
This simplifies further to a list comprehension:
def anti_vowel(text):
"""Remove all vowels from the supplied text."""
vowels = set("aeiouAEIOU")
return "".join([char for char in text if char not in vowels])
You can use a list comp:
def anti_vowel(text):
vowels = 'aeiouAEIOU'
return "".join([x for x in text if x not in vowels])
print anti_vowel("Hey look words!")
Hy lk wrds!
The list comprehension filters the vowels from the words.
You can also do it succinctly with a comprehension:
def anti_vowel(text):
return ''.join(ch for ch in text if ch.upper() not in 'AEIOU')
Iteration is an indexed operation. When you remove an item from a list while iterating over it, you essentially change the indices of every item in the list that follows the item you removed. When you loop over the list
['h','e','y',' ','l','o','o','k',' ','w','o','r','d','s']
whilst removing an item in 'aeiou', on the second iteration of the loop, you remove 'e' from your list and your left with
['h','y',' ','l','o','o','k',' ','w','o','r','d','s']
then on the third iteration, instead of testing your if statement on the y, which was originally in the third position, it is now testing it on the ' ', which is what is in the third position of the the modified list.
mylist.remove(x)
will search for the first matching value of x inmylist and remove it. When your loop gets to the first 'o' in the list, it removes it, thereby changing the index of the following 'o' by -1. On the next iteration of the loop, it is looking at 'k' instead of the subsequent 'o'.
However, why then did your function remove the first two 'o's and not the last one?
Your loop looked at the first 'o', not the second 'o', and looked at the third 'o'. In total your loop found two matches for 'o' and performed the remove function on both. And again, since the remove function will find the first matching item in the list and remove it, that's why it removed the first two 'o's, although for the removal of the second 'o' your loop was actually iterating over the third 'o'.
You were fortunate to have done this test on a string with consecutive vowels. Had you done it on a string without consecutive vowels, you would have removed all the vowels with your function and it would have appeared to work as you intended.

Small issue with Palindrome program

I've been working on this Palindrome program and am really close to completing it.Close to the point that it's driving me a bit crazy haha.
The program is supposed to check each 'phrase' to determine if it is a Palindrome or not and return a lowercase version with white space and punctuation removed if it is in fact a Palindrome. Otherwise, if not, it's supposed to return None.
I'm just having an issue with bringing my test data into the function. I can't seem to think of the correct way of dealing with it. It's probably pretty simple...Any ideas?
Thanks!
import string
def reverse(word):
newword = ''
letterflag = -1
for numoletter in word:
newword += word[letterflag]
letterflag -= 1
return newword
def Palindromize(phrase):
for punct in string.punctuation:
phrase= phrase.replace(punct,'')
phrase = str(phrase.lower())
firstindex = 0
secondindex = len(phrase) - 1
flag = 0
while firstindex != secondindex and firstindex < secondindex:
char1 = phrase[firstindex]
char2 = phrase[secondindex]
if char1 == char2:
flag += 1
else:
break
firstindex += 1
secondindex -= 1
if flag == len(phrase) // (2):
print phrase.strip()
else:
print None
def Main():
data = ['Murder for a jar of red rum',12321, 'nope', 'abcbA', 3443, 'what',
'Never odd or even', 'Rats live on no evil star']
for word in data:
word == word.split()
Palindromize(word)
if __name__ == '__main__':
Main()
Maybe this line is causing the problems.
for word in data:
word == word.split() # This line.
Palindromize(word)
You're testing for equality here, rather than reassigning the variable word which can be done using word = word.split(). word then becomes a list, and you might want to iterate over the list using
for elem in word:
Palindromize(elem)
Also, you seem to be calling the split method on int, which is not possible, try converting them to strings.
Also, why do you convert the phrase to lower case in the for loop, just doing it once will suffice.
At the "core" of your program, you could do much better in Python, using filter for example. Here is a quick demonstration:
>>> phrase = 'Murder for a jar of red rum!'
>>> normalized = filter(str.isalnum, phrase.lower())
>>> normalized
'murderforajarofredrum'
>>> reversed = normalized[-1::-1]
>>> reversed
'murderforajarofredrum'
# Test is it is a palindrome
>>> reversed == normalized
True
Before you go bananas, let's rethink the problem:
You have already pointed out that Palindromes only make sense in strings without punctuation, whitespace, or mixed case. Thus, you need to convert your input string, either by removing the unwanted characters or by picking the allowed ones. For the latter, one can imagine:
import string
clean_data = [ch for ch in original_data if ch in string.ascii_letters]
clean_data = ''.join(clean_data).lower()
Having the cleaned version of the input, one might consider the third parameter in slicing of strings, particularly when it's -1 ;)
Does a comparison like
if clean_data[::-1] == clean_data:
....
ring a bell?
One of the primary errors that i spotted is here:
for word in data:
word==word.split()
Here, there are two mistakes:
1. Double equals make no point here.
2. If you wish to split the contents of each iteration of data, then doing like this doesn't change the original list, since you are modifying the duplicate set called word. To achieve your list, do:
for i in range(data):
data[i]=data[i].split()
This may clear your errors

How do I match vowels?

I am having trouble with a small component of a bigger program I am in the works on. Basically I need to have a user input a word and I need to print the index of the first vowel.
word= raw_input("Enter word: ")
vowel= "aeiouAEIOU"
for index in word:
if index == vowel:
print index
However, this isn't working. What's wrong?
Try:
word = raw_input("Enter word: ")
vowels = "aeiouAEIOU"
for index,c in enumerate(word):
if c in vowels:
print index
break
for .. in will iterate over actual characters in a string, not indexes. enumerate will return indexes as well as characters and make referring to both easier.
Just to be different:
import re
def findVowel(s):
match = re.match('([^aeiou]*)', s, flags=re.I)
if match:
index = len(match.group(1))
if index < len(s):
return index
return -1 # not found
The same idea using list comprehension:
word = raw_input("Enter word: ")
res = [i for i,ch in enumerate(word) if ch.lower() in "aeiou"]
print(res[0] if res else None)
index == vowel asks if the letter index is equal to the entire vowel list. What you want to know is if it is contained in the vowel list. See some of the other answers for how in works.
One alternative solution, and arguably a more elegant one, is to use the re library.
import re
word = raw_input('Enter a word:')
try:
print re.search('[aeiou]', word, re.I).start()
except AttributeError:
print 'No vowels found in word'
In essence, the re library implements a regular expression matching engine. re.search() searches for the regular expression specified by the first string in the second one and returns the first match. [aeiou] means "match a or e or i or o or u" and re.I tells re.search() to make the search case-insensitive.
for i in range(len(word)):
if word[i] in vowel:
print i
break
will do what you want.
"for index in word" loops over the characters of word rather than the indices. (You can loop over the indices and characters together using the "enumerate" function; I'll let you look that up for yourself.)

Categories