Python: Find the longest word in a string

Python: Find the longest word in a string - python

I'm preparing for an exam but I'm having difficulties with one past-paper question. Given a string containing a sentence, I want to find the longest word in that sentence and return that word and its length. Edit: I only needed to return the length but I appreciate your answers for the original question! It helps me learn more. Thank you.
For example: string = "Hello I like cookies". My program should then return "Cookies" and the length 7.
Now the thing is that I am not allowed to use any function from the class String for a full score, and for a full score I can only go through the string once. I am not allowed to use string.split() (otherwise there wouldn't be any problem) and the solution shouldn't have too many for and while statements. The strings contains only letters and blanks and words are separated by one single blank.
Any suggestions? I'm lost i.e. I don't have any code.
Thanks.
EDIT: I'm sorry, I misread the exam question. You only have to return the length of the longest word it seems, not the length + the word.
EDIT2: Okay, with your help I think I'm onto something...
def longestword(x):
alist = []
length = 0
for letter in x:
if letter != " ":
length += 1
else:
alist.append(length)
length = 0
return alist
But it returns [5, 1, 4] for "Hello I like cookies" so it misses "cookies". Why? EDIT: Ok, I got it. It's because there's no more " " after the last letter in the sentence and therefore it doesn't append the length. I fixed it so now it returns [5, 1, 4, 7] and then I just take the maximum value.
I suppose using lists but not .split() is okay? It just said that functions from "String" weren't allowed or are lists part of strings?

You can try to use regular expressions:
import re
string = "Hello I like cookies"
word_pattern = "\w+"
regex = re.compile(word_pattern)
words_found = regex.findall(string)
if words_found:
longest_word = max(words_found, key=lambda word: len(word))
print(longest_word)

Finding a max in one pass is easy:
current_max = 0
for v in values:
if v>current_max:
current_max = v
But in your case, you need to find the words. Remember this quote (attribute to J. Zawinski):
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.
Besides using regular expressions, you can simply check that the word has letters. A first approach is to go through the list and detect start or end of words:
current_word = ''
current_longest = ''
for c in mystring:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
else:
if len(current_word)>len(current_longest):
current_longest = current_word
A final way is to split words in a generator and find the max of what it yields (here I used the max function):
def split_words(mystring):
current = []
for c in mystring:
if c in string.ascii_letters:
current.append(c)
else:
if current:
yield ''.join(current)
max(split_words(mystring), key=len)

Just search for groups of non-whitespace characters, then find the maximum by length:
longest = len(max(re.findall(r'\S+',string), key = len))

For python 3. If both the words in the sentence is of the same length, then it will return the word that appears first.
def findMaximum(word):
li=word.split()
li=list(li)
op=[]
for i in li:
op.append(len(i))
l=op.index(max(op))
print (li[l])
findMaximum(input("Enter your word:"))

It's quite simple:
def long_word(s):
n = max(s.split())
return(n)
IN [48]: long_word('a bb ccc dddd')
Out[48]: 'dddd'

found an error in a previous provided solution, he's the correction:
def longestWord(text):
current_word = ''
current_longest = ''
for c in text:
if c in string.ascii_letters:
current_word += c
else:
if len(current_word)>len(current_longest):
current_longest = current_word
current_word = ''
if len(current_word)>len(current_longest):
current_longest = current_word
return current_longest

I can see imagine some different alternatives. Regular expressions can probably do much of the splitting words you need to do. This could be a simple option if you understand regexes.
An alternative is to treat the string as a list, iterate over it keeping track of your index, and looking at each character to see if you're ending a word. Then you just need to keep the longest word (longest index difference) and you should find your answer.

Regular Expressions seems to be your best bet. First use re to split the sentence:
>>> import re
>>> string = "Hello I like cookies"
>>> string = re.findall(r'\S+',string)
\S+ looks for all the non-whitespace characters and puts them in a list:
>>> string
['Hello', 'I', 'like', 'cookies']
Now you can find the length of the list element containing the longest word and then use list comprehension to retrieve the element itself:
>>> maxlen = max(len(word) for word in string)
>>> maxlen
7
>>> [word for word in string if len(word) == maxlen]
['cookies']

This method uses only one for loop, doesn't use any methods in the String class, strictly accesses each character only once. You may have to modify it depending on what characters count as part of a word.
s = "Hello I like cookies"
word = ''
maxLen = 0
maxWord = ''
for c in s+' ':
if c == ' ':
if len(word) > maxLen:
maxWord = word
word = ''
else:
word += c
print "Longest word:", maxWord
print "Length:", len(maxWord)

Given you are not allowed to use string.split() I guess using a regexp to do the exact same thing should be ruled out as well.
I do not want to solve your exercise for you, but here are a few pointers:
Suppose you have a list of numbers and you want to return the highest value. How would you do that? What information do you need to track?
Now, given your string, how would you build a list of all word lengths? What do you need to keep track of?
Now, you only have to intertwine both logics so computed word lengths are compared as you go through the string.

My proposal ...
import re
def longer_word(sentence):
word_list = re.findall("\w+", sentence)
word_list.sort(cmp=lambda a,b: cmp(len(b),len(a)))
longer_word = word_list[0]
print "The longer word is '"+longer_word+"' with a size of", len(longer_word), "characters."
longer_word("Hello I like cookies")

import re
def longest_word(sen):
res = re.findall(r"\w+",sen)
n = max(res,key = lambda x : len(x))
return n
print(longest_word("Hey!! there, How is it going????"))
Output : there
Here I have used regex for the problem. Variable "res" finds all the words in the string and itself stores them in the list after splitting them.
It uses split() to store all the characters in a list and then regex does the work.
findall keyword is used to find all the desired instances in a string. Here \w+ is defined which tells the compiler to look for all the words without any spaces.
Variable "n" finds the longest word from the given string which is now free of any undesired characters.
Variable "n" uses lambda expressions to define the key len() here.
Variable "n" finds the longest word from "res" which has removed all the non-string charcters like %,&,! etc.
>>>#import regular expressions for the problem.**
>>>import re
>>>#initialize a sentence
>>>sen = "fun&!! time zone"
>>>res = re.findall(r"\w+",sen)
>>>#res variable finds all the words and then stores them in a list.
>>>res
Out: ['fun','time','zone']
>>>n = max(res)
Out: zone
>>>#Here we get "zone" instead of "time" because here the compiler
>>>#sees "zone" with the higher value than "time".
>>>#The max() function returns the item with the highest value, or the item with the highest value in an iterable.
>>>n = max(res,key = lambda x:len(x))
>>>n
Out: time
Here we get "time" because lambda expression discards "zone" as it sees the key is for len() in a max() function.

list1 = ['Happy', 'Independence', 'Day', 'Zeal']
listLen = []
for i in list1:
listLen.append(len(i))
print list1[listLen.index(max(listLen))]
Output - Independence

Related

Find the occurrence of a particular word from a file in python [duplicate]

I'm trying to find the number of occurrences of a word in a string.
word = "dog"
str1 = "the dogs barked"
I used the following to count the occurrences:
count = str1.count(word)
The issue is I want an exact match. So the count for this sentence would be 0.
Is that possible?

If you're going for efficiency:
import re
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))
This doesn't need to create any intermediate lists (unlike split()) and thus will work efficiently for large input_string values.
It also has the benefit of working correctly with punctuation - it will properly return 1 as the count for the phrase "Mike saw a dog." (whereas an argumentless split() would not). It uses the \b regex flag, which matches on word boundaries (transitions between \w a.k.a [a-zA-Z0-9_] and anything else).
If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.

You can use str.split() to convert the sentence to a list of words:
a = 'the dogs barked'.split()
This will create the list:
['the', 'dogs', 'barked']
You can then count the number of exact occurrences using list.count():
a.count('dog') # 0
a.count('dogs') # 1
If it needs to work with punctuation, you can use regular expressions. For example:
import re
a = re.split(r'\W', 'the dogs barked.')
a.count('dogs') # 1

Use a list comprehension:
>>> word = "dog"
>>> str1 = "the dogs barked"
>>> sum(i == word for word in str1.split())
0
>>> word = 'dog'
>>> str1 = 'the dog barked'
>>> sum(i == word for word in str1.split())
1
split() returns a list of all the words in a sentence. Then we use a list comprehension to count how many times the word appears in a sentence.

import re
word = "dog"
str = "the dogs barked"
print len(re.findall(word, str))

You need to split the sentence into words. For you example you can do that with just
words = str1.split()
But for real word usage you need something more advanced that also handles punctuation. For most western languages you can get away with replacing all punctuation with spaces before doing str1.split().
This will work for English as well in simple cases, but note that "I'm" will be split into two words: "I" and "m", and it should in fact be split into "I" and "am". But this may be overkill for this application.
For other cases such as Asian language, or actual real world usage of English, you might want to use a library that does the word splitting for you.
Then you have a list of words, and you can do
count = words.count(word)

#counting the number of words in the text
def count_word(text,word):
"""
Function that takes the text and split it into word
and counts the number of occurence of that word
input: text and word
output: number of times the word appears
"""
answer = text.split(" ")
count = 0
for occurence in answer:
if word == occurence:
count = count + 1
return count
sentence = "To be a programmer you need to have a sharp thinking brain"
word_count = "a"
print(sentence.split(" "))
print(count_word(sentence,word_count))
#output
>>> %Run test.py
['To', 'be', 'a', 'programmer', 'you', 'need', 'to', 'have', 'a', 'sharp', 'thinking', 'brain']
2
>>>
Create the function that takes two inputs which are sentence of text and word.
Split the text of a sentence into the segment of words in a list,
Then check whether the word to be counted exist in the segmented words and count the occurrence as a return of the function.

If you don't need RegularExpression then you can do this neat trick.
word = " is " #Add space at trailing and leading sides.
input_string = "This is some random text and this is str which is mutable"
print("Word count : ",input_string.count(word))
Output -- Word count : 3

Below is a simple example where we can replace the desired word with the new word and also for desired number of occurrences:
import string
def censor(text, word):<br>
newString = text.replace(word,"+" * len(word),text.count(word))
print newString
print censor("hey hey hey","hey")
output will be : +++ +++ +++
The first Parameter in function is search_string.
Second one is new_string which is going to replace your search_string.
Third and last is number of occurrences .

Let us consider the example s = "suvotisuvojitsuvo".
If you want to count no of distinct count "suvo" and "suvojit" then you use the count() method... count distinct i.e) you don't count the suvojit to suvo.. only count the lonely "suvo".
suvocount = s.count("suvo") // #output: 3
suvojitcount = s.count("suvojit") //# output : 1
Then find the lonely suvo count you have to negate from the suvojit count.
lonelysuvo = suvocount - suvojicount //# output: 3-1 -> 2

This would be my solution with help of the comments:
word = str(input("type the french word chiens in english:"))
str1 = "dogs"
times = int(str1.count(word))
if times >= 1:
print ("dogs is correct")
else:
print ("your wrong")

If you want to find the exact number of occurrence of the specific word in the sting and you don't want to use any count function, then you can use the following method.
text = input("Please enter the statement you want to check: ")
word = input("Please enter the word you want to check in the statement: ")
# n is the starting point to find the word, and it's 0 cause you want to start from the very beginning of the string.
n = 0
# position_word is the starting Index of the word in the string
position_word = 0
num_occurrence = 0
if word.upper() in text.upper():
while position_word != -1:
position_word = text.upper().find(word.upper(), n, len(text))
# increasing the value of the stating point for search to find the next word
n = (position_word + 1)
# statement.find("word", start, end) returns -1 if the word is not present in the given statement.
if position_word != -1:
num_occurrence += 1
print (f"{word.title()} is present {num_occurrence} times in the provided statement.")
else:
print (f"{word.title()} is not present in the provided statement.")

This is simple python program using split function
str = 'apple mango apple orange orange apple guava orange'
print("\n My string ==> "+ str +"\n")
str = str.split()
str2=[]
for i in str:
if i not in str2:
str2.append(i)
print( i,str.count(i))

I have just started out to learn coding in general and I do not know any libraries as such.
s = "the dogs barked"
value = 0
x = 0
y=3
for alphabet in s:
if (s[x:y]) == "dog":
value = value+1
x+=1
y+=1
print ("number of dog in the sentence is : ", value)

Another way to do this is by tokenizing string (breaking into words)
Use Counter from collection module of Python Standard Library
from collections import Counter
str1 = "the dogs barked"
stringTokenDict = { key : value for key, value in Counter(str1.split()).items() }
print(stringTokenDict['dogs'])
#This dictionary contains all words & their respective count

What would be the easiest way to search through a list?

It's actually a string but I just converted it to a list because the answer is supposed to be returned as a list. I've been looking at this problem for hours now and cannot get it. I'm supposed to take a string, like "Mary had a little lamb" for example and another string such as "ab" for example and search through string1 seeing if any of the letters from string2 occur. So if done correctly with the two example it would return
["a=4","b=1"]
I have this so far:
def problem3(myString, charString):
myList = list(myString)
charList = list(charString)
count = 0
newList = []
newString = ""
for i in range(0,len(myList)):
for j in range(0,len(charList)):
if charList[j] == myList[i]:
count = count + 1
newString = charList[j] + "=" + str(count)
newList.append(newString)
return newList
Which returns [a=5] I know it's something with the newList.append(string) and where it should be placed, anyone have any suggestions?

You can do this very easily with list comprehensions and the count function that strings (and lists!) have:
Split the search string into a list of chars.
For each character in the search string, loop over the input string and determine how much it occurs (via count).
Example:
string = 'Mary had a little lamb'
search_string = 'ab'
search_string_chars = [char for char in search_string]
result = []
for char in search_string_chars:
result.append('%s=%d' % (char, string.count(char)))
Result:
['a=4', 'b=1']
Note that you don't need to split the search_string ('ab') into a list of characters, as strings are already lists of characters - the above was done that way to illustrate the concept. Hence, a reduced version of the above could be (which also yields the same result):
string = 'Mary had a little lamb'
search_string = 'ab'
result = []
for char in search_string:
result.append('%s=%d' % (char, string.count(char)))

Here's a possible solution using Counter as mentioned by coder,
from collections import Counter
s = "Mary had a little lambzzz"
cntr = Counter(s)
test_str = "abxyzzz"
results = []
for letter in test_str:
if letter in s:
occurrances = letter + "=" + str(cntr.get(letter))
else:
occurrances = letter + "=" + "0"
if occurrances not in results:
results.append(occurrances)
print(results)
output
['a=4', 'b=1', 'x=0', 'y=1', 'z=3']

import collections
def count_chars(s, chars):
counter = collections.Counter(s)
return ['{}={}'.format(char, counter[char]) for char in set(chars)]
That's all. Let Counter do the work of actually counting the characters in the string. Then create a list comprehension of format strings using the characters in chars. (chars should be a set and not a list so that if there are duplicate characters in chars, the output will only show one.)

How to find the position of a word in a string / list?

I'm writing a function where the user inputs a word, then inputs a string, and the function identifies all occurrences and the position of that word in a string (although it's actually converted to a list halfway through).
The code I have currently is only identifying the first occurrence of the word, and none further. It also doesn't identify the word if the word is the first word in the string, returning an empty list. It will also say the word's actual position - 1, as the first word is counted as zero.
I have attempted to curb this problem in two ways, the first of which doing a aString.insert(0, ' '), the second of which doing for i in __: if i == int: i += 1. Neither of these work.
Also, when doing the .insert, I tried putting a character in the space instead of, well, a space (as this part doesn't get printed anyway), but that didn't work.
Here is the code:
def wordlocator(word):
yourWord = word
print("You have chosen the following word: " +yourWord)
aString = input("What string would you like to search for the given word?")
aString = aString.lower()
aString = aString.split()
b = [(i, j) for i, j in enumerate(aString)]
c = [(i, x) for i, x in b if x == yourWord]
return c
The output I'm looking for is that if someone did...
wordlocator("word")
"please input a string generic message" "that is a word"
"4, word"
Currently that would work, but it'd print "3, word". If the string was "that is a word and this is also a word" then it'd ignore the further occurrence of "word".
edit: Got it working now, used a simpler piece of code. Thanks for your help all!

Try this:
def wordlocator(word):
yourWord = word
print("You have chosen the following word: " +yourWord)
aString = raw_input("What string would you like to search for the given word?")
aString = aString.lower()
aString = aString.split()
b = [(i+1, j) for i, j in enumerate(aString) if j == yourWord.lower()]
return b
print wordlocator('word')
note that the list comprehension can be filtered on just the match you are looking for. Actually I just changed it
I get this:
What string would you like to search for the given word?This word is not that word is it?
[(2, 'word'), (6, 'word')]
Note that the index is off by one if that matters, add one to x in the comprehension
A new test:
You have chosen the following word: word
What string would you like to search for the given word?word is the word
[(1, 'word'), (4, 'word')]

Comparing lengths of words in strings

Need to find the longest word in a string and print that word.
1.) Ask user to enter sentence separated by spaces.
2.)Find and print the longest word. If two or more words are the same length than print the first word.
this is what I have so far
def maxword(splitlist): #sorry, still trying to understand loops
for word in splitlist:
length = len(word)
if ??????
wordlist = input("Enter a sentence: ")
splitlist = wordlist.split()
maxword(splitlist)
I'm hitting a wall when trying to compare the lenghts of words in a sentance. I'm a student who's been using python for 5 weeks.

def longestWord(sentence):
longest = 0 # Keep track of the longest length
word = '' # And the word that corresponds to that length
for i in sentence.split():
if len(i) > longest:
word = i
longest = len(i)
return word
>>> s = 'this is a test sentence with some words'
>>> longestWord(s)
'sentence'

You can use max with a key:
def max_word(splitlist):
return max(splitlist.split(),key=len) if splitlist.strip() else "" # python 2
def max_word(splitlist):
return max(splitlist.split()," ",key=len) # python 3
Or use a try/except as suggested by jon clements:
def max_word(splitlist):
try:
return max(splitlist.split(),key=len)
except ValueError:
return " "

You're going in the right direction. Most of your code looks good, you just need to finish the logic to determine which is the longest word. Since this seems like a homework question I don't want to give you the direct answer (even though everyone else has which I think is useless for a student like you), but there are multiple ways to solve this problem.
You're getting the length of each word correctly, but what do you need to compare each length against? Try to say the problem aloud and how you'd personally solve the problem aloud. I think you'll find that your english description translates nicely to a python version.
Another solution that doesn't use an if statement might use the built-in python function max which takes in a list of numbers and returns the max of them. How could you use that?

You can use nlargest from heapq module
import heapq
heapq.nlargest(1, sentence.split(), key=len)

sentence = raw_input("Enter sentence: ")
words = sentence.split(" ")
maxlen = 0
longest_word = ''
for word in words:
if len(word) > maxlen:
maxlen = len(word)
longest_word = word
print(word, maxlen)

How do I match vowels?

I am having trouble with a small component of a bigger program I am in the works on. Basically I need to have a user input a word and I need to print the index of the first vowel.
word= raw_input("Enter word: ")
vowel= "aeiouAEIOU"
for index in word:
if index == vowel:
print index
However, this isn't working. What's wrong?

Try:
word = raw_input("Enter word: ")
vowels = "aeiouAEIOU"
for index,c in enumerate(word):
if c in vowels:
print index
break
for .. in will iterate over actual characters in a string, not indexes. enumerate will return indexes as well as characters and make referring to both easier.

Just to be different:
import re
def findVowel(s):
match = re.match('([^aeiou]*)', s, flags=re.I)
if match:
index = len(match.group(1))
if index < len(s):
return index
return -1 # not found

The same idea using list comprehension:
word = raw_input("Enter word: ")
res = [i for i,ch in enumerate(word) if ch.lower() in "aeiou"]
print(res[0] if res else None)

index == vowel asks if the letter index is equal to the entire vowel list. What you want to know is if it is contained in the vowel list. See some of the other answers for how in works.

One alternative solution, and arguably a more elegant one, is to use the re library.
import re
word = raw_input('Enter a word:')
try:
print re.search('[aeiou]', word, re.I).start()
except AttributeError:
print 'No vowels found in word'
In essence, the re library implements a regular expression matching engine. re.search() searches for the regular expression specified by the first string in the second one and returns the first match. [aeiou] means "match a or e or i or o or u" and re.I tells re.search() to make the search case-insensitive.

for i in range(len(word)):
if word[i] in vowel:
print i
break
will do what you want.
"for index in word" loops over the characters of word rather than the indices. (You can loop over the indices and characters together using the "enumerate" function; I'll let you look that up for yourself.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Find the longest word in a string - python

You can try to use regular expressions: import re string = "Hello I like cookies" word_pattern = "\w+" regex = re.compile(word_pattern) words_found = regex.findall(string) if words_found: longest_word = max(words_found, key=lambda word: len(word)) print(longest_word)

Just search for groups of non-whitespace characters, then find the maximum by length: longest = len(max(re.findall(r'\S+',string), key = len))

For python 3. If both the words in the sentence is of the same length, then it will return the word that appears first. def findMaximum(word): li=word.split() li=list(li) op=[] for i in li: op.append(len(i)) l=op.index(max(op)) print (li[l]) findMaximum(input("Enter your word:"))

It's quite simple: def long_word(s): n = max(s.split()) return(n) IN [48]: long_word('a bb ccc dddd') Out[48]: 'dddd'

My proposal ... import re def longer_word(sentence): word_list = re.findall("\w+", sentence) word_list.sort(cmp=lambda a,b: cmp(len(b),len(a))) longer_word = word_list[0] print "The longer word is '"+longer_word+"' with a size of", len(longer_word), "characters." longer_word("Hello I like cookies")

list1 = ['Happy', 'Independence', 'Day', 'Zeal'] listLen = [] for i in list1: listLen.append(len(i)) print list1[listLen.index(max(listLen))] Output - Independence

Related

Find the occurrence of a particular word from a file in python [duplicate]

What would be the easiest way to search through a list?

How to find the position of a word in a string / list?

Comparing lengths of words in strings

How do I match vowels?

Categories

Resources