Return first word in sentence? [duplicate]

Return first word in sentence? [duplicate] - python

This question already has answers here:
How to extract the first and final words from a string?
(7 answers)
Closed 5 years ago.
Heres the question I have to answer for school
For the purposes of this question, we will define a word as ending a sentence if that word is immediately followed by a period. For example, in the text “This is a sentence. The last sentence had four words.”, the ending words are ‘sentence’ and ‘words’. In a similar fashion, we will define the starting word of a sentence as any word that is preceded by the end of a sentence. The starting words from the previous example text would be “The”. You do not need to consider the first word of the text as a starting word. Write a program that has:
An endwords function that takes a single string argument. This functioin must return a list of all sentence ending words that appear in the given string. There should be no duplicate entries in the returned list and the periods should not be included in the ending words.
The code I have so far is:
def startwords(astring):
mylist = astring.split()
if mylist.endswith('.') == True:
return my list
but I don't know if I'm using the right approach. I need some help

Several issues with your code. The following would be a simple approach. Create a list of bigrams and pick the second token of each bigram where the first token ends with a period:
def startwords(astring):
mylist = astring.split() # a list! Has no 'endswith' method
bigrams = zip(mylist, mylist[1:])
return [b[1] for b in bigrams if b[0].endswith('.')]
zip and list comprehenion are two things worth reading up on.

mylist = astring.split()
if mylist.endswith('.')
that cannot work, one of the reasons being that mylist is a list, and doesn't have endswith as a method.
Another answer fixed your approach so let me propose a regular expression solution:
import re
print(re.findall(r"\.\s*(\w+)","This is a sentence. The last sentence had four words."))
match all words following a dot and optional spaces
result: ['The']

def endwords(astring):
mylist = astring.split('.')
temp_words = [x.rpartition(" ")[-1] for x in mylist if len(x) > 1]
return list(set(temp_words))

This creates a set so there are no duplicates. Then goes on a for loop in a list of sentences (split by ".") then for each sentence, splits it in words then using [:-1] makes a list of the last word only and gets [0] item in that list.
print (set([ x.split()[:-1][0] for x in s.split(".") if len(x.split())>0]))
The if in theory is not needed but i couldn't make it work without it.
This works as well:
print (set([ x.split() [len(x.split())-1] for x in s.split(".") if len(x.split())>0]))

This is one way to do it ->
#!/bin/env/ python
from sets import Set
sentence = 'This is a sentence. The last sentence had four words.'
uniq_end_words = Set()
for word in sentence.split():
if '.' in word:
# check if period (.) is at the end
if '.' == word[len(word) -1]:
uniq_end_words.add(word.rstrip('.'))
print list(uniq_end_words)
Output (list of all the end words in a given sentence) ->
['words', 'sentence']
If your input string has a period in one of its word (lets say the last word), something like this ->
'I like the documentation of numpy.random.rand.'
The output would be - ['numpy.random.rand']
And for input string 'I like the documentation of numpy.random.rand a lot.'
The output would be - ['lot']

Related

how to get the next word from a string according to list element in python

I am new to python and trying to solve this problem.
words = ['plus', 'Constantly', 'the']
string = "Plus, I Constantly adding new resources, guides, and the personality quizzes to help you travel beyond the Guidebook"
output: I adding Guidebook
here I want to match the list element to the string and get the next word from the string to construct a new word.
I tried to do it by splitting the word into list and check if they are in the list. But the 'Plus,' won't match because of the ',' and also there has two 'the' but I only need to get the last word after 'the'

One way to do this is to use regex to split string by words (the pattern used is [\w]+). Then you can build a dictionary of the pairs, so that you can look up the first word to retrieve the following word.
import re
words = ['plus', 'Constantly', 'the']
string = "Plus, I Constantly adding new resources, guides, and the personality quizzes to help you travel beyond the Guidebook"
string_splits = re.findall(r'[\w]+',string)
pairs = {x:y for x,y in zip(map(lambda x: x.lower(), string_splits), string_splits[1:])}
print(' '.join(pairs.get(word.lower()) for word in words))
Edit to expand the dict comprehsion;
pairs = {}
for x,y in zip(map(lambda x: x.lower(), string_splits), string_splits[1:]):
pairs[x] = y

Python - Recursive word list

I'm trying to get make an anagram algorithm, but I'm stuck once I get to the recursive part. Let me know if anymore information is needed.
My code:
def ana_words(words, letter_count):
"""Return all the anagrams using the given letters and allowed words.
- letter_count has 26 keys (one per lowercase letter),
and each value is a non-negative integer.
#type words: list[str]
#type letter_count: dict[str, int]
#rtype: list[str]
"""
anagrams_list = []
if not letter_count:
return [""]
for word in words:
if not _within_letter_count(word, letter_count):
continue
new_letter_count = dict(letter_count)
for char in word:
new_letter_count[char] -= 1
# recursive function
var1 = ana_words(words[1:], new_letter_count)
sorted_word = ''.join(word)
for i in var1:
sorted_word = ''.join([word, i])
anagrams_list.append(sorted_word)
return anagrams_list
Words is a list of words from a file, and letter count is a dictionary of characters (already in lower case). the list of words in words is also in lowercase already.
Input: print ana_words('dormitory')
Output I'm getting:
['dirtyroom', 'dotoi', 'doori', 'dormitory', 'drytoori', 'itorod', 'ortoidry', 'rodtoi', 'roomidry', 'rootidry', 'torodi']
Output I want:
['dirty room', 'dormitory', 'room dirty']
Link to word list: https://1drv.ms/t/s!AlfWKzBlwHQKbPj9P_pyKdmPwpg

Without knowing your words list it is hard to tell why it is including the 'wrong' entries. Trying with just
words = ['room','dirty','dormitory']
Returns the correct entries.
if you are wanting spaces between the words you need to change
sorted_word = ''.join([word, i])
to
sorted_word = ' '.join([word, i])
(Note the added space)
Incidentally, if you are wanting to solve this problem more efficiently then using a 'trie' data structure to store words can help (https://en.wikipedia.org/wiki/Trie)

Question errors:
You are saying:
Words is a list of words from a file, and letter count is a dictionary of characters (already in lower case). the list of words in words is also in lowercase already.
But you are actually calling the function in a different way:
print ana_words('dormitory')
This is not right.
Checking if a dictionaries values are all 0:
if not letter_count: doesn't do what you expected. To check if a dictionary has all 0s you should do if not any(letter_count.values()): that first obtains the values, checks if any of them is different from 0 and then negates the answer.
Joining words:
str.join(arg1) method is not for joining 2 words, is for joining an iterable passed as arg1 by the string, in your case the string is an iterable of chars and you are joining by nothing so the result is the same word.
''.join('Hello')
>>> 'Hello'
The second time you use it the iterable is the list and it joins word with each of the elements of var1 that is actually a list of words so thats fine excluding the space you are missing here. The problem is you are not doing anything with sorted_words. You are just using the last time it appears. The anagram_list.append(sorted_word) should be inside the loop and the sorted_word = ''.join(word) should be deleted.
Other errors:
Aside from all this errors, you are never checking if the letter count gets to 0 to stop recursion.

Length of list formed from sentences of paragraph

I have following code:
def splitParagraphIntoSentences(paragraph):
import re
sentenceEnders = re.compile('[.!?]')
sentenceList = sentenceEnders.split(paragraph)
return sentenceList
sentenceList=splitParagraphIntoSentences (u"""I have a bicycle. I want the car.
""")
print len(sentenceList)
Python will return that the lenght of sentencelist is 3. Actually there are just two sentences. I know i t is so because the '.' at the end of second sentence. What is the best way to teach program count sentences in correct way without removing '.' from the end of second sentence?
Thank you

Instead of splitting, count the ends:
len(sentenceEnders.findall(paragraph))
Or subtract 1 to account for the empty line after the last sentence split:
len(splitParagraphIntoSentences(paragraph)) - 1
or return a filtered list, removing empty items:
return filter(None, sentenceList)
or, when using Python 3 (where filter() returns a generator):
return [s for s in sentenceList if s]

How do I calculate the number of times a word occurs in a sentence?

So I've been learning Python for some months now and was wondering how I would go about writing a function that will count the number of times a word occurs in a sentence. I would appreciate if someone could please give me a step-by-step method for doing this.

Quick answer:
def count_occurrences(word, sentence):
return sentence.lower().split().count(word)
'some string.split() will split the string on whitespace (spaces, tabs and linefeeds) into a list of word-ish things. Then ['some', 'string'].count(item) returns the number of times item occurs in the list.
That doesn't handle removing punctuation. You could do that using string.maketrans and str.translate.
# Make collection of chars to keep (don't translate them)
import string
keep = string.lowercase + string.digits + string.whitespace
table = string.maketrans(keep, keep)
delete = ''.join(set(string.printable) - set(keep))
def count_occurrences(word, sentence):
return sentence.lower().translate(table, delete).split().count(word)
The key here is that we've constructed the string delete so that it contains all the ascii characters except letters, numbers and spaces. Then str.translate in this case takes a translation table that doesn't change the string, but also a string of chars to strip out.

wilberforce has the quick, correct answer, and I'll give the long winded 'how to get to that conclusion' answer.
First, here are some tools to get you started, and some questions you need to ask yourself.
You need to read the section on Sequence Types, in the python docs, because it is your best friend for solving this problem. Seriously, read it. Once you have read that, you should have some ideas. For example you can take a long string and break it up using the split() function. To be explicit:
mystring = "This sentence is a simple sentence."
result = mystring.split()
print result
print "The total number of words is: " + str(len(result))
print "The word 'sentence' occurs: " + str(result.count("sentence"))
Takes the input string and splits it on any whitespace, and will give you:
["This", "sentence", "is", "a", "simple", "sentence."]
The total number of words is 6
The word 'sentence' occurs: 1
Now note here that you do have the period still at the end of the second 'sentence'. This is a problem because 'sentence' is not the same as 'sentence.'. If you are going to go over your list and count words, you need to make sure that the strings are identical. You may need to find and remove some punctuation.
A naieve approach to this might be:
no_period_string = mystring.replace(".", " ")
print no_period_string
To get me a period-less sentence:
"This sentence is a simple sentence"
You also need to decide if your input going to be just a single sentence, or maybe a paragraph of text. If you have many sentences in your input, you might want to find a way to break them up into individual sentences, and find the periods (or question marks, or exclamation marks, or other punctuation that ends a sentence). Once you find out where in the string the 'sentence terminator' is you could maybe split up the string at that point, or something like that.
You should give this a try yourself - hopefully I've peppered in enough hints to get you to look at some specific functions in the documentation.

Simplest way:
def count_occurrences(word, sentence):
return sentence.count(word)

text=input("Enter your sentence:")
print("'the' appears", text.count("the"),"times")
simplest way to do it

Problem with using count() method is that it not always gives the correct number of occurrence when there is overlapping, for example
print('banana'.count('ana'))
output
1
but 'ana' occurs twice in 'banana'
To solve this issue, i used
def total_occurrence(string,word):
count = 0
tempsting = string
while(word in tempsting):
count +=1
tempsting = tempsting[tempsting.index(word)+1:]
return count

You can do it like this:
def countWord(word):
numWord = 0
for i in range(1, len(word)-1):
if word[i-1:i+3] == 'word':
numWord += 1
print 'Number of times "word" occurs is:', numWord
then calling the string:
countWord('wordetcetcetcetcetcetcetcword')
will return: Number of times "word" occurs is: 2

def check_Search_WordCount(mySearchStr, mySentence):
len_mySentence = len(mySentence)
len_Sentence_without_Find_Word = len(mySentence.replace(mySearchStr,""))
len_Remaining_Sentence = len_mySentence - len_Sentence_without_Find_Word
count = len_Remaining_Sentence/len(mySearchStr)
return (int(count))

I assume that you just know about python string and for loop.
def count_occurences(s,word):
count = 0
for i in range(len(s)):
if s[i:i+len(word)] == word:
count += 1
return count
mystring = "This sentence is a simple sentence."
myword = "sentence"
print(count_occurences(mystring,myword))
explanation:
s[i:i+len(word)]: slicing the string s to extract a word having the same length with the word (argument)
count += 1 : increase the counter whenever matched.

Limit the number of sentences in a string

A beginner's Python question:
I have a string with x number of sentences. How to I extract first 2 sentences (may end with . or ? or !)

Ignoring considerations such as when a . constitutes the end of sentence:
import re
' '.join(re.split(r'(?<=[.?!])\s+', phrase, 2)[:-1])
EDIT: Another approach that just occurred to me is this:
re.match(r'(.*?[.?!](?:\s+.*?[.?!]){0,1})', phrase).group(1)
Notes:
Whereas the first solution lets you replace the 2 with some other number to choose a different number of sentences, in the second solution, you change the 1 in {0,1} to one less than the number of sentences you want to extract.
The second solution isn't quite as robust in handling, e.g., empty strings, or strings with no punctuation. It could be made so, but the regex would be even more complex than it is already, and I would favour the slightly less efficient first solution over an unreadable mess.

I solved it like this: Separating sentences, though a comment on that post also points to NLTK, though I don't know how to find the sentence segmenter on their site...

Here's how yo could do it:
str = "Sentence one? Sentence two. Sentence three? Sentence four. Sentence five."
sentences = str.split(".")
allSentences = []
for sentence in sentences
allSentences.extend(sentence.split("?"))
print allSentences[0:3]
There are probably better ways, I look forward to seeing them.

Here is a step by step explanation of how to disassemble, choose the first two sentences, and reassemble it. As noted by others, this does not take into account that not all dot/question/exclamation characters are really sentence separators.
import re
testline = "Sentence 1. Sentence 2? Sentence 3! Sentence 4. Sentence 5."
# split the first two sentences by the dot/question/exclamation.
sentences = re.split('([.?!])', testline, 2)
print "result of split: ", sentences
# toss everything else (the last item in the list)
firstTwo = sentences[:-1]
print firstTwo
# put the first two sentences back together
finalLine = ''.join(firstTwo)
print finalLine

Generator alternative using my utility function returning piece of string until any item in search sequence:
from itertools import islice
testline = "Sentence 1. Sentence 2? Sentence 3! Sentence 4. Sentence 5."
def multis(search_sequence,text,start=0):
""" multisearch by given search sequence values from text, starting from position start
yielding tuples of text before found item and found sequence item"""
x=''
for ch in text[start:]:
if ch in search_sequence:
if x: yield (x,ch)
else: yield ch
x=''
else:
x+=ch
else:
if x: yield x
# split the first two sentences by the dot/question/exclamation.
two_sentences = list(islice(multis('.?!',testline),2)) ## must save the result of generation
print "result of split: ", two_sentences
print '\n'.join(sentence.strip()+sep for sentence,sep in two_sentences)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.