Count Non-Substring Overlapping - python

Write a program allows user to input a string. End the program is printing out:
a. How many words that repeated itself.
For example: 'This is Jake and Jake is 24 years old'
The console must print out '4' because 'is' and 'Jake' are the word that repeated
b. Remove all the repeated word. Print out the rest: 'This and 24 years old'
c. Print out which repeated words have been removed
So the idea is the user can type whatever they want, 'This is Jake and Jake is 24 years old' is just an example. The hardest part is how can console check all the repeated words without a substring?

Does this work?
Here is what I am doing.
First I grab the user input, then I convert the string to a list splitting on spaces between words. Then I count the occurence of the words, if the wordcount is greater than 1, I add it to a dictionary , where the key is the word and the value is the count of the words that exist in the string.
After printing out the repeated words, I remove the strings that find a mention in the dictionary.
Note - This code can be improved so much but I purposely did it such a manner to make it easier to understand. You should not be using this code if its a production system.
string = input("Enter your string : ")
items = {}
words = string.split(" ")
for word in words:
wordCount = words.count(word)
if(wordCount > 1):
items[word] = wordCount
print("There are {0} repeated words".format(len(items)))
updateString = ""
for item in items:
updateString =string.replace(item,"")
print(updateString)
print(items)
Updated
string = input("Enter your string : ")
items = {}
words = string.split(" ")
for word in words:
wordCount = words.count(word)
if(wordCount > 1):
items[word] = wordCount
print("There are {0} repeated words".format(len(items)))
for item in items:
string = string.replace(" {0} ".format(item)," ")
print(string)
print(items)

Related

Find the occurrence of a particular word from a file in python [duplicate]

I'm trying to find the number of occurrences of a word in a string.
word = "dog"
str1 = "the dogs barked"
I used the following to count the occurrences:
count = str1.count(word)
The issue is I want an exact match. So the count for this sentence would be 0.
Is that possible?
If you're going for efficiency:
import re
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(word), input_string))
This doesn't need to create any intermediate lists (unlike split()) and thus will work efficiently for large input_string values.
It also has the benefit of working correctly with punctuation - it will properly return 1 as the count for the phrase "Mike saw a dog." (whereas an argumentless split() would not). It uses the \b regex flag, which matches on word boundaries (transitions between \w a.k.a [a-zA-Z0-9_] and anything else).
If you need to worry about languages beyond the ASCII character set, you may need to adjust the regex to properly match non-word characters in those languages, but for many applications this would be an overcomplication, and in many other cases setting the unicode and/or locale flags for the regex would suffice.
You can use str.split() to convert the sentence to a list of words:
a = 'the dogs barked'.split()
This will create the list:
['the', 'dogs', 'barked']
You can then count the number of exact occurrences using list.count():
a.count('dog') # 0
a.count('dogs') # 1
If it needs to work with punctuation, you can use regular expressions. For example:
import re
a = re.split(r'\W', 'the dogs barked.')
a.count('dogs') # 1
Use a list comprehension:
>>> word = "dog"
>>> str1 = "the dogs barked"
>>> sum(i == word for word in str1.split())
0
>>> word = 'dog'
>>> str1 = 'the dog barked'
>>> sum(i == word for word in str1.split())
1
split() returns a list of all the words in a sentence. Then we use a list comprehension to count how many times the word appears in a sentence.
import re
word = "dog"
str = "the dogs barked"
print len(re.findall(word, str))
You need to split the sentence into words. For you example you can do that with just
words = str1.split()
But for real word usage you need something more advanced that also handles punctuation. For most western languages you can get away with replacing all punctuation with spaces before doing str1.split().
This will work for English as well in simple cases, but note that "I'm" will be split into two words: "I" and "m", and it should in fact be split into "I" and "am". But this may be overkill for this application.
For other cases such as Asian language, or actual real world usage of English, you might want to use a library that does the word splitting for you.
Then you have a list of words, and you can do
count = words.count(word)
#counting the number of words in the text
def count_word(text,word):
"""
Function that takes the text and split it into word
and counts the number of occurence of that word
input: text and word
output: number of times the word appears
"""
answer = text.split(" ")
count = 0
for occurence in answer:
if word == occurence:
count = count + 1
return count
sentence = "To be a programmer you need to have a sharp thinking brain"
word_count = "a"
print(sentence.split(" "))
print(count_word(sentence,word_count))
#output
>>> %Run test.py
['To', 'be', 'a', 'programmer', 'you', 'need', 'to', 'have', 'a', 'sharp', 'thinking', 'brain']
2
>>>
Create the function that takes two inputs which are sentence of text and word.
Split the text of a sentence into the segment of words in a list,
Then check whether the word to be counted exist in the segmented words and count the occurrence as a return of the function.
If you don't need RegularExpression then you can do this neat trick.
word = " is " #Add space at trailing and leading sides.
input_string = "This is some random text and this is str which is mutable"
print("Word count : ",input_string.count(word))
Output -- Word count : 3
Below is a simple example where we can replace the desired word with the new word and also for desired number of occurrences:
import string
def censor(text, word):<br>
newString = text.replace(word,"+" * len(word),text.count(word))
print newString
print censor("hey hey hey","hey")
output will be : +++ +++ +++
The first Parameter in function is search_string.
Second one is new_string which is going to replace your search_string.
Third and last is number of occurrences .
Let us consider the example s = "suvotisuvojitsuvo".
If you want to count no of distinct count "suvo" and "suvojit" then you use the count() method... count distinct i.e) you don't count the suvojit to suvo.. only count the lonely "suvo".
suvocount = s.count("suvo") // #output: 3
suvojitcount = s.count("suvojit") //# output : 1
Then find the lonely suvo count you have to negate from the suvojit count.
lonelysuvo = suvocount - suvojicount //# output: 3-1 -> 2
This would be my solution with help of the comments:
word = str(input("type the french word chiens in english:"))
str1 = "dogs"
times = int(str1.count(word))
if times >= 1:
print ("dogs is correct")
else:
print ("your wrong")
If you want to find the exact number of occurrence of the specific word in the sting and you don't want to use any count function, then you can use the following method.
text = input("Please enter the statement you want to check: ")
word = input("Please enter the word you want to check in the statement: ")
# n is the starting point to find the word, and it's 0 cause you want to start from the very beginning of the string.
n = 0
# position_word is the starting Index of the word in the string
position_word = 0
num_occurrence = 0
if word.upper() in text.upper():
while position_word != -1:
position_word = text.upper().find(word.upper(), n, len(text))
# increasing the value of the stating point for search to find the next word
n = (position_word + 1)
# statement.find("word", start, end) returns -1 if the word is not present in the given statement.
if position_word != -1:
num_occurrence += 1
print (f"{word.title()} is present {num_occurrence} times in the provided statement.")
else:
print (f"{word.title()} is not present in the provided statement.")
This is simple python program using split function
str = 'apple mango apple orange orange apple guava orange'
print("\n My string ==> "+ str +"\n")
str = str.split()
str2=[]
for i in str:
if i not in str2:
str2.append(i)
print( i,str.count(i))
I have just started out to learn coding in general and I do not know any libraries as such.
s = "the dogs barked"
value = 0
x = 0
y=3
for alphabet in s:
if (s[x:y]) == "dog":
value = value+1
x+=1
y+=1
print ("number of dog in the sentence is : ", value)
Another way to do this is by tokenizing string (breaking into words)
Use Counter from collection module of Python Standard Library
from collections import Counter
str1 = "the dogs barked"
stringTokenDict = { key : value for key, value in Counter(str1.split()).items() }
print(stringTokenDict['dogs'])
#This dictionary contains all words & their respective count

how do I write the positions to a file

The full task that I have been assigned from school is:
Develop a program that identifies individual words in a sentence, stores these in a list and replaces each word in the original sentence with the position of that word in the list.
For example, the sentence
ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY
Contains the words ASK, NOT, WHAT, YOUR, COUNTRY, CAN, DO, FOR, YOU The sentence can be recreated from the positions of these words in this list using the sequence
1,2,3,4,5,6,7,8,9,1,3,9,6,7,8,4,5
Save the list of words and the positions of these words in the sentence as separate files or as a single file.
Analyse the requirements for this system and design, develop, test and evaluate a program to:
identify the individual words in a sentence and store them in a list
create a list of positions for words in that list
save these lists as a single file or as separate files.
Here is my current code:
sentencelist=[] #variable list for the sentences
word=[] #variable list for the words
positionofword=[]
words= open("words.txt","w")
position= open("position.txt","w")
question=input("Do you want to enter a sentence? Answers are Y or N.").upper()
if question=="Y":
sentence=input("Please enter a sentance").upper() #sets to uppercase so it's easier to read
sentencetext=sentence.isalpha or sentence.isspace()
while sentencetext==False: #if letters have not been entered
print("Only letters are allowed") #error message
sentence=input("Please enter a sentence").upper() #asks the question again
sentencetext=sentence.isalpha #checks if letters have been entered this time
elif question=="N":
print("The program will now close")
else:
print("please enter a letter")
sentence_word = sentence.split(' ')
for (i, check) in enumerate(word): #orders the words
print(sentence)
sentence_words = sentence.split(' ')
for (i, check) in enumerate(sentence_words): #orders the words
if (check == word):
positionofwords=print(i+1)
break
else:
print("This didn't work")
words.write(str(sentence_words) + " ")
position.write(str(positionofwords) + " ")
words.close()
position.close()
This doesn't work. The error I get is:
NameError: name 'positionofwords' is not defined
What I would like to know is why positionofwords=print(i+1) does not work in this case and what I would do instead.
>>> s = 'ASK NOT WHAT YOUR COUNTRY CAN DO FOR YOU ASK WHAT YOU CAN DO FOR YOUR COUNTRY'
>>> words = s.split()
>>> for w in words:
print(words.index(w) + 1)
Output:
1
2
3
4
5
6
7
8
9
1
3
9
6
7
8
4
5

How to print unique words from an inputted string

I have some code that I intend to print out all the unique words in a string that will be inputted by a user:
str1 = input("Please enter a sentence: ")
print("The words in that sentence are: ", str1.split())
unique = set(str1)
print("Here are the unique words in that sentence: ",unique)
I can get it to print out the unique letters, but not the unique words.
String.split(' ') takes a string and creates a list of elements divided by a space (' ').
set(foo) takes a collection foo and returns a set collection of only the distinct elements in foo.
What you want is this: unique_words = set(str1.split(' '))
The default value for the split separator is whitespace. I wanted to show that you can supply your own value to this method.
Also, you can use:
from collections import Counter
str1 = input("Please enter a sentence: ")
words = str1.split(' ')
c = Counter(words)
unique = [w for w in words if c[w] == 1]
print("Unique words: ", unique)
Another way of doing it:
user_input = input("Input: ").split(' ')
duplicates = []
for i in user_input:
if i not in duplicates:
duplicates.append(i)
print(duplicates)

Replacing and Storing

So, here is what I got:
def getSentence():
sentence = input("What is your sentence? ").upper()
if sentence == "":
print("You haven't entered a sentence. Please re-enter a sentence.")
getSentence()
elif sentence.isdigit():
print("You have entered numbers. Please re-enter a sentence.")
getSentence()
else:
import string
for c in string.punctuation:
sentence = sentence.replace(c,"")
return sentence
def list(sentence):
words = []
for word in sentence.split():
if not word in words:
words.append(word)
print(words)
def replace(words,sentence):
position = []
for word in sentence:
if word == words[word]:
position.append(i+1)
print(position)
sentence = getSentence()
list = list(sentence)
replace = replace(words,sentence)
I have only managed to get this far, my full intention is to take the sentence, seperate into words, change each word into a number e.g.
words = ["Hello","world","world","said","hello"]
And make it so that each word has a number:
So lets say that "hello" has the value of 1, the sentence would be '1 world world said 1'
And if world was 2, it would be '1 2 2 said 1'
Finally, if "said" was 3, it would be '1 2 2 1 2'
Any help would be greatly appreciated, I will then develop this code so that the sentence and such is stored into a file using file.write() and file.read() etc
Thanks
If you want just the position in which each word is you can do
positions = map(words.index,words)
Also, NEVER use built-in function names for your variables or functions. And also never call your variables the same as your functions (replace = replace(...)), functions are objects
Edit: In python 3 you must convert the iterator that map returns to a list
positions = list(map(words.index, words))
Or use a comprehension list
positions = [words.index(w) for w in words]
Does it matter what order the words are turned into numbers? Is Hello and hello two words or one? Why not something like:
import string
sentence = input() # user input here
sentence.translate(str.maketrans('', '', string.punctuation))
# strip out punctuation
replacements = {ch: str(idx) for idx, ch in enumerate(set(sentence.split()))}
# builds {"hello": 0, "world": 1, "said": 2} or etc
result = ' '.join(replacements.get(word, word) for word in sentence.split())
# join back with the replacements
Another idea (although don't think it's better than the rest), use dictionaries:
dictionary = dict()
for word in words:
if word not in dictionary:
dictionary[word] = len(dictionary)+1
Also, on your code, when you're calling "getSentence" inside "getSentence", you should return its return value:
if sentence == "":
print("You haven't entered a sentence. Please re-enter a sentence.")
return getSentence()
elif sentence.isdigit():
print("You have entered numbers. Please re-enter a sentence.")
return getSentence()
else:
...

Accessing certain words in an split list

I am trying to create a program in python that takes a sentence from a user and jumbles the middle letters of said word, but keeping the other letters intact...Right now I have code that will rearrange all the user input's and just forgets about the spaces...I'll let my code speak for myself.. IT works fine for a single word input, I guess I will just summarize it...
I need to randomize each word the user enters keeping the other words intact afterwards..
import random
words = input("Enter a word or sentence") #Gets user input
words.split()
for i in list(words.split()): #Runs the code for how many words there are
first_letter = words[0] #Takes the first letter out and defines it
last_letter = words[-1] #Takes the last letter out and defines it
letters = list(words[1:-1]) #Takes the rest and puts them into a list
random.shuffle(letters) #shuffles the list above
middle_letters = "".join(letters) #Joins the shuffled list
final_word_uncombined = (first_letter, middle_letters, last_letter) #Puts final word all back in place as a list
final_word = "".join(final_word_uncombined) #Puts the list back together again
print(final_word) #Prints out the final word all back together again
Your code is almost right. Corrected version would be like this:
import random
words = raw_input("Enter a word or sentence: ")
jumbled = []
for word in words.split(): #Runs the code for how many words there are
if len(word) > 2: # Only need to change long words
first_letter = word[0] #Takes the first letter out and defines it
last_letter = word[-1] #Takes the last letter out and defines it
letters = list(word[1:-1]) #Takes the rest and puts them into a list
random.shuffle(letters) #shuffles the list above
middle_letters = "".join(letters) #Joins the shuffled list
word = ''.join([first_letter, middle_letters, last_letter])
jumbled.append(word)
jumbled_string = ' '.join(jumbled)
print jumbled_string
So I read this question, during lunch at the apartment, then I had to wade through traffic. Anyways here is my one line contribution. Seriously alexeys' answer is where it's at.
sentence = input("Enter a word or sentence")
print " ".join([word[0] + ''.join(random.sample(list(word[1:-1]), len(list(word[1:-1])))) + word[-1] for word in sentence.split()])
If i understand your question correctly it looks like you are on track, you just have to extend this for every word
randomized_words = []
for word in words.split():
#perform your word jumbling
radomized_words.append(jumbled_word)
print ' '.join(randomized_words)
This creates a separate jumbled word list. Each word in the users word input is jumbled and added to the list to retain order. At the end, the jumbled words list is printed. Each word is in the same order as entered by the user but the letters are jumbled.

Categories