Comparing a Downloaded String to a List in Python - python

I'm trying to create a sentiment analyser in Python that downloads text and analyses it against a list of negative and positive words. For every match within the text with a word in poswords.txt there should be a +1 score and for every match within the text in negwords.txt there should be a -1 score, the overall score for the text will be the sentiment score. This is how I have tried to do it but I keep just getting a score of 0.
The answer below does not seem to work, I keep getting a sentiment score of 0.
split = text.split()
poswords = open('poswords.txt','r')
for word in split:
if word in poswords:
sentimentScore +=1
poswords.close()
negwords = open('negwords.txt','r')
for word in split:
if word in negwords:
sentimentScore -=1
negwords.close()

poswords and negwords in your code are just file handles, you are not reading the words in those files.
Here:
split = text.split()
poswords = open('poswords.txt','r')
pos = []
for line in poswords:
pos.append(line.strip())
for word in split:
if word in pos:
sentimentScore +=1
poswords.close()
negwords = open('negwords.txt','r')
neg = []
for line in negwords:
neg.append(line.strip())
for word in split:
if word in neg:
sentimentScore -=1
negwords.close()
If the files are huge, the above is not a optimal solution. Create a dictionary for positive and negative words:
input_text = text.split() # avoid using split as a variable name, since it is a keyword
poswords = open('poswords.txt','r')
pos_dict = defaultdict(int)
for line in poswords:
pos_dict[line.strip()] += 1
poswords.close()
negwords = open('negwords.txt','r')
neg_dict = defaultdict(int)
for line in negwords:
neg_dict[line.strip()] += 1
negwords.close()
sentiment_score = 0
for word in input_text:
if word in pos_dict:
sentiment_score += 1
elif word in neg_dict:
sentiment_score -=1

Related

How to read words with ' in them?

I got this code that prints the most common words of a txt file. I want it to print and count words with ' in them. How can I do this?
words = open(input('Enter the name of the file: ')).read().lower().split()
number_of_words = int(input('Enter how many top words you want to see: '))
uniques = []
stop_words = ["a", "an", "and", "in", "is", "the"]
for word in words:
check_special = False
if word.isalnum():
check_special = True
if word not in uniques and word not in stop_words and check_special:
uniques.append(word)
counts = []
for unique in uniques:
count = 0
for word in words:
if word == unique:
count += 1
counts.append((count, unique))
counts.sort()
counts.reverse()
counts_dict = {count: [] for count, word in counts}
for count, word in counts:
counts_dict[count].append(word)
count_num_word = 0
for count in counts_dict:
if count_num_word >= number_of_words:
break
print('The following words appeared %d times each: %s' % (count, ', '.join(sorted(counts_dict[count]))))
count_num_word += 1
Write your own function that checks whether every character in a string is alphanumeric or quote, and use that instead of word.isalnum().
def alnum_or_quote(s):
return all(c == "'" or c.isalnum() for c in s)
Then replace if word.isalnum(): with if alnum_or_quote(word):

How to count consecutive vowels in a text file in Python

I'm new to programming in python and I have a challenge that I've been attempting for a few days now but I can't seem to figure out what is wrong with my code. My code take a text file and tells me how many sentences, words, and syllables are in the text. I have everything running fine except my code is counting a syllable containing consecutive vowels as multiple syllables and I can't seem to figure out how to fix it. Any help at all would be appreciated.
For example if the file has this:
"Or to take arms against a sea of troubles, And by opposing end them? To die: to sleep."
It should come out as saying the text has 21 syllables but the program tells me it has 26 because it counts the consecutive vowels more than once.
fileName = input("Enter the file name: ")
inputFile = open(fileName, 'r')
text = inputFile.read()
# Count the sentences
sentences = text.count('.') + text.count('?') + \
text.count(':') + text.count(';') + \
text.count('!')
# Count the words
words = len(text.split())
# Count the syllables
syllables = 0
vowels = "aeiouAEIOU"
for word in text.split():
for vowel in vowels:
syllables += word.count(vowel)
for ending in ['es', 'ed', 'e']:
if word.endswith(ending):
syllables -= 1
if word.endswith('le'):
syllables += 1
# Compute the Flesch Index and Grade Level
index = 206.835 - 1.015 * (words / sentences) - \
84.6 * (syllables / words)
level = int(round(0.39 * (words / sentences) + 11.8 * \
(syllables / words) - 15.59))
# Output the results
print("The Flesch Index is", index)
print("The Grade Level Equivalent is", level)
print(sentences, "sentences")
print(words, "words")
print(syllables, "syllables")
Instead of counting the number of occurrences of each vowel for each word, we can iterate through the characters of the word, and only count a vowel if it isn't preceded by another vowel:
# Count the syllables
syllables = 0
vowels = "aeiou"
for word in (x.lower() for x in text.split()):
syllables += word[0] in vowels
for i in range(1, len(word)):
syllables += word[i] in vowels and word[i - 1] not in vowels
for ending in {'es', 'ed', 'e'}:
if word.endswith(ending):
syllables -= 1
if word.endswith('le'):
syllables += 1

Counting syllables in a list of strings Python without using RE

I have to count the number of syllables in a text file. My problem is that I don't know how to iterate each character of each string. My idea was to check if a letter is a vowel, and if the following letter is not a vowel, increase the count by 1. But I can't increase "letter". I've also tried to use the "range" method, but I have problem also with that. What can I try? Thank you.
PS: I can only use Python built-in methods.
txt = ['countingwords', 'house', 'plant', 'alpha', 'syllables']
This is my code so far.
def syllables(text_file):
count = 0
vowels = ['a','e','i','o','u','y']
with open(text_file, 'r') as f:
txt = f.readlines()
txt = [line.replace(' ','') for line in txt]
txt = [line.replace(',','') for line in txt]
txt = [y.lower() for y in txt]
for word in txt:
for letter in word:
if letter is in vowel and [letter + 1] is not in vowel:
count += 1
You might try this:
lines = ["You should count me too"]
count = 0
vowels = "aeiouy"
for line in lines:
for word in line.lower().split(" "):
for i in range(len(word)):
if word[i] in vowels and (i == 0 or word[i-1] not in vowels):
count +=1
print(count) # -> 5

How to I search a string for words with no spaces

I am trying to find out how to read a string for names with no spaces
ex. robbybobby I want it to search the string and separate them into there own groups
def wordcount(filename, listwords):
try:
file = open(filename, "r")
read = file.readline()
file.close()
for word in listwords:
lower = word.lower()
count = 0
for letter in read:
line = letter.split()
for each in line:
line2 = each.lower()
line2 = line2.strip(".")
if lower == line2:
count += 1
print(lower, ":", count)
except FileExistsError:
print("no")
wordcount("teststring.txt", ["robby"])
with this code it will only find robby if there is a space afterwards
There are several ways to do this. I am posting 2 suggestions so you can understand and improve :)
Solution 1:
def count_occurrences(line, word):
# Normalize vars
word = word.lower()
line = line.lower()
# Initialize vars
start_index = 0
total_count = 0
word_len = len(word)
# Count ignoring empty spaces
while start_index >= 0:
# Ignore if not found
if word not in line[start_index:]:
break
# Search for the word starting from <start_index> index
start_index = line.index(word, start_index)
# Increment if found
if start_index >= 0:
start_index += word_len
total_count += 1
# Return total occurrences
return total_count
print(count_occurrences('stackoverflow overflow overflowABC over', 'overflow'))
Output: 3
Solution 2:
If you want to go for a regex, this links may be usefull:
Count the occurrence of a word in a txt file in python
Exact match for words
IIUC you want to count occurrences of a word irrespective to whether it occurs as a part of other word, or as a word on its own.
You can use simple regex for that:
import re
def count_line(dict, line, words):
for word in words:
dict[word]=len(re.findall(word, line, re.IGNORECASE))+dict.get(word, 0)
return dict
allLines="""
bobby robbubobby yo xyz\n
robson bobbyrobin abc\n
xyz bob amy oo\n
amybobson robson
"""
print(allLines)
words=["amy", "robby", "bobby", "jack"]
res={}
for line in allLines.split("\n"):
res=count_line(res, line, words)
print(res)
Output:
bobby robbubobby yo xyz
robson bobbyrobin abc
xyz bob amy oo
amybobson robson
{'amy': 2, 'robby': 0, 'bobby': 3, 'jack': 0}

Sorting and counting words from a text file

I'm new to programming and stuck on my current program. I have to read in a story from a file, sort the words, and count the number of occurrences per word. It will count the words, but it won't sort the words, remove the punctuation, or duplicate words. I'm lost to why its not working. Any advice would be helpful.
ifile = open("Story.txt",'r')
fileout = open("WordsKAI.txt",'w')
lines = ifile.readlines()
wordlist = []
countlist = []
for line in lines:
wordlist.append(line)
line = line.split()
# line.lower()
for word in line:
word = word.strip(". , ! ? : ")
# word = list(word)
wordlist.sort()
sorted(wordlist)
countlist.append(word)
print(word,countlist.count(word))
There main problem in your code is at the line (line 9):
wordlist.append(line)
You are appending the whole line into the wordlist, I doubt that is what you want. As you do this, the word added is not .strip()ed before it is added to wordlist.
What you have to do is to add the word only after you have strip()ed it and make sure you only do that after you checked that there are not other same words (no duplicates):
ifile = open("Story.txt",'r')
lines = ifile.readlines()
wordlist = []
countlist = []
for line in lines:
# Get all the words in the current line
words = line.split()
for word in words:
# Perform whatever manipulation to the word here
# Remove any punctuation from the word
word = word.strip(".,!?:;'\"")
# Make the word lowercase
word = word.lower()
# Add the word into wordlist only if it is not in wordlist
if word not in wordlist:
wordlist.append(word)
# Add the word to countlist so that it can be counted later
countlist.append(word)
# Sort the wordlist
wordlist.sort()
# Print the wordlist
for word in wordlist:
print(word, countlist.count(word))
Another way you could do this is using a dictionary, storing the word as they key and the number of occurences as the value:
ifile = open("Story.txt", "r")
lines = ifile.readlines()
word_dict = {}
for line in lines:
# Get all the words in the current line
words = line.split()
for word in words:
# Perform whatever manipulation to the word here
# Remove any punctuation from the word
word = word.strip(".,!?:;'\"")
# Make the word lowercase
word = word.lower()
# Add the word to word_dict
word_dict[word] = word_dict.get(word, 0) + 1
# Create a wordlist to display the words sorted
word_list = list(word_dict.keys())
word_list.sort()
for word in word_list:
print(word, word_dict[word])
You have to provide a key function to the sorting methods.
Try this
r = sorted(wordlist, key=str.lower)
punctuation = ".,!?: "
counts = {}
with open("Story.txt",'r') as infile:
for line in infile:
for word in line.split():
for p in punctuation:
word = word.strip(p)
if word not in counts:
counts[word] = 0
counts[word] += 1
with open("WordsKAI.txt",'w') as outfile:
for word in sorted(counts): # if you want to sort by counts instead, use sorted(counts, key=counts.get)
outfile.write("{}: {}\n".format(word, counts[word]))

Categories