Counting Occurences of words in file

Counting Occurences of words in file - python

Just a preface, I have read--far too many--of the posts here about the same topic, and none of them quite cover the specific guidelines I'm under. I'm supposed to create an algorithm that counts the occurrence of each word in a text file, and display each as such:
"The: 4
Jump: 2
Fox: 6".
The terms I'm under is to use the skills we learned in our beginner python class, which means we cannot use dictionary, counters, sets or lists. (basically anything that would help shorten our code, tbh). I'm not the best at python so I've been struggling... pretty hard, to say the least. The closest I've gotten was scrabbling my old notes together from my previous class and finding a demo code that I reformatted.
wordsinlist = "words.txt"
word=input("Enter word to be searched:")
count = 0
with open("words.txt", 'r') as wordlist:
for line in wordlist:
words = line.split()
for i in words:
if(i==word):
count=count+1
print("Occurrences of the word:")
print(count)
The issue with this is that I need my code to display all of the words and their occurences at once, with no search input. There's definitely a way to do this, but I'm not the sharpest tool in the shed, and I've been going at it for like 5 hours now haha.
It definitely needs to look a little closer to this:
#Output
The: 112
History: 29
Learning: 25
Any help or hints are much appreciated! Thank you in advance! I know its a dumb question, these online classes are really frustrating.

without lists (or similar) I think is impossible...probably you're allowed to use lists , that is basic python!!
If you need to count the occurrance of all words, you don't need to insert them with input method, right?
So this is one simple solution:
with open("words.txt", 'r') as fp:
lines = fp.readlines()
lines_1 = [element.strip() for element in lines]
lines_2 = list(set(lines_1))
for w in lines_2:
for l in lines_1:
if(l==w):
count=count+1
print("Occurrences of {} : {}".format(w,count))
count = 0

Related

is there a way to print a specific amount of characters from multiple lists? (Word generator)

new to coding here and am starting out by attempting to create a simple phrase generator.
(I apologize if my questions aren't formatted properly, but I will try my best.)
I have created a small phrase generator, but I am trying to only get the phrases that are less than 35 characters. I also am trying not to create cut off sentences. So that brings me to my question, is it possible to retrieve only the sentences/phrases that are 35 characters and less from a set of different lists?
here is my code
import csv
from random import randint
other = ["leave","place","set","rest","prop on","lean","lay","stow","sit","set"]
names=["front","back"]
verbs=["door,", "side,","porch,","steps","stairs","stairway","staircase","entry","stoop"]
nouns=["Thanks a ton", "Thanks a million", "forever indebted", "please thanks","super great","appreciated",
"thank you","deep gratitude","means the world","TYSM","Congrats Champ","Keep on going","Never quit","Believe 4ever"
,"you did it","always believe","love persists","frenz forever","pat on back","kudos bro","mad thanks","best ever","gift for her",
"gift for cousin","u deserve it","keep it real","love u girl","u make my wrld","thankful","best wishes","stay warm","stay cool","2 my bestie",]```
while True:
phrase_amount = input("How many phrases would you like to create?")
for i in range(int(phrase_amount)):
print((other[randint(0,len(other)-1)]+" "+names[randint(0,len(names)-1)]+" "+verbs[randint(0,len(verbs)-1)]+" "+nouns[randint(0,len(nouns)-1)]))
Secondly, I am doing something wrong when writing to csv. The output is printing in multiple rows and columns, and I am unsure what's going on here. Any help would be much appreciated!
csvname = f"{phrase_amount}"
with open (f'PhrasesbyTrill{csvname}.csv','w', newline='') as file:
myfile = csv.writer(file)
myfile.writerow(["Phrases"])
for i in range(int(phrase_amount)):
myfile.writerow((other[randint(0,len(other)-1)]+" "+names[randint(0,len(names)-1)]+" "+verbs[randint(0,len(verbs)-1)]+" "+nouns[randint(0,len(nouns)-1)]))

Just save the phrase on a variable, check the length and print it. Something like:
for i in range(int(phrase_amount)):
phrase = (other[randint(0,len(other)-1)]+" "+names[randint(0,len(names)-1)]+" "+verbs[randint(0,len(verbs)-1)]+" "+nouns[randint(0,len(nouns)-1)])
if len(phrase)>35:
print(phrase)

Counting how many times a string appears in a CSV file

I have a piece of code what is supposed to tell me how many times a word occurs in a CSV file. Note: the file is pretty large (2 years of text messages)
This is my code:
key_word1 = 'Exmple_word1'
key_word2 = 'Example_word2'
counter = 0
with open('PATH_TO_FILE.csv',encoding='UTF-8') as a:
for line in a:
if (key_word1 or key_word2) in line:
counter = counter + 1
print(counter)
There are two words because I did not know how to make it non-case sensitive.
To test it I used the find function in word on the whole file (using only one of the words as I was able to do a non-case sensitive search there) and I received more than double of what my code has calculated.
At first I did use the value_counts() function BUT I received different values for the same word (searching Exmple_word1 appeared 32 and 56 times and 2 times and so on. I kind of got stuck there for a while but it got me thinking. I use two keyboards on my phone which I change regularly - could it be that the same words could actually be different and that would explain why I am getting these results?
Also, I pretty much checked all sources regarding this matter and I found different approaches that did not actually do what I want them to do. ( the value_counts() method for example)
If that is the case, how can I fix this?

Notice some mistakes in your code:
key_word1 or key_word2 - it's "lazy", meaning if the left part - "key_word1" evaluated to True, it won't even look at key_word2. The will cause checking only if key_word1 appeared in the line.
An example to emphesize:
w1 = 'word1'
w2 = 'word2'
s = 'bla word2'
(w1 or w2) in s
>> False
(w2 or w1) in s
>> True
2. Reading csv file: I recommend using csv package (just import it), something like:
import csv
with open('PATH_TO_FILE.csv') as f:
for line in csv.reader(f):
# do you logic here
Case sensitivity - don't work hard, you probably can lower case the line you read, just to not hold 2 words..
guess the solution you are looking for should look something like:
import csv
word_to_search = 'donald'
with open('PATH_TO_FILE.csv', encoding='UTF-8') as f:
for line in csv.reader(f):
if any(word_to_search in l for l in map(str.lower, line)):
counter += 1
Running on input:
bla,some other bla,donald rocks
make,who,great
again, donald is here, hura
will result:
counter=2

Trying to read text file and count words within defined groups

I'm a novice Python user. I'm trying to create a program that reads a text file and searches that text for certain words that are grouped (that I predefine by reading from csv). For example, if I wanted to create my own definition for "positive" containing the words "excited", "happy", and "optimistic", the csv would contain those terms. I know the below is messy - the txt file I am reading from contains 7 occurrences of the three "positive" tester words I read from the csv, yet the results print out to be 25. I think it's returning character count, not word count. Code:
import csv
import string
import re
from collections import Counter
remove = dict.fromkeys(map(ord, '\n' + string.punctuation))
# Read the .txt file to analyze.
with open("test.txt", "r") as f:
textanalysis = f.read()
textresult = textanalysis.lower().translate(remove).split()
# Read the CSV list of terms.
with open("positivetest.csv", "r") as senti_file:
reader = csv.reader(senti_file)
positivelist = list(reader)
# Convert term list into flat chain.
from itertools import chain
newposlist = list(chain.from_iterable(positivelist))
# Convert chain list into string.
posstring = ' '.join(str(e) for e in newposlist)
posstring2 = posstring.split(' ')
posstring3 = ', '.join('"{}"'.format(word) for word in posstring2)
# Count number of words as defined in list category
def positive(str):
counts = dict()
for word in posstring3:
if word in counts:
counts[word] += 1
else:
counts[word] = 1
total = sum (counts.values())
return total
# Print result; will write to CSV eventually
print ("Positive: ", positive(textresult))

I'm a beginner as well but I stumbled upon a process that might help. After you read in the file, split the text at every space, tab, and newline. In your case, I would keep all the words lowercase and include punctuation in your split call. Save this as an array and then parse it with some sort of loop to get the number of instances of each 'positive,' or other, word.
Look at this, specifically the "train" function:
https://github.com/G3Kappa/Adjustable-Markov-Chains/blob/master/markovchain.py
Also, this link, ignore the JSON stuff at the beginning, the article talks about sentiment analysis:
https://dev.to/rodolfoferro/sentiment-analysis-on-trumpss-tweets-using-python-
Same applies with this link:
http://adilmoujahid.com/posts/2014/07/twitter-analytics/
Good luck!

I looked at your code and passed through some of my own as a sample.
I have 2 idea's for you, based on what I think you may want.
First Assumption: You want a basic sentiment count?
Getting to 'textresult' is great. Then you did the same with the 'positive lexicon' - to [positivelist] which I thought would be the perfect action? Then you converted [positivelist] to essentially a big sentence.
Would you not just:
1. Pass a 'stop_words' list through [textresult]
2. merge the two dataframes [textresult (less stopwords) and positivelist] for common words - as in an 'inner join'
3. Then basically do your term frequency
4. It is much easier to aggregate the score then
Second assumption: you are focusing on "excited", "happy", and "optimistic"
and you are trying to isolate text themes into those 3 categories?
1. again stop at [textresult]
2. download the 'nrc' and/or 'syuzhet' emotional valence dictionaries
They breakdown emotive words by 8 emotional groups
So if you only want 3 of the 8 emotive groups (subset)
3. Process it like you did to get [positivelist]
4. do another join
Sorry, this is a bit hashed up, but if I was anywhere near what you were thinking let me know and we can make contact.
Second apology, Im also a novice python user, I am adapting what I use in R to python in the above (its not subtle either :) )

importing random words from a file without duplicates Python

I'm attempting to create a program which selects 10 words from a text file which contains 10+ words. For the purpose of the program when importing these 10 words from the text file, I must not import the same words twice! Currently I'm utilising a list for this however the same words seem to appear. I have some knowledge of sets and know they cannot hold the same value twice. As of now I'm clueless on how to solve this any help would be much appreciated. THANKS!
please find relevant code below! -(p.s. FileSelection is basically open file dialog)
def GameStage03_E():
global WordList
if WrdCount >= 10:
WordList = []
for n in range(0,10):
FileLines = open(FileSelection).read().splitlines()
RandWrd = random.choice(FileLines)
WordList.append(RandWrd)
SelectButton.destroy()
GameStage01Button.destroy()
GameStage04_E()
elif WrdCount <= 10:
tkinter.messagebox.showinfo("ERROR", " Insufficient Amount Of Words Within Your Text File! ")

Make WordList a set:
WordList = set()
Then update that set instead of appending:
WordList.update(set([RandWrd]))
Of course WordList would be a bad name for a set.
There are a few other problems though:
Don't use uppercase names for variables and functions (follow PEP8)
What happens if you draw the same word twice in your loop? There is no guarantee that WordList will contain 10 items after the loop completes, if words may appear multiple times.
The latter might be addressed by changing your loop to:
while len(WordList) < 10:
FileLines = open(FileSelection).read().splitlines()
RandWrd = random.choice(FileLines)
WordList.update(set([RandWrd]))
You would have to account for the case that there don't exist 10 distinct words after all, though.
Even then the loop would still be quite inefficient as you might draw the same word over and over and over again with random.choice(FileLines). But maybe you can base something useful off of that.

not sure i understand you right, but ehehe,
line 3: "if wrdcount" . . where dit you give wrdcount a value ?
Maybe you intent something along the line below?:
wordset = {}
wrdcount = len(wordset)
while wrdcount < 10:
# do some work to update the setcode here
# when end-of-file break

Using a dictionary as regex in Python

I had a Python question I was hoping for some help on.
Let's start with the important part, here is my current code:
import re #for regex
import numpy as np #for matrix
f1 = open('file-to-analyze.txt','r') #file to analyze
#convert files of words into arrays.
#These words are used to be matched against in the "file-to-analyze"
math = open('sample_math.txt','r')
matharray = list(math.read().split())
math.close()
logic = open('sample_logic.txt','r')
logicarray = list(logic.read().split())
logic.close()
priv = open ('sample_priv.txt','r')
privarray = list(priv.read().split())
priv.close()
... Read in 5 more files and make associated arrays
#convert arrays into dictionaries
math_dict = dict()
math_dict.update(dict.fromkeys(matharray,0))
logic_dict = dict()
logic_dict.update(dict.fromkeys(logicarray,1))
...Make more dictionaries from the arrays (8 total dictionaries - the same number as there are arrays)
#create big dictionary of all keys
word_set = dict(math_dict.items() + logic_dict.items() + priv_dict.items() ... )
statelist = list()
for line in f1:
for word in word_set:
for m in re.finditer(word, line):
print word.value()
The goal of the program is to take a large text file and perform analysis on it. Essentially, I want the program to loop through the text file and match words found in Python dictionaries and associate them with a category and keep track of it in a list.
So for example, let's say I was parsing through the file and I ran across the word "ADD". ADD is listed under the "math" or '0' category of words. The program should then add it to a list that it ran across a 0 category and then continue to parse the file. Essentially generating a large list that looks like [0,4,6,7,4,3,4,1,2,7,1,2,2,2,4...] with each of the numbers corresponding to a particular state or category of words as illustrated above. For the sake of understanding, we'll call this large list 'statelist'
As you can tell from my code, so far I can take as input the file to analyze, take and store the text files that contain the list of words into arrays and from there into dictionaries with their correct corresponding list value (a numerical value from 1 - 7). However, I'm having trouble with the analysis portion.
As you can tell from my code, I'm trying to go line by line through the text file and regex any of the found words with the dictionaries. This is done through a loop and regexing with an additional, 9th dictionary that is more or less a "super" dictionary to help simplify the parsing.
However, I'm having trouble matching all the words in the file and when I find the word, matching it to the dictionary value, not the key. That is when it runs across and "ADD" to add 0 to the list because it is a part of the 0 or "math" category.
Would someone be able to help me figure out how to write this script? I really appreciate it! Sorry for the long post, but the code requires a lot of explanation so you know what's going on. Thank you so much in advance for your help!

The simplest change to your existing code would just be to just keep track of both the word and the category in the loop:
for line in f1:
for word, category in word_set.iteritems():
for m in re.finditer(word, line):
print word, category
statelist.append(category)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Counting Occurences of words in file - python

Related

is there a way to print a specific amount of characters from multiple lists? (Word generator)

Counting how many times a string appears in a CSV file

Trying to read text file and count words within defined groups

importing random words from a file without duplicates Python

Using a dictionary as regex in Python

Categories

Resources