Getting count of certain word in txt file in Python? - python

i'm trying to get number of word in certain txt file.
I've tried this but it's not working due to "AttributeError: 'list' object has no attribute 'split'":
words = 0
for wordcount in textfile.readlines().split(":"):
if wordcount == event.getPlayer().getName():
words += 1
Is there any easier or less complicated way to do this?
Here's my text file:
b2:PlayerName:Location{world=CraftWorld{name=world},x=224.23016231506807,y=71.0,z=190.2291303186236,pitch=31.349741,yaw=-333.30002}
What I want is to search for "PlayerName" which is players name and if player has 5 entries (actually, if word "PlayerName" has been five times written to file) it will add +5 to words.
P.S. I'm not sure if this is good for security, because it's an multiplayer game, so it could be many nicknames starting with "PlayerName" such as "PlayerName1337" or whatever, will this cause problem?

Should work
words = 0
for wordcount in textfile.read().split(":"):
if wordcount == event.getPlayer().getName():
words += 1
Here's the difference: .readlines() produces a list and .read() produces a string that you can split into list.
Better approach that won't count wrong things:
words = 0
for line in textfile.readlines():
# I assume that player name position is fixed
word = line.split(':')[1]
if word == event.getPlayer().getName():
words += 1
And yes, there is a security concern if there are players with the same names or with : in their names.
The problem with equal names is that your code doesn't know to what
player a line belongs.
If there will be a colon in player's name you code will also split it.
I urge you to assign some sort of unique immutable identifier for every player and use a database instead of text files that will handle all this stuff for you.

there is an even easier way if you want to count multiple names at once... use the Counter from the collections module
from collections import Counter
counter = Counter([line.split(':') for line in textfile.readlines()])
Counter will behave like a dict, so you will count all the names at once and if you need to, you can efficiently look up the count for more than one name.
At the moment your script counts only one name at a time per loop
you can access the count like so
counter[event.getPlayer().getName()]
I bet you will eventually want to count more than one name. If you do, you should avoid reading the textfile more than once.

You can find how many times a word occurs in a string with count:
words = textfile.read().count('PlayerName')

Related

Trying to read text file and count words within defined groups

I'm a novice Python user. I'm trying to create a program that reads a text file and searches that text for certain words that are grouped (that I predefine by reading from csv). For example, if I wanted to create my own definition for "positive" containing the words "excited", "happy", and "optimistic", the csv would contain those terms. I know the below is messy - the txt file I am reading from contains 7 occurrences of the three "positive" tester words I read from the csv, yet the results print out to be 25. I think it's returning character count, not word count. Code:
import csv
import string
import re
from collections import Counter
remove = dict.fromkeys(map(ord, '\n' + string.punctuation))
# Read the .txt file to analyze.
with open("test.txt", "r") as f:
textanalysis = f.read()
textresult = textanalysis.lower().translate(remove).split()
# Read the CSV list of terms.
with open("positivetest.csv", "r") as senti_file:
reader = csv.reader(senti_file)
positivelist = list(reader)
# Convert term list into flat chain.
from itertools import chain
newposlist = list(chain.from_iterable(positivelist))
# Convert chain list into string.
posstring = ' '.join(str(e) for e in newposlist)
posstring2 = posstring.split(' ')
posstring3 = ', '.join('"{}"'.format(word) for word in posstring2)
# Count number of words as defined in list category
def positive(str):
counts = dict()
for word in posstring3:
if word in counts:
counts[word] += 1
else:
counts[word] = 1
total = sum (counts.values())
return total
# Print result; will write to CSV eventually
print ("Positive: ", positive(textresult))
I'm a beginner as well but I stumbled upon a process that might help. After you read in the file, split the text at every space, tab, and newline. In your case, I would keep all the words lowercase and include punctuation in your split call. Save this as an array and then parse it with some sort of loop to get the number of instances of each 'positive,' or other, word.
Look at this, specifically the "train" function:
https://github.com/G3Kappa/Adjustable-Markov-Chains/blob/master/markovchain.py
Also, this link, ignore the JSON stuff at the beginning, the article talks about sentiment analysis:
https://dev.to/rodolfoferro/sentiment-analysis-on-trumpss-tweets-using-python-
Same applies with this link:
http://adilmoujahid.com/posts/2014/07/twitter-analytics/
Good luck!
I looked at your code and passed through some of my own as a sample.
I have 2 idea's for you, based on what I think you may want.
First Assumption: You want a basic sentiment count?
Getting to 'textresult' is great. Then you did the same with the 'positive lexicon' - to [positivelist] which I thought would be the perfect action? Then you converted [positivelist] to essentially a big sentence.
Would you not just:
1. Pass a 'stop_words' list through [textresult]
2. merge the two dataframes [textresult (less stopwords) and positivelist] for common words - as in an 'inner join'
3. Then basically do your term frequency
4. It is much easier to aggregate the score then
Second assumption: you are focusing on "excited", "happy", and "optimistic"
and you are trying to isolate text themes into those 3 categories?
1. again stop at [textresult]
2. download the 'nrc' and/or 'syuzhet' emotional valence dictionaries
They breakdown emotive words by 8 emotional groups
So if you only want 3 of the 8 emotive groups (subset)
3. Process it like you did to get [positivelist]
4. do another join
Sorry, this is a bit hashed up, but if I was anywhere near what you were thinking let me know and we can make contact.
Second apology, Im also a novice python user, I am adapting what I use in R to python in the above (its not subtle either :) )

importing random words from a file without duplicates Python

I'm attempting to create a program which selects 10 words from a text file which contains 10+ words. For the purpose of the program when importing these 10 words from the text file, I must not import the same words twice! Currently I'm utilising a list for this however the same words seem to appear. I have some knowledge of sets and know they cannot hold the same value twice. As of now I'm clueless on how to solve this any help would be much appreciated. THANKS!
please find relevant code below! -(p.s. FileSelection is basically open file dialog)
def GameStage03_E():
global WordList
if WrdCount >= 10:
WordList = []
for n in range(0,10):
FileLines = open(FileSelection).read().splitlines()
RandWrd = random.choice(FileLines)
WordList.append(RandWrd)
SelectButton.destroy()
GameStage01Button.destroy()
GameStage04_E()
elif WrdCount <= 10:
tkinter.messagebox.showinfo("ERROR", " Insufficient Amount Of Words Within Your Text File! ")
Make WordList a set:
WordList = set()
Then update that set instead of appending:
WordList.update(set([RandWrd]))
Of course WordList would be a bad name for a set.
There are a few other problems though:
Don't use uppercase names for variables and functions (follow PEP8)
What happens if you draw the same word twice in your loop? There is no guarantee that WordList will contain 10 items after the loop completes, if words may appear multiple times.
The latter might be addressed by changing your loop to:
while len(WordList) < 10:
FileLines = open(FileSelection).read().splitlines()
RandWrd = random.choice(FileLines)
WordList.update(set([RandWrd]))
You would have to account for the case that there don't exist 10 distinct words after all, though.
Even then the loop would still be quite inefficient as you might draw the same word over and over and over again with random.choice(FileLines). But maybe you can base something useful off of that.
not sure i understand you right, but ehehe,
line 3: "if wrdcount" . . where dit you give wrdcount a value ?
Maybe you intent something along the line below?:
wordset = {}
wrdcount = len(wordset)
while wrdcount < 10:
# do some work to update the setcode here
# when end-of-file break

Index error in loop to separate body of text by speaker in python

I've got a corpus of text which takes the following form:
JOHN: Thanks for coming, everyone!
(EVERYONE GRUMBLES)
ROGER: They're really glad to see you, huh?
DAVIS: They're glad to see the both of you.
In order to analyze the text, I want to divide it into chunks, by speaker. I want to retain John and Roger, but not Davis. I also want to find the number of times the certain phrases like (EVERYONE GRUMBLES) occur during each person's speech.
My first thought was to use NLTK, so I imported it and used the following code to remove all the punctuation and tokenize the text, so that each word within the corpus becomes an individual token:
f = open("text.txt")
raw_t = f.read()
tokenizer = RegexpTokenizer(r'\w+')
tokens = tokenizer.tokenize(raw_t.decode('utf-8'))
text = nltk.Text(tokens)
Then, I thought that I could create a global list, within which I would include all of the instances of John and Roger speaking.
I figured that I'd first see if each word in the text corpus was upper case and in the list of acceptable names, and if it was, I'd examine every subsequent word until the next incidence of a term that was both upper case and was found in the list of acceptable names. I'd then add all the words from the initial instance of a speaker's name, through to one word less than the next speaker's name, and add this series of tokens/words to my global list.
I've written:
k = 0
i = 0
j = 1
names =["JOHN","ROGER"]
global_list =[]
for i in range(len(text)):
if (text[i].isupper() and text[i] in names):
for j in range(len(text)-i):
if (text[i+j].isupper() and text[i+j] in names):
global_list[k] = text[i:(j-1)]
k+=1
else: j+=1
else: i+=1
Unfortunately, this doesn't work, and I get the following index error:
IndexError Traceback (most recent call last)
<ipython-input-49-97de0c68b674> in <module>()
6 for j in range(len(text)-i):
7 if (text[i+j].isupper() and text[i+j] in names):
----> 8 list_speeches[k] = text[i:(j-1)]
9 k+=1
10 else: j+=1
IndexError: list assignment index out of range
I feel like I'm screwing up something really basic here, but am not exactly why I'm getting this index error. Can anyone shed some light on this?
Break the text into paragraphs with re.split(r"\n\s*\n", text), then examine the first word of each paragraph to see who is speaking. And don't worry about the nltk-- you haven't used it yet, and you don't need to.
Ok, figured this out after a bit of digging around. The initial loop mentioned in the question had a whole bunch of extraneous content, so I simplified it to:
names =["JOHN","ROGER"]
global_list = []
i = 0
for i in range(len(text)):
if (text[i].isupper()) and (text[i] in names):
j=0
while (text[i+j].islower()) and (text[i+j] not in names):
j+=1
global_list.append(text[i:(j-1)])
This generated a list, although, problematically, each item in this list was made up of words starting from the name through to the end of the document. Because each item began at the appropriate name while ending at the last word of the text corpus, it was easy to derive the length of each segment by subtracting the length of the following segment from it:
x=1
new_list = range(len(global_list)-1)
for x in range(len(global_list)):
if x == len(global_list):
new_list[x-1] = global_list[x]
else:
new_list[x-1] = global_list[x][:(len(global_list[x])-len(global_list[x+1]))]
(x was set to 1 because the original code gave me the first speaker's content twice).
This wasn't in the least bit pretty, but it ended up working. If anyone's got a prettier way of doing it — and I'm sure it exists, since I think I've messed up the initial loop — I'd love to see it.

Using a dictionary as regex in Python

I had a Python question I was hoping for some help on.
Let's start with the important part, here is my current code:
import re #for regex
import numpy as np #for matrix
f1 = open('file-to-analyze.txt','r') #file to analyze
#convert files of words into arrays.
#These words are used to be matched against in the "file-to-analyze"
math = open('sample_math.txt','r')
matharray = list(math.read().split())
math.close()
logic = open('sample_logic.txt','r')
logicarray = list(logic.read().split())
logic.close()
priv = open ('sample_priv.txt','r')
privarray = list(priv.read().split())
priv.close()
... Read in 5 more files and make associated arrays
#convert arrays into dictionaries
math_dict = dict()
math_dict.update(dict.fromkeys(matharray,0))
logic_dict = dict()
logic_dict.update(dict.fromkeys(logicarray,1))
...Make more dictionaries from the arrays (8 total dictionaries - the same number as there are arrays)
#create big dictionary of all keys
word_set = dict(math_dict.items() + logic_dict.items() + priv_dict.items() ... )
statelist = list()
for line in f1:
for word in word_set:
for m in re.finditer(word, line):
print word.value()
The goal of the program is to take a large text file and perform analysis on it. Essentially, I want the program to loop through the text file and match words found in Python dictionaries and associate them with a category and keep track of it in a list.
So for example, let's say I was parsing through the file and I ran across the word "ADD". ADD is listed under the "math" or '0' category of words. The program should then add it to a list that it ran across a 0 category and then continue to parse the file. Essentially generating a large list that looks like [0,4,6,7,4,3,4,1,2,7,1,2,2,2,4...] with each of the numbers corresponding to a particular state or category of words as illustrated above. For the sake of understanding, we'll call this large list 'statelist'
As you can tell from my code, so far I can take as input the file to analyze, take and store the text files that contain the list of words into arrays and from there into dictionaries with their correct corresponding list value (a numerical value from 1 - 7). However, I'm having trouble with the analysis portion.
As you can tell from my code, I'm trying to go line by line through the text file and regex any of the found words with the dictionaries. This is done through a loop and regexing with an additional, 9th dictionary that is more or less a "super" dictionary to help simplify the parsing.
However, I'm having trouble matching all the words in the file and when I find the word, matching it to the dictionary value, not the key. That is when it runs across and "ADD" to add 0 to the list because it is a part of the 0 or "math" category.
Would someone be able to help me figure out how to write this script? I really appreciate it! Sorry for the long post, but the code requires a lot of explanation so you know what's going on. Thank you so much in advance for your help!
The simplest change to your existing code would just be to just keep track of both the word and the category in the loop:
for line in f1:
for word, category in word_set.iteritems():
for m in re.finditer(word, line):
print word, category
statelist.append(category)

can anyone help me with my spell check code? [closed]

This question is unlikely to help any future visitors; it is only relevant to a small geographic area, a specific moment in time, or an extraordinarily narrow situation that is not generally applicable to the worldwide audience of the internet. For help making this question more broadly applicable, visit the help center.
Closed 10 years ago.
This is what i have, comments describe what im trying to do
There are words put in a text file where some words are spelt wrong and test text files aswell which are to be used to spell check.
e.g. >>> spellCheck("test1.txt")
{'exercsie': 1, 'finised': 1}
from string import ascii_uppercase, ascii_lowercase
def spellCheck(textFileName):
# Use the open method to open the words file.
# Read the list of words into a list named wordsList
# Close the file
file=open("words.txt","r")
wordsList = file.readlines()
file.close()
# Open the file whos name was provided as the textFileName variable
# Read the text from the file into a list called wordsToCheck
# Close the file
file=open(textFileName, "r")
wordsToCheck = file.readlines()
file.close()
for i in range(0,len(wordsList)): wordsList[i]=wordsList[i].replace("\n","")
for i in range(0,len(wordsToCheck)): wordsToCheck[i]=wordsToCheck[i].replace("\n","")
# The next line creates the dictionary
# This dictionary will have the word that has been spelt wrong as the key and the number of times it has been spelt wrong as the value
spellingErrors = dict(wordsList)
# Loop through the wordsToCheck list
# Change the current word into lower case
# If the current word does not exist in the wordsList then
# Check if the word already exists in the spellingErrors dictionary
# If it does not exist than add it to the dictionary with the initial value of 1.
# If it does exist in the dictionary then increase the value by 1
# Return the dictionary
char_low = ascii_lowercase
char_up = ascii_uppercase
for char in wordsToCheck[0]:
if char in wordsToCheck[0] in char_up:
result.append(char_low)
for i in wordsToCheck[0]:
if wordsToCheck[0] not in wordsList:
if wordsToCheck[0] in dict(wordsList):
dict(wordsList) + 1
elif wordsToCheck[0] not in dict(wordsList):
dict(wordsList) + wordsToCheck[0]
dict(wordsList) + 1
return dict(wordsList)
my code returns an an error
Traceback (most recent call last):
File "", line 1, in
spellCheck("test1.txt")
File "J:\python\SpellCheck(1).py", line 36, in spellCheck
spellingErrors = dict(wordsList)
ValueError: dictionary update sequence element #0 has length 5; 2 is required
So can anyone help me?
I applied PEP-8 and rewrote unpythonic code.
import collections
def spell_check(text_file_name):
# dictionary for word counting
spelling_errors = collections.defaultdict(int)
# put all possible words in a set
with open("words.txt") as words_file:
word_pool = {word.strip().lower() for word in words_file}
# check words
with open(text_file_name) as text_file:
for word in (word.strip().lower() for word in text_file):
if not word in word_pool:
spelling_errors[word] += 1
return spelling_errors
You might want to read about the with statement and defaultdict.
Your code with the ascii_uppercase and ascii_lowercase screams: Read the tutorial and learn the basics. That code is a collection of "I don't know what I'm doing but I do it anyway.".
Some more explanations concerning your old code:
You use
char_low = ascii_lowercase
There is no need for char_low because you never manipulate that value. Just use the original ascii_lowercase. Then there is the following part of your code:
for char in wordsToCheck[0]:
if char in wordsToCheck[0] in char_up:
result.append(char_low)
I'm not quite sure what you try to do here. It seems that you want to convert the words in the list to lower case. In fact, if that code would run - which it doesn't - you would append the whole lower case alphabet to resultfor every upper case character of the word in the list. Nevertheless you don't use resultin the later code, so no harm is done. It would be easy to add a print wordsToCheck[0] before the loop or a print char in the loop to see what happens there.
The last part of the code is just a mess. You access just the first word in each list - maybe because you don't know what that list looks like. That is coding by trial and error. Try coding by knowledge instead.
You don't really know what a dict does and how to use it. I could explain it here but there is this wonderful tutorial at www.python.org that you might want to read first, especially the chapter dealing with dictionaries. If you study those explanations and still don't understand it feel free to come back with a new question concerning this.
I used a defaultdict instead of a standard dictionary because it makes life easier here. If you define spelling errors as dict instead a part of my code would have to change to
if not word in word_pool:
if not word in spelling_errors:
spelling_errors[word] = 1
else:
spelling_errors[word] += 1
BTW, the code I wrote runs for me without any problems. I get a dictionary with the missing words (lower case) as keys and a count of that word as the corresponding value.

Categories