Counting word frequency by python list - python

Today i was trying to write a code to return the number of times a word is repeated in a text (the text that a txt file contains). at first , before i use a dictionary i wanted to test if the list is working and the words are appended into it so i wrote this code :
def word_frequency(file) :
"""Returns the frequency of all the words in a txt file"""
with open(file) as f :
arg = f.readlines()
l = []
for line in arg :
l = line.split(' ')
return l
After i gave it the file address and i pressed f5 this happened :
In[18]: word_frequency("C:/Users/ASUS/Desktop/Workspace/New folder/tesst.txt")
Out[18]: ['Hello', 'Hello', 'Hello\n']
At first you may think that there is no problem with this output but the text in the txt file is :
As you can see , it only appends the words of the first line to the list but i want all the words that are in the txt file to be appended to the list.
Does anyone know what i have to do? what is the problem here ?

You should save the words in the main list before returning the list.
def word_frequency(file):
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
line_words = line.split()
words += line_words
return words
In your code, you are saving and returning only the first line, return terminates the execution of the function and returns a value. Which in your case is just the first line of the file.

One answer is from https://www.pythonforbeginners.com/lists/count-the-frequency-of-elements-in-a-list#:~:text=Count%20frequency%20of%20elements%20in%20a%20list%20using,the%20frequency%20of%20the%20element%20in%20the%20list.
import collections
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
word = line.split(' ')
words.append(word)
frequencyDict = collections.Counter(words)
print("Input list is:", words)
print("Frequency of elements is:")
print(frequencyDict)

Related

How to loop through two list and append key,val pair?

I'm trying two loop through a text file and create a dict which holds dict[line_index]=word_index_position which means the key is the line number and the value is all the words in that line. The goal is to create a "matrix" so that a the user later on should be able to specify x,y coordinates (line, word_index_position) and retrieve a word in those coordinates, if there is any (Not sure how it is going to work with a dict, since it's not ordered). Below is the loop to create the dict.
try:
f = open("file.txt", "r")
except Exception as e:
print("Skriv in ett korrekt filnamn")
uppslag = dict()
num_lines = 0
for line in f.readlines():
num_lines += 1
print(line)
for word in line.split():
print(num_lines)
print(word)
uppslag[num_lines] = word
f.close()
uppslag
Loop works as it's supposed to, but uppslag[num_lines] = word seems to only store the last word in each line. Any guidance would be highly appreciated.
Many thanks,
Instead of overwriting the word:
for word in line.split():
print(num_lines)
print(word)
uppslag[num_lines] = word
you may be better off saving the whole line:
uppslag[num_lines] = line.split()
This way you'll be able to find the 3rd word in 4th line as:
uppslag[4][3]
uppslag[num_lines] = word is overwriting the dictionary entry for key num_lines every time it is called. You can use a list to hold all the words:
for line in f:
num_lines += 1
print(line)
uppslag[num_lines] = [] # initialize dictionary entry with empty list
for word in line.split():
print(num_lines, word)
uppslag[num_lines].append(word) # add new word to list
You can write the same code in a more compact form, since line.split() already returns a list:
for line_number, line in enumerate(f):
uppslag[line_number] = line.split()
If there is a word on every line (i.e. the line index will be continuous) you can use a list instead of a dictionary, and reduce your code to a one-line list comprehension:
uppslag = [line.split() for line in f]
There is no need for a dictionary, or .readlines().
with open("file.txt") as words_file:
words = [line.split() for line in words_file]

How to search words from txt file to python

How can I show words which length are 20 in a text file?
To show how to list all the word, I know I can use the following code:
#Program for searching words is in 20 words length in words.txt file
def main():
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
print (line)
return
main()
But I not sure how to focus and show all the words with 20 letters.
Big thanks
If your lines have lines of text and not just a single word per line, you would first have to split them, which returns a list of the words:
words = line.split(' ')
Then you can iterate over each word in this list and check whether its length is 20.
for word in words:
if len(word) == 20:
# Do what you want to do here
If each line has a single word, you can just operate on line directly and skip the for loop. You may need to strip the trailing end-of-line character though, word = line.strip('\n'). If you just want to collect them all, you can do this:
words_longer_than_20 = []
for word in words:
if len(word) > 20:
words_longer_than_20.append(word)
If your file has one word only per line, and you want only the words with 20 letters you can simply use:
with open("words.txt", "r") as f:
words = f.read().splitlines()
found = [x for x in words if len(x) == 20]
you can then print the list or print each word seperately
You can try this:
f = open('file.txt')
new_file = f.read().splitlines()
words = [i for i in f if len(i) == 20]
f.close()

Split string within list into words in Python

I'm a newbie in Python, and I need to write a code in Python that will read a text file, then split each words in it, sort it and print it out.
Here is the code I wrote:
fname = raw_input("Enter file name: ")
fh = open(fname)
lst = list()
words = list()
for line in fh:
line = line.strip()
line.split()
lst.append(line)
lst.sort()
print lst
That's my output -
['Arise fair sun and kill the envious moon', 'But soft what light through yonder window breaks', 'It is the east and Juliet is the sun', 'Who is already sick and pale with grienter code herew',
'with', 'yonder']
However, when I try to split lst.split() it saying
List object has no attribute split
Please help!
You should extend the new list with the splitted line, rather than attempt to split the strings after appending:
for line in fh:
line = line.strip()
lst.extend(line.split())
The issue is split() does not magically mutate the string that is split into a list. You have to do sth with the return value.
for line in fh:
# line.split() # expression has has no effect
line = line.split() # statement does
# lst += line # shortcut for loop underneath
for token in line:
lst = lst + [token]
lst += [token]
The above is a solution that uses a nested loop and avoids append and extend. The whole line by line splitting and sorting can be done very concisely, however, with a nested generator expression:
print sorted(word for line in fh for word in line.strip().split())
You can do:
fname = raw_input("Enter file name: ")
fh = open(fname, "r")
lines = list()
words = list()
for line in fh:
# get an array of words for this line
words = line.split()
for w in words:
lines.append(w)
lines.sort()
print lines
To avoid dups:
no_dups_list = list()
for w in lines:
if w not in no_dups_list:
no_dups_list.append(w)

How to read a text file in Python

I have an assignment that reads:
Write a function which takes the input file name and list of words
and write into the file “Repeated_word.txt” the word and number of
times word repeated in input file?
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
My code is below.
All it does is create the new file 'Repeated_word.txt' however it doesn't write the number of times the word from the wordlist appears in the file.
#obtain the name of the file
filename = raw_input("What is the file being used?: ")
fin = open(filename, "r")
#create list of words to see if repeated
word_list = ["Emma", "Woodhouse", "father", "Taylor", "Miss", "been", "she", "her"]
def repeatedWords(fin, word_list):
#open the file
fin = open(filename, "r")
#create output file
fout = open("Repeated_word.txt", "w")
#loop through each word of the file
for line in fin:
#split the lines into words
words = line.split()
for word in words:
#check if word in words is equal to a word from word_list
for i in range(len(word_list)):
if word == i:
#count number of times word is in word
count = words.count(word)
fout.write(word, count)
fout.close
repeatedWords(fin, word_list)
These lines,
for i in range(len(word_list)):
if word == i:
should be
for i in range(len(word_list)):
if word == word_list[i]:
or
for i in word_list:
if word == i:
word is a string, whereas i is an integer, the way you have it right now. These are never equal, hence nothing ever gets written to the file.
In response to your further question, you can either 1) use a dictionary to keep track of how many of each word you have, or 2) read in the whole file at once. This is one way you might do that:
words = fin.read().split()
for word in word_list:
fout.write(word, words.count(word), '\n')
I leave it up to you to figure out where to put this in your code and what you need to replace. This is, after all, your assignment, not ours.
Seems like you are making several mistakes here:
[1] for i in range(len(word_list)):
[2] if word == i:
[3] #count number of times word is in word
[4] count = words.count(word)
[5] fout.write(word, count)
First, you are comparing the word from cin with an integer from the range. [line 2]
Then you are writing the count to fout upon every match per line. [line 5] I guess you should keep the counts (e.g. in a dict) and write them all at the end of parsing input file.

I have a txt file. How can I take dictionary key values and print the line of text they appear in?

I have a txt file. I have written code that finds the unique words and the number of times each word appears in that file. I now need to figure out how to print the lines that those words apear in as well. How can I go about doing this?
Here is a sample output:
Analyze what file: itsy_bitsy_spider.txt
Concordance for file itsy_bitsy_spider.txt
itsy : Total Count: 2
Line:1: The ITSY Bitsy spider crawled up the water spout
Line:4: and the ITSY Bitsy spider went up the spout again
#this function will get just the unique words without the stop words.
def openFiles(openFile):
for i in openFile:
i = i.strip()
linelist.append(i)
b = i.lower()
thislist = b.split()
for a in thislist:
if a in stopwords:
continue
else:
wordlist.append(a)
#print wordlist
#this dictionary is used to count the number of times each stop
countdict = {}
def countWords(this_list):
for word in this_list:
depunct = word.strip(punctuation)
if depunct in countdict:
countdict[depunct] += 1
else:
countdict[depunct] = 1
from collections import defaultdict
target = 'itsy'
word_summary = defaultdict(list)
with open('itsy.txt', 'r') as f:
lines = f.readlines()
for idx, line in enumerate(lines):
words = [w.strip().lower() for w in line.split()]
for word in words:
word_summary[word].append(idx)
unique_words = len(word_summary.keys())
target_occurence = len(word_summary[target])
line_nums = set(word_summary[target])
print "There are %s unique words." % unique_words
print "There are %s occurences of '%s'" % (target_occurence, target)
print "'%s' is found on lines %s" % (target, ', '.join([str(i+1) for i in line_nums]))
If you parsed the input text file line by line, you could maintain another dictionary that is a word -> List<Line> mapping. ie for each word in a line, you add an entry. Might look something like the following. Bearing in mind I'm not very familiar with python, so there may be syntactic shortcuts I've missed.
eg
countdict = {}
linedict = {}
for line in text_file:
for word in line:
depunct = word.strip(punctuation)
if depunct in countdict:
countdict[depunct] += 1
else:
countdict[depunct] = 1
# add entry for word in the line dict if not there already
if depunct not in linedict:
linedict[depunct] = []
# now add the word -> line entry
linedict[depunct].append(line)
One modification you will probably need to make is to prevent duplicates being added to the linedict if a word appears twice in the line.
The above code assumes that you only want to read the text file once.
openFile = open("test.txt", "r")
words = {}
for line in openFile.readlines():
for word in line.strip().lower().split():
wordDict = words.setdefault(word, { 'count': 0, 'line': set() })
wordDict['count'] += 1
wordDict['line'].add(line)
openFile.close()
print words

Categories