How to read a text file in Python

How to read a text file in Python - python

I have an assignment that reads:
Write a function which takes the input file name and list of words
and write into the file “Repeated_word.txt” the word and number of
times word repeated in input file?
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
My code is below.
All it does is create the new file 'Repeated_word.txt' however it doesn't write the number of times the word from the wordlist appears in the file.
#obtain the name of the file
filename = raw_input("What is the file being used?: ")
fin = open(filename, "r")
#create list of words to see if repeated
word_list = ["Emma", "Woodhouse", "father", "Taylor", "Miss", "been", "she", "her"]
def repeatedWords(fin, word_list):
#open the file
fin = open(filename, "r")
#create output file
fout = open("Repeated_word.txt", "w")
#loop through each word of the file
for line in fin:
#split the lines into words
words = line.split()
for word in words:
#check if word in words is equal to a word from word_list
for i in range(len(word_list)):
if word == i:
#count number of times word is in word
count = words.count(word)
fout.write(word, count)
fout.close
repeatedWords(fin, word_list)

These lines,
for i in range(len(word_list)):
if word == i:
should be
for i in range(len(word_list)):
if word == word_list[i]:
or
for i in word_list:
if word == i:
word is a string, whereas i is an integer, the way you have it right now. These are never equal, hence nothing ever gets written to the file.
In response to your further question, you can either 1) use a dictionary to keep track of how many of each word you have, or 2) read in the whole file at once. This is one way you might do that:
words = fin.read().split()
for word in word_list:
fout.write(word, words.count(word), '\n')
I leave it up to you to figure out where to put this in your code and what you need to replace. This is, after all, your assignment, not ours.

Seems like you are making several mistakes here:
[1] for i in range(len(word_list)):
[2] if word == i:
[3] #count number of times word is in word
[4] count = words.count(word)
[5] fout.write(word, count)
First, you are comparing the word from cin with an integer from the range. [line 2]
Then you are writing the count to fout upon every match per line. [line 5] I guess you should keep the counts (e.g. in a dict) and write them all at the end of parsing input file.

Related

Counting word frequency by python list

Today i was trying to write a code to return the number of times a word is repeated in a text (the text that a txt file contains). at first , before i use a dictionary i wanted to test if the list is working and the words are appended into it so i wrote this code :
def word_frequency(file) :
"""Returns the frequency of all the words in a txt file"""
with open(file) as f :
arg = f.readlines()
l = []
for line in arg :
l = line.split(' ')
return l
After i gave it the file address and i pressed f5 this happened :
In[18]: word_frequency("C:/Users/ASUS/Desktop/Workspace/New folder/tesst.txt")
Out[18]: ['Hello', 'Hello', 'Hello\n']
At first you may think that there is no problem with this output but the text in the txt file is :
As you can see , it only appends the words of the first line to the list but i want all the words that are in the txt file to be appended to the list.
Does anyone know what i have to do? what is the problem here ?

You should save the words in the main list before returning the list.
def word_frequency(file):
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
line_words = line.split()
words += line_words
return words
In your code, you are saving and returning only the first line, return terminates the execution of the function and returns a value. Which in your case is just the first line of the file.

One answer is from https://www.pythonforbeginners.com/lists/count-the-frequency-of-elements-in-a-list#:~:text=Count%20frequency%20of%20elements%20in%20a%20list%20using,the%20frequency%20of%20the%20element%20in%20the%20list.
import collections
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
word = line.split(' ')
words.append(word)
frequencyDict = collections.Counter(words)
print("Input list is:", words)
print("Frequency of elements is:")
print(frequencyDict)

How to take out punctuation from string and find a count of words of a certain length?

I am opening trying to create a function that opens a .txt file and counts the words that have the same length as the number specified by the user.
The .txt file is:
This is a random text document. How many words have a length of one?
How many words have the length three? We have the power to figure it out!
Is a function capable of doing this?
I'm able to open and read the file, but I am unable to exclude punctuation and find the length of each word.
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
if len(i) == number:
count += 1
return count

You can try using the replace() on the string and pass in the desired punctuation and replace it with an empty string("").
It would look something like this:
puncstr = "Hello!"
nopuncstr = puncstr.replace(".", "").replace("?", "").replace("!", "")

I have written a sample code to remove punctuations and to count the number of words. Modify according to your requirement.
import re
fin = """This is a random text document. How many words have a length of one? How many words have the length three? We have the power to figure it out! Is a function capable of doing this?"""
fin = re.sub(r'[^\w\s]','',fin)
print(len(fin.split()))
The above code prints the number of words. Hope this helps!!

instead of cascading replace() just use strip() a one time call
Edit: a cleaner version
pl = '?!."\'' # punctuation list
def samplePractice(number):
with open('sample.txt', 'r') as fin:
words = fin.read().split()
# clean words
words = [w.strip(pl) for w in words]
count = 0
for word in words:
if len(word) == number:
print(word, end=', ')
count += 1
return count
result = samplePractice(4)
print('\nResult:', result)
output:
This, text, many, have, many, have, have, this,
Result: 8
your code is almost ok, it just the second for block in wrong position
pl = '?!."\'' # punctuation list
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
i = i.strip(pl) # clean the word by strip
if len(i) == number:
count += 1
return count
result = samplePractice(4)
print(result)
output:
8

How to search words from txt file to python

How can I show words which length are 20 in a text file?
To show how to list all the word, I know I can use the following code:
#Program for searching words is in 20 words length in words.txt file
def main():
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
print (line)
return
main()
But I not sure how to focus and show all the words with 20 letters.
Big thanks

If your lines have lines of text and not just a single word per line, you would first have to split them, which returns a list of the words:
words = line.split(' ')
Then you can iterate over each word in this list and check whether its length is 20.
for word in words:
if len(word) == 20:
# Do what you want to do here
If each line has a single word, you can just operate on line directly and skip the for loop. You may need to strip the trailing end-of-line character though, word = line.strip('\n'). If you just want to collect them all, you can do this:
words_longer_than_20 = []
for word in words:
if len(word) > 20:
words_longer_than_20.append(word)

If your file has one word only per line, and you want only the words with 20 letters you can simply use:
with open("words.txt", "r") as f:
words = f.read().splitlines()
found = [x for x in words if len(x) == 20]
you can then print the list or print each word seperately

You can try this:
f = open('file.txt')
new_file = f.read().splitlines()
words = [i for i in f if len(i) == 20]
f.close()

Python Pig Latin convertor and line/word counter

I have to create a python file that prompts the user for a file path to a text document and then convert it into pig Latin and do a line/word count.
• A function to generate the pig Latin version of a single word
• A function to print line and word counts to standard output
• Correct pig Latin output with identical formatting as the original text file
• Correct line and word counts
I can't figure out why the pig latin is coming out wrong. My teacher said that I need another string.strip("\n") because it is making the words convert wrong but I have no idea where I am supposed to put that.
Also my line counter is broken. It counts but it always says 222 lines.
How can I make it just count the lines with words ?
#Step 1: User enters text file.
#Step 2: Pig Latin function rewrites file and saves as .txt.
#Step 3: Tracks how many lines and words it rewrites.
vowels = ("A", "a", "E", "e", "I", "i", "O", "o", "U", "u")
# Functions
def pig_word(string):
line = string.strip("\n")
for word in string.split(" "):
first_letter = word[0]
if first_letter in vowels:
return word + "way"
else:
return word[1:] + first_letter + "ay"
def pig_sentence(sentence):
word_list = sentence.split(" ")
convert = " "
for word in word_list:
convert = convert + pig_word(word)
convert = convert + " "
return convert
def line_counter(s):
line_count = 0
for line in s:
line_count += 1
return line_count
def word_counter(line):
word_count = 0
list_of_words = line.split()
word_count += len(list_of_words)
return word_count
# File path conversion
text = raw_input("Enter the path of a text file: ")
file_path = open(text, "r")
out_file = open("pig_output.txt", "w")
s = file_path.read()
pig = pig_sentence(s)
out_file.write(pig+" ")
out_file.write("\n")
linecount = line_counter(s)
wordcount = word_counter(s)
file_path.close()
out_file.close()
# Results
print "\n\n\n\nTranslation finished and written to pig_output.txt"
print "A total of {} lines were translated successfully.".format(linecount)
print "A total of {} words were translated successfully.".format(wordcount)
print "\n\n\n\n"

your first problem is here:
def pig_word(string):
line = string.strip("\n") #!!!! line is NEVER USED !!!
for word in string.split(" "): #you want *line*.split here
the second issue is caused by iterating over a string, it goes through every character instead of every line like a file does:
>>> for i in "abcd":
... print(i)
a
b
c
d
so in your line_counter instead of doing:
for line in s:
line_count += 1
you just need to do:
for line in s.split("\n"):
line_count += 1

The first reason why your not getting the output you want is because in your pig_word(string) function, you return the first word in the string when you put that return inside of your for loop. Also, your teacher was talking about taking all the lines into the function, and iterating over each line via str.split('\n'). \n represents the "new-line" character.
You can try something like this to correct that.
def pig_sentence(string):
lines = []
for line in string.split('\n'):
new_string = ""
for word in line.split(" "):
first_letter = word[0]
if first_letter in vowels:
new_string += word + "way"
else:
new_string += word[1:] + first_letter + "ay"
lines.append(new_string)
return lines
The Changes Made
Initialized a new list lines that we can append to throughout the loops.
Iterate over each line in the passed in string.
For each line, create a new string new_string.
Use your code, but instead of returning we add it to new_string, then append new_string to our list of new lines, lines.
Note that this removes the need for two functions. Also note that I renamed pig_word to pig_sentence.
The second error is in your function line_counter(s). You are iterating over each character rather than each line. Here add that str.split('\n') again to get the output you want by splitting the string into a list of lines then iterating over the list.
Here is the modified function:
def line_counter(s):
line_count = 0
for _ in s.split('\n'):
line_count += 1
return line_count
(Since there is nothing erroneous with your file i.o., I'm just going to use a string literal here for the testing.)
Test
paragraph = """\
Hello world
how are you
pig latin\
"""
lines = line_counter(paragraph)
words = sum([word_counter(line) for line in paragraph.split('\n')])
out = pig_sentence(paragraph)
print(lines, words, out)
The output is what we expect!
3 7 ['elloHay', 'elloHayorldway', 'owhay', 'owhayareway', 'owhayarewayouyay', 'igpay', 'igpayatinlay']

You are removing only the space, you need to remove all punctuation as well as end of line characters. Replace
split(" ")
with
split()
Your sentence list is the equivalent of
sentence = 'Hello there.\nMy name is Roxy.\nHow are you?
If you print after split(" ") and split() you will see the difference and you will get the results that you expect.
Additionally, you will get incorrect results because you will have there translated to heretay. you need to loop around so that it comes out as erethay
That is move every consonent to the end before adding 'ay' so that the new word starts with a vowel.

reading and checking the consecutive words in a file

I want to read the words in a file, and say for example, check if the word is "1",if word is 1, I have to check if the next word is "two". After that i have to do some other task. Can u help me to check the occurance of "1" and "two" consecutively.
I have used
filne = raw_input("name of existing file to be proceesed:")
f = open(filne, 'r+')
for word in f.read().split():
for i in xrange(len(word)):
print word[i]
print word[i+1]
but its not working.

The easiest way to deal with consecutive items is with zip:
with open(filename, 'r') as f: # better way to open file
for line in f: # for each line
words = line.strip().split() # all words on the line
for word1, word2 in zip(words, words[1:]): # iterate through pairs
if word1 == '1' and word2 == 'crore': # test the pair
At the moment, your indices (i and i+1) are within each word (i.e. characters) not for words within the list.

I think you want to print two consecutive words from the file,
In your code you are iterating over the each character instead of each word in file if thats what you intend to do.
You can do that in following way:
f = open('yourFileName')
str1 = f.read().split()
for i in xrange(len(str1)-1): # -1 otherwise it will be index out of range error
print str1[i]
print str1[i+1]
and if you want to check some word is present and want check for word next to it, use
if 'wordYouWantToCheck' in str1:
index=str1.index('wordYouWantToCheck')
Now you have index for the word you are looking for, you can check for the word next to it using str1[index+1].
But 'index' function will return only the first occurrence of the word. To accomplish your intent here, you can use 'enumerate' function.
indices = [i for i,x in enumerate(str1) if x == "1"]
This will return list containing indices of all occurrences of word '1'.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to read a text file in Python - python

Related

Counting word frequency by python list

How to take out punctuation from string and find a count of words of a certain length?

How to search words from txt file to python

Python Pig Latin convertor and line/word counter

reading and checking the consecutive words in a file

Categories

Resources