How to search words from txt file to python - python

How can I show words which length are 20 in a text file?
To show how to list all the word, I know I can use the following code:
#Program for searching words is in 20 words length in words.txt file
def main():
file = open("words.txt","r")
lines = file.readlines()
file.close()
for line in lines:
print (line)
return
main()
But I not sure how to focus and show all the words with 20 letters.
Big thanks

If your lines have lines of text and not just a single word per line, you would first have to split them, which returns a list of the words:
words = line.split(' ')
Then you can iterate over each word in this list and check whether its length is 20.
for word in words:
if len(word) == 20:
# Do what you want to do here
If each line has a single word, you can just operate on line directly and skip the for loop. You may need to strip the trailing end-of-line character though, word = line.strip('\n'). If you just want to collect them all, you can do this:
words_longer_than_20 = []
for word in words:
if len(word) > 20:
words_longer_than_20.append(word)

If your file has one word only per line, and you want only the words with 20 letters you can simply use:
with open("words.txt", "r") as f:
words = f.read().splitlines()
found = [x for x in words if len(x) == 20]
you can then print the list or print each word seperately

You can try this:
f = open('file.txt')
new_file = f.read().splitlines()
words = [i for i in f if len(i) == 20]
f.close()

Related

Returning a line of txt.-file that has a word with more than 6 characters and starts with "A" in Python

I have a task to accomplish in Python with only one sentence:
I need to return lines of my txt-file that include words which have more than 6 characters and start with the letter "A".
My code is the following:
[line for line in open('test.txt') if line.split().count('A') > 6]
I am not sure how to implement another command in order to say that my word starts with "A" and has to have more than 6 characters. That is the furthest I could do. I thank you for your time.
Greetings
I would split up your for loop so that it's not a list comprehension, to make it easier to understand what's going on. Once you do that, it should be clearer what you're missing so you can assemble it back into a list comprehension.
lines = []
with open('test.txt', 'r') as f:
for line in f: # this line reads each line in the file
add_line = False
for word in line.split():
if (word.startswith('A') and len(word) > 6):
add_line = True
break
if (add_line):
lines.append(line)
This roughly translates to
[line for line in open('test.txt', 'r') if any(len(word) > 6 and word.startswith('A') for word in line.split())]
You should break each line and compare each word separately
[line for line in open('test.txt') if len([word for word in line.split(' ') if word[0].lower() == 'a' and len(word)> 6]) > 0]

Counting word frequency by python list

Today i was trying to write a code to return the number of times a word is repeated in a text (the text that a txt file contains). at first , before i use a dictionary i wanted to test if the list is working and the words are appended into it so i wrote this code :
def word_frequency(file) :
"""Returns the frequency of all the words in a txt file"""
with open(file) as f :
arg = f.readlines()
l = []
for line in arg :
l = line.split(' ')
return l
After i gave it the file address and i pressed f5 this happened :
In[18]: word_frequency("C:/Users/ASUS/Desktop/Workspace/New folder/tesst.txt")
Out[18]: ['Hello', 'Hello', 'Hello\n']
At first you may think that there is no problem with this output but the text in the txt file is :
As you can see , it only appends the words of the first line to the list but i want all the words that are in the txt file to be appended to the list.
Does anyone know what i have to do? what is the problem here ?
You should save the words in the main list before returning the list.
def word_frequency(file):
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
line_words = line.split()
words += line_words
return words
In your code, you are saving and returning only the first line, return terminates the execution of the function and returns a value. Which in your case is just the first line of the file.
One answer is from https://www.pythonforbeginners.com/lists/count-the-frequency-of-elements-in-a-list#:~:text=Count%20frequency%20of%20elements%20in%20a%20list%20using,the%20frequency%20of%20the%20element%20in%20the%20list.
import collections
with open(file) as f:
lines = f.readlines()
words = []
for line in lines:
word = line.split(' ')
words.append(word)
frequencyDict = collections.Counter(words)
print("Input list is:", words)
print("Frequency of elements is:")
print(frequencyDict)

How to take out punctuation from string and find a count of words of a certain length?

I am opening trying to create a function that opens a .txt file and counts the words that have the same length as the number specified by the user.
The .txt file is:
This is a random text document. How many words have a length of one?
How many words have the length three? We have the power to figure it out!
Is a function capable of doing this?
I'm able to open and read the file, but I am unable to exclude punctuation and find the length of each word.
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
if len(i) == number:
count += 1
return count
You can try using the replace() on the string and pass in the desired punctuation and replace it with an empty string("").
It would look something like this:
puncstr = "Hello!"
nopuncstr = puncstr.replace(".", "").replace("?", "").replace("!", "")
I have written a sample code to remove punctuations and to count the number of words. Modify according to your requirement.
import re
fin = """This is a random text document. How many words have a length of one? How many words have the length three? We have the power to figure it out! Is a function capable of doing this?"""
fin = re.sub(r'[^\w\s]','',fin)
print(len(fin.split()))
The above code prints the number of words. Hope this helps!!
instead of cascading replace() just use strip() a one time call
Edit: a cleaner version
pl = '?!."\'' # punctuation list
def samplePractice(number):
with open('sample.txt', 'r') as fin:
words = fin.read().split()
# clean words
words = [w.strip(pl) for w in words]
count = 0
for word in words:
if len(word) == number:
print(word, end=', ')
count += 1
return count
result = samplePractice(4)
print('\nResult:', result)
output:
This, text, many, have, many, have, have, this,
Result: 8
your code is almost ok, it just the second for block in wrong position
pl = '?!."\'' # punctuation list
def samplePractice(number):
fin = open('sample.txt', 'r')
lstLines = fin.readlines()
fin.close
count = 0
for words in lstLines:
words = words.split()
for i in words:
i = i.strip(pl) # clean the word by strip
if len(i) == number:
count += 1
return count
result = samplePractice(4)
print(result)
output:
8

How to read a text file in Python

I have an assignment that reads:
Write a function which takes the input file name and list of words
and write into the file “Repeated_word.txt” the word and number of
times word repeated in input file?
word_list = [‘Emma’, ‘Woodhouse’, ‘father’, ‘Taylor’, ‘Miss’, ‘been’, ‘she’, ‘her’]
My code is below.
All it does is create the new file 'Repeated_word.txt' however it doesn't write the number of times the word from the wordlist appears in the file.
#obtain the name of the file
filename = raw_input("What is the file being used?: ")
fin = open(filename, "r")
#create list of words to see if repeated
word_list = ["Emma", "Woodhouse", "father", "Taylor", "Miss", "been", "she", "her"]
def repeatedWords(fin, word_list):
#open the file
fin = open(filename, "r")
#create output file
fout = open("Repeated_word.txt", "w")
#loop through each word of the file
for line in fin:
#split the lines into words
words = line.split()
for word in words:
#check if word in words is equal to a word from word_list
for i in range(len(word_list)):
if word == i:
#count number of times word is in word
count = words.count(word)
fout.write(word, count)
fout.close
repeatedWords(fin, word_list)
These lines,
for i in range(len(word_list)):
if word == i:
should be
for i in range(len(word_list)):
if word == word_list[i]:
or
for i in word_list:
if word == i:
word is a string, whereas i is an integer, the way you have it right now. These are never equal, hence nothing ever gets written to the file.
In response to your further question, you can either 1) use a dictionary to keep track of how many of each word you have, or 2) read in the whole file at once. This is one way you might do that:
words = fin.read().split()
for word in word_list:
fout.write(word, words.count(word), '\n')
I leave it up to you to figure out where to put this in your code and what you need to replace. This is, after all, your assignment, not ours.
Seems like you are making several mistakes here:
[1] for i in range(len(word_list)):
[2] if word == i:
[3] #count number of times word is in word
[4] count = words.count(word)
[5] fout.write(word, count)
First, you are comparing the word from cin with an integer from the range. [line 2]
Then you are writing the count to fout upon every match per line. [line 5] I guess you should keep the counts (e.g. in a dict) and write them all at the end of parsing input file.

reading and checking the consecutive words in a file

I want to read the words in a file, and say for example, check if the word is "1",if word is 1, I have to check if the next word is "two". After that i have to do some other task. Can u help me to check the occurance of "1" and "two" consecutively.
I have used
filne = raw_input("name of existing file to be proceesed:")
f = open(filne, 'r+')
for word in f.read().split():
for i in xrange(len(word)):
print word[i]
print word[i+1]
but its not working.
The easiest way to deal with consecutive items is with zip:
with open(filename, 'r') as f: # better way to open file
for line in f: # for each line
words = line.strip().split() # all words on the line
for word1, word2 in zip(words, words[1:]): # iterate through pairs
if word1 == '1' and word2 == 'crore': # test the pair
At the moment, your indices (i and i+1) are within each word (i.e. characters) not for words within the list.
I think you want to print two consecutive words from the file,
In your code you are iterating over the each character instead of each word in file if thats what you intend to do.
You can do that in following way:
f = open('yourFileName')
str1 = f.read().split()
for i in xrange(len(str1)-1): # -1 otherwise it will be index out of range error
print str1[i]
print str1[i+1]
and if you want to check some word is present and want check for word next to it, use
if 'wordYouWantToCheck' in str1:
index=str1.index('wordYouWantToCheck')
Now you have index for the word you are looking for, you can check for the word next to it using str1[index+1].
But 'index' function will return only the first occurrence of the word. To accomplish your intent here, you can use 'enumerate' function.
indices = [i for i,x in enumerate(str1) if x == "1"]
This will return list containing indices of all occurrences of word '1'.

Categories