Print output to text file (.txt) using Python - python

I want print my output to text file. But the results different if I print in terminal. My code :
...
words = keywords.split("makan","Rina")
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
print('"' + sentences[itemIndex] + '."')
break
The ouput like this :
"Semalam saya makan nasi padang."
" Saya makan bersama Rina."
" Rina pesan ayam goreng."
If I add print to text file :
words = ["makan","Rina"]
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
with open("corpus.txt",'w+') as f:
f.write(sentences[itemIndex])
f.close()
The output just :
Rina pesan ayam goreng
Why? How to print outputs to text file same like I print outputs in terminal?

You are reopening the file for each iteration of the loop so when you write to it you overwrite what is already there. You need to open the file outside of all the loops and open it in append mode, denoted by a.
When you finish you will end up with only the last line in the file. Remember to close the file using f.close() when you are done with it.

You have to reorder the lines of your code, by moving opening/closing the file outside of the loop:
with open("corpus.txt",'w+') as f:
words = ["makan","Rina"]
sentences = text.split(".")
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
f.write(sentences[itemIndex])
Also, print usually added a newline character after the output, if you want your sentences to be written on the different lines in the file, you may want to add f.write('\n') after every sentence.

Because you are listing with open inside of the loop, and you're using 'w+' mode, your program is going to overwrite the file each time, so you will only end up with the last line written to the file. Try it with 'a' instead, or move with open outside of the loop.

You don't need to call close on a file handle that you have opened using the with syntax. The closing of the file is handled for you.
I would open the file just once before for loops (the for loops should be within the with statement) instead of opening it multiple times. You are overwriting the file each time you are opening it to write a new line.
Your code should be:
words = ["makan","Rina"]
sentences = text.split(".")
with open("corpus.txt",'w+') as f:
for itemIndex in range(len(sentences)):
for word in words:
if word in sentences[itemIndex]:
f.write(sentences[itemIndex] + '\n')

Related

How to get rid of the last whitespace when printing with end=" "?

Task:
Create a solution that accepts an input identifying the name of a text file, for example, "WordTextFile1.txt". Each text file contains three rows with one word per row. Using the open() function and write() and read() methods, interact with the input text file to write a new sentence string composed of the three existing words to the end of the file contents on a new line. Output the new file contents.
The solution output should be in the format
cat
chases
dog
cat chases dog
the "WordTextFile1.txt" has only 3 words each in a different row
cat
chases
dog
This is what I have which works however the last line with the sentence has an extra whitespace which is breaking my program. What can I do to get rid of the whitespace and fix my code? help!
file = input()
with open(file, "r+") as f:
list_words = f.readlines()
for word in list_words:
print(word.strip())
for word in list_words:
print(word.strip(), end = " ")
this is current output:
student
reads
book
student reads book(extra whitespace)
You are properly removing the last white space by word.strip() but adding end = " " just adds the last whitespace again. Change it to:
file = input()
with open(file, "r+") as f:
list_words = f.readlines()
# I don't see any reason having this for loop
# for word in list_words:
# print(word.strip())
print(' '.join(word.strip() for word in list_words) # this should work
Edit: Removed the list as it was not required. Thanks to #PranavHosangadi

How to compare contents of two large text files in Python?

Datasets: Two Large text files for train and test that all words of them are tokenized. a part of data is like the following: " the fulton county grand jury said friday an investigation of atlanta's recent primary election produced `` no evidence '' that any irregularities took place . "
Question: How can I replace every word in the test data not seen in training with the word "unk" in Python?
So far, I made the dictionary by the following codes to count the frequency of each word in the file:
#open text file and assign it to varible with the name "readfile"
readfile= open('C:/Users/amtol/Desktop/NLP/Homework_1/brown-train.txt','r')
writefile=open('C:/Users/amtol/Desktop/NLP/Homework_1/brown-trainReplaced.txt','w')
# Create an empty dictionary
d = dict()
# Loop through each line of the file
for line in readfile:
# Split the line into words
words = line.split(" ")
# Iterate over each word in line
for word in words:
# Check if the word is already in dictionary
if word in d:
# Increment count of word by 1
d[word] = d[word] + 1
else:
# Add the word to dictionary with count 1
d[word] = 1
#replace all words occurring in the training data once with the token<unk>.
for key in list(d.keys()):
line= d[key]
if (line==1):
line="<unk>"
writefile.write(str(d))
else:
writefile.write(str(d))
#close the file that we have created and we wrote the new data in that
writefile.close()
Honestly the above code doesn't work with writefile.write(str(d)) which I want to write the result in the new textfile, but by print(key, ":", line) it works and shows the frequency of each word but in the console which doesn't create the new file. if you also know the reason for this, please let me know.
First off, your task is to replace the words in test file that are not seen in train file. Your code never mentions the test file. You have to
Read the train file, gather what words are there. This is mostly okay; but you need to .strip() your line or the last word in each line will end with a newline. Also, it would make more sense to use set instead of dict if you don't need to know the count (and you don't, you just want to know if it's there or not). Sets are cool because you don't have to care if an element is in already or not; you just toss it in. If you absolutely need to know the count, using collections.Counter is easier than doing it yourself.
Read the test file, and write to replacement file, as you are replacing the words in each line. Something like:
with open("test", "rt") as reader:
with open("replacement", "wt") as writer:
for line in reader:
writer.write(replaced_line(line.strip()) + "\n")
Make sense, which your last block does not :P Instead of seeing whether a word from test file is seen or not, and replacing the unseen ones, you are iterating on the words you have seen in the train file, and writing <unk> if you've seen them exactly once. This does something, but not anything close to what it should.
Instead, split the line you got from the test file and iterate on its words; if the word is in the seen set (word in seen, literally) then replace its contents; and finally add it to the output sentence. You can do it in a loop, but here's a comprehension that does it:
new_line = ' '.join(word if word in seen else '<unk>'
for word in line.split(' '))

How do I print specific strings from text files?

file_contents = x.read()
#print (file_contents)
for line in file_contents:
if "ase" in line:
print (line)
I'm looking for all the sentences that contain the phrase "ase" in the file. When I run it, nothing is printed.
Since file_contents is the result of x.read(), it's a string not a list of strings.
So you're iterating on each character.
Do that instead:
file_contents = x.readlines()
now you can search in your lines
or if you're not planning to reuse file_contents, iterate on the file handle with:
for line in x:
so you don't have to readlines() and store all file in memory (if it's big, it can make a difference)
read will return the whole content of the file (not line by line) as string. So when you iterate over it you iterate over the single characters:
file_contents = """There is a ase."""
for char in file_contents:
print(char)
You can simply iterate over the file object (which returns it line-by-line):
for line in x:
if "ase" in line:
print(line)
Note that if you actually look for sentences instead of lines where 'ase' is contained it will be a bit more complicated. For example you could read the complete file and split at .:
for sentence in x.read().split('.'):
if "ase" in sentence:
print(sentence)
However that would fail if there are .s that don't represent the end of a sentence (like abbreviations).

How to load a word list into Python

I'm working through an introductory Python programming course on MIT OCW. On this problem set I've been given some code to work on and a text file. The code and the text file are in the same folder. The code looks like this:
import random
import string
def load_words( ):
print "Loading word list from file..."
inFile = open (WORDLIST_FILENAME, 'r', 0)
line = inFile.readline( )
wordlist = string.split (line)
print " ", len(wordlist), "words loaded."
return wordlist
def choose_word (wordlist):
return random.choice (wordlist)
wordlist = load_words ( )
When I run the code as it is, the problem set instructions say I should get this:
Loading word list from file...
55900 words loaded.
For some reason though, when I run the code I get:
Loading word list from file...
1 words loaded
I've tried omitting the 2nd and 3rd parameters from the input to the open function but to no avail. What could the problem be?
Moreover, when I try to print the value of wordlist I get
['AA']
When I print the value of line within the context of the relevant function I get:
AA
The text file does begin with 'AA', but what about all of the letters that follow?
line = inFile.readline( ) should be readlines(), plural.
readline would read only a single line. The reason why only one word is read.
Using readlines() would give you a list delimited by new line characters in your input file.
raw file like this:
cat wordlist.txt
aa
bb
cc
dd
ee
python file like this:
import random
def load_words(WORDLIST_FILENAME):
print "Loading word list from file..."
wordlist = list()
# 'with' can automate finish 'open' and 'close' file
with open(WORDLIST_FILENAME) as f:
# fetch one line each time, include '\n'
for line in f:
# strip '\n', then append it to wordlist
wordlist.append(line.rstrip('\n'))
print " ", len(wordlist), "words loaded."
print '\n'.join(wordlist)
return wordlist
def choose_word (wordlist):
return random.choice (wordlist)
wordlist = load_words('wordlist.txt')
then result:
python load_words.py
Loading word list from file...
5 words loaded.
aa
bb
cc
dd
ee
the function u have written can read words in a single line. It assumes all words are written in single line in text file and hence reads that line and creates a list by splitting it. However, it appears your text file contains some newlines also. Hence u can replace the following with:
line = inFile.readline( )
wordlist = string.split (line)
with:
wordlist =[]
for line in inFile:
line = line.split()
wordlist.extend(line)
print " ", len(wordlist), "words loaded."

getting context for a word

I am dealing with an extremely large text file (around 3.77 GB), and trying to extract all the sentences a specific word occurs in and write out to a text file.
So the large text file is just many lines of text:
line 1 text ....
line 2 text ....
I have also extracted the unique word list from the text file, and want to extract all the sentences each word occurs in and write out the context associated with the word. Ideally, the output file will take the format of
word1 \t sentence 1\n sentence 2\n sentence N\n
word2 \t sentence 1\n sentence 2\n sentence M\n
The current code I have is something like this :
fout=open('word_context_3000_4000(4).txt','a')
for x in unique_word[3000:4000]:
fout.write('\n'+x+'\t')
fin=open('corpus2.txt')
for line in fin:
if x in line.strip().split():
fout.write(line)
else:
pass
fout.close()
Since the unique word list is big, so I process the word list chunk by chunk. But, somehow, the code failed to get the context for all the words, and only returned the context for the first hundreds of words in the unique word list.
Does any one have worked on the similar problem before? I am using python, btw.
Thanks a lot.
First problem, you never close fin.
Maybe you should try something like this :
fout=open('word_context_3000_4000(4).txt','a')
fin=open('corpus2.txt')
for x in unique_word[3000:4000]:
fout.write('\n'+x+'\t')
fin.seek(0) # go to the begining of the file
for line in fin:
if x in line.strip().split():
fout.write(line)
else:
pass
fout.close()
fin.close()

Categories