i needed to create a program that would read a text file and count the number of lines, words and characters. I got it all working below if seperated individually but i wanted to convert it into using functions so it can read the file once, but i keep getting different answers and unsure what im doing wrong.
Words Code
print ' '
fname = "question2.txt"
infile = open ( fname, 'r' )
fcontents = infile.read()
words = fcontents.split()
cwords = len(words)
print "Words: ",cwords
Characters Code
fname = "question2.txt"
infile = open ( fname, 'r' )
fcontents = infile.read()
char = len(fcontents)
print "Characters: ", char
Lines Code
fname = "question2.txt"
infile = open ( fname, 'r' )
fcontents = infile.readlines()
lines = len(fcontents)
print "Lines: ", lines
Correct Results
Words: 87
Characters: 559
Lines: 12
This is what I came up while trying to use functions but just cant figure out what's wrong.
def filereader():
fname = 'question2.txt'
infile = open ( fname, 'r' )
fcontents = infile.read()
fcontents2 = infile.readlines()
return fname, infile, fcontents, fcontents2
def wordcount(fcontents):
words = fcontents.split(fcontents)
cwords = len(words)
return cwords
def charcount(fcontents):
char = len(fcontents)
return char
def linecount(fcontents2):
lines = len(fcontents2)
return lines
def main():
print "Words: ", wordcount ('cwords')
print "Character: ", charcount ('char')
print "Lines: ", linecount ('lines')
main()
Wrong Results
Words: 2
Character: 4
Lines: 5
You need to use filereader in main:
def main():
fname, infile, fcontents, fcontents2 = filereader()
print "Words: ", wordcount (fcontents)
print "Character: ", charcount (fcontents)
print "Lines: ", linecount (fcontents2)
Otherwise, how would you obtain the values for fcontents and fcontents2 to pass to your other functions? You also need to fix filereader to make sure it will read the file once:
def filereader():
fname = 'question2.txt'
infile = open ( fname, 'r' )
fcontents = infile.read()
fcontents2 = fcontents.splitlines(True)
return fname, infile, fcontents, fcontents2
Note that the line for fcontents2 has been modified to split fcontents on newlines (see str.splitlines). This will also gives you a list of strings as .readlines() would do.
infile = open ( fname, 'r' )
fcontents = infile.read()
fcontents2 = infile.readlines()
You cannot read from a file twice.
When you read from a file, the file handle remembers its position in the file. Thus, after your call infile.read(), infile will be placed at the end of the file. When you then call infile.readlines(), it will try to read all the characters between its current position and the end of the file, and hence return an empty list.
You can rewind the file to its initial position using infile.seek(0). Thus:
>>> fcontents = infile.read()
>>> infile.seek(0)
>>> flines = infile.readlines()
will work.
Alternatively, having read the file into the string fcontents, you can split the string into lines using splitlines:
>>> fcontents = infile.read()
>>> flines = fcontents.splitlines()
Related
I am writing a code in python where I am removing all the text after a specific word but in output lines are missing. I have a text file in unicode which have 3 lines:
my name is test1
my name is
my name is test 2
What I want is to remove text after word "test" so I could get the output as below
my name is test
my name is
my name is test
I have written a code but it does the task but also removes the second line "my name is"
My code is below
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
with open(r"test.txt", "w") as fp:
fp.write(txt)
It looks like if there is no keyword found the index become -1.
So you are avoiding the lines w/o keyword.
I would modify your if by adding the condition as follows:
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index > 0:
txt += line[:index + len(splitStr)] + "\n"
elif index < 0:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
No need to add \n because the line already contains it.
Your code does not append the line if the splitStr is not defined.
txt = ""
with open(r"test.txt", 'r') as fp:
for line in fp.readlines():
splitStr = "test"
index = line.find(splitStr)
if index != -1:
txt += line[:index + len(splitStr)] + "\n"
else:
txt += line
with open(r"test.txt", "w") as fp:
fp.write(txt)
In my solution I simulate the input file via io.StringIO. Compared to your code my solution remove the else branch and only use one += operater. Also splitStr is set only one time and not on each iteration. This makes the code more clear and reduces possible errore sources.
import io
# simulates a file for this example
the_file = io.StringIO("""my name is test1
my name is
my name is test 2""")
txt = ""
splitStr = "test"
with the_file as fp:
# each line
for line in fp.readlines():
# cut somoething?
if splitStr in line:
# find index
index = line.find(splitStr)
# cut after 'splitStr' and add newline
line = line[:index + len(splitStr)] + "\n"
# append line to output
txt += line
print(txt)
When handling with files in Python 3 it is recommended to use pathlib for that like this.
import pathlib
file_path = pathlib.Path("test.txt")
# read from wile
with file_path.open('r') as fp:
# do something
# write back to the file
with file_path.open('w') as fp:
# do something
Suggestion:
for line in fp.readlines():
i = line.find('test')
if i != -1:
line = line[:i]
I'm trying to generate a dataset based on an existing one, I was able to implement a method to randomly change the contents of files, but I can’t write all this to a file. Moreover, I also need to write the number of changed words to the file, since I want to use this dataset to train a neural network, could you help me?
Input: files with 2 lines of text in each.
Output: files with 3(maybe) lines: the first line does not change, the second changes according to the method, the third shows the number of words changed (if for deep learning tasks it is better to do otherwise, I would be glad to advice, since I'm a beginner)
from random import randrange
import os
Path = "D:\corrected data\\"
filelist = os.listdir(Path)
if __name__ == "__main__":
new_words = ['consultable', 'partie ', 'celle ', 'également ', 'forte ', 'statistiques ', 'langue ',
'cadeaux', 'publications ', 'notre', 'nous', 'pour', 'suivr', 'les', 'vos', 'visitez ', 'thème ', 'thème ', 'thème ', 'produits', 'coulisses ', 'un ', 'atelier ', 'concevoir ', 'personnalisés ', 'consultable', 'découvrir ', 'fournit ', 'trace ', 'dire ', 'tableau', 'décrire', 'grande ', 'feuille ', 'noter ', 'correspondant', 'propre',]
nb_words_to_replace = randrange(10)
#with open("1.txt") as file:
for i in filelist:
# if i.endswith(".txt"):
with open(Path + i,"r",encoding="utf-8") as file:
# for line in file:
data = file.readlines()
first_line = data[0]
second_line = data[1]
print(f"Original: {second_line}")
# print(f"FIle: {file}")
second_line_array = second_line.split(" ")
for j in range(nb_words_to_replace):
replacement_position = randrange(len(second_line_array))
old_word = second_line_array[replacement_position]
new_word = new_words[randrange(len(new_words))]
print(f"Position {replacement_position} : {old_word} -> {new_word}")
second_line_array[replacement_position] = new_word
res = " ".join(second_line_array)
print(f"Result: {res}")
with open(Path + i,"w") as f:
for line in file:
if line == second_line:
f.write(res)
In short, you have two questions:
How to properly replace line number 2 (and 3) of the file.
How to keep track of number of words changed.
How to properly replace line number 2 (and 3) of the file.
Your code:
with open(Path + i,"w") as f:
for line in file:
if line == second_line:
f.write(res)
Reading is not enabled. for line in file will not work. fis defined, but file is used instead. To fix this, do the following instead:
with open(Path + i,"r+") as file:
lines = file.read().splitlines() # splitlines() removes the \n characters
lines[1] = second_line
file.writelines(lines)
However, you want to add more lines to it. I suggest you structure the logic differently.
How to keep track of number of words changed.
Add varaible changed_words_count and increment it when old_word != new_word
Resulting code:
for i in filelist:
filepath = Path + i
# The lines that will be replacing the file
new_lines = [""] * 3
with open(filepath, "r", encoding="utf-8") as file:
data = file.readlines()
first_line = data[0]
second_line = data[1]
second_line_array = second_line.split(" ")
changed_words_count = 0
for j in range(nb_words_to_replace):
replacement_position = randrange(len(second_line_array))
old_word = second_line_array[replacement_position]
new_word = new_words[randrange(len(new_words))]
# A word replaced does not mean the word has changed.
# It could be replacing itself.
# Check if the replacing word is different
if old_word != new_word:
changed_words_count += 1
second_line_array[replacement_position] = new_word
# Add the lines to the new file lines
new_lines[0] = first_line
new_lines[1] = " ".join(second_line_array)
new_lines[2] = str(changed_words_count)
print(f"Result: {new_lines[1]}")
with open(filepath, "w") as file:
file.writelines(new_lines)
Note: Code not tested.
I am having an issue getting the train function to work correctly in python. I can not modify the def function. I am at the point where I need to get the second file to read lines one at a time for PosList and i need to match the value of movieWordCount[z] in OpenPos. If the file is there, then I am good to incrment column 2 by one of t hat line (segmented by a space). If it is not, then I need the else to append it to the file end. It does not work. It does not append the values if it is missing and I am not sure if it will find the value if it is there. I have been stuck getting thsi to work for two days.
Here is my code segment I am working with:
with open("PosList") as OpenPos:
lines = OpenPos.readlines()
print lines
if movieWordCount[z] in lines:
print "found"
#Now use tokenize to split it apart by space and set to new array for me to call column2
else:
print "not found"
lines.append(movieWordCount[z] + " 1" + "\n")
Here is my full code:
#!/usr/bin/python
#Import Counter
import collections
from collections import Counter
#Was already here but pickle is used for data input and export
import math, os, pickle, re
class Bayes_Classifier:
def __init__(self, trainDirectory = "movie_reviews/"):
#If file listing exists skip to train
if os.path.isfile('iFileList'):
print "file found"
self.train()
#self.classify()
#If file listing does not exist skip to train
if not os.path.isfile('iFileList'):
print "no file"
newfile = 'iFileList'
tempList = set()
subDir = './movie_reviews'
for filenames in os.listdir(subDir):
my_sub_path = os.path.join(os.sep,subDir,filenames)
tempList.add(filenames)
self.save("filenames", "try3")
f = []
for fFileObj in os.walk("movie_reviews/"):
f.extend(fFileObj)
break
pickle.dump(f, open( "save.p", "wb" ))
self.save(f, "try4")
with open(newfile, 'wb') as fi:
pickle.dump(tempList, fi)
#print tempList
self.train()
#self.classify()
def train(self):
'''Trains the Naive Bayes Sentiment Classifier.'''
print "File ready for training"
#Open iFileList to use as input for opening movie files
x = 0
OpenIFileList = open('iFileList','r')
print "iFileList now Open"
#Loop through the file
for line in OpenIFileList:
#print "Ready to read lines"
#print "reading line " + line
if x > 4:
if x % 2 == 0:
#print line
s = line
if '-' in s:
comp = s.split("'")
#print comp[2]
print comp[1] #This is What you need for t he movie file
compValue1 = comp[1]
#Determine Positive/Negative.
#compType is the variable I am storing it to.
compType = compValue1.split("-",2)[1]
#print compType #Prints that middle value like 5 or 1
# This will do the work based on the value.
if compType == '5':
# print "you have a five" #Confirms the loop I am in.
#If file does not exists create it
if not os.path.exists('PosList'):
print "no file"
file('PosList', 'w').close()
#Open file that needs to be reviewed for word count
compValue2 = "movie_reviews/" + compValue1
print compValue2 #Prints the directory and file path
OpenMovieList = open(compValue2,'r')
for commentLine in OpenMovieList:
commentPositive = commentLine.split(" ")
commentPositiveCounter = Counter(commentPositive)
#print commentPositiveCounter # " Comment Pos goes here"
#if commentLine != '' or commentLine != ' ':
#Get first word, second word, ....
if commentLine and (not commentLine.isspace()):
movieWordCount = self.tokenize(commentLine)
y = len(movieWordCount) #determines length of string
print y
z = 0
#print movieWordCount[0] # Shows the zero position in the file.
while z < y:
print "position " + str(z) + " word is " + movieWordCount[z] # Shows the word we are at and position id
with open("PosList") as OpenPos:
lines = OpenPos.readlines()
print lines
if movieWordCount[z] in lines:
print "found"
else:
print "not found"
lines.append(movieWordCount)
z = z + 1
#Close the files
OpenMovieList.close()
OpenPos.close()
x += 1
#for line2 in OpenIFileList.readlines():
#for line in open('myfile','r').readlines():
#do_something(line)
#Save results
#Close the File List
OpenIFileList.close()
def loadFile(self, sFilename):
'''Given a file name, return the contents of the file as a string.'''
f = open(sFilename, "r")
sTxt = f.read()
f.close()
return sTxt
def save(self, dObj, sFilename):
'''Given an object and a file name, write the object to the file using pickle.'''
f = open(sFilename, "w")
p = pickle.Pickler(f)
p.dump(dObj)
f.close()
def load(self, sFilename):
'''Given a file name, load and return the object stored in the file.'''
f = open(sFilename, "r")
u = pickle.Unpickler(f)
dObj = u.load()
f.close()
return dObj
def tokenize(self, sText):
'''Given a string of text sText, returns a list of the individual tokens that
occur in that string (in order).'''
lTokens = []
sToken = ""
for c in sText:
if re.match("[a-zA-Z0-9]", str(c)) != None or c == "\'" or c == "_" or c == '-':
sToken += c
else:
if sToken != "":
lTokens.append(sToken)
sToken = ""
if c.strip() != "":
lTokens.append(str(c.strip()))
if sToken != "":
lTokens.append(sToken)
return lTokens
To open a file for writing, you can use
with open('PosList', 'w') as Open_Pos
As you are using the with form, you do not need to close the file; Python will do that for you at the end of the with-block.
So assuming that the way you add data to the lines variable is correct, you could remove the superfluous code OpenMovieList.close() and OpenPos.close(), and append 2 lines to your code:
with open("PosList") as OpenPos:
lines = OpenPos.readlines()
print lines
if movieWordCount[z] in lines:
print "found"
else:
print "not found"
lines.append(movieWordCount)
with open("PosList", "w") as OpenPos:
OpenPos.write(lines)
When reading and printing through my files, printing through my cousole gives me the correct result, but writing to the outfile does not
with infile as f :
lines = f.readlines()
new_line = " "
for line in lines:
new_line = ''.join(line).replace('*',letter.upper())
new_line = new_line.replace(':',letter.lower())
print(new_line)
This prints out all of the letters that I inputted
with infile as f :
lines = f.readlines()
new_line = " "
for line in lines:
new_line = ''.join(line).replace('*',letter.upper())
new_line = new_line.replace(':',letter.lower())
outfile.write(new_line)
It only gives me the last letter of the word inputted.
folder = r"C:\Users\sarah\Documents\a CPS 111\Bonus PA\stars\stars"
# os.listdir(folder) returns a list of files in folder
file_list = os.listdir(folder)
letter_art = {}
word = str(input("Please input a letter: "))
word = word.upper()
for fname in file_list:
letter_extension_list = fname.split(".")
for letter in word:
key = letter
value = letter_extension_list[1]
value = "%s."%(key) + value
letter_art[key] = value
fname = "\\".join([folder, value])
infile = open(fname, "r")
outfile = open("word_art.txt", "w")
with infile as f :
lines = f.readlines()
new_line = " "
for line in lines:
new_line = ''.join(line).replace('*',letter.upper())
new_line = new_line.replace(':',letter.lower())
print(new_line)
outfile.write(new_line)
infile.close()
outfile.close()
This is the code I am currently working with. I am taking in symbols from a txt file and changing them to the coornading letter depending on what the user inputed
Open the output file before the loop instead of within it:
outfile = open("word_art.txt", "w")
for letter in word:
with open("test.txt",'r') as f :
lines = f.readlines()
with open('out.txt','w') as outfile:
for line in lines:
new_line = line.replace('*',letter.upper())
new_line = new_line.replace(':',letter.lower())
outfile.write(new_line)
This worked for me.
EDIT:
TigerhawkT3 is correct. I checked out your full code and you were opening the file again and again inside the loop, each time discarding the prior changes.
def match_text(raw_data_file, concentration):
file = open(raw_data_file, 'r')
lines = ""
print("Testing")
for num, line in enumerate(file.readlines(), 0):
w = ' WITH A CONCENTRATION IN ' + concentration
if re.search(w, line):
for i in range(0, 6):
lines += linecache.getline(raw_data_file, num+1)
try:
write(lines, "lines.txt")
print("Lines Data Created...")
except:
print("Could not print Line Data")
else:
print("Didn't Work")
I am trying to open a .txt file and search for a specific string.
If you are simply trying to write all of the lines that hold your string to a file, this will do.
def match_text(raw_data_file, concentration):
look_for = ' WITH A CONCENTRATION IN ' + concentration
with open(raw_data_file) as fin, open('lines.txt', 'w') as fout:
fout.writelines(line for line in fin if look_for in line)
Fixed my own issue. The following works to find a specific line and get the lines following the matched line.
def match_text(raw_data_file, match_this_text):
w = match_this_text
lines = ""
with open(raw_data_file, 'r') as inF:
for line in inF:
if w in line:
lines += line //Will add the matched text to the lines string
for i in range(0, however_many_lines_after_matched_text):
lines += next(inF)
//do something with 'lines', which is final multiline text
This will return multiple lines plus the matched string that the user wants. I apologize if the question was confusing.