Text to Excel in Python overwrites existing data - python

I have a small script, and I am trying to parse data from text file to Excel file instead of going till the counter which is till 166 it stops at 134 and then doesn't do anything.
I have a file close operation also but it doesn't close the file and looks like the script continues to run.
Any thoughts? What am I doing wrong ?
path = ('C:\\Users\\40081\\PycharmProjects\\abcd')
#file_name = open('parsed_DUT1.txt', 'r')
file_name = 'parsed_DUT1.txt'
count=1
for line in file_name:
inputfile = open(file_name)
outputfile = open("parsed_DUT1" + '.xls', 'w+')
while count < 166:
for line in inputfile:
text = "TestNum_" + str(count*1)
if text in line :
#data = text[text.find(" ")+1:].split()[0]
outputfile.writelines(line)
count = count+1
inputfile.close()
outputfile.close()

w+
Opens a file for both writing and reading. Overwrites the existing
file if the file exists. If the file does not exist, creates a new
file for reading and writing.
You are opening the output file in w+ mode, that overwrites it everytime. Try with
outputfile = open("parsed_DUT1" + '.xls', 'a') # 'a' opens a file for appending.
I also suggest you to deal with files with with statement:
with open(file_name) as inputfile, open("parsed_DUT1" + '.xls', 'a') as outputfile:
# do stuff with input and output files

Related

How to write a list into a specific line in a text file using Python?

I have copied a certain part of a .txt file into a list. I need to go to particular line and paste/append it.
file = open(filepath, 'r')
with open(filepath) as f: # SEARCH IF STAGE1 & STAGE2 EXIST OR NOT
if 'stage1' in f.read():
print("stage 1")
data = file.readlines()[11:26]
print(*data, sep='')
with open(filepath) as f:
srno = f.readlines()[7:8]
print(*srno, sep='')
Now that I've copied lines 11-26 and line 7-8.. I want to paste/append it in the text file above line 5. How would I go about doing that?
I've written this but it only appends it to the end of the text file.
with open(filepath, 'a+') as fa:
fa.writelines(srno)
fa.writelines("M0\n\n")
for i in data:
fa.writelines(i)
file.close()
I want to write what I've copied in 'data' and 'srno' to line 5 in the text file.

hadoop filesystem open file and skip first line

I'm reading the file in my HDFS using Python language.
Each file has a header and I'm trying to merge the files. However, the header in each file also gets merged.
Is there a way to skip the header from second file?
hadoop = sc._jvm.org.apache.hadoop
conf = hadoop.conf.Configuration()
fs = hadoop.fs.FileSystem.get(conf)
src_dir = "/mnt/test/"
out_stream = fs.create(hadoop.fs.Path(dst_file), overwrite)
files = []
for f in fs.listStatus(hadoop.fs.Path(src_dir)):
if f.isFile():
files.append(f.getPath())
for file in files:
in_stream = fs.open(file)
hadoop.io.IOUtils.copyBytes(in_stream, out_stream, conf, False)
Currently I have solved the problem with below logic, however would like to know if there is any better and efficient solution? appreciate your help
for idx,file in enumerate(files):
if debug:
print("Appending file {} into {}".format(file, dst_file))
# remove header from the second file
if idx>0:
file_str = ""
with open('/'+str(file).replace(':',''),'r+') as f:
for idx,line in enumerate(f):
if idx>0:
file_str = file_str + line
with open('/'+str(file).replace(':',''), "w+") as f:
f.write(file_str)
in_stream = fs.open(file) # InputStream object and copy the stream
try:
hadoop.io.IOUtils.copyBytes(in_stream, out_stream, conf, False) # False means don't close out_stream
finally:
in_stream.close()
What you are doing now is appending repeatedly to a string. This is a fairly slow process. Why not write directly to the output file as you are reading?
for file_idx, file in enumerate(files):
with open(...) as out_f, open(...) as in_f:
for line_num, line in enumerate(in_f):
if file_idx == 0 or line_num > 0:
f_out.write(line)
If you can load the file all at once, you can also skip the first line by using readline followed by readlines:
for file_idx, file in enumerate(files):
with open(...) as out_f, open(...) as in_f:
if file_idx != 0:
f_in.readline()
f_out.writelines(f_in.readlines())

How to export text to a new file, userinput?

So i wrote a little program in python which allows me to take a .csv file, filter out the lines i need and then export these into a new .txt file.
This worked quite well, so i decided to make it more user friendly by allowing the user to select the file that should be converted by himself through the console (command line).
My problem: The file is imported as a .csv file but not exported as a .txt file which leads to my program overwriting the original file which will be emptied because of a step in my program which allows me to delete the first two lines of the output text.
Does anyone know a solution for this?
Thanks :)
import csv
import sys
userinput = raw_input('List:')
saveFile = open(userinput, 'w')
with open(userinput, 'r') as file:
reader = csv.reader(file)
count = 0
for row in reader:
print(row[2])
saveFile.write(row[2] + ' ""\n')
saveFile.close()
saveFile = open(userinput, 'r')
data_list = saveFile.readlines()
saveFile.close()
del data_list[1:2]
saveFile = open(userinput, 'w')
saveFile.writelines(data_list)
saveFile.close()
Try This:
userinput = raw_input('List:')
f_extns = userinput.split(".")
saveFile = open(f_extns[0]+'.txt', 'w')
I think you probably just want to save the file with a new name, this Extracting extension from filename in Python talks about splitting out the extension so then you can just add your own extension
you would end up with something like
name, ext = os.path.splitext(userinput)
saveFile = open(name + '.txt', 'w')
You probably just need to change the extension of the output file. Here is a solution that sets the output file extension to .txt; if the input file is also .txt then there will be a problem, but for all other extensions of the input file this should work.
import csv
import os
file_name = input('Name of file:')
# https://docs.python.org/3/library/os.path.html#os.path.splitext
# https://stackoverflow.com/questions/541390/extracting-extension-from-filename-in-python
file_name, file_ext_r = os.path.splitext(file_name)
file_ext_w = '.txt'
file_name_r = ''.format(file_name, file_ext_r)
file_name_w = ''.format(file_name, file_ext_w)
print('File to read:', file_name_r)
print('File to write:', file_name_w)
with open(file_name_r, 'r') as fr, open(file_name_w, 'w') as fw:
reader = csv.reader(fr)
for i, row in enumerate(reader):
print(row[2])
if i >= 2:
fw.write(row[2] + ' ""\n')
I also simplified your logic to avoid writting the first 2 lines to the output file; no need to read and write the output file again.
Does this work for you?

Code not outputting to correct folder Python

so I have a some code that opens a text file containing a list of paths to files like so:
C:/Users/User/Desktop/mini_mouse/1980
C:/Users/User/Desktop/mini_mouse/1982
C:/Users/User/Desktop/mini_mouse/1984
It then opens these files individually, line-by-line, and does some filtering to the files. I then want it to output the result to a completely different folder called:
output_location = 'C:/Users/User/Desktop/test2/'
As it stands, my code currently outputs the result to the place where the original file was opened i.e if it opens the file C:/Users/User/Desktop/mini_mouse/1980, the output will be in the same folder under the name '1980_filtered'. I, however, would like the output to go into the output_location. Could anyone see where I am going wrong currently? Any help would be greatly appreciated! Here is my code:
import os
def main():
stop_words_path = 'C:/Users/User/Desktop/NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
output_location = 'C:/Users/User/Desktop/test2/'
list_file = 'C:/Users/User/Desktop/list_of_files.txt'
with open(list_file, 'r') as f:
for file_name in f:
#print(file_name)
if file_name.endswith('\n'):
file_name = file_name[:-1]
#print(file_name)
file_path = os.path.join(file_name) # joins the new path of the file to the current file in order to access the file
filestring = '' # file string which will take all the lines in the file and add them to itself
with open(file_path, 'r') as f2: # open the file
print('just opened ' + file_name)
print('\n')
for line in f2: # read file line by line
x = remove_stop_words(line, stopwords) # remove stop words from line
filestring += x # add newly filtered line to the file string
filestring += '\n' # Create new line
new_file_path = os.path.join(output_location, file_name) + '_filtered' # creates a new file of the file that is currenlty being filtered of stopwords
with open(new_file_path, 'a') as output_file: # opens output file
output_file.write(filestring)
if __name__ == "__main__":
main()
Assuming you're using Windows (because you have a normal Windows filesystem), you have to use backslashes in your pathnames. Note that this is only on Windows. I know it's annoying, so I changed it for you (you're welcome :)). You also have to use two backslashes, as it will try to use it as an escape char.
import os
def main():
stop_words_path = 'C:\\Users\\User\\Desktop\\NLTK-stop-word-list.txt'
stopwords = get_stop_words_list(stop_words_path)
output_location = 'C:\\Users\\User\\Desktop\\test2\\'
list_file = 'C:\\Users\\User\\Desktop\\list_of_files.txt'
with open(list_file, 'r') as f:
for file_name in f:
#print(file_name)
if file_name.endswith('\n'):
file_name = file_name[:-1]
#print(file_name)
file_path = os.path.join(file_name) # joins the new path of the file to the current file in order to access the file
filestring = '' # file string which will take all the lines in the file and add them to itself
with open(file_path, 'r') as f2: # open the file
print('just opened ' + file_name)
print('\n')
for line in f2: # read file line by line
x = remove_stop_words(line, stopwords) # remove stop words from line
filestring += x # add newly filtered line to the file string
filestring += '\n' # Create new line
new_file_path = os.path.join(output_location, file_name) + '_filtered' # creates a new file of the file that is currenlty being filtered of stopwords
with open(new_file_path, 'a') as output_file: # opens output file
output_file.write(filestring)
if __name__ == "__main__":
main()
Based your code it looks like an issue in the line:
new_file_path = os.path.join(output_location, file_name) + '_filtered'
In Python's os.path.join() any absolute path (or drive letter in Windows) in the inputs will discard everything before it and restart the join from the new absolute path (or drive letter). Since you're calling file_name directly from list_of_files.txt and you have each path formatted there relative to the C: drive, each call to os.path.join() is dropping output_location and being reset to the original file path.
See Why doesn't os.path.join() work in this case? for a better explanation of this behavior.
When building the output path you could strip the file name, "1980" for instance, from the path "C:/Users/User/Desktop/mini_mouse/1980" and join based on the output_location variable and the isolated file name.

Why is it only writing last input to txt?

Output:
Sorry, this was being awfully awkward when I trying to paste my Python code into the code box on this forum post.
Code:
# update three quotes to a file
file_name = "my_quote.txt"
# create a file called my_quote.txt
new_file = open(file_name, 'w')
new_file.close()
def update_file(file_name, quote):
# First open the file
new_file = open(file_name, 'w')
new_file.write("This is an update\n")
new_file.write(quote)
new_file.write("\n\n")
# now close the file
new_file.close()
for index in range(3):
quote = input("Enter your favorite quote: ")
update_file(file_name, quote)
# Now print the contents to the screen
new_file = open(file_name, 'r')
print(new_file.read())
# And finally close the file
new_file.close(
You should be using append instead of write. When you use write, it creates a new file regardless of what was there before. Try new_file = open(file_name, 'a')
Why is it only writing last input to txt?
Everytime you do open(file_name, 'w') it clears the contents of the file and begins to write from the start of the file.
If you would like to append new content to that file do
open(file_name, 'a')
I guess you should use a instead of w to append to file:
new_file = open(file_name, 'a')
And read the docs before asking of course ;)

Categories