Write program to merge files and convert to .csv python? - python

I have written a program in python that does the following:
write an initial header in a new file
merge the files in the new file(ie append file to the new file, I want all my log files to be put together)
finally convert the space seperated file to csv.
What I do is mention the output directory where my file should be, and also a filelist,which contains the location of each file that should be merged, it looks like this:
/Users/ra/Documents/Dryad01/meow.log
/Users/ra/Documents/Dryad01/meow1.log
Then I do python program.py path_to_list_file output_dir
Here is my program :
import csv
def main():
parser = argparse.ArgumentParser()
parser.add_argument("filelist", help="Format: Value File in each line")
parser.add_argument("output_dir", help="output directory")
args = parser.parse_args()
# write header
fout = open(args.output_dir+"merged.txt","a")
fout.write("timestamp type response_time")
#from each file get the data and put it in fout/merge
with open(args.filelist) as f:
for file in f:
file_read = open(file)
for line in file_read:
fout.write(line)
fout.close()
#now all file in filelist have been merged
#next make them into csv files
make_csv(args.output_dir+"merged.txt",args.output_dir+"merged_csv.csv")
def make_csv(file1,file2):
with open(file1) as fin, open(file2, 'w') as fout:
o=csv.writer(fout)
for line in fin:
o.writerow(line.split())
But for some reason I get no error, no warning,but just no file!
What do you think is wrong?

Did you maybe forget the following main pattern at the end of your source file?
if __name__ == '__main__':
main()
If so, your main() function will never be called.

Related

Unable to find string in text file

I am trying to simple find if a string exists in a text file, but I am having issues. I am assuming its something on the incorrect line, but I am boggled.
def extract(mPath, frequency):
if not os.path.exists('history.db'):
f = open("history.db", "w+")
f.close()
for cFile in fileList:
with open('history.db', "a+") as f:
if cFile in f.read():
print("File found - skip")
else:
#with ZipFile(cFile, 'r') as zip_ref:
#zip_ref.extractall(mPath)
print("File Not Found")
f.writelines(cFile + "\n")
print(cFile)
Output:
File Not Found
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\Test1.zip
File Not Found
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\test2.zip
Text within the history.db file:
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\Test1.zip
C:\Users\jefhill\Desktop\Python Stuff\Projects\autoExtract\test2.zip
What am I missing? Thanks in advance
Note: cFile is the file path shown in the output and fileList is the list of both the paths from the output.
You're using the wrong flags for what you want to do. open(file, 'a') opens a file for append-writing, meaning that it seeks to the end of the file. Adding the + modifier means that you can also read from the file, but you're doing so from the end of the file; so read() returns nothing, because there's nothing beyond the end of the file.
You can use r+ to read from the start of the file while having the option of writing to it. But keep in mind that anytime you write you'll be writing to the reader's current position in the file.
I haven't tested the code but this should put you on the right track!
def extract(mPath, frequency):
if not os.path.exists('history.db'):
f = open("history.db", "w+")
f.close()
with open('history.db', "rb") as f:
data = f.readlines()
for line in data:
if line.rstrip() in fileList: #assuming fileList is a list of strings
#do everything else here

How to amend an existing python file

I am trying to ammend a group of files in a folder, by adding F to the 4th line (which is number 3 in python, if I'm correct). With the following code below, the code is just continuously running and not making the amendments, anyone got any ideas?
import os
from glob import glob
list_of_files = glob('*.gjf') # get list of all .gjf files
for file in list_of_files:
# read file:
with open(file, 'r+') as f:
lines=f.readlines()
for line in lines:
lines.insert(3,'F')
for file in files:
# read your file's lines
with open(file, 'r') as f:
lines = f.readlines()
# add the value to insert in the list of lines at index 3
# don't forget line-break (\n)
lines.insert(3, 'F\n')
# join lines to create a string
text = ''.join(lines)
# don't forget to write the string back in your file
with open(file, 'w') as f:
f.write(text)
You are not able to edit a file with Python in this way. You would need to create a temporary file and then do some cleanup at the end by renaming the temporary file and removing the temporary file. The core logic is to read through the original files and add an F to the 4th line, otherwise just add the line. This can be done with a function like this:
import os
def add_f_4th_line(filename):
with open(filename, 'r') as f_in:
with opne('temp', 'w') as f_out:
for line_number, line_contents in enumerate(f_in):
if line_number == 3:
f_out.write(line_contents.replace('\n', 'F\n'))
else:
f_out.write(line_contents)
os.rename('temp', filename)
os.remove('temp')
enumerate will keep make an iterator of the line number and the contents of the line, so just iterate over each line, and you can find the 4th line with line_number == 3. Then, it will take that line and replace \n with F\n. \n is a new line character, so I'm assuming the end of the line is this character. So after adding this function, you'll just need to call this function for each file you get with your glob call.
import os
from glob import glob
list_of_files = glob('*.gjf') # get list of all .gjf files
for file in list_of_files:
add_f_4th_line(file)

how to open a file using strings of other file in python?

I have 1000 files, and the name of these are "numbers", for example, 2323.csv.
I have these name in a file called 1.txt.
Now I want to open these files one by one in python, using 1.txt to open them.
How can I do this?
Why not this?
with open('1.txt', 'r') as listFile:
for line in listFile:
with open(line.rstrip(), 'r') as individualFile:
# do stuff
Roughly and very basic but understandable code (no error handling).
with open('1.txt', 'r') as f:
for line in f.readlines(): # This assumes each line has a number
with open('.'.join([line, 'csv']) as cf:
file_content = cf.readlines()
print(file_content)

search replace the string from number of .txt files in python

there are multiple files in directory with extension .txt, .dox, .qcr etc.
i need to list out txt files, search & replace the text from each txt files only.
need to search the $$\d ...where \d stands for the digit 1,2,3.....100.
need to replace with xxx.
please let me know the python script for this .
thanks in advance .
-Shrinivas
#created following script, it works for single txt files, but it is not working for txt files more than one lies in directory.
-----
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=1):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
sys.stdout.write(line)
#following code is not working, i expect to list out the files start #with "um_*.txt", open the file & replace the "$$\d" with replaceAll function.
for um_file in glob.glob('*.txt'):
t = open(um_file, 'r')
replaceAll("t.read","$$\d","xxx")
t.close()
fileinput.input(...) is supposed to process a bunch of files, and must be ended with a corresponding fileinput.close(). So you can either process all in one single call:
def replaceAll(file,searchExp,replaceExp):
for line in fileinput.input(file, inplace=True):
if searchExp in line:
line = line.replace(searchExp,replaceExp)
dummy = sys.stdout.write(line) # to avoid a possible output of the size
fileinput.close() # to orderly close everythin
replaceAll(glob.glob('*.txt'), "$$\d","xxx")
or consistently close fileinput after processing each file, but it rather ignores the main fileinput feature.
Try out this.
import re
def replaceAll(file,searchExp,replaceExp):
for line in file.readlines():
try:
line = line.replace(re.findall(searchExp,line)[0],replaceExp)
except:
pass
sys.stdout.write(line)
#following code is not working, i expect to list out the files start #with "um_*.txt", open the file & replace the "$$\d" with replaceAll function.
for um_file in glob.glob('*.txt'):
t = open(um_file, 'r')
replaceAll(t,"\d+","xxx")
t.close()
Here we are sending file handler to the replaceAll function rather than a string.
You can try this:
import os
import re
the_files = [i for i in os.listdir("foldername") if i.endswith("txt")]
for file in the_files:
new_data = re.sub("\d+", "xxx", open(file).read())
final_file = open(file, 'w')
final_file.write(new_data)
final_file.close()

Why does my script write every input string twice to the output file?

When I look at what it wrote it's always double. For example if I write 'dog' ill get 'dogdog'. Why?
Reading and writing to file, filename taken from command line arguments:
from sys import argv
script,text=argv
def reading(f):
print f.read()
def writing(f):
print f.write(line)
filename=open(text)
#opening file
reading(filename)
filename.close()
filename=open(text,'w')
line=raw_input()
filename.write(line)
writing(filename)
filename.close()
As I said the output I am getting is the double value of what input I am giving.
You are getting double value because you are writing two times
1) From the Function call
def writing(f):
print f.write(line)
2) By writing in file using filename.write(line)
Use this code:
from sys import argv
script,text=argv
def reading(f):
print f.read()
def writing(f):
print f.write(line)
filename=open(text,'w')
line=raw_input()
writing(filename)
filename.close()
And also no need to close file two times, once you finished all the read and write operations then just close it.
If you want to display each line and then write a new line, you should probably just read the entire file first, and then loop over the lines when writing new content.
Here's how you can do it. When you use with open(), you don't have to close() the file, since that's done automatically.
from sys import argv
filename = argv[1]
# first read the file content
with open(filename, 'r') as fp:
lines = fp.readlines()
# `lines` is now a list of strings.
# then open the file for writing.
# This will empty the file so we can write from the start.
with open(filename, 'w') as fp:
# by using enumerate, we can get the line numbers as well.
for index, line in enumerate(lines, 1):
print 'line %d of %d:\n%s' % (index, len(lines), line.rstrip())
new_line = raw_input()
fp.write(new_line + '\n')

Categories