Combine files as handy as possible - python

Combine files as handy as possible
Suppose I have the following markdown files
1.md # contains 'foo'
2.md # contains 'bar'
3.md # 'zoo'
4.md # 'zxc'
They are easy to be merged using command cat
$ cat {1..4}.md > merged_5.md
Nevertheless, Python requires multiple steps to achieve this result.
Create Read and Write Methods
def read(filename):
with open(filename) as file:
content = file.read()
return content
def write(filename, content):
with open(filename, 'w') as file:
file.write(content)
Retrieve the qualified files
import glob
filenames = glob.glob('*.md')
In [17]: filenames
Out[17]: ['1.md', '2.md', '3.md', '4.md']
Read and combine
def combine(filenames):
merged_conent = ""
for filename in filenames:
content = read(filename)
merged_content += content
write('merged.md', merged_content)
Encapsulate data and methods in main module and save as 'combine_files.py'
def main():
filenames = glob.glob('*.md')
combine(filenames)
if __name__ == '__main__':
main()
Run it on command line
python3 combine_files.py
It's not handy as command 'cat'
How to refactor the codes to be as handy as possible?

How about something like?:
with open('merged.md', 'w') as out_f:
for filename in glob.glob('*.md'):
with open(filename) as f:
out_f.write(f.read())

How about just do the easy:
def cat(out, *src):
'''Concatenate files'''
with open(out, 'wb') as f:
data = b'\n'.join(open(i, 'rb').read() for i in src)
f.write(data)
You may now call it with cat('merged.md', glob.glob('*.md')). How's that for handy? Certainly much easier than the source of GNU Coreutils.

Related

How to read pickled files using command line in Python?

I have 2 pickled files on the computer and I want to load it to my script. Would this code I wrote work in python?
import sys
import pickle
filename1 = sys.argv(1)
filename2 = sys.argv(2)
def read_file(filename1,filename2):
with open(filename1, 'rb') as file1:
file1=pickle.load()
with open(filename1, 'rb') as file2:
file2=pickle.load()
return file1
return file2
sys.exit()
As mentioned here:
1 # Load the dictionary back from the pickle file.
2 import pickle
3
4 favorite_color = pickle.load( open( "save.p", "rb" ) )
5 # favorite_color is now { "lion": "yellow", "kitty": "red" }
So in your case you can do:
def read_file(filename1,filename2):
with open(filename1, 'rb') as file1:
f1=pickle.load(file1)
with open(filename1, 'rb') as file2:
f2=pickle.load(file2)
return f1,f2
There are a few things that don't quite work here:
Your second with-clause is not inside of your function, neither are your returns
A function can only return once. When python reaches the first return inside a function, it exits that function. But you can return multiple things at once, e.g. with return file1, file2
Your pickle-loading is the other way around. Like this, you redefine your file1 and file2 - variables with whatever your call of pickle.load() (notice that it didn't get any arguments) would give you -- what you want is to pass the file you opened to pickle.load and then save the output of that to a new variable
You never actually called your read_file function
your indentation seems to be off
sys.argv is a list, you can't call it (Round braces, like sys.argv(1)) but you can index it (square brackets sys.argv[1])
import sys
import pickle
def read_file(filename1,filename2):
with open(filename1, 'rb') as file1:
data1 = pickle.load(file1)
with open(filename1, 'rb') as file2:
data2 = pickle.load(file2)
return data1, data2
filename1 = sys.argv[1]
filename2 = sys.argv[2]
your_data = read_file(filename1, filename2)
print(your_data) # Now you use your data, i used print as an example
sys.exit() # You usually only write that if you specifically need to
Further suggestion:
You don't necessarily have to make one function to handle two files, you can also just make one function to handle one file and call it twice.
import pickle
def read_file(filename):
with open(filename, 'rb') as file1:
data = pickle.load(file1)
return data1
filename1 = sys.argv[1]
filename2 = sys.argv[2]
your_data = read_file(filename1), read_file(filename2)
print(your_data)

How to add for loop in python?

I'm creating new files from originally existing ones in the mdp folder by changing a couple of lines in those files using python. I need to do this for 1000 files. Can anyone suggest a for loop which reads all files and changes them and creates new in one go?
This way I have to change the the number followed by 'md_' in the path and it's tedious because there are a 1000 files here.
I tried using str() but there was a 'could not read file error'
fin = open("/home/abc/xyz/mdp/md_1.mdp", "rt")
fout = open("/home/abc/xyz/middle/md_1.mdp", "wt")
for line in fin:
fout.write(line.replace('integrator = md', 'integrator
= md-vv'))
fin = open("/home/abc/xyz/middle/md_1.mdp", "rt")
fout = open("/home/abc/xyz/mdb/md_1.mdp", "wt")
for line in fin:
fout.write(line.replace('dt = 0.001', 'dt
= -0.001'))
fin.close()
fout.close()
os.listdir(path) is your friend:
import os
sourcedir = "/home/abc/xyz/mdp"
destdir = "/home/abc/xyz/middle"
for filename in os.listdir(sourcedir):
if not filename.endswith(".mdp"):
continue
source = os.path.join(sourcedir, filename)
dest = os.path.join(destdir, filename)
# with open(xxx) as varname makes sure the file(s)
# will be closed whatever happens in the 'with' block
# NB text mode is the default, and so is read mode
with open(source) as fin, open(dest, "w") as fout:
# python files are iterable... avoids reading
# the whole file in memory at once
for line in fin:
# will only work for those exact strings,
# you may want to use regexps if number of
# whitespaces vary etc
line = line.replace("dt = 0.001", "dt = -0.001")
line = line.replace(
'integrator = md',
'integrator = md-vv'
)
fout.write(line)
Assuming you want to edit all files that are located in the mdp folder you could do something like this.
import os
dir = "/home/abc/xyz/mdp/"
for filename in os.listdir(dir):
with open(dir + filename, "r+") as file:
text = file.read()
text = text.replace("dt = 0.001", "dt = -0.001")
file.seek(0)
file.write(text)
file.truncate()
This will go through every file and change it using str.replace().
If there are other files in the mdp folder that you do not want to edit, you could use and if-statement to check for the correct file name. Add something like this to encase the with open statement.
if filename.startswith("md_")

Save file after converting from pdf to txt

I have converted some pdf files to read in txt format. However, how do I save it after converting? I was trying to use file.write('file1') option, but it seems not to be working.
file1 = pdf_to_txt("important_file_1.pdf")
file2 = pdf_to_txt("important_file_2.pdf")
Thank you for the help.
You need to open a new file with write mode:
file1 = pdf_to_txt("important_file_1.pdf")
f = open('pdf_to_text.txt', 'w')
f.write(file1)
f.close()
To make it reusable do something like this:
import time
def save_pdf_to_text(file_to_save, filename=None):
if not filename:
timestr = time.strftime("%Y-%m-%d-%H-%M-%S")
filename = '{}.txt'.format(timestr)
with open(filename, 'w') as f:
f.write(file_to_save)
Usage:
file1 = pdf_to_txt("important_file_1.pdf")
save_pdf_to_text(file1)
it will create a file with timestamp. Or pass filename as second argument.

Looping multiple files into a single csv file in python

I am trying to process several files into a single, merged csv file using python. So far, I have
files = ["file1.txt", "file2.txt", "file3.txt"]
def doSomething(oldfile):
content = []
with open oldfile as file:
content = file.read().splitlines()
file.close()
return content.reverse()
with open("newfile.txt", "w") as file:
w = csv.writer(file, dialect = "excel-tab")
for i in range(0, len(files)):
w. writerows(doSomething(files[i])
file.close()
The new file is being created, but there is nothing in it. I am curious about what is going on.
Thanks!
For starters, list.reverse() reverses the list in place and doesn't return anything so you're essentially returning None from your doSomething() function. You'll actually want to split that into two lines:
content.reverse()
return content
If you want to streamline your code, here's a suggestion:
def doSomething(oldfile):
with open(oldfile, "r") as f:
return reversed(f.read().splitlines())
files = ["file1.txt", "file2.txt", "file3.txt"]
with open("newfile.txt", "wb") as file:
w = csv.writer(file, dialect = "excel-tab")
for current_file in files:
w.writerows(doSomething(current_file))
I think your program crashes for several reasons:
open(..) is a function, so you cannot write:
with open oldfile as file:
a with statement for files is used to enforce closing of a file, so file.close() is actually not necessary.
.reverse() works inplace: it returns None, you can use reversed(..) for that.
You can fix it with:
files = ["file1.txt", "file2.txt", "file3.txt"]
def doSomething(oldfile):
content = []
with open(oldfile,'r') as file:
return list(reversed(file))
with open("newfile.txt", "w") as file:
w = csv.writer(file, dialect = "excel-tab")
for oldfile in files:
w.writerows(doSomething(oldfile))
I also used a for loop over the list, instead of the indices, since that is more "pythonic". Furthermore a file is iterable over its rows. So one can use reversed(file) to obtain the lines of the file in reverse.

File append in python

I have n files in the location /root as follows
result1.txt
abc
def
result2.txt
abc
def
result3.txt
abc
def
and so on.
I must create a consolidated file called result.txt with all the values concatenated from all result files looping through the n files in a location /root/samplepath.
It may be easier to use cat, as others have suggested. If you must do it with Python, this should work. It finds all of the text files in the directory and appends their contents to the result file.
import glob, os
os.chdir('/root')
with open('result.txt', 'w+') as result_file:
for filename in glob.glob('result*.txt'):
with open(filename) as file:
result_file.write(file.read())
# append a line break if you want to separate them
result_file.write("\n")
That could be an easy way of doing so
Lets says for example that my file script.py is in a folder and along with that script there is a folder called testing, with inside all the text files named like file_0, file_1....
import os
#reads all the files and put everything in data
number_of_files = 0
data =[]
for i in range (number_of_files):
fn = os.path.join(os.path.dirname(__file__), 'testing/file_%d.txt' % i)
f = open(fn, 'r')
for line in f:
data.append(line)
f.close()
#write everything to result.txt
fn = os.path.join(os.path.dirname(__file__), 'result.txt')
f = open(fn, 'w')
for element in data:
f.write(element)
f.close()

Categories