Save file after converting from pdf to txt

Save file after converting from pdf to txt - python

I have converted some pdf files to read in txt format. However, how do I save it after converting? I was trying to use file.write('file1') option, but it seems not to be working.
file1 = pdf_to_txt("important_file_1.pdf")
file2 = pdf_to_txt("important_file_2.pdf")
Thank you for the help.

You need to open a new file with write mode:
file1 = pdf_to_txt("important_file_1.pdf")
f = open('pdf_to_text.txt', 'w')
f.write(file1)
f.close()
To make it reusable do something like this:
import time
def save_pdf_to_text(file_to_save, filename=None):
if not filename:
timestr = time.strftime("%Y-%m-%d-%H-%M-%S")
filename = '{}.txt'.format(timestr)
with open(filename, 'w') as f:
f.write(file_to_save)
Usage:
file1 = pdf_to_txt("important_file_1.pdf")
save_pdf_to_text(file1)
it will create a file with timestamp. Or pass filename as second argument.

Related

How to add for loop in python?

I'm creating new files from originally existing ones in the mdp folder by changing a couple of lines in those files using python. I need to do this for 1000 files. Can anyone suggest a for loop which reads all files and changes them and creates new in one go?
This way I have to change the the number followed by 'md_' in the path and it's tedious because there are a 1000 files here.
I tried using str() but there was a 'could not read file error'
fin = open("/home/abc/xyz/mdp/md_1.mdp", "rt")
fout = open("/home/abc/xyz/middle/md_1.mdp", "wt")
for line in fin:
fout.write(line.replace('integrator = md', 'integrator
= md-vv'))
fin = open("/home/abc/xyz/middle/md_1.mdp", "rt")
fout = open("/home/abc/xyz/mdb/md_1.mdp", "wt")
for line in fin:
fout.write(line.replace('dt = 0.001', 'dt
= -0.001'))
fin.close()
fout.close()

os.listdir(path) is your friend:
import os
sourcedir = "/home/abc/xyz/mdp"
destdir = "/home/abc/xyz/middle"
for filename in os.listdir(sourcedir):
if not filename.endswith(".mdp"):
continue
source = os.path.join(sourcedir, filename)
dest = os.path.join(destdir, filename)
# with open(xxx) as varname makes sure the file(s)
# will be closed whatever happens in the 'with' block
# NB text mode is the default, and so is read mode
with open(source) as fin, open(dest, "w") as fout:
# python files are iterable... avoids reading
# the whole file in memory at once
for line in fin:
# will only work for those exact strings,
# you may want to use regexps if number of
# whitespaces vary etc
line = line.replace("dt = 0.001", "dt = -0.001")
line = line.replace(
'integrator = md',
'integrator = md-vv'
)
fout.write(line)

Assuming you want to edit all files that are located in the mdp folder you could do something like this.
import os
dir = "/home/abc/xyz/mdp/"
for filename in os.listdir(dir):
with open(dir + filename, "r+") as file:
text = file.read()
text = text.replace("dt = 0.001", "dt = -0.001")
file.seek(0)
file.write(text)
file.truncate()
This will go through every file and change it using str.replace().
If there are other files in the mdp folder that you do not want to edit, you could use and if-statement to check for the correct file name. Add something like this to encase the with open statement.
if filename.startswith("md_")

How to export text to a new file, userinput?

So i wrote a little program in python which allows me to take a .csv file, filter out the lines i need and then export these into a new .txt file.
This worked quite well, so i decided to make it more user friendly by allowing the user to select the file that should be converted by himself through the console (command line).
My problem: The file is imported as a .csv file but not exported as a .txt file which leads to my program overwriting the original file which will be emptied because of a step in my program which allows me to delete the first two lines of the output text.
Does anyone know a solution for this?
Thanks :)
import csv
import sys
userinput = raw_input('List:')
saveFile = open(userinput, 'w')
with open(userinput, 'r') as file:
reader = csv.reader(file)
count = 0
for row in reader:
print(row[2])
saveFile.write(row[2] + ' ""\n')
saveFile.close()
saveFile = open(userinput, 'r')
data_list = saveFile.readlines()
saveFile.close()
del data_list[1:2]
saveFile = open(userinput, 'w')
saveFile.writelines(data_list)
saveFile.close()

Try This:
userinput = raw_input('List:')
f_extns = userinput.split(".")
saveFile = open(f_extns[0]+'.txt', 'w')

I think you probably just want to save the file with a new name, this Extracting extension from filename in Python talks about splitting out the extension so then you can just add your own extension
you would end up with something like
name, ext = os.path.splitext(userinput)
saveFile = open(name + '.txt', 'w')

You probably just need to change the extension of the output file. Here is a solution that sets the output file extension to .txt; if the input file is also .txt then there will be a problem, but for all other extensions of the input file this should work.
import csv
import os
file_name = input('Name of file:')
# https://docs.python.org/3/library/os.path.html#os.path.splitext
# https://stackoverflow.com/questions/541390/extracting-extension-from-filename-in-python
file_name, file_ext_r = os.path.splitext(file_name)
file_ext_w = '.txt'
file_name_r = ''.format(file_name, file_ext_r)
file_name_w = ''.format(file_name, file_ext_w)
print('File to read:', file_name_r)
print('File to write:', file_name_w)
with open(file_name_r, 'r') as fr, open(file_name_w, 'w') as fw:
reader = csv.reader(fr)
for i, row in enumerate(reader):
print(row[2])
if i >= 2:
fw.write(row[2] + ' ""\n')
I also simplified your logic to avoid writting the first 2 lines to the output file; no need to read and write the output file again.
Does this work for you?

Combine files as handy as possible

Combine files as handy as possible
Suppose I have the following markdown files
1.md # contains 'foo'
2.md # contains 'bar'
3.md # 'zoo'
4.md # 'zxc'
They are easy to be merged using command cat
$ cat {1..4}.md > merged_5.md
Nevertheless, Python requires multiple steps to achieve this result.
Create Read and Write Methods
def read(filename):
with open(filename) as file:
content = file.read()
return content
def write(filename, content):
with open(filename, 'w') as file:
file.write(content)
Retrieve the qualified files
import glob
filenames = glob.glob('*.md')
In [17]: filenames
Out[17]: ['1.md', '2.md', '3.md', '4.md']
Read and combine
def combine(filenames):
merged_conent = ""
for filename in filenames:
content = read(filename)
merged_content += content
write('merged.md', merged_content)
Encapsulate data and methods in main module and save as 'combine_files.py'
def main():
filenames = glob.glob('*.md')
combine(filenames)
if __name__ == '__main__':
main()
Run it on command line
python3 combine_files.py
It's not handy as command 'cat'
How to refactor the codes to be as handy as possible?

How about something like?:
with open('merged.md', 'w') as out_f:
for filename in glob.glob('*.md'):
with open(filename) as f:
out_f.write(f.read())

How about just do the easy:
def cat(out, *src):
'''Concatenate files'''
with open(out, 'wb') as f:
data = b'\n'.join(open(i, 'rb').read() for i in src)
f.write(data)
You may now call it with cat('merged.md', glob.glob('*.md')). How's that for handy? Certainly much easier than the source of GNU Coreutils.

File append in python

I have n files in the location /root as follows
result1.txt
abc
def
result2.txt
abc
def
result3.txt
abc
def
and so on.
I must create a consolidated file called result.txt with all the values concatenated from all result files looping through the n files in a location /root/samplepath.

It may be easier to use cat, as others have suggested. If you must do it with Python, this should work. It finds all of the text files in the directory and appends their contents to the result file.
import glob, os
os.chdir('/root')
with open('result.txt', 'w+') as result_file:
for filename in glob.glob('result*.txt'):
with open(filename) as file:
result_file.write(file.read())
# append a line break if you want to separate them
result_file.write("\n")

That could be an easy way of doing so
Lets says for example that my file script.py is in a folder and along with that script there is a folder called testing, with inside all the text files named like file_0, file_1....
import os
#reads all the files and put everything in data
number_of_files = 0
data =[]
for i in range (number_of_files):
fn = os.path.join(os.path.dirname(__file__), 'testing/file_%d.txt' % i)
f = open(fn, 'r')
for line in f:
data.append(line)
f.close()
#write everything to result.txt
fn = os.path.join(os.path.dirname(__file__), 'result.txt')
f = open(fn, 'w')
for element in data:
f.write(element)
f.close()

Save file without first and last double quotes

I am trying to save my data to a file. My problem is the file i saved contains double quotes at the first and the last of a line. I have tried many ways to solve it from str.replace(), strip, csv to json, pickle. However, the problem has been still persistent. I have got stuck with it. Please help me. I will detail my problem below.
Firstly, I have a file called angles.txt like that:
{'left_w0': -2.6978887076110842, 'left_w1': -1.3257428944152834, 'left_w2': -1.7533400385498048, 'left_e0': 0.03566505327758789, 'left_e1': 0.6948932961 181641, 'left_s0': -1.1665923878540039, 'left_s1': -0.6726505747192383}
{'left_w0': -2.6967382220214846, 'left_w1': -0.8440729275695802, 'left_w2': -1.7541070289428713, 'left_e0': 0.036048548474121096, 'left_e1': 0.166820410 49194338, 'left_s0': -0.7731263162109375, 'left_s1': -0.7056311616210938}
I read line by line from the text file and transfer to a dict variable called data. Here is the reading file code:
def read_data_from_file(file_name):
data = dict()
f = open(file_name, 'r')
for index_line in range(1, number_lines +1):
data[index_line] = eval(f.readline())
f.close()
return data
Then I changed something in the data. Something like data[index_line]['left_w0'] = data[index_line]['left_w0'] + 0.0006. After that I wrote my data into another text file. Here is the code:
def write_data_to_file(data, file_name)
f = open(file_name, 'wb')
data_convert = dict()
for index_line in range(1, number_lines):
data_convert[index_line] = repr(data[index_line])
data_convert[index_line] = data_convert[index_line].replace('"','') # I also used strip
json.dump(data_convert[index_line], f)
f.write('\n')
f.close()
The result I received in the new file is:
"{'left_w0': -2.6978887076110842, 'left_w1': -1.3257428944152834, 'left_w2': -1.7533400385498048, 'left_e0': 0.03566505327758789, 'left_e1': 0.6948932961 181641, 'left_s0': -1.1665923878540039, 'left_s1': -0.6726505747192383}"
"{'left_w0': -2.6967382220214846, 'left_w1': -0.8440729275695802, 'left_w2': -1.7541070289428713, 'left_e0': 0.036048548474121096, 'left_e1': 0.166820410 49194338, 'left_s0': -0.7731263162109375, 'left_s1': -0.7056311616210938}"
I cannot remove "".

You could simplify your code by removing unnecessary transformations:
import json
def write_data_to_file(data, filename):
with open(filename, 'w') as file:
json.dump(data, file)
def read_data_from_file(filename):
with open(filename) as file:
return json.load(file)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Save file after converting from pdf to txt - python

I have converted some pdf files to read in txt format. However, how do I save it after converting? I was trying to use file.write('file1') option, but it seems not to be working. file1 = pdf_to_txt("important_file_1.pdf") file2 = pdf_to_txt("important_file_2.pdf") Thank you for the help.

Related

How to add for loop in python?

How to export text to a new file, userinput?

Combine files as handy as possible

File append in python

Save file without first and last double quotes

Categories

Resources