Format Pdftotext output object - python

I am extracting text from a Pdf file and I wanted to get a formatted output of the content. As the Object is made of a list sentences I thought that using textwrap.wrap would have had the job to work on the joined content.
I have tried with
for file in listoffiles
with open(file, "rb") as f:
pdf = pdftotext.PDF(f)
with open("filename" + ".txt") , 'w', encoding = 'utf-8') as f:
f.write(textwrap.wrap("\n\n".join(pdf), width = 70, ))
f.close()
I have also tried with {:<}".format("\n\n".join(pdf) instead of textwrap but it gives me back the dame type of result.
Is there any way to pass a clean file to the wrapper?

Related

Merge video with base64

I am merging videos in python.(ffmpeg,moviepy)
But it is very slow.
So, I am trying to encode a 1.mp4 file and a 2.mp4 file using base64 and combine them.
I have the code below.
import base64
with open('1.mp4', "rb") as videoFile:
text = base64.b64encode(videoFile.read())
with open("2.mp4", "rb") as videoFile:
texts = base64.b64encode(videoFile.read())
fh = open("dfghhffdssdf.mp4", "wb")
fh.write(base64.b64decode(text+texts))
fh.close()
I tried running the code and the video didn't merge.
So I created a new code like below.
import base64
with open('1.mp4', "rb") as videoFile:
text = base64.b64encode(videoFile.read())
with open("2.mp4", "rb") as videoFile:
texts = base64.b64encode(videoFile.read())
text = str(text).replace("=","") + str(texts)
fh = open("dfghhffdssdf.mp4", "wb")
fh.write(base64.b64decode(text+texts))
fh.close()
Then, the following error is displayed.
can only concatenate str (not "bytes") to str
Therefore, if you replace the "text" variable with bytes with the bytes function, you will get the following error:
string argument without an encoding
What should I do?
If that's not possible, please tell me how to quickly merge video files.

How can I manipulate a txt file to be all in lowercase in python?

Let's say that I have a txt file that I have to get all in lowercase. I tried this
def lowercase_txt(file):
file = file.casefold()
with open(file, encoding = "utf8") as f:
f.read()
Here I get "'str' object has no attribute 'read'"
then I tried
def lowercase_txt(file):
with open(poem_filename, encoding="utf8") as f:
f = f.casefold()
f.read()
and here '_io.TextIOWrapper' object has no attribute 'casefold'
What can I do?
EDIT: I re-runned this exact code and now there are no errors (dunno why) but the file doesn't change at all, all the letters stay the way they are.
This will rewrite the file. Warning: if there is some type of error in the middle of processing (power failure, you spill coffee on your computer, etc.) you could lose your file. So, you might want to first make a backup of your file:
def lowercase_txt(file_name):
"""
file_name is the full path to the file to be opened
"""
with open(file_name, 'r', encoding = "utf8") as f:
contents = f.read() # read contents of file
contents = contents.lower() # convert to lower case
with open(file_name, 'w', encoding = "utf8") as f: # open for output
f.write(contents)
For example:
lowercase_txt('/mydirectory/test_file.txt')
Update
The following version opens the file for reading and writing. After the file is read, the file position is reset to the start of the file before the contents is rewritten. This might be a safer option.
def lowercase_txt(file_name):
"""
file_name is the full path to the file to be opened
"""
with open(file_name, 'r+', encoding = "utf8") as f:
contents = f.read() # read contents of file
contents = contents.lower() # convert to lower case
f.seek(0, 0) # position back to start of file
f.write(contents)
f.truncate() # in case new encoded content is shorter than older

How Do I Decode/Encode A Video To A Text File and then Back To Video?

I want to take a video - take the video contents - and turn it into base64. Then, I want to take that text file - decode it from base64 - and then turn it back into a video.
Currently, I have been able to turn the video into a text file, but when I try to convert it back into a video I get an empty text file instead of a video file.
How do I fix this?
import base64
with open("al.mp4", "rb") as videoFile:
text = base64.b64encode(videoFile.read())
print(text)
file = open("textTest.txt", "wb")
file.write(text)
file.close()
fh = open("video.mp4", "wb")
fh.write(base64.b64decode(str))
fh.close()
import base64
with open("al.mp4", "rb") as videoFile:
text = base64.b64encode(videoFile.read())
print(text)
file = open("textTest.txt", "wb")
file.write(text)
file.close()
fh = open("video.mp4", "wb")
fh.write(base64.b64decode(text))
fh.close()
This is the code that works.
You were trying to write str to the file. Now str in python is the name of the string class. You can do something like str = "assda" but that is not recommended. And furthermore, str is not the stuff you just read from the file. That is text. Just write text and you're good.

Problems writing to text file

I'm having a little trouble writing data to text files with python. Basically, what I want to do is read information in a text file, update the read text, and write the updated information back to the same text file. Reading and updating the text is easy enough, however, I run into difficulties when I try to write the updated text back to the text file.
The text file is very basic and consists of three lines. Here it is:
48850
z_merged_shapefiles
EDRN_048850
I used the code below to try and update it but got this error: 'file' object has no attribute 'writeline'
Here is the code that I used:
fo = open("C:\\Users\\T0015685\\Documents\\Python\\Foo1.txt", "r")
read1 = fo.readline()
read2 = fo.readline()
read3 = fo.readline()
fo.close()
edrn_v = int(read1) + 1
newID = "EDRN_" + str(edrn_v)
fo = open("C:\\Users\\T0015685\\Documents\\Python\\Foo1.txt", "w")
fo.writeline(edrn_v)
fo.writeline(read2)
fo.writeline(newID)
Although there is a readline there is no analog writeline.
You can either use a write and append a '\n' to terminate a line
with open("C:\\Users\\T0015685\\Documents\\Python\\Foo1.txt", "w") as fo:
fo.write(edrn_v + '\n')
fo.write(read2 + '\n')
fo.write(newID + '\n')
Or put all the variables in a list and use writelines.
with open("C:\\Users\\T0015685\\Documents\\Python\\Foo1.txt", "w") as fo:
fo.writelines([edrn_v, read2, newID])
Note
I am using the with open statement
with open() as f:
So you don't have to manage the open and close yourself
f.open()
f.read()
f.close()

How can I create files , read and write files in Python?

All the tutorials I can find follow the same format which isn't working.I don't get an error message but I don't get normal output. What I get appears to be the file description at some memory location.
# file_test
ftpr= open("file","w")
ftpr.write("This is a sample line/n")
a=open("file","r")
print a
#This is the result
<open file 'file', mode 'r' at 0x00000000029DDDB0>
>>>
Do you want to read the contents of the file? Try print a.readlines().
Ie:
with open('file', 'w') as f:
f.write("Hello, world!\nGoodbye, world!\n")
with open('file', 'r') as f:
print f.readlines() # ["Hello, world!\n", "Goodbye, world!\n"]
FYI, the with blocks, if you're unfamiliar with them, ensure that the open()-d files are close()-d.
This is not the correct way to read the file. You are printing return value from open call which is object of file type. Do like this for reading and writing.
for writing
f=open("myfile","w")
f.write("hello\n")
f.write("This is a sample line/n")
f.close()
For reading
f=open("file","r")
string = f.read()
print("string")
f.close()

Categories