Getting NotImplementedError file fetching table data from pdf using python pdftables

Getting NotImplementedError file fetching table data from pdf using python pdftables - python

I am using Python pdftables to fetch table data from pdf and i followed the instructions as give in git
https://github.com/drj11/pdftables
but when i run the code
filepath = 'tests.pdf'
fileobj = open(filepath,'rb')
from pdftables.pdf_document import PDFDocument
doc = PDFDocument.from_fileobj(fileobj)
i get error like this
File "<stdin>", line 1, in <module>
File "pdftables/pdf_document.py", line 53, in from_fileobj
raise NotImplementedError
can any anyone help me out in this problem

If you look at the file implementing the from_fileobj function you can see the following comment:
# TODO(pwaller): For now, put fh into a temporary file and call
# .from_path. Future: when we have a working stream input function for
# poppler, use that.
If I understand it correctly you should instead use the from_path function as from_fileobj is not implemented yet. This is easy with your current code:
filepath = 'tests.pdf'
from pdftables.pdf_document import PDFDocument
doc = PDFDocument.from_path(filepath)

Related

Getting Assertion error while reading the PDF file python - pypdf2

I am getting the below error when I try to read a PDF file.
Code:
from PyPDF2 import PdfFileReader
import os
os.chdir("Path to dir")
pdf_document = 'sample.pdf'
pdf = PdfFileReader(pdf_document,'rb') #Error here
Error:
Traceback (most recent call last):
File "/home/krishna/PycharmProjects/sample/sample.py", line 9, in
pdf = PdfFileReader(filehandle)
File "/home/krishna/PycharmProjects/AI_DRC/venv/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1084, in init
self.read(stream)
File "/home/krishna/PycharmProjects/AI_DRC/venv/lib/python3.6/site-packages/PyPDF2/pdf.py", line 1838, in read
assert start >= last_end
AssertionError
NOTE: File is 18 MB in size

Here I wrote this and it completely works for me, The pdf is in same folder, you can use os to get a path value of string type too
import PyPDF2
pdf_file = PyPDF2.PdfFileReader("Sample.pdf")#addressing the file, you can use os method it works on that as well
page_content = pdf_file.getPage(0).extractText()# here I get the psge number one(index zero) and then extracted its content
print(page_content)#you can then do whatever you want with it
I think the problem with your program is that "rb" thing, you use it in normal file handling, PyPDF2 already has methods called PdfFileReader, PdfFileWriter and PdfFileMerger.
Hope it helped
If you counter any problem just mention, and I will try to get back at it.

How to generate a DOCX in Python and save it in memory?

I have a task of generating a DOCX file from a template and then serving it via Flask. I use python-docx-templates which is simply a wrapper around python-docx allowing for use of jinja templates.
In the end they suggest using StringIO to save file only in memory, so this is how my code looks like:
def report_doc(user_id):
# Prepare the data...
from docxtpl import DocxTemplate
doc = DocxTemplate(app.root_path+'/templates/report.docx')
doc.render({
# Pass parameters
})
from io import StringIO
file_stream = StringIO()
doc.save(file_stream)
return send_file(file_stream, as_attachment=True, attachment_filename='report_'+user_id+'.docx')
On saving it throws an error TypeError: string argument expected, got 'bytes'. After googling it, I found this answer which says that ZipFile expects BytesIO. However, when I substituted StringIO with BytesIO, it only returned an empty file, so it doesn't throw any error, but definitely doesn't save the file.
What exactly would work in this case? If something is entirely wrong here, how in general could this work?
Thank you!
UPD: Here's the exception with full trace to the save function call:
File "/ms/controllers.py", line 1306, in report_doc
doc.save(file_stream)
File "/.env/lib/python3.5/site-packages/docx/document.py", line 142, in save
self._part.save(path_or_stream)
File "/.env/lib/python3.5/site-packages/docx/parts/document.py", line 129, in save
self.package.save(path_or_stream)
File "/.env/lib/python3.5/site-packages/docx/opc/package.py", line 160, in save
PackageWriter.write(pkg_file, self.rels, self.parts)
File "/.env/lib/python3.5/site-packages/docx/opc/pkgwriter.py", line 33, in write
PackageWriter._write_content_types_stream(phys_writer, parts)
File "/.env/lib/python3.5/site-packages/docx/opc/pkgwriter.py", line 45, in _write_content_types_stream
phys_writer.write(CONTENT_TYPES_URI, cti.blob)
File "/.env/lib/python3.5/site-packages/docx/opc/phys_pkg.py", line 155, in write
self._zipf.writestr(pack_uri.membername, blob)
File "/usr/lib/python3.5/zipfile.py", line 1581, in writestr
self.fp.write(zinfo.FileHeader(zip64))
TypeError: string argument expected, got 'bytes'

Using a BytesIO instance is correct, but you need to rewind the file pointer before passing it to send_file:
Make sure that the file pointer is positioned at the start of data to
send before calling send_file().
So this should work:
import io
from docxtpl import DocxTemplate
def report_doc(user_id):
# Prepare the data...
doc = DocxTemplate(app.root_path+'/templates/report.docx')
doc.render({
# Pass parameters
})
# Create in-memory buffer
file_stream = io.BytesIO()
# Save the .docx to the buffer
doc.save(file_stream)
# Reset the buffer's file-pointer to the beginning of the file
file_stream.seek(0)
return send_file(file_stream, as_attachment=True, attachment_filename='report_'+user_id+'.docx')
(Testing on Firefox, I found the browser kept retrieving the file from cache even if I specified a different filename, so you may need to clear your browser's cache while testing, or disable caching in dev tools if your browser supports this, or adjust Flask's cache control settings).

Unable to pass argument from batch file to python file

I am trying to pass an argument from batch file to my python file.
I followed the steps given in these two links:
Passing Argument from Batch File to Python
Sending arguments from Batch file to Python script
Here is a part of my python file where I'm trying to pass argument:
def main(argv):
imapServ = 'imap.gmail.com'
filename = 'TestRunLog.log'
attachment = open("{} {}".format(argv[0], filename), 'rb')
....##rest of the code
import sys
try:
if __name__ == '__main__':
print 'go ahead'
main(sys.argv[:1])
except ImportError:
print 'hi'
Also, here is the part of batch file which I'm using to send argument to the Python file:
c:\python27\python.exe C:\Users\abcd\Documents\automation\testsendemail.py %%myhome%\Documents\automation\Testresults\%resultDir%
pause
Above, %resultDir% is the variable which is generated based on timestamp.
Here is the output:
go ahead
Traceback (most recent call last):
C:/Users/abcd/Documents/automation/testsendemail.py\TestRunLog.log
File "C:/Users/abcd/Documents/automation/testsendemail.py", line 44, in <module>
main(sys.argv[:1])
File "C:/Users/abcd/Documents/automation/testsendemail.py", line 25, in main
attachment = open("{} {}".format(argv[0], filename), 'rb')
IOError: [Errno 2] No such file or directory: 'C:/Users/abcd/Documents/automation/testsendemail.py TestRunLog.log'
I followed lots of stackoverflow questions regarding this issue but still I'm unable to run. Not sure where the mistake is.

The issue is related on how python works with argv.
In this scenario, when you run:
main(sys.argv[:1]) # (["C:\Users\abcd\Documents\automation\testsendemail.py"])
you actually get only the first arguments passed to the python script, which is the current script location.
To get all the arguments but the first, you must fix that array filter:
main(sys.argv[1:]) # ["%%myhome%\Documents\automation\Testresults\%resultDir%"])
Note that the second filter will also include any other arguments that you might add to the command line.
Also, as a side note. You should consider using the STD lib to join the paths.
It should be something like this:
from os.path import join
(...)
filename = 'TestRunLog.log'
attachment = open(join(argv[0], filename), 'rb')

Accessing .zipx with Python

I'm attempting to write a very simple script that counts the number of entries/files a given ZIP file has, for some statistics.
I'm using the zipfile library, and I'm running into this problem where the library appears not to support .zipx format.
bash-3.1$ python zipcount.py t.zipx
Traceback (most recent call last):
File "zipcount.py", line 10, in <module>
zipCount(file)
File "zipcount.py", line 5, in zipCount
with ZipFile(file, "r") as zf:
File "c:\Python34\lib\zipfile.py", line 937, in __init__
self._RealGetContents()
File "c:\Python34\lib\zipfile.py", line 978, in _RealGetContents
raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file
Googling for help reveals that the zipx format is not the same as zip, and so maybe I shouldn't be expecting this to work. Further googling though fails to bring up a library that actually can deal with zipx. Searching stack overflow didn't find much either.
I can't possibly be the only person who wants to manipulate zipx files in python, right? Any suggestions?

chilkat might work for this. It's not a free library but there is a 30 day trial. Here is an example from http://www.example-code.com/python/ppmd_compress_file.asp:
import sys
import chilkat
compress = chilkat.CkCompression()
# Any string argument automatically begins a 30-day trial.
success = compress.UnlockComponent("30-day trial")
if (success != True):
print "Compression component unlock failed"
sys.exit()
compress.put_Algorithm("ppmd")
# Decompress back to the original:
success = compress.DecompressFile("t.zipx", "t")
if (success != True):
print compress.lastErrorText()
sys.exit()
print "Success!"
The API documentation: http://www.chilkatsoft.com/refdoc/pythonCkCompressionRef.html

There is no direct python package to unzip the zipx files in python.
So, One simple way to unzip it is using subprocess and winzip application. Please find the below code.
import subprocess
command = "C:\Program Files\WinZip\wzunzip.exe" "D:\Downloads\hello.zipx" "D:\unzip_location"
subprocess.run(command, shell=True, timeout=120)

Import a new file format without using maya api commands

Is it possible to use the maya.cmds instead of using any maya API to load/import in a file format in which it is not part of Maya file types?
I have tried googling but to no avail results other than the fileDialog command in Maya, otherwise it would means I will need to implement maya API (where I totally do not have any experiences with it)
I tried the following:
multipleFilters = "chan (*.chan)"
fileList = cmds.fileDialog2(fileMode=1, fileFilter=multipleFilters, dialogStyle=2)
if not fileList:
# return or print something or bail out early
filename = fileList[0]
cmds.file(filename, i=True)
Instead I keep getting the following error:
# Error: Unrecognized file.
# Traceback (most recent call last):
# File "<maya console>", line 3, in <module>
# RuntimeError: Unrecognized file. #
Any ideas?

cmds.file only works for files with translators that are registered via the API, either in Python or C++.
You can, however, easily write python (or even mel) scripts which read files off disk and create stuff in your scenes. You can use cmds.fileDiialog2 to present a file dialog to the user to pick file off disk, but it will be up to you to read the file.
multipleFilters = "chan (*.chan)"
fileList = cmds.fileDialog2(fileMode=1, fileFilter=multipleFilters, dialogStyle=2)
with open (fileList[0], 'rt') as filehandle:
for line in filehandle:
print line # or do something useful with the data

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Getting NotImplementedError file fetching table data from pdf using python pdftables - python

Related

Getting Assertion error while reading the PDF file python - pypdf2

How to generate a DOCX in Python and save it in memory?

Unable to pass argument from batch file to python file

Accessing .zipx with Python

Import a new file format without using maya api commands

Categories

Resources