I am trying to merge 2 pdfs and write to one pdf using pypdf2.
Below is the code to read the file content
output = PyPDF2.PdfFileWriter()
for pdffile in self.files:
input = PyPDF2.PdfFileReader(pdffile, strict=False)
where self.files is file objects
I am getting below error when trying to read one particular pdf file
TypeError: 'NumberObject' object has no attribute 'getitem'
When i ran ghostscript on the pdf file i found that the file is corrupted and the repaired one i am able to read without errors. I wanted to check if there is any way i can read the corrupted pdf file using pypdf2 only without errors?
Thanks in Advance.
Related
I'm trying to read a PDF file extracted from a zip file in memory to get the tables inside the file. Camelot seems a good way to do it, but I'm getting the following error:
AttributeError: '_io.StringIO' object has no attribute 'lower'
Is there some way to read the file and extract the tables with camelot, or should I use another library?
z = zipfile.ZipFile(self.zip_file)
for file in z.namelist():
if file.endswith(".pdf"):
pdf = z.read(file).decode(encoding="latin-1")
pdf = StringIO(pdf)
pdf = camelot.read_pdf(pdf, codec='utf-8')
camelot.read_pdf(filepath,...)
Accepts a file path as the first parameter. It appears to be a bad match for your requirements. Search for another library.
In any case StringIO(pdf), will return the following:
<_io.StringIO object at 0x000002592DD33E20>
For starters, when you read a file from StringIO, do it by calling the read() function
pdf = StringIO(pdf)
pdf.read()
That bit will indeed return the file bytes themselves. Next think about the encoding that the library will accept.
I'm trying to use Pandas to parse an Excel file someone uploaded to a Flask web application but not having much success.
I save the raw stream to a temporary file and then try to read it but pandas complains about the raw byte array:
tmpfile = tempfile.NamedTemporaryFile()
tmpfile.write(request.file['spreadsheet'].read())
sheet = pandas.ExcelFile(tmpfile.name)
results in the error:
*** XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '\\xd0\\xcf'
Is there a way to do this?
Found the answer: I need to tmpfile.seek(0) before being able to read from it.
I have a file from SAS that is exported as an older Excel .xls file. I would like to import this file into python 3.5.
when I do:
import pandas as pd
Filewant = pd.read_excel("Filepath\\\Filename.xls")
I get a bunch of error messages culminating in
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<html xm'
if I open up the file and manually save it in a current .xlsx file and us the same command line using:
Filewant =pd.read_excel("Filepath\\Filename\.xlsx")
then the file is imported into Python properly. However, I want the process to be more automated so I don't to have to manually save the file to .xlsx format to make it work.
SAS tech support told me that this won't work and that I'll need to convert the .xls SAS output into a .xlsx file:
Unfortunately, the MSOffice2K destination creates an HTML file even though it uses the .XLS extension here which allows the file to be opened with excel.
You can use VBScript to convert the file to .XLSX, however, there is no way to do this using the MSoffice2K destination.
The error message tells you the problem. found b'<html xm' Your file is an HTML file and not an XLS file. This was commonly done with "old" SAS since it did not support writing XLS files, but Excel did support reading HTML files.
I am trying to convert a CSV to an xlxs file format because I have a code that is meant to read a an excel file, but ended up getting a CSV. Is there a way to convert a CSV file to an TEMP excel file and have it not destroyed until the reading process is done. I have tried using openpyxl but it ends up not working and throwing an error saying it's not a good zip file. I even tried converting the CSV to text and then storing it in a dictionary but it writing to excel using xlrd pakage did not work aswell. I was wondering if there is a way do it in a cc
Seems like you open the file in text mode. Try this to open file
open('sample.csv', "rt", encoding="utf8")
or
open('sample.csv', "rt", encoding="ascii")
depending on the encoding of the file
Im trying to read a file stored in one of my buckets in GAE.
The file is stored in a public bucket
I've tried to:
archivo=cloudstorage.open('/bucket/workbook.xlsx')
wb = xlrd.open_workbook(filename=archivo)
but xlrd expect to open the file by itself, so I get a TypeError
TypeError: coercing to Unicode: need string or buffer, ReadBuffer found
Is there any way to give xlrd an open file so I can read the file without having to change xlrd.py
I should read the documentation with more attention before asking stuff...
To provide xlrd with an open file, instead of a filename, I have to give a filecontent.
This is done by:
archivo=cloudstorage.open('/bucket/workbook.xlsx')
wb = xlrd.open_workbook(file_contents=archivo.read())