how to load workbook using tempfile using openpyxl - python

In my flask web app, I am writing data from excel to a temporary file which I then parse in memory. This method works fine with xlrd but it does not with openpyxl.
Here is how I am writing to a temporary file which I then parse with xlrd.
xls_str = request.json.get('file')
try:
xls_str = xls_str.split('base64,')[1]
xls_data = b64decode(xls_str)
except IndexError:
return 'Invalid form data', 406
save_path = os.path.join(tempfile.gettempdir(), random_alphanum(10))
with open(save_path, 'wb') as f:
f.write(xls_data)
f.close()
try:
bundle = parse(save_path, current_user)
except UnsupportedFileException:
return 'Unsupported file format', 406
except IncompatibleExcelException as ex:
return str(ex), 406
finally:
os.remove(save_path)]
When I use openpyxl with the code above it complains about an unsupported type but that is because I'm using a temporary file to parse the data hence it doesn't have an ".xlsx" extension and even if I added it, it would not work because its not a excel file after all.
openpyxl.utils.exceptions.InvalidFileException: openpyxl does not support file format,
please check you can open it with Excel first. Supported formats are: .xlsx,.xlsm,.xltx,.xltm
What should I do?

Why not create a temp excel file with openpyxl instead. Give this example a try. I did something similar in the past.
from io import BytesIO
from openpyxl.writer.excel import save_virtual_workbook
from openpyxl import Workbook
def create_xlsx():
wb = Workbook()
ws = wb.active
row = ('Hello', 'Boosted_d16')
ws.append(row)
return wb
#app.route('/', methods=['GET'])
def main():
xlsx = create_xlsx()
filename = BytesIO(save_virtual_workbook(xlsx))
return send_file(
filename,
attachment_filename='test.xlsx',
as_attachment=True
)

Related

How do I get my .xlsx file I created using Pandas (python) to save to a different file location?

I've just started out with Pandas and I have gotten my xls file to convert into an xlsx file using Pandas however I now want the file to save to a different loaction such as OneDrive I was wondering if you could help me out?
Here is the code I have written for it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
#Deleting original file
path = (r"C:\Users\MQ\Downloads\Incident Report.xls")
os.remove(path)
print("Original file has been deleted :)")
#Identifying the xls file
excel_file_1 = 'Incident Report.xls'
df_first_shift = pd.read_excel(r'C:\Users\MQ\3D Objects\New Folder\Incident Report.xls')
print(df_first_shift)
#combining data
df_all = pd.concat([df_first_shift])
print(df_all)
#Creating the .xlsx file
df_all.to_excel("Incident_Report.xlsx")
Use pd.ExcelWriter by passing in your destination path!
destination_path = "path\\to\\your\\onedrive\\filename.xlsx"
writer = pd.ExcelWriter(destination_path , engine='xlsxwriter')
df.to_excel(writer, sheet_name='sheetname')
writer.save()
To write to cloud OneDrive the following code is suggested. I did not run it but offer it as a suggestion.
REFER to www.lieben.nu's example for uploading file to onedrive`
import requests
import io
import pandas as pd
def cloudOneDrive(filename, bytesIO):
'''
Reference : https://www.lieben.nu/liebensraum/2019/04/uploading-a-file-to-onedrive-for-business-with-python/
Write to cloud (bytesIO)
'''
data = {'grant_type':"client_credentials",
'resource':"https://graph.microsoft.com",
'client_id':'XXXXX',
'client_secret':'XXXXX'}
URL = "https://login.windows.net/YOURTENANTDOMAINNAME/oauth2/token?api-version=1.0"
# FIXME: put coder top open OneDrive file here as bytes stream
r = requests.put(URL+"/"+filename+":/content", data=bytesIO, headers=headers)
if r.status_code == 200 or r.status_code == 201:
print("succeeded")
return True
else:
print("Fail", r.status_code)
fn = 'junk.xlsx'
with io.BytesIO() as bio:
with pd.ExcelWriter(bio, mode='wb') as xio:
df.to_excel(bio, sheet_name='sh1')
bio.seek(0)
cloudOneDrive(fn, bio)

Processing files stored on cloud (S3 or Spaces)

I've setup a script to process excel files uploaded by a user. The scripts works fine when the file is stored on the local disk.
from openpyxl import load_workbook
wb = load_workbook("file_path.xlsx") # Load workbook from disk works fine
ws = wb.worksheets[0]
I've then setup django-storages to allow user uploaded files to be stored on digital ocean spaces.
My problem now is how to access and process the cloud stored file. For the record, if I pass the file URL to load_workbook it fails with the error No such file or directory: file_url.
Do I have to download the file using requests and then process it as a local file? Feels inefficient? What options do I have?
You can get byte content of the file, wrap it in ContentFile and pass it to openpyxl. Assuming your model is FileContainer and field name is file:
from django.core.files.base import ContentFile
from openpyxl import load_workbook
fc = FileContainer.objects.first()
bytefile = fc.file.read()
wb = load_workbook(ContentFile(bytefile))
ws = wb.worksheets[0]
I checked it with S3 and it works just fine.
If you want to actually save file locally, you can try this:
from django.core.files.base import ContentFile
from django.core.files.storage import FileSystemStorage
from openpyxl import load_workbook
fc = FileContainer.objects.first()
local_storage = FileSystemStorage()
bytefile = fc.file.read()
newfile = ContentFile(bytefile)
relative_path = local_storage.save(fc.file.name, newfile)
wb = load_workbook(local_storage.path(relative_path))
ws = wb.worksheets[0]

Python - Converting XLSX to PDF

I have always used win32com module in my development server to easily convert from xlsx to pdf:
o = win32com.client.Dispatch("Excel.Application")
o.Visible = False
o.DisplayAlerts = False
wb = o.Workbooks.Open("test.xlsx")))
wb.WorkSheets("sheet1").Select()
wb.ActiveSheet.ExportAsFixedFormat(0, "test.pdf")
o.Quit()
However, I have deployed my Django app in production server where I don't have Excel application installed and it raises the following error:
File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\__init__.p
y", line 95, in Dispatch
dispatch, userName = dynamic._GetGoodDispatchAndUserName(dispatch,userName,c
lsctx)
File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\dynamic.py
", line 114, in _GetGoodDispatchAndUserName
return (_GetGoodDispatch(IDispatch, clsctx), userName)
File "C:\virtualenvs\structuraldb\lib\site-packages\win32com\client\dynamic.py
", line 91, in _GetGoodDispatch
IDispatch = pythoncom.CoCreateInstance(IDispatch, None, clsctx, pythoncom.II
D_IDispatch)
com_error: (-2147221005, 'Invalid class string', None, None)
Is there any good alternative to convert from xlsx to PDF in Python?
I have tested xtopdf with PDFWriter, but with this solution you need to read and iterate the range and write lines one by one. I wonder if there is a more direct solution similar to win32com.client.
Thanks!
As my original answer was deleted and is eventually a bit useful, I repost it here.
You could do it in 3 steps:
excel to pandas: pandas.read_excel
pandas to HTML: pandas.DataFrame.to_html
HTML to pdf: python-pdfkit (git), python-pdfkit (pypi.org)
import pandas as pd
import pdfkit
df = pd.read_excel("file.xlsx")
df.to_html("file.html")
pdfkit.from_file("file.html", "file.pdf")
install:
sudo pip3.6 install pandas xlrd pdfkit
sudo apt-get install wkhtmltopdf
This is a far more efficient method than trying to load a redundant script that is hard to find and was wrtten in Python 2.7.
Load excel spread sheet into a DataFrame
Write the DataFrame to a HTML file
Convert the html file to an image.
dirname, fname = os.path.split(source)
basename = os.path.basename(fname)
data = pd.read_excel(source).head(6)
css = """
"""
text_file = open(f"{basename}.html", "w")
# write the CSS
text_file.write(css)
# write the HTML-ized Pandas DataFrame
text_file.write(data.to_html())
text_file.close()
imgkitoptions = {"format": "jpg"}
imgkit.from_file(f"{basename}.html", f'{basename}.png', options=imgkitoptions)
try:
os.remove(f'{basename}.html')
except Exception as e:
print(e)
return send_from_directory('./', f'{basename}.png')
Taken from here https://medium.com/#andy.lane/convert-pandas-dataframes-to-images-using-imgkit-5da7e5108d55
Works really well, I have XLSX files converting on the fly and displaying as image thumbnails on my application.
from openpyxl import load_workbook
from PDFWriter import PDFWriter
workbook = load_workbook('fruits2.xlsx', guess_types=True, data_only=True)
worksheet = workbook.active
pw = PDFWriter('fruits2.pdf')
pw.setFont('Courier', 12)
pw.setHeader('XLSXtoPDF.py - convert XLSX data to PDF')
pw.setFooter('Generated using openpyxl and xtopdf')
ws_range = worksheet.iter_rows('A1:H13')
for row in ws_range:
s = ''
for cell in row:
if cell.value is None:
s += ' ' * 11
else:
s += str(cell.value).rjust(10) + ' '
pw.writeLine(s)
pw.savePage()
pw.close()
I have been using this and it works fine

Open Excel file from zipfolder in openpyxl

I am trying following code.
from zipfile import ZipFile
from openpyxl import load_workbook
from io import BytesIO
zip_path = r"path/to/zipfile.zip"
with ZipFile(zip_path) as myzip:
with myzip.open(myzip.namelist()[0]) as myfile:
wb = load_workbook(filename=BytesIO(myfile.read()))
data_sheet = wb.worksheets[1]
for row in data_sheet.iter_rows(min_row=3, min_col=3):
print(row[0].value)
it shows
ValueError: stat: path too long for Windows
Is this possible?
I am trying logic from Using openpyxl to read file from memory
With xlrd following code works fine.
with ZipFile(zip_path) as myzip:
with myzip.open(myzip.namelist()[0]) as myfile:
book = xlrd.open_workbook(file_contents=(myfile.read()))
sh = book.sheet_by_index(0)
#your code here

Parsing xlsx sheet from HTTP response using openpyxl library

I am writing a test case for testing Excel sheet parsing.
I tried to parse the response.content into list of objects using openpyxl.
I have extracted the filename from response header and converted into File like object. load_workbook() is not accepting the filename.
def test_export_timesheet(self):
change_url = '/admin/core/timesheet/'
#response contains the generated file using openpyxl
response = self.client.post(change_url, {'action': 'export_xlsx', '_selected_action': [x.id for x in timesheets]})
content = response._headers.get('content-disposition')[1]
start = content.find('=') + 1
end = content.find('.xlsx')
content_path = (content[start:end]+'.xlsx')
#Passing file like object
wb = load_workbook(BytesIO(filename="'"+content_path+"'"))
ws = wb.get_sheet_by_name(name="'" + content[start:end] + "'")
for row in ws.iter_rows():
for cell in row:
print cell.value
Basically I am trying to validate the contents of the file in my testcase.
Is there a way to do this?
# response contains the generated file using openpyxl
response = self.client.post(change_url, ・・・・・
When you get the response above, "response.content" is bytes-type, so you can load it into the buffer with BytesIO. Continuing from above, write:
from io import BytesIO
file_like_object = BytesIO(response.content)
(from openpyxl import load_workbook) # if this line is needed...
wb = load_workbook(file_like_object)
Now you can use this "wb" for general openpyxl operations

Categories