I have a Flask view that generates an Excel file (using openpyxl) from some data and it's returned to the user using send_file(). A very simplified version:
import io
from flask import send_file
from openpyxl.workbook import Workbook
#app.route("/download/<int:id>")
def file_download(id):
wb = Workbook()
# Add sheets and data to the workbook here.
file = io.BytesIO()
wb.save(file)
file.seek(0)
return send_file(file, attachment_filename=f"{id}.xlsx", as_attachment=True)
This works fine -- the file downloads and is a valid Excel file. But I'm not sure how to test the file download. So far I have something like this (using pytest):
def test_file_download(test_client):
response = test_client.get("/download/123")
assert response.status_code == 200
assert response.content_type == "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
Which passes, but I'd like to test that the (a) the filename used is as expected and (b) that the file... exists? Is an Excel file?
I can access response.get_data(), which is a bytes object, but I'm not sure what to do with it.
To check that the filename used is as expected you could check that the Content-Disposition header is as expected. For example:
assert response.headers['Content-Disposition'] == 'attachment; filename=123.xlsx'
To check "the existance of the file" you could for example check that for some test data it lies within an expected range of size. For example:
assert 3000 <= response.content_length <= 5000
assert 3000 <= len(response.data) <= 5000
Another level of verifying that the Excel file works would be attempting to load the data back into openpyxl and checking if it reports any problems. For example:
from io import BytesIO
from openpyxl import load_workbook
load_workbook(filename=BytesIO(response.data))
Here you risk running into some sort of exception like:
zipfile.BadZipFile: File is not a zip file
Which would indicate that the data contents of the file are invalid as a Excel file.
Related
I have created a script which dumps the excel sheets stored in S3 into my local postgres database. I've used pandas read_excel and ExcelFile method to read the excel sheets.
Code for the same can be found here.
import boto3
import pandas as pd
import io
import os
from sqlalchemy import create_engine
import xlrd
os.environ["AWS_ACCESS_KEY_ID"] = "xxxxxxxxxxxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxxxxxxxxxxxxxxxxx"
s3 = boto3.client('s3')
obj = s3.get_object(Bucket='bucket-name', Key='file.xlsx')
data = pd.ExcelFile(io.BytesIO(obj['Body'].read()))
print(data.sheet_names)
a = len(data.sheet_names)
engine1 = create_engine('postgresql://postgres:postgres#localhost:5432/postgres')
for i in range(a):
df = pd.read_excel(io.BytesIO(obj['Body'].read()),sheet_name=data.sheet_names[i], engine='openpyxl')
df.to_sql("test"+str(i), engine1, index=False)
Basically, code parses the S3 bucket and runs in a loop. For each sheet, it creates a table
and dumps the data from sheet in that table.
Where I'm having trouble is, when I run this code, I get this error.
df = pd.read_excel(io.BytesIO(obj['Body'].read()),sheet_name=data.sheet_names[i-1], engine='openpyxl')
zipfile.BadZipFile: File is not a zip file
This is coming after I added 'openpyxl' engine in read_excel method. When I remove the engine, I get this error.
raise ValueError(
ValueError: Excel file format cannot be determined, you must specify an engine manually.
Please note that I can print the connection to database, so there is no problem in connectivity, and I'm using latest version of python and pandas. Also, I can get all the sheet_names in the excel file so I'm able to reach to that file as well.
Many Thanks!
You are reading the obj twice, fully:
data = pd.ExcelFile(io.BytesIO(obj['Body'].read()))
pd.read_excel(io.BytesIO(obj['Body'].read()), ...)
Your object can only be .read() once, second read produce nothing, an empty b"".
In order to avoid re-reading the S3 stream many times, you could store it once in a BytesIO, and rewind that BytesIO with seek.
buf = io.BytesIO(obj["Body"].read())
pd.ExcelFile(buf)
buf.seek(0)
pd.read_excel(buf, ...)
# repeat
I am trying to create an API function, that takes in .csv file (uploaded) and opens it as pandas DataFrame. Like that:
from fastapi import FastAPI
from fastapi import UploadFile, Query, Form
import pandas as pd
app = FastAPI()
#app.post("/check")
def foo(file: UploadFile):
df = pd.read_csv(file.file)
return len(df)
Then, I am invoking my API:
import requests
url = 'http://127.0.0.1:8000/check'
file = {'file': open('data/ny_pollution_events.csv', 'rb')}
resp = requests.post(url=url, files=file)
print(resp.json())
But I got such error: FileNotFoundError: [Errno 2] No such file or directory: 'ny_pollution_events.csv'
As far as I understand from doc pandas is able to read .csv file from file-like object, which file.file is supposed to be. But it seems, that here in read_csv() method pandas obtains name (not a file object itself) and tries to find it locally.
Am I doing something wrong?
Can I somehow implement this logic?
To read the file in pandas, the file must be stored on your PC. Don't forget to import shutil. if you don't need the file to be stored on your PC, delete it using os.remove(filepath).
if not file.filename.lower().endswith(('.csv',".xlsx",".xls")):
return 404,"Please upload xlsx,csv or xls file."
if file.filename.lower().endswith(".csv"):
extension = ".csv"
elif file.filename.lower().endswith(".xlsx"):
extension = ".xlsx"
elif file.filename.lower().endswith(".xls"):
extension = ".xls"
# eventid = datetime.datetime.now().strftime('%Y%m-%d%H-%M%S-') + str(uuid4())
filepath = "location where you want to store file"+ extension
with open(filepath, "wb") as buffer:
shutil.copyfileobj(file.file, buffer)
try:
if filepath.endswith(".csv"):
df = pd.read_csv(filepath)
else:
df = pd.read_excel(filepath)
except:
return 401, "File is not proper"
I am new to Python development.
I am developing a flask API in Python that will help to download one excel file in .xlsx format.
my code is generating the file in .xlsx format but when I downloading the report I am getting the error: "File format or file extensions are not valid. Verify the file has not been corrupted".
Please help me on this.
import io
import pandas as pd
from flask import send_file
def get_data():
buf = io.BytesIO()
with pd.ExcelWriter(buf, date_format='dd/mm/yyyy', datetime_format='dd/mm/yyyy') as test:
dtl_ext = detail.to_excel(test, index=False,encoding='utf-16')
detail_rec.save()
excel_data = buf.getvalue()
buf.seek(0)
return send_file(
buf,
mimetype='application/vnd.openxmlformats-officedocument.spreadsheetml.sheet',
attachment_filename='test11.xlsx',
as_attachment=True,
cache_timeout=0
)
You can use the following libraries to make it easier:
http://flask.pyexcel.org/en/latest/
https://github.com/pyexcel/pyexcel-xls
https://github.com/pyexcel/pyexcel-xlsx
After you have installed them (check the links for that). Import them in Python in the following manner:
import flask_excel as excel
import pyexcel.ext.xls
import pyexcel.ext.xlsx
Then:
#app/route('/download', methods=['GET'])
def download_data():
sample_data=[0, 1, 2]
excel.init_excel(app)
extension_type = "xlsx"
filename = "test123" + "." extension_type
d = {'colName': sample_data}
return excel.make_response_from_dict(d, file_type=extension_type, file_name=filename)
# check the flask-excel library from different export option.
# This one uses a dictionary of lists.
# You can do a lists of dictionary, simply an array, so on and so forth
Right now I have a flask app in which part of the functionality allows me to select a date range and see data from a sql database from that selected date range. I then can click a button and it exports this to a csv file which is just saved in the flask project directory. I want the user to be able to download this csv file. I want to know what the best practice for a user to download a dynamic csv file. Should I send_file() and then delete the file after user has downloaded since this data shouldn't be saved and the user won't be using that file again. Should the file be saved to the database and then deleted out of the db? Or can I just keep it within the flask directory? Please provide insight if possible, thank you so much.
#brunns pointed it in very right direction.
You don't have to save the file in your database or in your file structure or anywhere. It will get created in memory on user request.
I've done this with django for pdf and for csv files it'll work in the same way with flask too. Basics are all same.
for python3 use io.StringIO, for python2 use StringIO
from io import StringIO
import csv
from flask import make_response
#app.route('/download')
def post(self):
si = StringIO.StringIO()
cw = csv.writer(si)
cw.writerows(csvList)
output = make_response(si.getvalue())
output.headers["Content-Disposition"] = "attachment; filename=export.csv"
output.headers["Content-type"] = "text/csv"
return output
Courtesy: vectorfrog
Based on #xxbinxx's answer, used with pandas
from io import StringIO
import csv
from flask import make_response
#app.route('/download')
def download_csv(self, df: pd.DataFrame):
si = StringIO()
cw = csv.writer(si)
cw.writerows(df.columns.tolist())
cw.writerows(df.values.tolist())
output = make_response(si.getvalue())
output.headers["Content-Disposition"] = "attachment; filename=export.csv"
output.headers["Content-type"] = "text/csv"
return output
Is it possible to define a route in bottle which would return a file?
I have a mongo database which is accessed by pandas.
Pandas generates a xls file based on a request parameters.
Two steps above are clear and easy to implement.
The third step is the one I have a problem with.
Define a bottle route which would return a file to download by user.
I don't want to use static previously generated files.
Thanks in advance.
I'm not familiar with Pandas but you need to get binary contents of a xls file to send to a user via a Bottle route. Modified example from here for Python 3:
from io import BytesIO
from bottle import route, response
from pandas import ExcelWriter
#route('/get-xlsx')
def get_xlsx():
output = BytesIO()
writer = ExcelWriter(output, engine='xlsxwriter')
# Do something with your Pandas data
# ...
pandas_dataframe.to_excel(writer, sheet_name='Sheet1')
writer.save()
response.contet_type = 'application/vnd.openxmlformats-officedocument.spreadsheetml.sheet'
response.add_header('Content-Disposition', 'attachment; filename="report.xlsx"')
return output.getvalue()
When a user click a link that corresponds to this route, a file download dialog for "report.xlxs" will open in their browser.