Streaming a CSV file in Django

Streaming a CSV file in Django - python

I am attempting to stream a csv file as an attachment download. The CSV files are getting to be 4MB in size or more, and I need a way for the user to actively download the files without waiting for all of the data to be created and committed to memory first.
I first used my own file wrapper based on Django's FileWrapper class. That failed. Then I saw a method here for using a generator to stream the response:
How to stream an HttpResponse with Django
When I raise an error within the generator, I can see that I am creating the proper data with the get_row_data() function, but when I try to return the response it comes back empty. I've also disabled the Django GZipMiddleware. Does anyone know what I'm doing wrong?
Edit: The issue I was having was with the ConditionalGetMiddleware. I had to replace it, the code is in an answer below.
Here is the view:
from django.views.decorators.http import condition
#condition(etag_func=None)
def csv_view(request, app_label, model_name):
""" Based on the filters in the query, return a csv file for the given model """
#Get the model
model = models.get_model(app_label, model_name)
#if there are filters in the query
if request.method == 'GET':
#if the query is not empty
if request.META['QUERY_STRING'] != None:
keyword_arg_dict = {}
for key, value in request.GET.items():
#get the query filters
keyword_arg_dict[str(key)] = str(value)
#generate a list of row objects, based on the filters
objects_list = model.objects.filter(**keyword_arg_dict)
else:
#get all the model's objects
objects_list = model.objects.all()
else:
#get all the model's objects
objects_list = model.objects.all()
#create the reponse object with a csv mimetype
response = HttpResponse(
stream_response_generator(model, objects_list),
mimetype='text/plain',
)
response['Content-Disposition'] = "attachment; filename=foo.csv"
return response
Here is the generator I use to stream the response:
def stream_response_generator(model, objects_list):
"""Streaming function to return data iteratively """
for row_item in objects_list:
yield get_row_data(model, row_item)
time.sleep(1)
And here is how I create the csv row data:
def get_row_data(model, row):
"""Get a row of csv data from an object"""
#Create a temporary csv handle
csv_handle = cStringIO.StringIO()
#create the csv output object
csv_output = csv.writer(csv_handle)
value_list = []
for field in model._meta.fields:
#if the field is a related field (ForeignKey, ManyToMany, OneToOne)
if isinstance(field, RelatedField):
#get the related model from the field object
related_model = field.rel.to
for key in row.__dict__.keys():
#find the field in the row that matches the related field
if key.startswith(field.name):
#Get the unicode version of the row in the related model, based on the id
try:
entry = related_model.objects.get(
id__exact=int(row.__dict__[key]),
)
except:
pass
else:
value = entry.__unicode__().encode("utf-8")
break
#if it isn't a related field
else:
#get the value of the field
if isinstance(row.__dict__[field.name], basestring):
value = row.__dict__[field.name].encode("utf-8")
else:
value = row.__dict__[field.name]
value_list.append(value)
#add the row of csv values to the csv file
csv_output.writerow(value_list)
#Return the string value of the csv output
return csv_handle.getvalue()

Here's some simple code that'll stream a CSV; you can probably go from this to whatever you need to do:
import cStringIO as StringIO
import csv
def csv(request):
def data():
for i in xrange(10):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
csvwriter.writerow([i,"a","b","c"])
yield csvfile.getvalue()
response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response
This simply writes each row to an in-memory file, reads the row and yields it.
This version is more efficient for generating bulk data, but be sure to understand the above before using it:
import cStringIO as StringIO
import csv
def csv(request):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
def read_and_flush():
csvfile.seek(0)
data = csvfile.read()
csvfile.seek(0)
csvfile.truncate()
return data
def data():
for i in xrange(10):
csvwriter.writerow([i,"a","b","c"])
data = read_and_flush()
yield data
response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response

The middleware issue has been solved as of Django 1.5 and a StreamingHttpResponse has been introduced. The following should do:
import cStringIO as StringIO
import csv
def csv_view(request):
...
# Assume `rows` is an iterator or lists
def stream():
buffer_ = StringIO.StringIO()
writer = csv.writer(buffer_)
for row in rows:
writer.writerow(row)
buffer_.seek(0)
data = buffer_.read()
buffer_.seek(0)
buffer_.truncate()
yield data
response = StreamingHttpResponse(
stream(), content_type='text/csv'
)
disposition = "attachment; filename=file.csv"
response['Content-Disposition'] = disposition
return response
There's some documentation on how to output csv from Django but it doesn't take advantage of the StreamingHttpResponse so I went ahead and opened a ticket in order to track it.

The problem I was having was with the ConditionalGetMiddleware. I saw django-piston come up with a replacement middleware for the ConditionalGetMiddleware that allows streaming:
from django.middleware.http import ConditionalGetMiddleware
def compat_middleware_factory(klass):
"""
Class wrapper that only executes `process_response`
if `streaming` is not set on the `HttpResponse` object.
Django has a bad habbit of looking at the content,
which will prematurely exhaust the data source if we're
using generators or buffers.
"""
class compatwrapper(klass):
def process_response(self, req, resp):
if not hasattr(resp, 'streaming'):
return klass.process_response(self, req, resp)
return resp
return compatwrapper
ConditionalMiddlewareCompatProxy = compat_middleware_factory(ConditionalGetMiddleware)
So then you will replace ConditionalGetMiddleware with your ConditionalMiddlewareCompatProxy middleware, and in your view (borrowed code from a clever answer to this question):
def csv_view(request):
def data():
for i in xrange(10):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
csvwriter.writerow([i,"a","b","c"])
yield csvfile.getvalue()
#create the reponse object with a csv mimetype
response = HttpResponse(
data(),
mimetype='text/csv',
)
#Set the response as an attachment with a filename
response['Content-Disposition'] = "attachment; filename=test.csv"
response.streaming = True
return response

Related

Django how to create a tmp excel file and return it to the browser within the response

I have a process to build a tmp file and then return it to the browser in csv. Now i want to do the same but return a excel file.
So what i have for the csv is a view in django that does:
def export_wallet_view(request):
tmp = tempfile.NamedTemporaryFile(delete=False)
with open(tmp.name, 'w', encoding="utf-8-sig") as fi:
csv_headers = [
'Id',
'Name'
]
fi.write(';'.join(csv_headers))
fi.write('\n')
//here also i save the rows into the file
response = FileResponse(open(tmp.name, 'rb'))
response['Content-Disposition'] = 'attachment; filename="wallet.csv"'
return response
So to convert it to excel i try to do something like this using pandas:
df = pd.read_csv(tmp.name)
df.to_excel('pandas_to_excel.xlsx', sheet_name='new_sheet_name')
The problem is that this creates the excel in the server, and i would like to do something like:
df = pd.read_csv(tmp.name)
df.to_excel('pandas_to_excel.xlsx', sheet_name='new_sheet_name') //this being a tmp file
response = FileResponse(open(tmp.name, 'rb')) //this should be the new excel tmp file
response['Content-Disposition'] = 'attachment; filename="wallet.csv"'
return response
Thanks

I don't understand your problem.
You should use the same 'pandas_to_excel.xlsx' in both
df.to_excel('pandas_to_excel.xlsx', ...)
... open('pandas_to_excel.xlsx', 'rb')
or the same tmp.name in both
df.to_excel(tmp.name, ...)
... open(tmp.name, 'rb')
You can even use again NamedTemporaryFile() to create new temporary name.
tmp = tempfile.NamedTemporaryFile(delete=False)
df.to_excel(tmp.name, ...)
... open(tmp.name, 'rb')
But popular method is to use io.String() or io.Bytes() to create file-like object in memory - without creating file on disk.
def export_wallet_view(request):
csv_headers = ['Id', 'Name']
file_like_object = io.Bytes()
file_like_object.write(';'.join(csv_headers).encode('utf-8-sig'))
file_like_object.write('\n'.encode('utf-8-sig'))
file_like_object.write('other rows'.encode('utf-8-sig'))
file_like_object.seek(0) # move to the beginning of file
response = FileResponse(file_like_object)
response['Content-Disposition'] = 'attachment; filename="wallet.csv"'
return response
For excel it could be something like this. I use io.String() to read csv directly to pandas, and later I use io.Bytes() to create file-like object with excel data.
def export_wallet_view(request):
csv_headers = ['Id', 'Name']
text = ';'.join(csv_headers)
text += '\n'
text += 'other rows'
df = pd.read_csv(io.String(text))
file_like_object = io.Bytes()
df.to_excel(file_like_object)
file_like_object.seek(0) # move to the beginning of file
response = FileResponse(file_like_object)
response['Content-Disposition'] = 'attachment; filename="pandas_to_excel.xlsx"'
return response

How to write row names while exporting django querysets to csv?

The below function exporting into csv properly but one thing missing is the header of those value.
How to write header of those values ?
I have quite a large queryset so i am using StreamingHttpResponse as suggested by django documentation but couldn't find the solution for adding a header of these values.
class Echo:
def write(self, value):
return value
def some_streaming_csv_view(request):
rows = MyModel.objects.values_list("value1", "value2")
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
writer.writerow(["name1", "name2"]) # didn't work
response = StreamingHttpResponse(
(writer.writerow(row) for row in rows),
status=200,
content_type="text/csv",
)
response["Content-Disposition"] ='attachment; filename=filname.csv'
return response

To be able to add the header part you are trying, you should write them to the pseudo_buffer and yield them.
def iter_content(rows, headers):
pseudo_buffer = Echo()
writer = csv.writer(pseudo_buffer)
yield pseudo_buffer.write(headers)
for row in rows:
yield writer.writerow(row)
and then use it like this:
response = StreamingHttpResponse(
(iter_content(rows, headers)),
status=200,
content_type="text/csv",
)

Export CSV files in a bulk in Django

I'm creating a custom administration action to download a list of orders as a CSV file.
I've this in my orders/admin.py:
def export_to_csv(modeladmin, request, queryset):
opts = modeladmin.model._meta # opts = options
for order in queryset:
content_disposition = f'attachment; filename=OrderID-{order.id}.csv'
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = content_disposition
writer = csv.writer(response)
fields = [field for field in opts.get_fields() if not field.many_to_many
and not field.one_to_many]
writer.writerow([field.verbose_name for field in fields])
# Write data rows
for obj in queryset:
data_row = []
for field in fields:
value = getattr(obj, field.name)
if isinstance(value, datetime.datetime):
value = value.strftime('%d/%m/%Y')
data_row.append(value)
writer.writerow(data_row)
return response
However, this code doesn't download more than one csv file Even when I've selected more than one.

You cannot download several files via 1 HTTP request, details.
So, you can zip your csvs in one zip file and return it like this answer describes.

Better way to file response with DRF?

I have this action:
#action(methods=['get'], detail=True)
def download_csv(self, request, pk, *args, **kwargs):
project = self.get_object()
data = show_stages_tasks(request, pk)
file_name = f"{project.name}.csv"
export_to_csv(data, file_name)
file_handle = open(file_name, "r")
response = FileResponse(file_handle.read(), content_type='application/csv')
response['Content-Disposition'] = f'attachment; filename="{file_handle.name}"'
file_handle.close()
os.remove(file_name)
return response
and export_to_csv is:
def export_to_csv(data, filename="project"):
content = JSONRenderer().render(data)
stream = io.BytesIO(content)
content_parsed = JSONParser().parse(stream)
tasks = content_parsed[0]["related_tasks"]
keys = tasks[0].keys()
with open(filename, 'w') as output_file:
dict_writer = csv.DictWriter(output_file, fieldnames=keys)
dict_writer.writeheader()
for task in tasks:
task['children'] = []
task['task_folders'] = []
dict_writer.writerow(task)
And show_stages_tasks returns a serialized data with DRF serializer, with 3 nested serializers (too big and I think unnecessary to post it here).
As you see here - I parse serializer data, create a CSV file, save it, next open it, pass in the Response and delete file. The question is can I somehow pass the content of the file, without actually creating CSV file and next deleting it?

From the Django's official doc, you could find a similar example.
In that example, they are using django.http.HttpResponse class, it can be also used in your case
import csv
from django.http import HttpResponse
def some_view(request):
# Create the HttpResponse object with the appropriate CSV header.
response = HttpResponse(content_type='text/csv')
response['Content-Disposition'] = 'attachment; filename="somefilename.csv"'
writer = csv.writer(response)
writer.writerow(['First row', 'Foo', 'Bar', 'Baz'])
writer.writerow(['Second row', 'A', 'B', 'C', '"Testing"', "Here's a quote"])
return response

Saving and Retrieving Python object Attributes values to a file

I require 2 things to be done.
First, take the request object and save the object attribute values
to a file as values of some known keys. This file needs to be editable
after saving, ie, a user can modify the values of the keys(So I used
json format). This is handled in function
save_auth_params_to_file().
Second, get the file contents in a such a format that I can retrieve
the values using the keys. This is handled in function
get_auth_params_from_file.
import json
import os
SUCCESS_AUTH_PARAM_FILE = '/auth/success_auth_params.json'
def save_auth_params_to_file(request):
auth_params = {}
if request is not None:
auth_params['token'] = request.token
auth_params['auth_url'] = request.auth_url
auth_params['server_cert'] = request.server_cert
auth_params['local_key'] = request.local_key
auth_params['local_cert'] = request.local_cert
auth_params['timeout'] = request.timeout_secs
with open(SUCCESS_AUTH_PARAM_FILE, 'w') as fout:
json.dump(auth_params, fout, indent=4)
def get_auth_params_from_file():
auth_params = {}
if os.path.exists(SUCCESS_AUTH_PARAM_FILE):
with open(SUCCESS_AUTH_PARAM_FILE, "r") as fin:
auth_params = json.load(fin)
return auth_params
Question:
Is there a more pythonic way to achieve the 2 things ?
Any potential issues in the code which I have overlooked?
Any error conditions I have to take care ?

There are some things to be noted, yes:
i) When your request is None for some reason, you are saving an empty JSON object to your file. Maybe you'll want to write to your file only if request is not None?
auth_params = {}
if request is not None:
auth_params['token'] = request.token
auth_params['auth_url'] = request.auth_url
auth_params['server_cert'] = request.server_cert
auth_params['local_key'] = request.local_key
auth_params['local_cert'] = request.local_cert
auth_params['timeout'] = request.timeout_secs
with open(SUCCESS_AUTH_PARAM_FILE, 'w') as fout:
json.dump(auth_params, fout, indent=4)
ii) Why not create the dict all at once?
auth_params = {
'token': request.token,
'auth_url': request.auth_url,
'server_cert': request.server_cert,
'local_key': request.local_key,
'local_cert': request.local_cert,
'timeout': request.timeout,
}
iii) Make sure this file is in a SAFE location with SAFE permissions. This is sensitive data, like anything related to authentication.
iv) You are overwriting your file everytime save_auth_params_to_file is called. Maybe you mean to append your JSON to the file instead of overwriting? If that's the case:
with open(SUCCESS_AUTH_PARAM_FILE, 'a') as fout:

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Streaming a CSV file in Django - python

Related

Django how to create a tmp excel file and return it to the browser within the response

How to write row names while exporting django querysets to csv?

Export CSV files in a bulk in Django

Better way to file response with DRF?

Saving and Retrieving Python object Attributes values to a file

Categories

Resources