Saving PDFs to disk as they are generated with django-wkhtmltopdf

Saving PDFs to disk as they are generated with django-wkhtmltopdf - python

What I'm trying to implement is this:
User sends query parameters from React FE microservice to the Django BE microservice.
URI is something like /api/reports?startingPage=12&dataView=Region
These PDFs are way too big to be generated in FE, so doing it server side
Request makes its way into the view.py where the data related to dataView=Region is queried from the database, each row is iterated through and a PDF report is generated for each item
Each dataView=Region can consist of a few hundred items and each of those items is its own report that can be a page long or several pages long
As the reports are generated, they should be saved to the server persistent volume claim and not be sent back to FE until they have all run.
When they have all run, I plan to use pypdf2 to combine all of the PDFs into one large file.
At that point, the file is sent back to the FE to download.
I'm only working on 1. and 3. at this point and I'm unable to:
Get the files to save to storage
Prevent the default behavior of the PDF being sent back to the FE after it has been generated
The PDFs are being generated, so that is good.
I'm trying to implement the suggestions as found here, but I'm not getting the desired results:
Save pdf from django-wkhtmltopdf to server (instead of returning as a response)
This is what I currently have on the Django side:
# urls.py
from django.urls import path
from .views import GeneratePDFView
app_name = 'Reports'
urlpatterns = [
path('/api/reports',
GeneratePDFView.as_view(), name='generate_pdf'),
]
# views.py
from django.conf import settings
from django.views.generic.base import TemplateView
from rest_framework.permissions import IsAuthenticated
from wkhtmltopdf.views import PDFTemplateResponse
# Create your views here.
class GeneratePDFView(TemplateView):
permission_classes = [IsAuthenticated]
template_name = 'test.html'
filename = 'test.pdf'
def generate_pdf(self, request, **kwargs):
context = {'key': 'value'}
# generate response
response = PDFTemplateResponse(
request=self.request,
template=self.template_name,
filename=self.filename,
context=context,
cmd_options={'load-error-handling': 'ignore'})
self.save_pdf(response.rendered_content, self.filename)
# Handle saving the document
# This is what I'm using elsewhere where files are saved and it works there
def save_pdf(self, file, filename):
with open(settings.PDF_DIR + '/' + filename, 'wb+') as destination:
for chunk in file.chunks():
destination.write(chunk)
# settings.py
...
DOWNLOAD_ROOT = '/mnt/files/client-downloads/'
MEDIA_ROOT = '/mnt/files/client-submissions/'
PDF_DIR = '/mnt/files/pdf-sections/'
...
I should note the other DOWNLOAD_ROOT and MEDIA_ROOT are working fine where the app uses them. I've even tried using settings.MEDIA_ROOT because I know it works, but still nothing is saved there. But as you can see, I'm starting out super basic and haven't added a query, loops, etc.
My save_pdf() is different than the SO question I linked to because that is what I'm using in other parts of my application and it is saving files fine there. I did try what they provided in the SO question, but had the same results with it not saving. That being:
with open("file.pdf", "wb") as f:
f.write(response.rendered_content)
So what do I need to do to get these PDFs to save to disk?
Perhaps I need to be using a different library for my needs as django-wkhtmltopdf seems to do a number of things out of the box that I don't want that I'm not clear I can override.

OK, my smooth brain gained a few ripples overnight and figured it out this morning:
# views.py
class GeneratePDFView(TemplateView):
permission_classes = [IsAuthenticated]
def get(self, request, *args, **kwargs):
template_name = 'test.html'
filename = 'test.pdf'
context = {'key': 'value'}
# generate response
response = PDFTemplateResponse(
request=request,
template=template_name,
filename=filename,
context=context,
cmd_options={'load-error-handling': 'ignore'})
# write the rendered content to a file
with open(settings.PDF_DIR + '/' + filename, "wb") as f:
f.write(response.rendered_content)
return HttpResponse('Hello, World!')
This saved the PDF to disk and also did not respond with the PDF. Obviously a minimally functioning example that I can expand on, but at least got those two issues figured out.

Related

Downloading an Excel File in a Django View

I have included an excel file in my project directory. I want to created a Django view that allows a user to download that file. Please how best do I handle that?

import csv
from django.http import StreamingHttpResponse
# create an echo handler, returns what is put into for the writer
psuedo_buffer = Echo()
#Build csv writer ontop of echo filelike instance
writer = csv.writer(psuedo_buffer)
#Stream the response row by row using the psuedo_writer
response = StreamingHttpResponse((
writer.writerow(row) for row in query_data),
content_type="text/csv"
)
response['Content-Disposition'] = 'attachment; filename="Something.csv"'
return response
This is a code snippet that I use to return a streaming HTTP response with the data. The data that would be in query_data can either be raw CSV data from a file handler which you can pretty easily find a few ways to open the data and drop it into this function, or you can use arrays of data from query sets to pass in. Just format your data for query_set and return this Response handler in either API views or Template views. Hope this helps!
Data should be formatted in arrays of data, which you can use .dict to get parsable data from most models or simply parsing the csv into memory with the CSV library will accomplish the same thing.

Is your file associated with a Model? I prefer to create a Model to store general resources.
models.py
class Resources(models.Model):
description = models.CharField(max_length=200)
file = models.FileField(upload_to='uploads/resources/')
def __str__(self):
return self.description
views.py
...
some_file = Resources.objects.get(description='Your description')
...
return render(request, "template.html", {
"some_file": some_file,
}
template.html
...
<p>Download file.</p>
...

Loading Template in Admin Form for Custom Fields

I learned that I can add a custom button in my Admin form by adding it to
fields = ["connect"]
readonly_fields = ('connect',)
def connect(self, obj):
return format_html("<button></button>")
connect.allow_tags=True
connect.short_description = ''
However, the html I want to add to the connect is getting out of control. I was wondering if there's a proper (Django-nic) way to move that to a template and load and return the content of the template in the connect function.
I can think of reading the content of the template file (open('file.html', 'r')) to read the content, however, I am looking for a suggestion that aligns Django standards (if any).
P.S. I also tried creating a view for getting the HTML content of the connect file, but that for some reason doesn't seem to work and feels unnatural to do.

from django.template.loader import render_to_string
...
def connect(self, obj):
html = render_to_string('file.html')
return html
With file.html in templates directory

RESTful way to upload file along with some data in django

I am creating a webservice with django using django rest framework.
Users are able to upload some images and videos. Uploading media is a two step action, first user uploads the file and receives an ID then in a separate request uses that ID to refer to the media (for example (s)he can use it as profile picture or use it in a chat message).
I need to know who is uploading the media for both HMAC authentication middleware and setting owner of media in database. All other requests are in JSON format and include a username field that it used by HMAC middleware to retrieve the secret shared key.
It first came to my mind that media upload api may look like this:
{
"username":"mjafar",
"datetime":"2015-05-08 19:05",
"media_type":"photo",
"media_data": /* base64 encoded image file */
}
But i thought that base64 encoding may have significant overhead for larger files like videos; or there may be some restrictions on size of data that can be parsed in json or be created in user side. (This webservice is supposed to communicate with a Android/iOS app, they have limited memory)! Is this a good solution? Are my concerns real problems or i shouldn't worry? Better solutions?

You could separate the two. Meta data at one interface with a URL pointing to the actual file. Depending on how you store the actual file you could then reference the file directly via URL at a later point.
You could then have the POST API directly accept the file and simply return the JSON meta data
{
"username":"mjafar", // inferred from self.request.user
"datetime":"2015-05-08 19:05", // timestamp on server
"media_type":"photo", // inferred from header content-type?
// auto-generated hashed location for file
"url": "/files/1dde/2ecf/4075/f61b/5a9c/1cec/53e0/ca9b/4b58/c153/09da/f4c1/9e09/4126/271f/fb4e/foo.jpg"
}
Creating such an interface using DRF would be more along the lines of implementing rest_framework.views.APIView
Here's what I'm doing for one of my sites:
class UploadedFile(models.Model):
creator = models.ForeignKey(auth_models.User,blank=True)
creation_datetime = models.DateTimeField(blank=True,null=True)
title = models.CharField(max_length=100)
file = models.FileField(max_length=200, upload_to=FileSubpath)
sha256 = models.CharField(max_length=64,db_index=True)
def save(self,*args,**kw_args):
if not self.creation_datetime:
self.creation_datetime = UTC_Now()
super(UploadedFile,self).save(*args,**kw_args)
serializer:
class UploadedFileSerializer(serializers.HyperlinkedModelSerializer):
class Meta:
model = UploadedFile
fields = ('url', 'creator','creation_datetime','title','file')
And the view to use this:
from rest_framework.views import APIView
from qc_srvr import serializers,models
from rest_framework.response import Response
from rest_framework import status
from rest_framework import parsers
from rest_framework import renderers
import django.contrib.auth.models as auth_models
import hashlib
class UploadFile(APIView):
'''A page for uploading files.'''
throttle_classes = ()
permission_classes = ()
parser_classes = (parsers.FormParser, parsers.JSONParser,)
renderer_classes = (renderers.JSONRenderer,)
serializer_class = serializers.UploadedFileSerializer
def calc_sha256(self,afile):
hasher = hashlib.sha256()
blocksize=65536
hasher.update('af1f9847d67300b996edce88889e358ab81f658ff71d2a2e60046c2976eeebdb') # salt
buf = afile.read(blocksize)
while len(buf) > 0:
hasher.update(buf)
buf = afile.read(blocksize)
return hasher.hexdigest()
def post(self, request):
if not request.user.is_authenticated():
return Response('User is not authenticated.', status=status.HTTP_401_UNAUTHORIZED)
uploaded_file = request.FILES.get('file',None)
if not uploaded_file:
return Response('No upload file was specified.', status=status.HTTP_400_BAD_REQUEST)
# calculate sha
sha256 = self.calc_sha256(uploaded_file)
# does the file already exist?
existing_files = models.UploadedFile.objects.filter(sha256=sha256)
if len(existing_files):
serializer = self.serializer_class(instance=existing_files[0],context={'request':request})
else:
instance = models.UploadedFile.objects.create(
creator = request.user,
title= uploaded_file.name,
file = uploaded_file,
sha256 = sha256)
serializer = self.serializer_class(instance=instance,context={'request':request})
#import rpdb2; rpdb2.start_embedded_debugger('foo')
#serializer.is_valid()
return Response(serializer.data)
FYI, this is a bit of security-through-obscurity since all the uploaded files are retrievable if you have the URL to the file.
I'm still using DRF 2.4.4, so this may not work for you on 3+. I haven't upgraded due to the dropped nested-serializers support.

File upload not working after changing models format

I have a model class similar to following -
class Document(models.Model):
docfile = models.FileField(upload_to='documents/%Y/%M/%D')
Everything is working fine and files are uploaded successfully based on directory structure.
Now I don't want to upload files in this format but simply all files in one folder so I changed the logic ..
class Document(models.Model):
docfile = models.FileField(upload_to='documents')
Now It is not uploading the files and throwing error. Maybe I need to run some command but I do not know what ??
Please suggest something
Edit1:
Ok .. I found that the actual problem lies somewhere else.
I have a view like this - (please ignore the bad spacing but that is fine in actual code)
def lists(request):
// Problematic Code Start
path = settings.MEDIA_URL + 'upload/location.txt'
f = open(path, 'w')
myfile = File(f)
myfile.write('Hello World')
myfile.closed
f.closed
// Problematic Code ends
# Handle file upload
if request.method == 'POST':
form = DocumentForm(request.POST, request.FILES)
if form.is_valid():
filename = Document(docfile = request.FILES['docfile'])
filename.save()
# Redirect to the document list after POST
return HttpResponseRedirect(reverse('sdm:lists'))
#return render_to_response(reverse('sdm:lists'))
else:
form = DocumentForm() # A empty, unbound form
# Load documents for the list page
documents = Document.objects.all()
# Render list page with the documents and the form
return render_to_response(
'sdm/lists.html',
{'documents': documents, 'form': form},
context_instance=RequestContext(request)
)
When I remove the problematic code , everything works fine. (ignore the purpose of this weird code, actual interest is something bigger)
MEDIA_URL=/media/
Here is the error:
IOError at /sdm/lists
[Errno 2] No such file or directory: '/media/upload/location.txt'
Although File Exists and all permissions are www-data:www-data with 755

"problematic" code indeed - whoever wrote this should find another job. This code is wrong in more than one way (using MEDIA_URL instead of MEDIA_ROOT - which is the cause of the IOError you get - and also badly misusing Python's dead simple file objects) and totally useless, and looks like a leftover of someone programming by accident. To make a long story short : just remove it and you'll be fine.

Django: Serving a Download in a Generic View

So I want to serve a couple of mp3s from a folder in /home/username/music. I didn't think this would be such a big deal but I am a bit confused on how to do it using generic views and my own url.
urls.py
url(r'^song/(?P<song_id>\d+)/download/$', song_download, name='song_download'),
The example I am following is found in the generic view section of the Django documentations:
http://docs.djangoproject.com/en/dev/topics/generic-views/ (It's all the way at the bottom)
I am not 100% sure on how to tailor this to my needs. Here is my views.py
def song_download(request, song_id):
song = Song.objects.get(id=song_id)
response = object_detail(
request,
object_id = song_id,
mimetype = "audio/mpeg",
)
response['Content-Disposition'= "attachment; filename=%s - %s.mp3" % (song.artist, song.title)
return response
I am actually at a loss of how to convey that I want it to spit out my mp3 instead of what it does now which is to output a .mp3 with all of the current pages html contained. Should my template be my mp3? Do I need to setup apache to serve the files or is Django able to retrieve the mp3 from the filesystem(proper permissions of course) and serve that? If it do need to configure Apache how do I tell Django that?
Thanks in advance. These files are all on the HD so I don't need to "generate" anything on the spot and I'd like to prevent revealing the location of these files if at all possible. A simple /song/1234/download would be fantastic.

Why do you want to do this with a generic view? It's very easy to do this without generic views:
from django.http import HttpResponse
def song_download(request, song_id):
song = Song.objects.get(id=song_id)
fsock = open('/path/to/file.mp3', 'rb')
response = HttpResponse(fsock, content_type='audio/mpeg')
response['Content-Disposition'] = "attachment; filename=%s - %s.mp3" % \
(song.artist, song.title)
return response
I'm not sure if it's possible to make this work somehow with a generic view. But either way, using one is redundant here. With no template to render, the context that is automatically provided by the generic view is useless.

To wrap my comment to Tomasz Zielinski into a real answer:
For several reasons it is indeed better to let apache/nginx/etc do the work of sending files.
Most servers have mechanisms to help in that usecase: Apache and lighttpd have xsendfile, nginx has X-Accel-Redirect.
The idea is that you can use all the features of django like nice urls, authentification methods, etc, but let the server do the work of serving files. What your django view has to do, is to return a response with a special header. The server will then replace the response with the actual file.
Example for apache:
def song_download(request):
path = '/path/to/file.mp3'
response = HttpResponse()
response['X-Sendfile'] = smart_str(path)
response['Content-Type'] = "audio/mpeg"
response['Content-Length'] = os.stat(path).st_size
return response
install mode_xsendfile
add XSendFileOn on and (depending on the version) XSendFileAllowAbove on or XSendFilePath the/path/to/serve/from to your apache configuration.
This way you don't reveale the file location, and keep all the url management in django.

Serving static files with Django is a bad idea, use Apache, nginx etc.
https://docs.djangoproject.com/en/dev/howto/static-files/deployment/

To answer the original question how to use a generic view, you could do the following:
from django.views.generic import DetailView
from django.http.response import FileResponse
class DownloadSong(DetailView):
model = Song
def get(self, request, *args, **kwargs):
super().get(request, *args, **kwargs)
song = self.object
return FileResponse(open(song, 'rb'),
as_attachment=True,
filename=f'{song.artist} - {song.title}.mp3')
Docs:
Detailview: https://docs.djangoproject.com/en/3.2/ref/class-based-views/generic-display/#detailview
FileResponse: https://docs.djangoproject.com/en/3.2/ref/request-response/#fileresponse-objects
If your Django version does not have the FileResponse object, use the HttpResponse as shown in the other answers.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Saving PDFs to disk as they are generated with django-wkhtmltopdf - python

Related

Downloading an Excel File in a Django View

Loading Template in Admin Form for Custom Fields

RESTful way to upload file along with some data in django

File upload not working after changing models format

Django: Serving a Download in a Generic View

Categories

Resources