I have a Pyramid web application that uses an internal web service to convert some data into a PDF file using reportlab. This works great, but only one PDF file is generated at a time. The client now wants to be able to print (it can be zipped up, instead of an actual print through a printer) multiple PDF files.
I currently have this piece of code at the bottom:
result = {"sales_order_data": {"sales_order_list": order_pdf_data}}
encoded_result = urllib.urlencode(result)
received_data = requests.post('http://convert_to_pdf/sales_order', data=encoded_result)
pdf_file = received_data.content
content_disposition = received_data.headers['Content-Disposition']
res = Response(content_type='application/pdf', content_disposition=content_disposition)
res.body = pdf_file
return res
The pdf_file is the binary form of the PDF file. I was thinking of running my pdf conversion code multiple times and each time storing the pdf binary data in a list of sorts and then using StringIO and ZipFile to zip a bunch of the files together.
I'm not too sure if this is possible:
list_of_pdf_files = []
for order in list_of_orders:
<processing of data here>
result = {"sales_order_data": {"sales_order_list": order_pdf_data}}
encoded_result = urllib.urlencode(result)
received_data = requests.post('http://convert_to_pdf/sales_order', data=encoded_result)
pdf_file = received_data.content
list_of_pdf_files.append(pdf_file)
zipped_file = <What do I do here to zip the list of pdf files?>
content_disposition = 'attachment; filename="pdf.zip"'
res = Response(content_type='application/zip', content_disposition=content_disposition)
res.body = zipped_file
return res
What do I do after getting a list of binary pdf files so that I can generate a zipped file in-memory and return that file via content-disposition as the response?
Use ZipFile.writestr() to write the PDF files one at a time to the archive.
Related
I want to download images from URLs stored in an .xslx file, zip them and use Streamlit's download button to download the zip file.
Running the code without st.download_button seems to work but I don't know how to introduce the download button to my code.
The code below gives me this error message.
RuntimeError: Invalid binary data format: <class 'zipfile.ZipFile'>
code:
import pandas as pd
import urllib.request
import zipfile
df = pd.read_excel("images.xlsx")
# Column containing the image names
title = df["Name"].tolist()
# Column containing the image urls
images = df["Image"].tolist()
with zipfile.ZipFile("images.zip", "w", zipfile.ZIP_DEFLATED) as z:
for idx,image in enumerate(images):
response = urllib.request.urlopen(image)
img_data = response.read()
z.writestr(title[idx], img_data)
st.download_button(
label="Download Images",
data=z,
file_name="images.zip",
mime="application/zip",
)
As st.download_button lists:
data (str or bytes or file)
zipfile.ZipFile is just neither of those. However your code creates an actual images.zip file, and that one is a file and can be supplied as the last example (with a flower.png) shows on the same page. So putting the pieces together the result probably looks like this:
# create the file, from your code
with zipfile.ZipFile("images.zip", "w", zipfile.ZIP_DEFLATED) as z:
for idx,image in enumerate(images):
response = urllib.request.urlopen(image)
img_data = response.read()
z.writestr(title[idx], img_data)
# open it as a regular file and supply to the button as shown in the example:
with open("images.zip", "rb") as file:
btn = st.download_button(
label = "Download Images",
data = file,
file_name = "images.zip",
mime = "application/zip"
)
I have pdf files where I want to extract info only from the first page. My solution is to:
Use PyPDF2 to read from S3 and save only the first page.
Read the same one-paged-pdf I saved, convert to byte64 and analyse it on AWS Textract.
It works but I do not like this solution. What is the need to save and still read the exact same file? Can I not use the file directly at runtime?
Here is what I have done that I don't like:
from PyPDF2 import PdfReader, PdfWriter
from io import BytesIO
import boto3
def analyse_first_page(bucket_name, file_name):
s3 = boto3.resource("s3")
obj = s3.Object(bucket_name, file_name)
fs = obj.get()['Body'].read()
pdf = PdfReader(BytesIO(fs), strict=False)
writer = PdfWriter()
page = pdf.pages[0]
writer.add_page(page)
# Here is the part I do not like
with open("first_page.pdf", "wb") as output:
writer.write(output)
with open("first_page.pdf", "rb") as pdf_file:
encoded_string = bytearray(pdf_file.read())
#Analyse text
textract = boto3.client('textract')
response = textract.detect_document_text(Document={"Bytes": encoded_string})
return response
analyse_first_page(bucket, file_name)
Is there no AWS way to do this? Is there no better way to do this?
You can use BytesIO as stream in memory without write to file then read it again.
with BytesIO() as bytes_stream:
writer.write(bytes_stream)
bytes_stream.seek(0)
encoded_string = b64encode(bytes_stream.getvalue())
I have a Django app that reads a document template (DOCX) and modifies it. The program is working well, but it is returning to download a DOCX document (as expected). So, I want to edit the download file format to PDF. I thought of converting the DOCX file to PDF but I couldn't find a working way to do that.
My actual code looks like this:
f = io.BytesIO()
document.write(f)
length = f.tell()
f.seek(0)
response = HttpResponse(
f.getvalue(),
content_type='application/vnd.openxmlformats-officedocument.wordprocessingml.document'
)
response['Content-Disposition'] = 'attachment; filename=' + formular.name + '.docx'
response['Content-Length'] = length
return response
I want to find a working method of converting the DOCX file f to a PDF file before returning that as response.
you need pywin32 library for that purpose. which you can use like so
def convert_to_pdf(doc):
try:
word = client.DispatchEx("Word.Application")
new_name = doc.replace(".docx", r".pdf")
worddoc = word.Documents.Open(doc)
worddoc.SaveAs(new_name, FileFormat = 17)
worddoc.Close()
except Exception, e:
return e
finally:
word.Quit()
I am trying to download multiple image files from the server. I am using Django for my backend.
Question related to single image has already been answered and I tried the code and it works on single image. In my application, I want to download multiple images in a single HTTP connection.
from PIL import Image
img = Image.open('test.jpg')
img2 = Image.open('test2.png')
response = HttpResponse(content_type = 'image/jpeg')
response2 = HttpResponse(content_type = 'image/png')
img.save(response, 'JPEG')
img2.save(response2, 'PNG')
return response #SINGLE
How can I fetch both img and img2 at once. One way I was thinking is to zip both images and unzip it on client size but I dont think that is good solution. Is there a way to handle this?
I looked around and find an older solution using a temporary Zip file on disk: https://djangosnippets.org/snippets/365/
It needed some updating, and this should work (tested on django 2.0)
import tempfile, zipfile
from django.http import HttpResponse
from wsgiref.util import FileWrapper
def send_zipfile(request):
"""
Create a ZIP file on disk and transmit it in chunks of 8KB,
without loading the whole file into memory. A similar approach can
be used for large dynamic PDF files.
"""
temp = tempfile.TemporaryFile()
archive = zipfile.ZipFile(temp, 'w', zipfile.ZIP_DEFLATED)
for index in range(10):
filename = 'C:/Users/alex1/Desktop/temp.png' # Replace by your files here.
archive.write(filename, 'file%d.png' % index) # 'file%d.png' will be the
# name of the file in the
# zip
archive.close()
temp.seek(0)
wrapper = FileWrapper(temp)
response = HttpResponse(wrapper, content_type='application/zip')
response['Content-Disposition'] = 'attachment; filename=test.zip'
return response
Right now, this takes my .png and writes it 10 times in my .zip, then sends it.
You could add your files/images to a ZIP file and return that one in the response. I think that is the best approach.
Here is some example code of how you could achieve that (from this post):
def zipFiles(files):
outfile = StringIO() # io.BytesIO() for python 3
with zipfile.ZipFile(outfile, 'w') as zf:
for n, f in enumarate(files):
zf.writestr("{}.csv".format(n), f.getvalue())
return outfile.getvalue()
zipped_file = zip_files(myfiles)
response = HttpResponse(zipped_file, content_type='application/octet-stream')
response['Content-Disposition'] = 'attachment; filename=my_file.zip'
Otherwise (if you don't like ZIP files) you could make individual requests from the client.
In a web app I am working on, the user can create a zip archive of a folder full of files. Here here's the code:
files = torrent[0].files
zipfile = z.ZipFile(zipname, 'w')
output = ""
for f in files:
zipfile.write(settings.PYRAT_TRANSMISSION_DOWNLOAD_DIR + "/" + f.name, f.name)
downloadurl = settings.PYRAT_DOWNLOAD_BASE_URL + "/" + settings.PYRAT_ARCHIVE_DIR + "/" + filename
output = "Download " + torrent_name + ""
return HttpResponse(output)
But this has the nasty side effect of a long wait (10+ seconds) while the zip archive is being downloaded. Is it possible to skip this? Instead of saving the archive to a file, is it possible to send it straight to the user?
I do beleive that torrentflux provides this excat feature I am talking about. Being able to zip GBs of data and download it within a second.
Check this Serving dynamically generated ZIP archives in Django
As mandrake says, constructor of HttpResponse accepts iterable objects.
Luckily, ZIP format is such that archive can be created in single pass, central directory record is located at the very end of file:
(Picture from Wikipedia)
And luckily, zipfile indeed doesn't do any seeks as long as you only add files.
Here is the code I came up with. Some notes:
I'm using this code for zipping up a bunch of JPEG pictures. There is no point compressing them, I'm using ZIP only as container.
Memory usage is O(size_of_largest_file) not O(size_of_archive). And this is good enough for me: many relatively small files that add up to potentially huge archive
This code doesn't set Content-Length header, so user doesn't get nice progress indication. It should be possible to calculate this in advance if sizes of all files are known.
Serving the ZIP straight to user like this means that resume on downloads won't work.
So, here goes:
import zipfile
class ZipBuffer(object):
""" A file-like object for zipfile.ZipFile to write into. """
def __init__(self):
self.data = []
self.pos = 0
def write(self, data):
self.data.append(data)
self.pos += len(data)
def tell(self):
# zipfile calls this so we need it
return self.pos
def flush(self):
# zipfile calls this so we need it
pass
def get_and_clear(self):
result = self.data
self.data = []
return result
def generate_zipped_stream():
sink = ZipBuffer()
archive = zipfile.ZipFile(sink, "w")
for filename in ["file1.txt", "file2.txt"]:
archive.writestr(filename, "contents of file here")
for chunk in sink.get_and_clear():
yield chunk
archive.close()
# close() generates some more data, so we yield that too
for chunk in sink.get_and_clear():
yield chunk
def my_django_view(request):
response = HttpResponse(generate_zipped_stream(), mimetype="application/zip")
response['Content-Disposition'] = 'attachment; filename=archive.zip'
return response
Here's a simple Django view function which zips up (as an example) any readable files in /tmp and returns the zip file.
from django.http import HttpResponse
import zipfile
import os
from cStringIO import StringIO # caveats for Python 3.0 apply
def somezip(request):
file = StringIO()
zf = zipfile.ZipFile(file, mode='w', compression=zipfile.ZIP_DEFLATED)
for fn in os.listdir("/tmp"):
path = os.path.join("/tmp", fn)
if os.path.isfile(path):
try:
zf.write(path)
except IOError:
pass
zf.close()
response = HttpResponse(file.getvalue(), mimetype="application/zip")
response['Content-Disposition'] = 'attachment; filename=yourfiles.zip'
return response
Of course this approach will only work if the zip files will conveniently fit into memory - if not, you'll have to use a disk file (which you're trying to avoid). In that case, you just replace the file = StringIO() with file = open('/path/to/yourfiles.zip', 'wb') and replace the file.getvalue() with code to read the contents of the disk file.
Does the zip library you are using allow for output to a stream. You could stream directly to the user instead of temporarily writing to a zip file THEN streaming to the user.
It is possible to pass an iterator to the constructor of a HttpResponse (see docs). That would allow you to create a custom iterator that generates data as it is being requested. However I don't think that will work with a zip (you would have to send partial zip as it is being created).
The proper way, I think, would be to create the files offline, in a separate process. The user could then monitor the progress and then download the file when its ready (possibly by using the iterator method described above). This would be similar what sites like youtube use when you upload a file and wait for it to be processed.