Generate PDF of Protected Django Webpage with Attachments - python

So I'm trying to generate a PDF of a view that I have in a django web application. This view is protected, meaning the user has to be logged in and have specific permission to view the page. I also have some attachments (stored in the database as FileFields) that I would like to append to the end of the PDF.
I've read most of the posts I could find on how to generate PDFs from a webpage using pdfkit or reportlab, but all of them fail for me for some reason or another.
Currently, the closest I've gotten is successfully generating a PDF of the page using pdfkit, but this requires me to remove the restrictions that require the user to be logged in and have page permissions, which really isn't an option long term. I found a couple posts that discuss printing pdfs on protected pages and providing login information, but I couldn't get any of that to work.
I haven't found anything on how to include attachments, and don't really know where to start with that.
I'm more than happy to update this question with more information or snippets of code if need be, but there's quite a few moving parts here and I don't want to flood people with useless information. Let me know if there's any other information I should provide, and thanks in advance for any help.

I got it working! Through a combination of PyPDF2 and pdfkit, I got this to work pretty simply. It works on protected pages because django takes care of getting the complete html as a string, which I just pass to pdfkit. It also supports appending attachments, but I doubt (though I haven't tested) that it works with anything other than pdfs.
from django.template.loader import get_template
from PyPDF2 import PdfFileWriter, PdfFileReader
import pdfkit
def append_pdf(pdf, output):
[output.addPage(pdf.getPage(page_num)) for page_num in range(pdf.numPages)]
def render_to_pdf():
t = get_template('app/template.html')
c = {'context_data': context_data}
html = t.render(c)
pdfkit.from_string(html, 'path/to/file.pdf')
output = PdfFileWriter()
append_pdf(PdfFileReader(open('path/to/file.pdf', "rb")), output)
attaches = Attachment.objects.all()
for attach in attaches:
append_pdf(PdfFileReader(open(attach.file.path, "rb")), output)
output.write(open('path/to/file_with_attachments.pdf', "wb"))

If you just want to secure it, you could write a custom Authentication Backend that lets your server spoof users. Way over-kill but it would solve your problem and at least you get to learn about custom auth backends! (Note: You should be using HTTPS.)
https://docs.djangoproject.com/en/1.11/topics/auth/customizing/#writing-an-authentication-backend
Create auth backend in app/auth_backends.py
Add app.auth_backends.SpoofAuthBackend backend to settings.py that takes a shared_secret and user_id.
Create a URL route like url(r'^spoof-user/(?P<user_id>\d+)/$', 'app.views.spoof_user', name="spoof-user")
Add the view spoof_user that must invoke both django.contrib.auth.authenticate (which invokes backend in #1 above) and after getting user from authenticate(...) you pad the request with the user django.contrib.auth.login(request, user). Finally, this view should return HttpResponseForbidden if the shared secret is wrong or HttpResponseRedirect to the PDF URL you actually want (after logging in to spoof user programmatically via authenticate and login).
You would probably want to create a random secret key each request using something like cache.set('spoof-user-%s' % user_id, RANDOM_STRING, 30) which persists shared secret for 30 seconds to allow time for request. Then perform pdf_response = requests.get("%s?shared_secret=1a2b3c&redirect_uri=/path/to/pdf/" % reverse('spoof-user', kwargs={'user_id': 1234})). Your new view will test the provided shared_secret in auth backend, login user to request and perform redirect to request.GET.get('redirect_uri').

You can use pdfkit to do that. You can retrieve the page using the url and pdfkit will handle the rest:
pdfkit.from_url('http://website.com/somepage', 'somepage.pdf')
You will have to properly access the page using the appropriate headers for it is protected of course:
options = {
'cookie': [
('cookie-name1', 'cookie-value1'),
('cookie-name2', 'cookie-value2'),
]
}
pdfkit.from_url('http://website.com/somepage', 'somepage.pdf')
`

Related

How to run Python script on a webserver

I use a webapp that can generate a PDF report of some data stored in the app. To get to that report, however, requires several clicks and monkeying around with the app.
I support a group of users of this app (we use the app, we don't create the app) and I'd like them to be able to generate and view this report with as few clicks as possible. Thankfully, this web app provides a lot of data via a RESTful API. So I did some scripting.
I have a Python script that makes an HTTP GET request, processes the JSON results, and uses that resultant data to dynamically build a URL. Here's a simplified version of my python code:
#!/usr/bin/env python
import requests
app_id="12345"
secret="67890"
api_url='https://api.webapp.example/some_endpoint'
resp = requests.get(api_url, auth=(app_id,secret))
json_data = resp.json()
# Simplification of the data processing I'm doing
my_data = json_data['attr1']['attr2'] + my_data_processing
# Result of the script is a link to a dynamically generated PDF
pdf_url = 'https://pdf.webapp.example/items/' + my_data
The above is a simplification of the code I actually have, but it shows the relevant points. In my actual script, I continue on by doing another GET with the dynamically built URL. The webapp generates a PDF based on the my_data portion of the URL, and I write that PDF to file. This works very well today.
Currently, this is a python script that runs on my local machine on-demand. However, I'd like to host this somewhere on the web so that when a user hits a URL in their browser it runs and generates the pdf_url, instead of having to install this script on each user's local machine, and so that the PDF can be generated and viewed on a mobile device.
The thought is that the user can open http://example.com/report-shortcut, the python script would run server-side, dynamically build the URL, and redirect the user to that URL, which would then show the PDF in the browser (assuming the user is using a browser that shows PDFs like Chrome, Safari, etc). Alternately, if a redirect is problematic, going to http://example.com/report-shortcut could just show an HTML page with a link to the URL generated by the Python script.
I'm looking for a solution on how to host this Python script and have it run when a user accesses a webpage. I've looked into AWS Lambda and Django, but both seem like overkill for such a simple script (~20 lines of code, plus comments and whitespace). I've also looked at Python CGI scripting, which looks promising, but I have no experience setting up something like that.
Looking for suggestions on how best to host and run this code when a user goes to the example URL.
PS: I thought about just re-implementing in Javascript, but I'd rather the API key not be publicly accessible.
I suggest building the script in AWS Lambda and using the API Gateway to invoke it.
You could create the pdf, store it in S3 and generate a pre-signed URL. Then return a response 302 to the user to redirect them to the pre-signed URL. This will display the PDF in their browser.
Very quick to setup and using Boto3 getting the PDF into S3 and generating the URL is simple.
It will be much simpler than some of your other suggestions.
See API Gateway
& Boto3

Python Flask: Download a file generated on the fly AND print a response

I followed the tutorial to stream a file generated on the fly in Flask. Now I want to display a message using the same data that was used to generate the file. It's a large dataset and I can not afford to download it both to generate the file and print a result on the page.
Unlike In Flask how can I redirect to a template and show a message after returning send_file in a view? , I do not want a redirect or a refresh. Is it possible to send both a file and HTML response in a single page load?
I tried using a generator but did not have any success.
I am using Heroku.
You're trying to return a "multipart HTTP response", with a HTML part and another (the file) part. After a quick research I'm not sure such a thing exists, and if it does how it is supported/implemented in browsers.
A slightly different way to that would be to respond with a "classic" HTML response which would then fire a second request, a XHR call at document load for instance, to start the download of the file.
Server side you would have to store the content of the file, I'm not really familiar with Heroku, but I guess you could use a key-value like Redis to do so, or even a dedicated service like Amazon S3.

Python: Request URL via POST and show result in browser

I'm a developer for a big GUI app and we have a web site for bug tracking. Anybody can submit a new bug to the bug tracking site. We can detect certain failures from our desktop app (i.e. an unhandled exception) and in such cases we would like to open the submit-new-bug form in the user predefined browser, adding whatever information we can gather about the failure to some form fields. We can either retrieve the submit-new-bug form using GET or POST http methods and we can provide default field values to that form. So from the http server side everything is pretty much OK.
So far we can successfully open a URL passing the default values as GET parameters in the URL using the webbrowser module from the Python Standard Library. There are, however, some limitations of this method such as the maximum allowed length of the URL for some browsers (specially MS IE). The webbrowser module doesn't seem to have a way to request the URL using POST. OTOH there's the urllib2 module that provides the type of control we want but AFAIK it lacks the possibility of opening the retrieved page in the user preferred browser.
Is there a way to get this mixed behavior we want (to have the fine control of urllib2 with the higher level functionallity of webbrowser)?
PS: We have thought about the possibility of retreiving the URL with urllib2, saving its content to a temp file and opening that file with webbrowser. This is a little nasty solution and in this case we would have to deal with other issues such as relative URLs. Is there a better solution?
This is not proper answer. but it also work
import requests
import webbrowser
url = "https://www.facebook.com/login/device-based/regular/login/?login_attempt=1&lwv=110"
myInput = {'email':'mymail#gmail.com','pass':'mypaass'}
x = requests.post(url, data = myInput)
y = x.text
f = open("home.html", "a")
f.write(y)
f.close()
webbrowser.open('file:///root/python/home.html')
I don't know of any way you can open the result of a POST request in a web browser without saving the result to a file and opening that.
What about taking an alternative approach and temporarily storing the data on the server. Then the page can be opened in the browser with a simple id parameter, and the saved partially filled form would be shown.
You could use tempfile.NamedTemporaryFile():
import tempfile
import webbrowser
import jinja2
t = jinja2.Template('hello {{ name }}!') # you could load template from a file
f = tempfile.NamedTemporaryFile() # deleted when goes out of scope (closed)
f.write(t.render(name='abc'))
f.flush()
webbrowser.open_new_tab(f.name) # returns immediately
A better approach if the server can be easily modified is to make POST request with partial parameters using urllib2 and open url generated by server using webbrowser as suggested by #Acorn.

Download from Megaupload with login - Python

It's my first question here.
Today, I've done a little application using wxPython: a simple Megaupload Downloader, but yet, it doesn't support premium accounts.
Now I would like to know how to download from MU with a login (free or premium user).
I'm very new to Python, so please don't be specific and "professional".
I used to download files with urlretrieve but, but is there a way to pass "arguments" or something to be able to log in as a premium user ?
Thank you. :D
EDIT =
News: new help needed xD
After trying with PyCUrl, htmllib2 and mechanize, I've done the login with urllib2 and cookiejar (the requested html says the username).
But when I start download a file, surely the server doesn't keep my login, in fact the downloaded file seems corrupted (I changed wait time from 45 to 25 seconds).
How can I download a file from MegaUpload keeping my previously done login? Thanks for your patient :D
Questions like this are usually frowned upon, they are very broad, and there are already an abundance of answers if you just search on google.
You can use urllib, or mechanize, or any library you can make an http post request with.
megaupload looks to have the form values
login:1
redir:1
username:
password:
just post those values at http://megaupload.com/?c=login
all you should have to do is set your username and password to the correct values!
For logging in using Python follow the following steps.
Find the list of parameters to be sent in the POST request and the url where the request has to be made by viewing the source of the login form. You may use a browser with "Inspect Element" feature to find it easily. [parameter name examples - userid, password]. Just check the tags name attribute.
Most of the sites set a cookie on logging in and the cookie has to be sent along with subsequent requests. To handle this download httllib2 (http://code.google.com/p/httplib2/ ) and read the wiki page on the link given. It has shown how to login with examples.
Now you can make subsequent requests for files, the cookies etc. will be handled automatically by httplib2.
i do alot of web stuff with python, i perfer using pycurl you can get it here
it is very simple to post data and login with curl, i've used it accross many languages such as PHP, python, and C++, hope this helps
You can use urllib this is a good example

Accessing Facebook Profile URLs

The goal here, given a user facebook profile url, access and open the profile page. Some simple python code:
from urllib2 import urlopen
url = "http://www.facebook.com/username"
page = urlopen(url)
The problem is that for some "username" this causes HTTP ERROR 404. I noticed this error only happening when the path includes a name rather than the "profile.php?id=XXX" format.
Notice that we only have the url here and not the user id.
UPDATE:
This turned out to happen also for some of the "profile.php?id=XXX" and other username formats.
This is a privacy feature of Facebook. Users have the ability to hide their profile page so that only logged in users can view their page. Accessing the page with /profile.php?id=XXX or with /username makes no difference. You must be logged-in in order to view the HTML page.
In your context, you'd have to first log in to a valid Facebook account before requesting the page and you should no longer receive the 404's.
One way to check this is on the graph API, graph.facebook.com/USERNAME will return a link property in the resulting JSON if they have a public page, and it will be omitted on private pages.
Not every Facebook account is accessible as FIRST.LAST, so you won't be able to reliably do this.
There is currently no guarantee that an account is accessible with a vanity name.
Works perfectly fine as long as the username exists.
Are you trying to open the page in a Web Browser or access the HTML source generated by the page?
If the latter, have you thought of using the Facebook Graph API to achieve whatever it is that you are doing? This will be much faster and the API is all documented. Plus the page's HTML source could change at any point in time, whereas the Graph API will not.
Edit
You could use the Graph API without having to even create an application to get the user ID, but going to http://graph.facebook.com/username and parsing the JSON response. You can then access the profile HTML using http://www.facebook.com/profile.php?id=userId

Categories