I use a webapp that can generate a PDF report of some data stored in the app. To get to that report, however, requires several clicks and monkeying around with the app.
I support a group of users of this app (we use the app, we don't create the app) and I'd like them to be able to generate and view this report with as few clicks as possible. Thankfully, this web app provides a lot of data via a RESTful API. So I did some scripting.
I have a Python script that makes an HTTP GET request, processes the JSON results, and uses that resultant data to dynamically build a URL. Here's a simplified version of my python code:
#!/usr/bin/env python
import requests
app_id="12345"
secret="67890"
api_url='https://api.webapp.example/some_endpoint'
resp = requests.get(api_url, auth=(app_id,secret))
json_data = resp.json()
# Simplification of the data processing I'm doing
my_data = json_data['attr1']['attr2'] + my_data_processing
# Result of the script is a link to a dynamically generated PDF
pdf_url = 'https://pdf.webapp.example/items/' + my_data
The above is a simplification of the code I actually have, but it shows the relevant points. In my actual script, I continue on by doing another GET with the dynamically built URL. The webapp generates a PDF based on the my_data portion of the URL, and I write that PDF to file. This works very well today.
Currently, this is a python script that runs on my local machine on-demand. However, I'd like to host this somewhere on the web so that when a user hits a URL in their browser it runs and generates the pdf_url, instead of having to install this script on each user's local machine, and so that the PDF can be generated and viewed on a mobile device.
The thought is that the user can open http://example.com/report-shortcut, the python script would run server-side, dynamically build the URL, and redirect the user to that URL, which would then show the PDF in the browser (assuming the user is using a browser that shows PDFs like Chrome, Safari, etc). Alternately, if a redirect is problematic, going to http://example.com/report-shortcut could just show an HTML page with a link to the URL generated by the Python script.
I'm looking for a solution on how to host this Python script and have it run when a user accesses a webpage. I've looked into AWS Lambda and Django, but both seem like overkill for such a simple script (~20 lines of code, plus comments and whitespace). I've also looked at Python CGI scripting, which looks promising, but I have no experience setting up something like that.
Looking for suggestions on how best to host and run this code when a user goes to the example URL.
PS: I thought about just re-implementing in Javascript, but I'd rather the API key not be publicly accessible.
I suggest building the script in AWS Lambda and using the API Gateway to invoke it.
You could create the pdf, store it in S3 and generate a pre-signed URL. Then return a response 302 to the user to redirect them to the pre-signed URL. This will display the PDF in their browser.
Very quick to setup and using Boto3 getting the PDF into S3 and generating the URL is simple.
It will be much simpler than some of your other suggestions.
See API Gateway
& Boto3
Related
Is it possible to upload and manipulate a photo in the browser with GitHub-pages? The photo doesn't need to be stored else than just for that session.
PS. I'm new to this area and I am using python to manipulate the photo.
GitHub pages allows users to create static HTML sites. This means you have no control over the server which hosts the HTML files - it is essentially a file server.
Even if you did have full control over the server (e.g. if you hosted your own website), it would not be possible to allow the client to run Python code in the browser since the browser only interprets JavaScript.
Therefore the most easy solution is to re-write your code in JavaScript.
Failing this, you could offer a download link to your Python script, and have users trust you enough to run it on their computer.
I'm doing a personal project where I am trying to scrape HTML tables from a financial data website using Python. I am able to successfully use the requests package in Python to access public websites and extract any information (using BeautfulSoup4 afterwards for processing), but the code I am using is shown below:
# import requests
import requests
# access website
url = 'https://financial-data-url.ezproxy1.library.uniname.edu.com/path/to/financial/data'
headers = example_header
page = requests.get(url, headers = headers)
However, trying to access the website normally requires login through my University's library database through an EZproxy server (shown in example url). When I attempt to request the URL of the financial data webpage after getting access through the library database, it returns what seems to be the University library EZproxy webpage. This is where I need to click "login" before being directed to the financial data webpage.
Is there some credential provision that I may be missing in the request function, or potentially a different way of passing the proxy server to the URL so that the request does not end up on the proxy server login page?
I found that the fastest and most effective work-around for this problem is to use the Selenium web-based automation package (https://selenium-python.readthedocs.io/)
Selenium makes it very easy to replicate a login as well as navigation within the browser just as a person would. IMO, the simplicity of it may far outweigh the benefits of calling the web-page directly depending on the use-case (not efficient when speed and efficiency is the primary goal, however if that is not a major constraint it works quite well)
I am trying to write a code that will download all the data from a server which holds the .rar files about imaginary cadastrial particles for student projects. What I got for now is the query for the server which only needs to input a specific number of particle and access it as url to download the .rar file.
url = 'http://www.pg.geof.unizg.hr/geoserver/wfs?request=getfeature&version=1.0.0&service=wfs&&propertyname=broj,naziv_ko,kc_geom&outputformat=SHAPE-ZIP&typename=gf:katastarska_cestica&filter=<Filter+xmlns="http://www.opengis.net/ogc"><And><PropertyIsEqualTo><PropertyName>broj</PropertyName><Literal>1900/1</Literal></PropertyIsEqualTo><PropertyIsEqualTo><PropertyName>naziv_ko</PropertyName><Literal>Suma Striborova Stara (9997)</Literal></PropertyIsEqualTo></And></Filter>'
This is the "url" I want to open with the web browser module for a particle "1900/1" but this way I get an error:
This XML file does not appear to have any style information associated with it. The document tree is shown below.
When I manually input this url it downloads the file without a problem.
What is the way I can make this python web application work?
I used a webbrowser.open_new(url) option which does not work.
You're using the wrong tool. webbrowser is for controlling a native web browser. If you just want to download a file, use the requests module (or urllib.request if you can't install Requests).
import requests
r = requests.get('http://www.pg.geof.unizg.hr/geoserver/wfs', params={
'request': 'getfeature',
...
'filter': '<Filter xmlns=...>'
})
print(r.content) # or write it to a file, or whatever
Note requests will handle encoding GET parameters for you -- you don't need to worry about escaping the request yourself.
I followed the tutorial to stream a file generated on the fly in Flask. Now I want to display a message using the same data that was used to generate the file. It's a large dataset and I can not afford to download it both to generate the file and print a result on the page.
Unlike In Flask how can I redirect to a template and show a message after returning send_file in a view? , I do not want a redirect or a refresh. Is it possible to send both a file and HTML response in a single page load?
I tried using a generator but did not have any success.
I am using Heroku.
You're trying to return a "multipart HTTP response", with a HTML part and another (the file) part. After a quick research I'm not sure such a thing exists, and if it does how it is supported/implemented in browsers.
A slightly different way to that would be to respond with a "classic" HTML response which would then fire a second request, a XHR call at document load for instance, to start the download of the file.
Server side you would have to store the content of the file, I'm not really familiar with Heroku, but I guess you could use a key-value like Redis to do so, or even a dedicated service like Amazon S3.
I want to download a zip file using python.
With this type of url,
http://server.com/file.zip
this is quite simple by using urllib2.urlopen and writing it in a local file.
But in my case I have this type of url:
http://server.com/customer/somedata/download?id=121&m=zip,
the download is launched after a form validation.
It could be useful to precise that in my case I want to deploy it on heroku, so I can't use spynner that is built with C++. This download is launched after a scraping that uses scrapy.
From a browser the download works well, I get a good zip file with its name. Using python I just get html and header data...
Is there any way to get a file from this type of url in python ?
This Site is serving JavaScript which then invokes the download.
You have no choice but to: a) evaluate the JavaScript in a simulated Browser environment or b) parse manually what the JS does, and re-implement that in python. e.g. string extraction of the URL and download key, possibly invoking an AJAX request, and finally download the file
I generally recommend Mechanize for webpage related automation, but it cannot deal with JavaScript either, so I guess you can stick with Scrapy if you want to go for plan b).
When you do the download in the browser, open up the network tab of the developer console and record what HTTP method (probably POST), the POST parameters, the cookie, and everything else that is part of the validation; then use a library to replicate that.