flask urlretrieve transaction isolation - python

I'm using flask to process requests which contain an URL pointing to a document. When a request arrives, the document the URL points to is saved to a file. The file is opened, processed and a json string depending on the data in the document is generated. The json string is sent in the response.
My Question is about requests which arrive with very short time between them. When User1 sends url_1 in his request the document at url_1 is saved. User2 sends a request with url_2 before the document from User1 is opened. Will the generated json string which is sent to User1 be based on the document at url_2? Is this very likely to happen?
The following picture illustrates the scenario:
Here is what the flask app looks like:
app = Flask(__name__)
#app.route("/process_document", methods=['GET'])
def process_document():
download_location = "document.txt"
urllib.request.urlretrieve(request.args.get('document_location'),download_location)
json = some_module.construct_json(download_location)
return json

If threading is enabled (disabled by default) then the situation can happen. If you must use the local file system, then it's always advisable to isolate it, e.g. using a temporary directory. You can use tempfile.TemporaryDirectory for example for that.
import os
from tempfile import TemporaryDirectory
# ...
#app.route("/process_document", methods=['GET'])
def process_document():
with TemporaryDirectory() as path:
download_location = os.path.join(path, "document.txt")
urllib.request.urlretrieve(
request.args.get('document_location'),
download_location
)
json = some_module.construct_json(download_location)
return json
Using a temporary directory or file helps to avoid concurrancy issues like you describe. But it also guards against issues where say your function throws an exception and keeps the file around (it may not guard agains serious crashes). You would then not accidentally pick up a file from a previous run.

Related

Django + Gunicorn + Nginx + Python -> Link to download file from webserver

On my webpage served by a Debian web server hosted by amazon-lightsail behind nginx and gunicorn, a user can send a request to start a Django view function. This function add some work to a background process and check every 5s if the background process created a file. If the file exists, the view sends a response and the user can download the file. Sometimes this process can take a long time and the user get a 502 bad gateway message. If the process takes too long, I like to send the user an email with a link where he can download the file from the web server. I know how to send the email after the process is finished, but I don't know how to serve the file to the user by a download link.
This is the end of my view function:
print('######### Serve Downloadable File #########')
while not os.path.exists(f'/srv/data/ship_notice/{user_token}'):
print('wait on file is servable')
time.sleep(5)
# Open the file for reading content
path = open(filepath, 'r')
# Set the mime type
mime_type, _ = mimetypes.guess_type(filepath)
# Set the return value of the HttpResponse
response = HttpResponse(path, content_type=mime_type)
# Set the HTTP header for sending to browser
response['Content-Disposition'] = f"attachment; filename={filename}"
# Return the response value
return response
Another model function which sends the mail to the user after process is finished:
def send_mail_precipitation(filepath, user_token, email):
from django.core.mail import EmailMessage
import time
import os
while not os.path.exists(f'/srv/data/ship_notice/{user_token}'):
print('wait 30secs')
time.sleep(30)
msg = EmailMessage(
subject = 'EnviAi data',
body = 'The process is finished, you can download the file here.... ',
to = [email]
)
msg.send()
The file is too big to send it with msg.attach_file(filepath)
What options do I have to send the user a link to download these files. Do I need to set up a ftp server/folder, or what kind of options do I have? And what kind of work do I have to do when I want that the link is only 72h valid? Thanks a lot!
Update
One way would be to copy the file to the static folder, which is available to the public. Should I avoid this approach for any reason?
Not a straight answer but a possible way to go.
Such a long-running tasks are usually implemented with additional tool like Celery. It is a bad practice to let view/api endpoint run as long as it takes and keep requesting process waiting until completed. Good practice is to give response as fast as you can.
In your case it would be:
create a celery task to build your file (creating a task is fast)
return task id in response
request task status from frontend with given task id
when task is done file URL should be returned
It is also possible to add on_success code which will be executed (started by Celery automatically) when task is done. You can call your email_user_when_file_is_ready function in reaction on this event.
To make files downloadable you can add a location to the nginx config same as you did for static and media folders. Put your files to the location mapped folder and that's it. Give the user URL to your file.

How to serialize an OAuth1Session?

I currently have a monolithic Python script which performs an OAuth authentication, returning an OAuth1Session, and then proceeds to perform some business logic using that OAuth1Session to gain authorization to a third-party service.
I need to split this up into two separate scripts, one which performs the OAuth authentication and will run on one machine, and the other which will run on a remote machine to perform the business logic authorized against the third-party service.
How can I serialize the OAuth1Session object so that the authenticated tokens can be handed off seamlessly from the authentication script on machine A to the processing script on machine B?
I tried the obvious:
print(json.dumps(session))
But I got this error:
TypeError: Object of type OAuth1Session is not JSON serializable
Is there a canonical solution for this simple requirement?
UPDATE
Here's the entire source code. Please note this is not my code, I downloaded it from the author and now I'm trying to modify it to work a bit differently.
"""This Python script provides examples on using the E*TRADE API endpoints"""
from __future__ import print_function
import webbrowser
import json
import logging
import configparser
import sys
import requests
from rauth import OAuth1Service
def oauth():
"""Allows user authorization for the sample application with OAuth 1"""
etrade = OAuth1Service(
name="etrade",
consumer_key=config["DEFAULT"]["CONSUMER_KEY"],
consumer_secret=config["DEFAULT"]["CONSUMER_SECRET"],
request_token_url="https://api.etrade.com/oauth/request_token",
access_token_url="https://api.etrade.com/oauth/access_token",
authorize_url="https://us.etrade.com/e/t/etws/authorize?key={}&token={}",
base_url="https://api.etrade.com")
base_url = config["DEFAULT"]["PROD_BASE_URL"]
# Step 1: Get OAuth 1 request token and secret
request_token, request_token_secret = etrade.get_request_token(
params={"oauth_callback": "oob", "format": "json"})
# Step 2: Go through the authentication flow. Login to E*TRADE.
# After you login, the page will provide a text code to enter.
authorize_url = etrade.authorize_url.format(etrade.consumer_key, request_token)
webbrowser.open(authorize_url)
text_code = input("Please accept agreement and enter text code from browser: ")
# Step 3: Exchange the authorized request token for an authenticated OAuth 1 session
session = etrade.get_auth_session(request_token,
request_token_secret,
params={"oauth_verifier": text_code})
return(session, base_url)
# loading configuration file
config = configparser.ConfigParser()
config.read(sys.argv[1])
(session, base_url) = oauth()
print(base_url)
print(json.dumps(session))
#original code
#market = Market(session, base_url)
#quotes = market.quotes(sys.argv[2])
Please note the last two commented-out lines. That is the original code: Immediate after the oauth is performed, the code invokes some business functionality. I want to break this up into two separate scripts running as isolated processes: Script 1 performs the oauth and persists the session, Script 2 reads the session from a file and performs the business functionality.
Unfortunately it fails at the last line, print(json.dumps(session)).
"XY Problem" Alert
My goal is to split up the script into two so that the business logic can run in a separate machine from the authentication code. I believe that the way to do this is to serialize the session object and then parse it back in the second script. Printing out the session using json.dumps() is an intermediate step, 'Y', in my journey to solving problem 'X'. If you can think of a better way to achieve the goal, that could be a valid answer.
From the comments available in the source code here: https://github.com/litl/rauth/blob/a6d887d7737cf21ec896a8104f25c2754c694011/rauth/session.py
You only need to serialize some attributes of your object to reinstantiate it:
Line 103
def __init__(self,
consumer_key,
consumer_secret,
access_token=None,
access_token_secret=None,
signature=None,
service=None):
...
Thus I would suggest serializing the following dict on the first machine:
info_to_serialize = {
'consumer_key': session.consumer_key,
'consumer_secret': session.consumer_secret,
'access_token': session.access_token,
'access_token_secret': session.access_token_secret
}
serialized_data = json.dumps(info_to_serialize)
And on the second machine reinstantiate your session like that:
from rauth.session import OAuth1Session
info_deserialized = json.loads(serialized_data)
session = OAuth1Session(**info_deserialized)
Hope this helped

Postman, Python and passing images and metadata to a web service

this is a two-part question: I have seen individual pieces discussed, but can't seem to get the recommended suggestions to work together. I want to create a web service to store images and their metadata passed from a caller and run a test call from Postman to make sure it is working. So to pass an image (Drew16.jpg) to the web service via Postman, it appears I need something like this:
For the web service, I have some python/flask code to read the request (one of many variations I have tried):
from flask import Flask, jsonify, request, render_template
from flask_restful import Resource, Api, reqparse
...
def post(self, name):
request_data = request.get_json()
userId = request_data['UserId']
type = request_data['ImageType']
image = request.files['Image']
Had no problem with the data portion and straight JSON but adding the image has been a bugger. Where am I going wrong on my Postman config? What is the actual set of Python commands for reading the metadata and the file from the post? TIA
Pardon the almost blog post. I am posting this because while you can find partial answers in various places, I haven't run across a complete post anywhere, which would have saved me a ton of time. The problem is you need both sides to the story in order to verify either.
So I want to send a request using Postman to a Python/Flask web service. It has to have an image along with some metadata.
Here are the settings for Postman (URL, Headers):
And Body:
Now on to the web service. Here is a bare bones service which will take the request, print the metadata and save the file:
from flask import Flask, request
app = Flask(__name__)
# POST - just get the image and metadata
#app.route('/RequestImageWithMetadata', methods=['POST'])
def post():
request_data = request.form['some_text']
print(request_data)
imagefile = request.files.get('imagefile', '')
imagefile.save('D:/temp/test_image.jpg')
return "OK", 200
app.run(port=5000)
Enjoy!
Make sure `request.files['Image'] contains the image you are sending and follow http://flask.pocoo.org/docs/1.0/patterns/fileuploads/ to save the file to your file system. Something like
file = request.files['Image']
file.save('./test_image.jpg')
might do what you want, while you will have to work out the details of how the file should be named and where it should be placed.

Python micro web service always hang

I built a micro web service but I find it hangs a lot. By hang I mean all requests will just time out, when it hangs, I can see the process is running fine in server using only about 15MB memory as usual. I think it's a very interesting problem to post, the code is super simple, please tell me what I am doing wrong.
app = Bottle()
# static routing
#app.route('/')
def server_static_home():
return static_file('index.html', root='client/')
#app.route('/<filename>')
def server_static(filename):
return static_file(filename, root='client/')
#app.get('/api/data')
def getData():
data = {}
arrayToReturn = []
with open("data.txt", "r") as dataFile:
entryArray = json.load(dataFile)
for entry in entryArray:
if not entry['deleted']:
arrayToReturn.append(entry)
data["array"] = arrayToReturn
return data
#app.put('/api/data')
def changeEntry():
jsonObj = request.json
with open("data.txt", "r+") as dataFile:
entryArray = json.load(dataFile)
for entry in entryArray:
if entry['id'] == jsonObj['id']:
entry['val'] = jsonObj['val']
dataFile.seek(0)
json.dump(entryArray, dataFile, indent=4)
dataFile.truncate()
return {"success":True}
run_simple('0.0.0.0', 80, app, use_reloader=True)
Basically mydomain.com is route to my index.html and load necessary JS, CSS files, that's what static routing part is doing. Once page is loaded, an ajax GET request is fired to /api/data to load data and when I modify data, it fires another ajax Put request to /api/data to modify data.
How to reproduce
It's very easy to reproduce the hang, I just need to visit mydomain.com and refresh the page for 10-30 times rapidly, then it will stop responding. But I was never able to reproduce this locally how ever fast I refresh and data.txt is the same on my local machine.
Update
Turns out it's not problem with read/write to file but a problem with trying to write to broken pipe. The client that sent request close the connection before receiving all the data. I'm looking into solution now...
It looks like you are trying to open and read the same data.txt file with every PUT request. Eventually you are going to run into concurrency issues with this architecture as you will have multiple requests trying to open and write to the same file.
The best solution is to persist the data to a database (something like MySQL, Postgres, Mongodb) instead of writing to a flat file on disk.
However, if you must write to a flat file, then you should write to a different file per request where the name of the file could be the jsonObj['id'], This way you avoid the problem of multiple requests trying to read/write to the same file at the same time.
Reading and writing to your data.txt file will be victim as race conditions as Calvin mentions. Databases are pretty easy in python especially with libraries like SqlAlchemy. But if you insist, you can also use a global dictionary and a lock assuming your webserver is not running as multiple processes. Something like
entryArray = {}
mylock = threading.Lock()
#app.put('/api/data')
def changeEntry():
jsonObj = request.json
with mylock.lock:
for entry in entryArray:
if entry['id'] == jsonObj['id']:
entry['val'] = jsonObj['val']

Serving HTTP client with versioned resource only if it has changed - using Flask

I am running a webserver based on Flask, which serves a resource being versioned (e.g. installation file of some versioned program). I want to serve my HTTP client with new resource only in case, it already does not have the current version available. If there is new version, I want the client to download the resource and install it.
my Flask server looks like this
import json
import redis
import math
import requests
from flask import Flask,render_template,request
app=Flask(__name__)
#app.route('/version', methods=['GET','POST'])
def getversion():
r_server=redis.Redis("127.0.0.1")
if request.method == 'POST':
jsonobj_recieve=request.data
data=json.loads(jsonobj)
currentversion=r_server.hget('version')
if data == currentversion:
#code to return a 'ok'
else:
#code to return 'not ok' also should send the updated file to the client
else:
return r_server.hget('version')
if __name__ == '__main__':
app.run(
debug=True,
host="127.0.0.1",
port=80
)
my client is very basic:
import sys
import json
import requests
url="http://127.0.0.1/version"
jsonobj=json.dumps(str(sys.argv[1]))
print jsonobj
r=requests.post(url,data=jsonobj)
I will likely have to recode the entire client, this is not a problem but I really have no idea where to start....
Requirements Review
have web app, serving a versioned resource. It can be e.g. file with an applications.
have client, which allows fetching the resource only in case, the version of resource on the server and what client has locally already available differ
the client is aware of version string of the resource
allow client to learn new version string if new version is available
HTTP like design of your solution
If you want to allow downloading an application only in case, the client does not have it already, following design could be used:
use etag header. This usually contains some string describing unique status of resource you want to get from that url. In your case it could be current version number of your application.
in your request, use header "if-none-match", providing version number of your application present at client. This will result in HTTP Status code 306 - Not Modified in case, your client and server share the same version of resource. In case it differs, you would simply provide the content of the resource and use it. Your resource shall also denote in etag current version of the resource and your client shall take note of it, or find new version name from other sources (like from the downloaded file).
This design follows HTTP principles.
Flask serving resource with declaring version in etag
This is focusing on showing the principle, you shall elaborate on providing real content of the resource.
from flask import Flask, Response, request
import werkzeug.exceptions
app = Flask(__name__)
class NotModified(werkzeug.exceptions.HTTPException):
code = 304
def get_response(self, environment):
return Response(status=304)
#app.route('/download/app')
def downloadapp():
currver = "1.0"
if request.if_none_match and currver in request.if_none_match:
raise NotModified
def generate():
yield "app_file_part 1"
yield "app_file_part 2"
yield "app_file_part 3"
return Response(generate(), headers={"etag": currver})
if __name__ == '__main__':
app.run(debug=True)
Client getting resource only, if it is new
import requests
ver = "1.0"
url = "http://localhost:5000/download/app"
req = requests.get(url, headers={"If-None-Match": ver})
if req.status_code == 200:
print "new content of resource", req.content
new_ver = req.headers["etag"]
else:
print "resource did not change since last time"
Alternative solution of web part using web server (e.g. NGINX)
Assuming the resource is static file, which updates only sometime, you shall be able configuring your web server, e.g. NGINX, to serve that resource and declaring in your configuration explicit value for etag header to the version string.
Note, that as it was not requested, this alternative solution is not elaborated here (and was not tested).
Client implementation would not be modified by that (here it pays back the design is following HTTP concepts).
There are multiple ways of achieving this but as this is a Flask app, here's one using HTTP.
If the version is OK, just return a relevant status code, like a 200 OK. You can add a JSON response in the body if that's necessary. If you return a string with flask, the status code will be 200 OK and you can inspect that in your client.
If the version differs, return the URL where the file is located. The client will have to
download the file. That's pretty simple using requests. Here's a typical example for downloading file by streaming requests:
def get(url, chunk_size=1024):
""" Download a file in chunks of n bytes """
fn = url.split("/")[-1] # if you're url is complicated, use urlparse.
stream = requests.get(url, stream=True)
with open(fn, "wb") as local:
for chunk in stream.iter_content(chunk_size=chunk_size):
if chunk:
f.write(chunk)
return fn
This is very simplified. If your file is not static and cannot live on the server (like software update patches probably shouldn't) then you'll have to figure out a way to get the file from a database or generate it on the fly.

Categories