I'm not sure if I worded the question correctly, but for example I want to return the response without returning the function.
My context here is, the user asks for a large excel file to be generate, so a link will be returned to him, and when the excel is done an email will also be sent.
Pseudo example :
from flask import Flask
from flask import send_file
from someXlsLib import createXls
from someIoLib import deleteFile
from someMailLib import sendMail
import uuid
app = Flask(__name__)
host = 'https://myhost.com/myApi'
#app.route('/getXls')
def getXls:
fileName = uuid.uuid4().hex + '.xls'
downloadLink = host + '/tempfiles/' + fileName
#Returning the downloadLink for the user to acces when xls file ready
return downloadLink
#But then this code is unreachable
generateXls(fileName)
def generateXls(fileName, downloadLink)
createXls('/tempfiles/' + fileName)
sendMail(downloadLink)
#app.route('/tempfiles/<fileName>')
def getTempFile:
#Same problem here, I need the user to finish the download before deleting the file
return send_file('/tempfiles/' + fileName, attachment_filename=fileName)
deleteFile('/tempfiles/' + fileName)
Other commenters are right that you need to use something to manage asynchronous actions. One of the most popular options, and one that comes with lots of tools for completing delayed, scheduled, and asynchronous actions is Celery. You can do what you want using celery with something like the following:
from celery import Celery
...
# This is for Redis on the local host. You can also use RabbitMQ or AWS SQS.
celery = Celery(app.name, broker='redis://localhost:6379/0')
celery.conf.update(app.config)
...
# Create your Celery task
#celery.task(bind=True)
def generateXls(file_name, downloadLink):
createXls('/tempfiles/' + fileName)
sendMail(downloadLink)
#app.route('/getXls')
def getXls:
fileName = uuid.uuid4().hex + '.xls'
downloadLink = host + '/tempfiles/' + fileName
# Asynchronously call your Celery task.
generateXls.delay(file_name, downloadLink)
return downloadLink
This will return the download link immediately while continuing with generateXls in its own thread.
Related
I have a Python Rumps application that monitors a folder for new files using the rumps.Timer(...) feature. When it sees new files, it transfers them offsite (to AWS s3) and runs a GET request. sometimes that transfer, and get request can take over 1 second, and sometimes up to about 5 seconds. During this time, the application is frozen and can't do anything else.
Here is the current code:
class MyApp(rumps.App):
def __init__(self):
super(MyApp, self).__init__("App", quit_button="Stop")
self.process_timer = rumps.Timer(self.my_tick, 1)
self.process_timer.start()
def my_tick(self, sender):
named_set = set()
for file in os.listdir(self.process_folder):
fullpath = os.path.join(self.process_folder, file)
if os.path.isfile(fullpath) and fullpath.endswith(('.jpg', '.JPG')):
named_set.add(file)
if len(named_set) == 0:
self.files_in_folder = set()
new_files = sorted(named_set - self.files_in_folder)
if len(new_files) > 0:
for new_file in new_files:
# upload file
self.s3_client.upload_file(
new_file,
'##bucket##',
'##key##'
)
# GET request
return requests.get(
'##url##',
params={'file': new_file}
)
self.files_in_folder = named_set
if __name__ == "__main__":
MyApp().run()
Is there a way to have this transfer and GET request run as a background process?
I've tried using subprocess with the transfer code in a separate script
subprocess.Popen(['python3', 'transferscript.py', newfile])
and it doesn't appear to do anything. It will work if I run that line outside of rumps, but once it's in rumps, it will not run.
Edit: code provided
I am trying to write a file to a specific mount location in Linux. The API returns a path which is required for the further operations. The problem is if the file size is huge, then i face a request time-out error because of which iam not able to get the path. The code is as follows:
#migration_blueprint.route("/migration/upload", methods=["POST"])
def upload_migration_file():
file_abs_path = ""
try:
file = request.files['files']
logger.debug("The file recieved is '{}'".format(file))
file_name = str(datetime.now().strftime("%H%M%s")) + file.filename
proxy_bin = db.find_one("bins", query={"bin_type":"proxy", "status":"active"})
if not proxy_bin:
raise Exception("Proxy Bin not found")
base_proxy_path = "/mnt/share_{}/migration/".format(proxy_bin['_id'])
if not os.path.exists(base_proxy_path):
os.makedirs(base_proxy_path)
file_abs_path = os.path.join(base_proxy_path, file_name)
file.save(file_abs_path)
except Exception as ex:
logger.exception("Error : {}".format(str(ex)))
abort("500",{"message" : str(ex)})
return {"path" : file_abs_path}
Is there any workaround where irrespective of the file size, the file gets written to the location and the path is also returned as response in time ?
You could try uploading files via ajax and polling the server at intervals until the filename is ready (you could also use server side events) or you could use websockets to upload files and then update the client when the filename is available.
I tried to impelement the solution through multiprocessing in python.
from multiprocessing import Process
def copy_file(file_data, abs_path):
file_data.save(abs_path)
In the API i updated it:
p = Process(target=copy_file(file, file_abs_path))
p.start()
# file.save(file_abs_path)
I'm trying to use Celery on my Flask application.
I'm defining a task in a file insight_tasks.py.
In that file is defined a function:
#celery_app.task
def save_insights_task()
That function do some stuff and, here comes the error, I'm trying to save data into MongoDB and the console throws me:
MongoEngineConnectionError('You have not defined a default connection',)
So I think it's because MongoEngine has not been initialized, and here is my question:
How should I use MongoDB inside a Celery Task?, because when using MongoDB on my routes (flask app) It's working as expected.
Celery does not share the db instance?
Files:
__init__.py (Celery intialization)
celery_app = Celery('insights',
broker=config.CELERY_LOCATIONS_BROKER_URL,
backend=config.CELERY_LOCATIONS_RESULT_BACKEND,
include=['app.insight_tasks']
)
insight_tasks.py
from app.google import google_service
from app.models import LocationStats
from . import celery_app
from firebase_admin import db as firebase_db
import arrow
#celery_app.task
def save_insight_task(account_location, uid, gid, locations_obj, aggregation):
try:
insights, reviews = google_service.store_location_resources(
gid, uid,
start_datetime, end_datetime,
account_location, aggregation
)
except StandardError as err:
from pprint import pprint
import traceback
pprint(err)
pprint(traceback.print_exc())
path = 'saved_locations/{}/accounts/{}'.format(gid, account_location)
location = [loc for loc in locations_obj if loc['name'] == 'accounts/' + account_location]
if len(location) > 0:
firebase_db.reference(path).update(location[0])
Here google_service.store_location_resources() is the function that saves thew data into MongoDB. this function is used on another side, by the routes of my app, so it works as expected, except on the Celery task
---------
The Celery task is called into a POST request
accounts/routes.py
#account.route('/save/queue', methods=['POST'])
def save_all_locations():
data = request.data
dataDict = json.loads(data)
uid = request.headers.get('uid', None)
gid = request.headers.get('gid', None)
account_locations = dataDict['locations']
locations_obj = dataDict['locations_obj']
for path in account_locations:
save_insight_task.delay(account_location=path, uid=uid, gid=gid, locations_obj=locations_obj, aggregate='SOME_TEXT')
You are supposed to connect to the database inside the task. The reason is because child processes (created by Celery) must have their own instance of mongo client.
More details here : Using PyMongo with Multiprocessing
For example define a utils.py :
from pymodm import connect
def mongo_connect():
return connect("mongodb://{0}:{1}/{2}".format(MONGODB['host'],
MONGODB['port'],
MONGODB['db_name']),
alias=MONGODB['db_name'])
Then in insight_tasks.py
from utils import mongo_connect
#celery_app.task
def save_insight_task(account_location, uid, gid, locations_obj, aggregation):
# connect to mongodb
mongo_connect()
# do your db operations
try:
insights, reviews = google_service.store_location_resources(
gid, uid,
start_datetime, end_datetime,
account_location, aggregation
)
except StandardError as err:
from pprint import pprint
import traceback
pprint(err)
pprint(traceback.print_exc())
path = 'saved_locations/{}/accounts/{}'.format(gid, account_location)
location = [loc for loc in locations_obj if loc['name'] == 'accounts/' + account_location]
if len(location) > 0:
firebase_db.reference(path).update(location[0])
Note that I use pymodm package instead of mongoengine package as ODM for mongo.
I am using threadpool to upload using S3. However, I suspect that sometimes one thread might fail. Therefore, I want to restart the thread that has failed. I want some pointers on how can i achieve this:
pool = ThreadPool(processes=10)
pool.map(uploadS3_helper, args)
def uploadS3_helper(args):
return uploadS3(*args)
def uploadS3(myfile, bucket_name, key_root, path_root, use_rel, define_region):
if define_region:
conn = S3Connection(aws_access_key_id=S3_ACCESS_KEY, aws_secret_access_key=S3_SECRET_KEY,
host='s3-us-west-2.amazonaws.com')
else:
conn = S3Connection(aws_access_key_id=S3_ACCESS_KEY, aws_secret_access_key=S3_SECRET_KEY)
bucket = conn.get_bucket(bucket_name)
print key_root + myfile
print path_root
print os.path.join(path_root, myfile)
if use_rel:
bucket.new_key(key_root + myfile).set_contents_from_file(open(os.path.join(path_root, myfile[1:])))
else:
bucket.new_key(key_root + myfile).set_contents_from_file(open(path_root))
To expand on #Martin James comment: consider a "retry decorator". Add the linked code to your project, flag the uploadS3 function, and it'll try to upload multiple times, waiting longer after each failure.
I want to use twisted for some basic FTP server, just like this example:
from twisted.protocols.ftp import FTPFactory, FTPRealm
from twisted.cred.portal import Portal
from twisted.cred.checkers import AllowAnonymousAccess, FilePasswordDB
from twisted.internet import reactor
#pass.dat looks like this:
# jeff:bozo
# grimmtooth:bozo2
p = Portal(FTPRealm('./'), (AllowAnonymousAccess(), FilePasswordDB("pass.dat")))
f = FTPFactory(p)
reactor.listenTCP(21, f)
reactor.run()
...with one simple customization: I want to fire an event when a file upload (STOR) is completed successfully, so that my custom code can adequately handle this file.
I found no documentation for FTPFactory or FTP that helps me doing this. Should I overload the FTP object or some other object? How to wire everything up?
I have done simple custom HTTP servers with twisted in the past and it was pleasantly easy, but I can find nearly no material about FTP.
First off, this is just a modification of Rakis' answer. Without his answer this would not exist. His one just wouldn't work on my setup. It also may just be that the API has changed, since this is 5 years later.
class MyFTP (ftp.FTP):
def ftp_STOR(self, path):
d = super(MyFTP, self).ftp_STOR(path)
def onStorComplete(d):
print 'STORED', repr(d), path
return d
d.addCallback(onStorComplete)
return d
f = ftp.FTPFactory( some_portal_object )
f.protocol = MyFTP
It looks like the following may do the trick
from twisted.protocols import ftp
class MyFTP (ftp.FTP):
def ftp_STOR(self, path):
d = super(MyFTP, self).ftp_STOR(path)
d.addCallback( lambda _: self.onStorComplete(path) )
return d
def onStorComplete(self, path):
# XXX your code here
f = ftp.FTPFactory( some_portal_object )
f.protocol = MyFTP