Is anyone doing asynchronous DB commits?

Is anyone doing asynchronous DB commits? - python

Most of the longest (most time-consuming) logic I've encountered basically involves two things: sending email and committing items to the database.
Is there any kind of built-in mechanism for doing these things asynchronously so as not to slow down page load?
Validation should be handled synchronously, but it really seems that the most performant way to email and write to the database should be asynchronously.
For example, let's say that I want to track pageviews. Thus, every time I get a view, I do:
pv = PageView.objects.get(page = request.path)
pv.views = pv.views + 1
pv.save() # SLOWWWWWWWWWWWWWW
Is it natural to think that I should speed this up by making the whole process asynchronous?

Take a look at Celery. It gives you asynchronous workers to offload tasks exactly like you're asking about: sending e-mails, counting page views, etc. It was originally designed to work only with Django, but now works in other environments too.

I use this pattern to update the text index (which is slow), since this can be done in background. This way the user sees a fast response time:
# create empty file
dir=os.path.join(settings.DT.HOME, 'var', 'belege-changed')
file=os.path.join(dir, str(self.id))
fd=open(file, 'a') # like "touch" shell command
fd.close()
A cron-job scans this directory every N minutes and updates the text index.
In your case, I would write the request.path to a file, and update the PageView model in background. This would improve the performance, since you don't need to hit the database for every increment operator.

You can have a python ThreadPool and assign the writes to the database. Although GIL prevent the Python threads to work concurrently this allow to continue the response flow before the write is finished.
I use this technique when the result of the write is not important to render the response.
Of course, if you want to post request and want to return a 201, this is not a god practice.

http://www.mongodb.org/ can do this.

Related

Can Flask or Django handle concurrent tasks?

What I'm trying to accomplish:
I have a sensor that is constantly reading in data. I need to print this data to a UI whenever data appears. While the aforementioned task is taking place, the user should be able to write data to the sensor. Ideally, both these tasks would / could happen at the same time. Currently, I have the program written using flask; but if django would be better suited (or a third party) I would be willing to make the switch. Note: this website will never be deployed so no need to worry about that. Only user will be me, running program from my laptop.
I have spent a lot of time researching flask async functions and coroutines; however I have not seen any clear indications if something like this would be possible.
Not looking for a line by line solution. Rather, a way (async, threading etc) to set up the code such that the aforementioned tasks are possible. All help is appreciated, thanks.

I'm a Django guy, so I'll throw out what I think could be possible
Django has a decorator #start_new_thread which can be put on any function and it will run in a thread.
You could make a view, POST to it with Javascript/Ajax and start a thread for communication with the sensor using the data POSTed.
You could also make a threading function that will read from the sensor
Could be a management command or a 'start' btn that POSTs to a view that then starts the thread
Note: You need to do Locks or some other logic so the two threads don't conflict when reading/writing
Maybe it's a single thread that reads/writes to the sensor and each loop it checks if there's anything to write (existence + contents of a file? Maybe db entry?
Per the UI, lets say a webpage. You're best best would be Websockets, but because you're the only one that will ever use it you could just write up some Javascript/Ajax that would Ping a view every x seconds and display the new data on the webpage
Note: that's essentially what websockets do, ping every x seconds
Now the common thread is Javascript/Ajax, this is so the page doesn't need to refresh and you can constantly see the data coming in without the page being refreshed.
You can probably do all of this in Flask if you find a similar threading ability and just add some javascript to the frontend
Hopefully you find some of this useful, and idk why stackoverflow hates these types of questions... They're literally fine

Is it a bad practice to use sleep() in a web server in production?

I'm working with Django1.8 and Python2.7.
In a certain part of the project, I open a socket and send some data through it. Due to the way the other end works, I need to leave some time (let's say 10 miliseconds) between each data that I send:
while True:
send(data)
sleep(0.01)
So my question is: is it considered a bad practive to simply use sleep() to create that pause? Is there maybe any other more efficient approach?
UPDATED:
The reason why I need to create that pause is because the other end of the socket is an external service that takes some time to process the chunks of data I send. I should also point out that it doesnt return anything after having received or let alone processed the data. Leaving that brief pause ensures that each chunk of data that I send gets properly processed by the receiver.
EDIT: changed the sleep to 0.01.

Yes, this is bad practice and an anti-pattern. You will tie up the "worker" which is processing this request for an unknown period of time, which will make it unavailable to serve other requests. The classic pattern for web applications is to service a request as-fast-as-possible, as there is generally a fixed or max number of concurrent workers. While this worker is continually sleeping, it's effectively out of the pool. If multiple requests hit this endpoint, multiple workers are tied up, so the rest of your application will experience a bottleneck. Beyond that, you also have potential issues with database locks or race conditions.
The standard approach to handling your situation is to use a task queue like Celery. Your web-application would tell Celery to initiate the task and then quickly finish with the request logic. Celery would then handle communicating with the 3rd party server. Django works with Celery exceptionally well, and there are many tutorials to help you with this.
If you need to provide information to the end-user, then you can generate a unique ID for the task and poll the result backend for an update by having the client refresh the URL every so often. (I think Celery will automatically generate a guid, but I usually specify one.)

Like most things, short answer: it depends.
Slightly longer answer:
If you're running it in an environment where you have many (50+ for example) connections to the webserver, all of which are triggering the sleep code, you're really not going to like the behavior. I would strongly recommend looking at using something like celery/rabbitmq so Django can dump the time delayed part onto something else and then quickly respond with a "task started" message.
If this is production, but you're the only person hitting the webserver, it still isn't great design, but if it works, it's going to be hard to justify the extra complexity of the task queue approach mentioned above.

Get feedback from a scheduled job while it is processed

I would like to run jobs, but as they may be long, I would like to know how far they have been processed during their execution. That is, the executor would regularly return its progress, without ending the job it is executing.
I have tried to do this with APScheduler, but it seems the scheduler can only receive event messages like EVENT_JOB_EXECUTED or EVENT_JOB_ERROR.
Is it possible to get information from an executor while it is executing a job?
Thanks in advance!

There is, I think, no particular support for this within APScheduler. This requirement has come up for me many times, and the best solution will depend on exactly what you need. Some possibilities:
Job status dictionary
The simplest solution would be to use a plain python dictionary. Make the key the job's key, and the value whatever status information you require. This solution works best if you only have one copy of each job running concurrently (max_instances=1), of course. If you need some structure to your status information, I'm a fan of namedtuples for this. Then, you either keep the dictionary as an evil global variable or pass it into each job function.
There are some drawbacks, though. The status information will stay in the dictionary forever, unless you delete it. If you delete it at the end of the job, you don't get to read a 'job complete' status, and otherwise you have to make sure that whatever is monitoring the status definitely checks and clears every job. This of course isn't a big deal if you have a reasonable sized set of jobs/keys.
Custom dict
If you need some extra functions, you can do as above, but subclass dict (or UserDict or MutableMapping, depending on what you want).
Memcached
If you've got a memcached server you can use, storing the status reports in memcached works great, since they can expire automatically and they should be globally accessible to your application. One probably-minor drawback is that the status information could be evicted from the memcached server if it runs out of memory, so you can't guarantee that the information will be available.
A more major drawback is that this does require you to have a memcached server available. If you might or might not have one available, you can use dogpile.cache and choose the backend that's appropriate at the time.
Something else
Pieter's comment about using a callback function is worth taking note of. If you know what kind of status information you'll need, but you're not sure how you'll end up storing or using it, passing a wrapper to your jobs will make it easy to use a different backend later.
As always, though, be wary of over-engineering your solution. If all you want is a report that says "20/133 items processed", a simple dictionary is probably enough.

Perform Task Directly After Returning JSON

I need to perform a task whenever the mobile app requests certain data. The user does not need the task performed right away, but may need it within the next 2 minutes.
I am still fairly new to Python / web dev so I am not quite sure how to accomplish this.
I don't want the user to wait for the task performed, it'll probably take 30 seconds, but I'd still it rather be 30 seconds faster.
Is there anyway that I can send a response, so that the user gets the required info immediately, and then the task is performed right after sending the JSON.
Is it possible to send a Response to the mobile app that asked for the data without using return so that the method can continue to perform the task the user does not need to wait for?
#app.route('/image/<image_id>/')
def images(image_id):
# get the resource (unnecessary code removed)
return Response(js, status=200, mimetype='application/json')
# once the JSON response is returned, do some action
# (what I would like to do somehow, but don't know how to get it to work
On second thought maybe I need to do this action somehow asynchronously so it does not block the router (but it still needs to be done right after returning the JSON)
UPDATE - in response to some answers
For me to perform such tasks, is a Worker server on Heroku recommended / a must or is there another, cheaper way to do this?

you can create a second thread to do the extra work :
t = threading.Thread(target=some_function, args=[argument])
t.setDaemon(False)
t.start()
you should also take a look at celery or python-rq

Yes, you need a task queue. There are a couple of options.
Look at this other question: uWSGI for uploading and processing files
And of course your code is wrong since once you return your terminating code execution of that function you're in.

Multithreading or how to avoid blocking in a Python-application

I'm developing a Python-application that "talks" to the user, and performs tasks based on what the user says(e.g. User:"Do I have any new facebook-messages?", answer:"Yes, you have 2 new messages. Would you like to see them?"). Functionality like integration with facebook or twitter is provided by plugins. Based on predefined parsing rules, my application calls the plugin with the parsed arguments, and uses it's response. The application needs to be able to answer multiple query's from different users at the same time(or practically the same time).
Currently, I need to call a function, "Respond", with the user input as argument. This has some disadvantages, however:
i)The application can only "speak when it is spoken to". It can't decide to query facebook for new messages, and tell the user whether it does, without being told to do that.
ii)Having a conversation with multiple users at a time is very hard, because the application can only do one thing at a time: if Alice asks the application to check her Facebook for new messages, Bob can't communicate with the application.
iii)I can't develop(and use) plugins that take a lot of time to complete, e.g. download a movie, because the application isn't able to do anything whilesame the previous task isn't completed.
Multithreading seems like the obvious way to go, here, but I'm worried that creating and using 500 threads at a time dramatically impacts performance, so using one thread per query(a query is a statement from the user) doesn' seem like the right option.
What would be the right way to do this? I've read a bit about Twisted, and the "reactor" approach seems quite elegant. However, I'm not sure how to implement something like that in my application.

i didn't really understand what sort of application its going to be, but i tried to anwser your questions
create a thread that query's, and then sleeps for a while
create a thread for each user, and close it when the user is gone
create a thread that download's and stops
after all, there ain't going to be 500 threads.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.