Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
What I want to achieve is to run python some script which will collect data and insert it to DB in a background.
So basically, a person opens Django view, clicks on a button and then closes the browser and Django launches this script on a server, the script then collects data in background while everything else goes on its own.
What is the best library, framework, module or package to achieve such functionality?
Celery is the most used tool for such tasks.
Celery is a good suggestion, but its a bit heavy solution and there more simple and straightforward solution exist unless you need full power of celery.
So i suggest to use rq and django integration of rq.
RQ inspired by the good parts of Celery, Resque , and has been created as a lightweight alternative to the heaviness of Celery or other AMQP-based queuing implementations.
I'd humbly reccomend the standard library module multiprocessing for this. As long as the background process can run on the same server as the one processing the requests, you'll be fine.
Although i consider this to be the simplest solution, this wouldn't scale well at all, since you'd be running extra processess on your server. If you expect these things to only happen once in a while, and not to last that long, it's a good quick solution.
One thing to keep in mind though: In the newly started process ALWAYS close your database connection before doing anything - this is because the forked process shares the same connection to the SQL server and might enter into data races with your main django process.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
Recently I've started working on Python socket server which handles raw UTF input from Java's streams and sends the result back on all of the currently connected servers, and that works fine, but I'm so pumped and worried about thread usage: you see, I'm using about 2 threads per each connection and I'm worried that CPU will die out that way soon, so, I need a better solution now so that my server could handle hundreds of connections.
I have two ideas for that:
Using a non-blocking IO
Having a fixed amount of thread pools (i.e. FixedThreadPool as it called in Java)
I have no idea which one is gonna work better, so I'd appreciate your advice and ideas.
Thanks!
I would advise not to invent a bicycle and to use some framework for async/streaming processing. For example Tornado.
Also if you can consider using Go language - a lot of developers (including me) are switching from Python to Go for this kind of tasks. It's designed from ground up to support async processing.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm working through possible architectures for a problem. In as few words as possible, the problem is: I need to design a system that allows clients to connect using HTTP/REST to kick off long running processes. Each process will create a persistent connection to a third party server and write the received data to a queue. Each process will terminate only if the third party server closes the connection or another HTTP/REST request is received indicating it should be terminated.
Constraints and background:
Clients must be able to connect using HTTP/REST
System must be written in Python
I'm a lower level C guy (with enough Python experience to feel competent) but trying to wrap my head around the Python frameworks available for making this easier. My gut is to jump into the weeds and I know if I implement this as I'm thinking, I might as well have written it in C. Don't want that. I want to leverage as many frameworks and libraries for Python as possible. Performance is not a top priority.
Approaches I've considered:
In doing research, I came across Twisted which might be a fit and seems to make sense to me (thinking about this as a daemon). I'm imagining the final product would be a Twisted app that exposes a REST interface, dispatches new threads connecting to the third party service for each client request received, and would manage its own thread pool. I'm familiar with threading, though admittedly haven't done anything in Python with them yet. In a nutshell, Twisted looks very cool, though in the end, I'm left wondering if I'm overcomplicating this.
The second approach I considered is using Celery and Flask and simply let Celery handle all the dispatching, thread management, etc. I found this article showing Celery and Flask playing nicely together. It seems much like a much simpler approach.
After writing this, I'm leaning towards the second option of using Celery and Flask, though I don't know much about Celery, so looking for any advice you might have, as well as other possible architectures that I'm not considering. I really appreciate it and thank you in advance.
Yes, Twisted is overkill here.
From what you described, the combination of Celery and Flask would suffice. It would allow you to implement a REST interface that kicks off your long running processes as Celery tasks. You can easily implement a REST method allowing clients to stop running tasks by invoking Celery's revoke method on a tasks ID. Take note that Celery depends on a Message Broker for sending and receiving messages (frequently RabbitMQ is used) and a data backend for storing results (frequently Redis is used).
>>> from celery.task.control import revoke
>>> revoke(task_id, terminate=True)
http://docs.celeryproject.org/en/latest/userguide/workers.html#revoking-tasks
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions concerning problems with code you've written must describe the specific problem — and include valid code to reproduce it — in the question itself. See SSCCE.org for guidance.
Closed 9 years ago.
Improve this question
I made a server monitoring script that is monitoring mainly network drive usage and cluster's job status. It's really basic and mainly uses unix commands such as top, status, df and such.
I rely using subprocess which works well, but under heavy workload it starts to get really slow and use a lot of cpu capacity. Slowest part is where I grep users from status -a and they have thousands of jobs running.
Script runs over endless while loop.
So I'm searching for more effective solutions to do this and any help or hint will be appreciated. I'm using Python 2.7
I can suggest you to take a look to iotop, especially the source code as it is made in python.
The global philosophy behind this is to not use the unix tools (top, df...) but parse their source of informations that is /proc.
Opening a file (especially in a memory filesystem like the procfs) is much more faster than forking a process to launch an unix command.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
Currently I'm working on python project that requires implement some background jobs (mostly for email sending and heavily database updates). I use Redis for task broker. So in this point I have two candidates: Celery and RQ. I had some experience with these job queues, but I want to ask you guys to share you experience of using this tools. So.
What pros and cons to use Celery vs. RQ.
Any examples of projects/task suitable to use Celery vs. RQ.
Celery looks pretty complicated but it's full featured solution. Actually I don't think that I need all these features. From other side RQ is very simple (e.g configuration, integration), but it seems that it lacks some useful features (e.g task revoking, code auto-reloading)
Here is what I have found while trying to answer this exact same question. It's probably not comprehensive, and may even be inaccurate on some points.
In short, RQ is designed to be simpler all around. Celery is designed to be more robust. They are both excellent.
Documentation. RQ's documentation is comprehensive without being complex, and mirrors the project's overall simplicity - you never feel lost or confused. Celery's documentation is also comprehensive, but expect to be re-visiting it quite a lot when you're first setting things up as there are too many options to internalize
Monitoring. Celery's Flower and the RQ dashboard are both very simple to setup and give you at least 90% of all information you would ever want
Broker support. Celery is the clear winner, RQ only supports Redis. This means less documentation on "what is a broker", but also means you cannot switch brokers in the future if Redis no longer works for you. For example, Instagram considered both Redis and RabbitMQ with Celery. This is important because different brokers have different guarantees e.g. Redis cannot (as of writing) guarantee 100% that your messages are delivered.
Priority queues. RQs priority queue model is simple and effective - workers read from queues in order. Celery requires spinning up multiple workers to consume from different queues. Both approaches work
OS Support. Celery is the clear winner here, as RQ only runs on systems that support fork e.g. Unix systems
Language support. RQ only supports Python, whereas Celery lets you send tasks from one language to a different language
API. Celery is extremely flexible (multiple result backends, nice config format, workflow canvas support) but naturally this power can be confusing. By contrast, the RQ api is simple.
Subtask support. Celery supports subtasks (e.g. creating new tasks from within existing tasks). I don't know if RQ does
Community and Stability. Celery is probably more established, but they are both active projects. As of writing, Celery has ~3500 stars on Github while RQ has ~2000 and both projects show active development
In my opinion, Celery is not as complex as its reputation might lead you to believe, but you will have to RTFM.
So, why would anyone be willing to trade the (arguably more full-featured) Celery for RQ? In my mind, it all comes down to the simplicity. By restricting itself to Redis+Unix, RQ provides simpler documentation, simpler codebase, and a simpler API. This means you (and potential contributors to your project) can focus on the code you care about, instead of having to keep details about the task queue system in your working memory. We all have a limit on how many details can be in our head at once, and by removing the need to keep task queue details in there RQ lets get back to the code you care about. That simplicity comes at the expense of features like inter-language task queues, wide OS support, 100% reliable message guarantees, and ability to switch message brokers easily.
Celery is not that complicated. At its core, you do the step by step configuration from the tutorials, create a celery instance, decorate your function with #celery.task then run the task with my_task.delay(*args, **kwargs).
Judging from your own assessment, it seems you have to choose between lacking (key) features or having some excess hanging around. That is not too hard of a choice in my book.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I'm building a system that works with web clients (Django) and remote APIs (probably a standalone daemon). I see it's easier to coordinate their work with some events framework like in JavaScript. Unfortunately, Django signals are synchronous, which will make replies to the clients very slow. Also, I might want to be able to migrate the daemon or its part to a separate machine, but still work the same way (not RPC, but just triggering an event or sending a message). (This might sound like Erlang's approach.)
Is there a framework that would use proven and reliable ways to communicate between processes (say, RabbitMQ), and require minimum boilerplate?
As for Twisted, that André Paramés suggested, I'd prefer a simpler code. Is this doable in Twisted?
from events_framework import subscribe, trigger
from django.http import Client
http_client = Client() # just a sample
#subscribe('data_received'):
def reply(data):
http_client.post('http://www.example.com', data)
trigger('data_resent', data)
Here are more details. There is a Django views file that uses some models and notifies others of events. And there is a standalone daemon script that runs infinitely and reacts to events.
This is just pseudo code, I only mean how easy it should be.
# django_project/views.py (a Django views file)
from events_framework import publish, subscribe
from annoying import
#subscribe('settings_updated')
def _on_settings_update(event): # listens to settings_updated event and saves the data
Settings.object.get(user__id=event.user_id).update(event.new_settings)
#render_to('form.html')
def show_form(request): # triggers 'form_shown' event
publish('form_shown', {'user_id': request.user.id, 'form_data': request.GET})
return {...}
# script.py (a standalone script)
from events_framework import publish, subscribe
#subscribe('form_shown')
def on_form_shown(event): # listens to form_shown event and triggers another event
pass
result = requests.get('third party url', some_data)
publish('third_party_requested', {'result': result})
Again, this couldn't be done just with Django signals: some events need to be published over network, others should be local but asynchronous.
It may be necessary to do instantiate something, like
from events_framework import Environment
env = Environment() # will connect to default rabbitmq server from settings.
Check circuits: a Lightweight Event driven and Asynchronous Application Framework for the Python Programming Language with a strong Component Architecture.
I decided Celery with RabbitMQ is the most mature software combination, and I will stick with them. Celery allows not just creating events, but flexible specialization via queue routing, and parallelization.
Django ztaskd is a way of calling asynchronous tasks from Django via ZeroMQ (via pyzmq).