Using PyTorch with Celery

Using PyTorch with Celery - python

I'm trying to run a PyTorch model in a Django app. As it is not recommended to execute the models (or any long-running task) in the views, I decided to run it in a Celery task. My model is quite big and it takes about 12 seconds to load and about 3 seconds to infer. That's why I decided that I couldn't afford to load it at every request. So I tried to load it at settings and save it there for the app to use it. So my final scheme is:
When the Django app starts, in the settings the PyTorch model is loaded and it's accessible from the app.
When views.py receives a request, it delays a celery task
The celery task uses the settings.model to infer the result
The problem here is that the celery task throws the following error when trying to use the model
[2020-08-29 09:03:04,015: ERROR/ForkPoolWorker-1] Task app.tasks.task[458934d4-ea03-4bc9-8dcd-77e4c3a9caec] raised unexpected: RuntimeError("Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method")
Traceback (most recent call last):
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 412, in trace_task
R = retval = fun(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/celery/app/trace.py", line 704, in __protected_call__
return self.run(*args, **kwargs)
/*...*/
File "/home/ubuntu/anaconda3/envs/tensor/lib/python3.7/site-packages/torch/cuda/__init__.py", line 191, in _lazy_init
"Cannot re-initialize CUDA in forked subprocess. " + msg)
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Here's the code in my settings.py loading the model:
if sys.argv and sys.argv[0].endswith('celery') and 'worker' in sys.argv: #In order to load only for the celery worker
import torch
torch.cuda.init()
torch.backends.cudnn.benchmark = True
load_model_file()
And the task code
#task
def getResult(name):
print("Executing on GPU:", torch.cuda.is_available())
if os.path.isfile(name):
try:
outpath = model_inference(name)
os.remove(name)
return outpath
except OSError as e:
print("Error", name, "doesn't exist")
return ""
The print in the task shows "Executing on GPU: true"
I've tried setting torch.multiprocessing.set_start_method('spawn') in the settings.py before and after the torch.cuda.init() but it gives the same error.

Setting this method works as long as you're also using Process from the same library.
from torch.multiprocessing import Pool, Process
Celery uses "regular" multiprocessing library, thus this error.
If I were you I'd try either:
run it single threaded to see if that helps
run it with eventlet to see if that helps
read this

A quick fix is to make things single-threaded. To do that set the worker pool type of celery to solo while starting the celery worker
celery -A your_proj worker -P solo -l info

This is due to the fact that the Celery worker itself is using forking. This appears to be a currently known issue with Celery >=4.0
You used to be able to configure celery to spawn, rather than fork, but that feature (CELERYD_FORCE_EXECV) was removed in 4.0.
There is no inbuilt options to get around this. Some custom monkeypatching to do this is probably be possible, but YMMV
Some potentially viable options might be:
Use celery <4.0 with CELERYD_FORCE_EXECV enabled.
Launch celery workers on Windows (where forking is not possible anyhow)

Related

apscheduler: returned more than one DjangoJobExecution -- it returned 2

In my proyect scheduler return this error in the execute job, help me please
this is my error in cosole, then execute the program
Error notifying listener
Traceback (most recent call last):
File "C:\Users\angel\project\venv\lib\site-packages\apscheduler\schedulers\base.py", line 836, in _dispatch_event
cb(event)
File "C:\Users\angel\project\venv\lib\site-packages\django_apscheduler\jobstores.py", line 53, in handle_submission_event
DjangoJobExecution.SENT,
File "C:\Users\angel\project\venv\lib\site-packages\django_apscheduler\models.py", line 157, in atomic_update_or_create
job_id=job_id, run_time=run_time
File "C:\Users\angel\project\venv\lib\site-packages\django\db\models\query.py", line 412, in get
(self.model._meta.object_name, num)
django_apscheduler.models.DjangoJobExecution.MultipleObjectsReturned: get() returned more than one DjangoJobExecution -- it returned 2!
This is my code
class Command(BaseCommand):
help = "Runs apscheduler."
scheduler = BackgroundScheduler(timezone=settings.TIME_ZONE, daemon=True)
scheduler.add_jobstore(DjangoJobStore(), "default")
def handle(self, *args, **options):
self.scheduler.add_job(
delete_old_job_executions,
'interval', seconds=5,
id="delete_old_job_executions",
max_instances=1,
replace_existing=True
)
try:
logger.info("Starting scheduler...")
self.scheduler.start()
except KeyboardInterrupt:
logger.info("Stopping scheduler...")
self.scheduler.shutdown()
logger.info("Scheduler shut down successfully!")

Not sure if you're still having this issue. I have same error and found your question. Turned out this happens only in dev environment.
Because python3 manage.py runserver starts two processes by default, the code
seems to register two job records and find two entries at next run time.
With --noreload option, it starts only one scheduler thread and works well. As name implies, it won't reload changes you make automatically though.
python3 manage.py runserver --noreload

not sure if you're still having this issue. i think you can use socket , socket can use this issue.
look this enter image description here

Django: Execute code only for `manage.py runserver`, not for `migrate`, `help` etc

We are using Django as backend for a website that provides various things, among others using a Neural Network using Tensorflow to answer to certain requests.
For that, we created an AppConfig and added loading of this app config to the INSTALLED_APPS in Django's settings.py. This AppConfig then loads the Neural Network as soon as it is initialized:
settings.py:
INSTALLED_APPS = [
...
'bert_app.apps.BertAppConfig',
]
.../bert_apps/app.py:
class BertAppConfig(AppConfig):
name = 'bert_app'
if 'bert_app.apps.BertAppConfig' in settings.INSTALLED_APPS:
predictor = BertPredictor() #loads the ANN.
Now while that works and does what it should, the ANN is now loaded for every single command run through manage.py. While we of course want it to be executed if you call manage.py runserver, we don't want it to be run for manage.py migrate, or manage.py help and all other commands.
I am generally not sure if this is the proper way how to load an ANN for a Django-Backend in general, so does anybody have any tips how to do this properly? I can imagine that loading the model on startup is not quite best practice, and I am very open to suggestions on how to do that properly instead.
However, there is also some other code besides the actual model-loading that also takes a few seconds and that is definitely supposed to be executed as soon as the server starts up (so on manage.py runserver), but also not on manage.py help (as it takes a few seconds as well), so is there some quick fix for how to tell Django to execute it only on runserver and not for its other commands?

I had a similar problem, solved it with checking argv.
class SomeAppConfig(AppConfig):
def ready(self, *args, **kwargs):
is_manage_py = any(arg.casefold().endswith("manage.py") for arg in sys.argv)
is_runserver = any(arg.casefold() == "runserver" for arg in sys.argv)
if (is_manage_py and is_runserver) or (not is_manage_py):
init_your_thing_here()
Now a bit closer to the if not is_manage_py part: in production you run your web server with uwsgi/uvicorn/..., which is still a web server, except it's not run with manage.py. Most likely, it's the only thing that you will ever run without manage.py
Use AppConfig.ready() - it's intended for it:
Subclasses can override this method to perform initialization tasks such as registering signals. It is called as soon as the registry is fully populated. - [django documentation]
To get your AppConfig back, use:
from django.apps import apps
apps.get_app_config(app_name)
# apps.get_app_configs() # all

This is another way, in your manage.py will have something probably look like this
def main():
os.environ.setdefault('DJANGO_SETTINGS_MODULE', 'slambook.settings')
try:
from django.core.management import execute_from_command_line
except ImportError as exc:
raise ImportError(
"Couldn't import Django. Are you sure it's installed and "
"available on your PYTHONPATH environment variable? Did you "
"forget to activate a virtual environment?"
) from exc
execute_from_command_line(sys.argv)
# check if has runserver
if `runserver` in sys.argv:
#execute your custom function
if __name__ == '__main__':
main()
you can check sys.argv if it have runserver, if so then execute your script or function

Working with flask-rq2 extension in heroku

I followed this tutorial to run tasks with Rides Queue:
https://flask-rq2.readthedocs.io/en/latest/
First
app = Flask(__name__,template_folder='templates')
app.config['RQ_REDIS_URL'] = os.environ['REDIS_URL']
Then
rq = RQ(app)
default_worker.work(burst=True)
And After execute this line
job = task.queue(arg1)
i have faced this error:
i tried to set env vairiable FLASK_APP="app.py" i got this error again but with message
AttributeError: module 'app' has no attribute 'task' 18:43:49 Moving
job to 'failed' queue
i think there is misconfigured options related to worker but where is this in official docs?

Changed FLASK_APP="app.py" to FLASK_APP="app:app" and worked fine but also you need to change this line default_worker.work(burst=True) to inside the method and i did not know way it's not working if i put it in main suite

ValueError: Unknown type <class 'redis.client.StrictPipeline'>

I develop locally on win10, which is a problem for the usage of the RQ task queue, which only works on linux systems because it requires the ability to fork processes. I'm trying to extend the flask-base project https://github.com/hack4impact/flask-base/tree/master/app which can use RQ. I came across https://github.com/michaelbrooks/rq-win . I love the idea of this repo (If I can get it working it will really simplify my life, since I develop on win 10 -64):
After installing this library
I can queue a job in my views by running something like:
#login_required
#main.route('/selected')
def selected():
messages = 'abcde'
j = get_queue().enqueue(render_png, messages, result_ttl=5000)
return j.get_id()
This returns a job_code correctly.
I changed the code in manage.py to:
from rq_win import WindowsWorker
#manager.command
def run_worker():
"""Initializes a slim rq task queue."""
listen = ['default']
REDIS_URL = 'redis://localhost:6379'
conn = Redis.from_url(REDIS_URL)
with Connection(conn):
# worker = Worker(map(Queue, listen))
worker = WindowsWorker(map(Queue, listen))
worker.work()
When I try to run it with:
$ python -u manage.py run_worker
09:40:44
09:40:44 *** Listening on ?[32mdefault?[39;49;00m...
09:40:58 ?[32mdefault?[39;49;00m: ?[34mapp.main.views.render_png('{"abcde"}')?[39;49;00m (8c1b6186-39a5-4daf-9c45-f60e4241cd1f)
...\lib\site-packages\rq\job.py:161: DeprecationWarning: job.status is deprecated. Use job.set_status() instead
DeprecationWarning
09:40:58 ?[31mValueError: Unknown type <class 'redis.client.StrictPipeline'>?[39;49;00m
Traceback (most recent call last):
File "...\lib\site-packages\rq_win\worker.py", line 87, in perform_job
queue.enqueue_dependents(job, pipeline=pipeline)
File "...\lib\site-packages\rq\queue.py", line 322, in enqueue_dependents
for job_id in pipe.smembers(dependents_key)]
File "...\lib\site-packages\rq\queue.py", line 322, in <listcomp>
for job_id in pipe.smembers(dependents_key)]
File "...\lib\site-packages\rq\compat\__init__.py", line 62, in as_text
raise ValueError('Unknown type %r' % type(v))
ValueError: Unknown type <class 'redis.client.StrictPipeline'>
So in summary, I think the jobs are being queued correctly within redis. However when the worker process tries to grab a job off of the queue to process, This error occurs. How can I fix this?

So after some digging, it looks like the root of the error is here, where job_id being sent to the as_text function is, somehow, a StrictPipeline object. However, I have been unable to replicate the error locally; can you post more of your code? Also, I would try re-installing the redis, rq, and rq-win modules, and possibly try importing rq.compat

Why does Celery work in Python shell, but not in my Django views? (import problem)

I installed Celery (latest stable version.)
I have a directory called /home/myuser/fable/jobs. Inside this directory, I have a file called tasks.py:
from celery.decorators import task
from celery.task import Task
class Submitter(Task):
def run(self, post, **kwargs):
return "Yes, it works!!!!!!"
Inside this directory, I also have a file called celeryconfig.py:
BROKER_HOST = "localhost"
BROKER_PORT = 5672
BROKER_USER = "abc"
BROKER_PASSWORD = "xyz"
BROKER_VHOST = "fablemq"
CELERY_RESULT_BACKEND = "amqp"
CELERY_IMPORTS = ("tasks", )
In my /etc/profile, I have these set as my PYTHONPATH:
PYTHONPATH=/home/myuser/fable:/home/myuser/fable/jobs
So I run my Celery worker using the console ($ celeryd --loglevel=INFO), and I try it out.
I open the Python console and import the tasks. Then, I run the Submitter.
>>> import fable.jobs.tasks as tasks
>>> s = tasks.Submitter()
>>> s.delay("abc")
<AsyncResult: d70d9732-fb07-4cca-82be-d7912124a987>
Everything works, as you can see in my console
[2011-01-09 17:30:05,766: INFO/MainProcess] Task tasks.Submitter[d70d9732-fb07-4cca-82be-d7912124a987] succeeded in 0.0398268699646s:
But when I go into my Django's views.py and run the exact 3 lines of code as above, I get this:
[2011-01-09 17:25:20,298: ERROR/MainProcess] Unknown task ignored: "Task of kind 'fable.jobs.tasks.Submitter' is not registered, please make sure it's imported.": {'retries': 0, 'task': 'fable.jobs.tasks.Submitter', 'args': ('abc',), 'expires': None, 'eta': None, 'kwargs': {}, 'id': 'eb5c65b4-f352-45c6-96f1-05d3a5329d53'}
Traceback (most recent call last):
File "/home/myuser/mysite-env/lib/python2.6/site-packages/celery/worker/listener.py", line 321, in receive_message
eventer=self.event_dispatcher)
File "/home/myuser/mysite-env/lib/python2.6/site-packages/celery/worker/job.py", line 299, in from_message
eta=eta, expires=expires)
File "/home/myuser/mysite-env/lib/python2.6/site-packages/celery/worker/job.py", line 243, in __init__
self.task = tasks[self.task_name]
File "/home/myuser/mysite-env/lib/python2.6/site-packages/celery/registry.py", line 63, in __getitem__
raise self.NotRegistered(str(exc))
NotRegistered: "Task of kind 'fable.jobs.tasks.Submitter' is not registered, please make sure it's imported."
It's weird, because the celeryd client does show that it's registered, when I launch it.
[2011-01-09 17:38:27,446: WARNING/MainProcess]
Configuration ->
. broker -> amqp://GOGOme#localhost:5672/fablemq
. queues ->
. celery -> exchange:celery (direct) binding:celery
. concurrency -> 1
. loader -> celery.loaders.default.Loader
. logfile -> [stderr]#INFO
. events -> OFF
. beat -> OFF
. tasks ->
. tasks.Decayer
. tasks.Submitter
Can someone help?

This is what I did which finally worked
in Settings.py I added
CELERY_IMPORTS = ("myapp.jobs", )
under myapp folder I created a file called jobs.py
from celery.decorators import task
#task(name="jobs.add")
def add(x, y):
return x * y
Then ran from commandline: python manage.py celeryd -l info
in another shell i ran python manage.py shell, then
>>> from myapp.jobs import add
>>> result = add.delay(4, 4)
>>> result.result
and the i get:
16
The important point is that you have to rerun both command shells when you add a new function. You have to register the name both on the client and and on the server.
:-)

I believe your tasks.py file needs to be in a django app (that's registered in settings.py) in order to be imported. Alternatively, you might try importing the tasks from an __init__.py file in your main project or one of the apps.
Also try starting celeryd from manage.py:
$ python manage.py celeryd -E -B -lDEBUG
(-E and -B may or may not be necessary, but that's what I use).

See Automatic Naming and Relative Imports, in the docs:
http://celeryq.org/docs/userguide/tasks.html#automatic-naming-and-relative-imports
The tasks name is "tasks.Submitter" (as listed in the celeryd output),
but you import the task as "fable.jobs.tasks.Submitter"
I guess the best solution here is if the worker also sees it as "fable.jobs.tasks.Submitter",
it makes more sense from an app perspective.
CELERY_IMPORTS = ("fable.jobs.tasks", )

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.