Task is logged but not executing - python

I have a cron job which runs on my db and creates tasks to process certain db records. The cron job runs as expected, but the tasks added with taskqueue.add() don't really function. They are logged in the logs of the app, but nothing happens. I also put some logging.info() code in the tasks, but it's not executing. This is really puzzling beacuse I'm not getting any errors, and if I access the task directly (through a URL in the browser) it executes fine.
Here's the log:
Here's the empty_task.py code:
import webapp2
import logging
class EmptyHandler(webapp2.RequestHandler):
def get(self):
logging.info("empty task!")
app = webapp2.WSGIApplication([
('/tasks/empty_task', EmptyHandler),
], debug=True)
EDIT:
Here's the cron.yaml code:
cron:
- description: the shots dispatcher
url: /tasks/run_schedules
schedule: every 15 minutes
And here's the run_schedules code:
import webapp2
import logging
from google.appengine.datastore.datastore_query import Cursor
from google.appengine.api import taskqueue
from models.schedule import Schedule
MAX_FETCH = 20
class RunSchedulesHandler(webapp2.RequestHandler):
def get(self):
cursor = Cursor(urlsafe=self.request.get('cursor'))
scheds, next_curs, more = Schedule.query().fetch_page(MAX_FETCH, keys_only=True, start_cursor=cursor)
for key in scheds:
logging.info("adding to taskqueue schedule: {}".format(key.id()))
taskqueue.add(url='/tasks/process_schedule', params={'sid': key.id()})
taskqueue.add(url='/tasks/empty_task', params={'sid': key.id()})
if more and next_curs:
taskqueue.add(url='/tasks/run_schedules', params={'cursor': next_curs})
app = webapp2.WSGIApplication([
('/tasks/run_schedules', RunSchedulesHandler),
], debug=True)

The default method of task execution is POST. And you have only defined the GET method in EmptyHandler. See the appengine console status. Its returning 405 which is method not allowed.
You can use 2 solution for this.
Solution 1:
Change the method name to post.
class EmptyHandler(webapp2.RequestHandler):
def post(self):
logging.info("empty task!")
Solution 2: Define method explicitly in taskqueue.add()
taskqueue.add(url='/tasks/empty_task', params={'sid': key.id()}, method='GET')

Related

Distributed python: Celery send_task gets COMMAND_INVALID

Context
I developed a Flask API that sends tasks to my computing environment.
To use this, you should make a post request to the API.
Then, the API received your request, process it and send necessary data, through the RABBITMQ broker, a message to be held by the computing environment.
At the end, it should send the result back to the API
Some code
Here is an example of my API and my Celery application:
#main.py
# Package
import time
from flask import Flask
from flask import request, jsonify, make_response
# Own module
from celery_app import celery_app
# Environment
app = Flask()
# Endpoint
#app.route("/test", methods=["POST"])
def test():
"""
Test route
Returns
-------
Json formatted output
"""
# Do some preprocessing in here
result = celery_app.send_task(f"tasks.Client", args=[1, 2])
while result.state == "PENDING":
time.sleep(0.01)
result = result.get()
if result["sucess"]:
result_code = 200
else:
result_code = 500
output = str(result)
return make_response(
jsonify(
text=output,
code_status=result_code, ),
result_code,
)
# Main thread
if __name__ == "__main__":
app.run()
In a different file, I have setup my celery application connected to RABBITMQ Queue
#celery_app.py
from celery import Celery, Task
celery_app = Celery("my_celery",
broker=f"amqp://{USER}:{PASSWORD}#{HOSTNAME}:{PORT}/{COLLECTION}",
backend="rpc://"
)
celery_app.conf.task_serializer = "pickle"
celery_app.conf.result_serializer = "pickle"
celery_app.conf.accept_content = ["pickle"]
celery_app.conf.broker_connection_max_retries = 5
celery_app.conf.broker_pool_limit = 1
class MyTask(Task):
def run(self, a, b):
return a + b
celery_app.register_task(MyTask())
To run it, you should launch:
python3 main.py
Do not forget to run the celery worker (after registering tasks in it)
Then you can make a post request on it:
curl -X POST http://localhost:8000/test
The problem to resolve
When this simple API is running, I am sending request on my endpoint.
Unfortunatly, it fails 1 time on 4.
I have 2 messages:
The first message is:
amqp.exceptions.PreconditionFailed: (0, 0): (406) PRECONDITION_FAILED - delivery acknowledgement on channel 1 timed out. Timeout value used: 1800000 ms. This timeout value can be configured, see consumers doc guide to learn more
Then, because of the time out, my server has lost the message so:
File "main.py", line x, in test
result = celery_app.send_task("tasks.Client", args=[1, 2])
amqp.exceptions.InvalidCommand: Channel.close_ok: (503) COMMAND_INVALID - unimplemented method
Resolve this error
There are 2 solutions to get around this problem
retry to send a tasks until it fails 5 times in a row (try / except amqp.exceptions.InvalidCommand)
change the timeout value.
Unfortunatly, it doesn't seems to be the best ways to solve it.
Can you help me ?
Regards
PS:
my_packages:
Flask==2.0.2
python==3.6
celery==4.4.5
rabbitmq==latest
1. PreconditionFailed
I change my RabbitMQ version from latest to 3.8.14.
Then, I set up a celery task timeout using time_limit and soft_time_limit.
And it works :)
2. InvalidCommand
To resolve this problem, I use this retryfunctionnaluity.
I setup:
# max_retries=3
# autoretry_for=(InvalidCommand,)

Unit test a listener in Sanic without starting the app server

Assuming I have this listener defined in my Sanic app:
#app.before_server_start
async def db_setup(*args):
# ... set up the DB as I wish for the app
If I want to unit test this function (with pytest) and I import it in a unit test file with from my.app import db_setup, it seems the test actually starts serving the app, as pytest outputs:
[INFO] Goin' Fast # http://0.0.0.0:8000
[INFO] Starting worker [485]
Now, I know that I can remove the effects of the decorator by doing db_setup = db_setup.__wrapped__, but in order to do this I actually need to import db_setup, which is where the Sanic server fires up.
Is there a way of removing the effects of the decorator at import time?
LE: I've tried patching the Sanic app as follows:
async def test_stuff(mocker):
mocker.patch('myapp.app.app') # last '.app' being `app = Sanic('MyApp')`
imp = importlib.import_module('myapp.app')
db_setup = getattr(imp, 'db_setup')
await db_setup()
but now I get a RuntimeError: Cannot run the event loop while another loop is running for the mocker.patch('myapp.app.app') line.
I am going to make a few assumptions here, so I may need to amend this answer if there are some clarifications.
Before starting, it should be noted that the decorator itself will not start your web server. That will run in one of two scenarios:
You are running app.run() somewhere in the global scope
You are using the Sanic TestClient, which specifically operates by running your application's web server
Now, from what I can understand, you are trying to run db_setup in a test manually by calling it as a function, but you do not want it to attach as a listener to the application in your tests.
You can get access to all of your application instance's listeners in the app.listeners property. Therefore one solution would be something like this:
# conftest.py
import pytest
from some.place import app as myapp
#pytest.fixture
def app():
myapp.listeners = {}
return myapp
Like I said earlier, this will just empty out your listeners. It does not actually impact your application starting, so I am not sure it has the utility that you are looking for.
You should be able to have something like this:
from unittest.mock import Mock
import pytest
from sanic import Request, Sanic, json
app = Sanic(__name__)
#app.get("/")
async def handler(request: Request):
return json({"foo": "bar"})
#app.before_server_start
async def db_setup(app, _):
app.ctx.db = 999
#pytest.mark.asyncio
async def test_sample():
await db_setup(app, Mock())
assert app.ctx.db == 999
For the sake of ease, it is all in the same scope, but even if the test functions, the application instance, and the listener are spread across different modules, the end result is the same: You can run db_setup as any other function and it does not matter if it is registered as a listener or not.

How to get flask request context in celery task?

I have a flask server running within a gunicorn.
In my flask application I want to handle large upload files (>20GB), so I plan on letting a celery task do the handling of the large file.
The problem is that retrieving the file from request.files already takes quite long, in the meantime gunicorn terminates the worker handling that request. I could increase the timeout time, but the maximum file size is currently unknown, so I don't know how much time I would need.
My plan was to make the request context available to the celery task, as it was described here: http://xion.io/post/code/celery-include-flask-request-context.html, but I cannot make it work
Q1 Is the signature right?
I set the signature with
celery.signature(handle_large_file, args={}, kwargs={})
and nothing is complaining. I get the arguments I pass from the flask request handler to the celery task, but that's it. Should I somehow get a handle to the context here?
Q2 how to use the context?
I would have thought if the flask request context was available I could just use request.files in my code, but then I get the warning that I am out of context.
Using celery 4.4.0
Code:
# in celery.py:
from flask import request
from celery import Celery
celery = Celery('celery_worker',
backend=Config.CELERY_RESULT_BACKEND,
broker=Config.CELERY_BROKER_URL)
#celery.task(bind=True)
def handle_large_file(task_object, data):
# do something with the large file...
# what I'd like to do:
files = request.files['upfile']
...
celery.signature(handle_large_file, args={}, kwargs={})
# in main.py
def create_app():
app = Flask(__name__.split('.')[0])
...
celery_worker.conf.update(app.config)
# copy from the blog
class RequestContextTask(Task):...
celery_worker.Task = RequestContextTask
# in Controller.py
#FILE.route("", methods=['POST'])
def upload():
data = dict()
...
handle_large_file.delay(data)
What am I missing?

GAE - cron job failing, with no error message in logs

I have been trying to run a cron job with GAE (code developed in Python), but when I trigger the job, it fails without any error message -- I can't find anything at all in the logs.
This is happening for a service for which I'm using the flexible environment.
This is the structure of my files:
my_service.yaml looks like this:
service: my_service
runtime: custom
env: flex
env_variables:
a:
b:
the my_service.app looks like this:
from __future__ import absolute_import
from flask import Flask
from flask import request
import logging
import datetime
import os
import tweepy
from google.cloud import datastore
import time
logging.basicConfig(level=logging.INFO)
app = Flask(__name__)
#app.route('/Main')
def hello():
"""A no-op."""
return 'nothing to see.'
#app.route('/my_service')
def get_service():
is_cron = request.headers.get('X-Appengine-Cron', False)
logging.info("is_cron is %s", is_cron)
# Comment out the following test to allow non cron-initiated requests.
if not is_cron:
return 'Blocked.'
## data scraping and saving in Datastore
return 'Done.'
#app.errorhandler(500)
def server_error(e):
logging.exception('An error occurred during a request.')
return """
An internal error occurred: <pre>{}</pre>
See logs for full stacktrace.
""".format(e), 500
if __name__ == '__main__':
app.run(host='127.0.0.1', port=8080, debug=True)
Then I have a dispatch.yaml with this structure:
dispatch:
- url: "*/my_service*"
service: my_service
And a cron.yaml:
cron:
- description: run my service
url: /my_service
schedule: 1 of month 10:00
target: my_service
Not sure what I'm doing wrong here.
EDIT
A bit of context. This is something I'm editing starting from this repo.
The service called backend that is defined in there works perfectly (it has also the same schedule in the cron job as my_service but when I trigger it in a day different from the one in which it's scheduled, it works just fine). What I did was to create an additional service with its own yaml file, which looks exactly the same as the beckend.yaml, its own my_service.py and adding it to the dispacth.yamland the cron.yaml. In theory this should work, since the structure is exactly the same, but it doesn't.
This service was originally developed in the standard environment and there it was working, the problem originated when I moved it to the flex environment.
EDIT 2:
The problem was actually in the Dockerfile, that was calling a service that I was not using.
EDIT:
def get(self): may have some issues.
First, get may be reserved. Second, you aren't able to send self to that function. Change that to:
def get_service():
EDIT2:
You also need to import logging at the top of any page that uses it. And, you have not imported Flask and its components:
from flask import Flask, request, render_template # etc...
import logging
Your 1 of month 10:00 cron schedule specification can most likely be the culprit: it specifies to run the job at 10:00 only on the first day of each month! From Defining the cron job schedule:
Example: For the 1,2,3 of month 07:00 schedule, the job runs one
time at 07:00 on the first three days of each month.
So the last execution happened 3 days ago (if this cron config was deployed at the time) and no other attempt will happen until Nov 1st :)
Change the schedule to something easier to test with, like every 5 minutes or every 1 hours and revert the change once you're happy it works as expected.

Google App Engine Python Cron

I've been trying for a few days now to get Google App Engine to run a cron Python script which will simply execute a script hosted on a server of mine.
It doesn't need to post any data to the page, simply open a connection, wait for it to finish then email me.
The code I've previously written has logged as "successful" but I never got an email, nor did I see any of the logging.info code I added to test things.
Ideas?
The original and wrong code that I originally wrote can be found at Google AppEngine Python Cron job urllib - just so you know I have attempted this before.
Mix of weird things was happening here.
Firstly, app.yaml I had to place my /cron handler before the root was set:
handlers:
- url: /cron
script: assets/backup/main.py
- url: /
static_files: assets/index.html
upload: assets/index.html
Otherwise I'd get crazy errors about not being able to find the file. That bit actually makes sense.
The next bit was the Python code. Not sure what was going on here, but in the end I managed to get it working by doing this:
#!/usr/bin/env python
# import logging
from google.appengine.ext import webapp
from google.appengine.api import mail
from google.appengine.ext.webapp.util import run_wsgi_app
from google.appengine.api import urlfetch
import logging
class CronMailer(webapp.RequestHandler):
def get(self):
logging.info("Backups: Started!")
urlStr = "http://example.com/file.php"
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, urlStr)
mail.send_mail(sender="example#example.com",
to="email#example.co.uk",
subject="Backups complete!",
body="Daily backups have been completed!")
logging.info("Backups: Finished!")
application = webapp.WSGIApplication([('/cron', CronMailer)],debug=True)
def main():
run_wsgi_app(application)
if __name__ == '__main__':
main()
Whatever it was causing the problems, it's now fixed.

Categories