How to start mapreduce job from cron on GAE Python - python

I have mapreduce job defined in mapreduce.yaml:
mapreduce:
- name: JobName
mapper:
input_reader: google.appengine.ext.mapreduce.input_readers.DatastoreInputReader
handler: handler_name
params:
- name: entity_kind
default: KindName
How to start it from cron? Is there some url that can run it?

You can start a mapreduce task from any kind of AppEngine handler using control.py
from mapreduce import control
mapreduce_id = control.start_map(
"My Mapper",
"main.my_mapper",
"mapreduce.input_readers.DatastoreInputReader",
{"entity_kind": "models.MyEntity"},
shard_count=10)

Yes, if you look at the Getting Started page, it shows that you set the URL in your app.yaml:
handlers:
- url: /mapreduce(/.*)?
script: mapreduce/main.py
login: admin
You then can just cron it in the usual App Engine fashion, which in this example would be writing a cron.yaml like this:
cron:
- description: daily summary job
url: /mapreduce
schedule: every 24 hours

Related

Serverless framework can't connect to existing REST API in AWS

Hello and thanks in advance for the help,
Im trying to connect my serverless file to an existing API Rest in AWS but when I make the deploy it fails with the message:
CREATE_FAILED: ApiGatewayResourceOtherversion (AWS::ApiGateway::Resource)
Resource handler returned message: "Invalid Resource identifier specified
Here is the configuration in my serverless file and the API in the cloud
service: test-api-2
frameworkVersion: '3'
provider:
name: aws
region: us-east-1
runtime: python3.8
apiGateway:
restApiId: 7o3h7b2zy5
restApiRootResourceId: "/second"
functions:
hello_oscar:
handler: test-api/handler.hello_oscar
events:
# every Monday at 03:15 AM
- schedule: cron(15 3 ? * MON *)
#- sqs: arn:aws:sqs:region:XXXXXX:MyFirstQueue
package:
include:
- test-api/**
get:
handler: hexa/application/get/get.get_information
memorySize: 128
description: Test function
events:
- http:
path: /hola
method: GET
cors: true
package:
include:
- hexa/**
other_version:
handler: other_version/use_other.another_version
layers:
- xxxxxxxxx
runtime: python3.7
description: Uses other version of python3.7
events:
- http:
path: /other_version
method: POST
cors: true
package:
include:
- other_version/**
diferente:
handler: other_version/use_other.another_version
layers:
- xxxxxxxxxxxxxx
runtime: python3.8
In the example serverless.yml, where you have the restApiRootResourceId property set to /second, you should have it set to the root resource id, which is shown in your screen shot as bt6nd8xw4l

Adding warmup requests to App Engine on Python 3

When deploying my app to App Engine Standard's Python 3 runtime, how can I avoid request latency during an update to a new version, or starting new instances? Can I create some type of "warmup request"?
It's possible to configure custom warmup requests for your app. First, add the inbound_services directive and a corresponding handler in your app.yaml file:
inbound_services:
- warmup
handlers:
- url: /_ah/warmup
script: main.py
Then, define a warmup route in your main.py file:
#app.route('/_ah/warmup')
def warmup():
"""Warm up an instance of the app."""
pass # For example, initiate a db connection
See https://cloud.google.com/appengine/docs/standard/python3/configuring-warmup-requests for more details.

How to setup an AWS Elastic Beanstalk Worker Tier Environment + Cron + Django for periodic tasks? 403 Forbidden error

The app needs to run periodic tasks on the background to delete expired files. The app is up and running at a Web Server and at a Worker Tier Environment.
A cron.yaml file is at the root of the app:
version: 1
cron:
- name: "delete_expired_files"
url: "/networks_app/delete_expired_files"
schedule: "*/10 * * * *"
The cron url points to an app view:
def delete_expired_files(request):
users = DemoUser.objects.all()
for user in users:
documents = Document.objects.filter(owner=user.id)
if documents:
for doc in documents:
now = timezone.now()
if now >= doc.date_published + timedelta(days=doc.owner.group.valid_time):
doc.delete()
the Django ALLOWED_HOSTS setting is as follows:
ALLOWED_HOSTS = ['127.0.0.1', 'localhost', 'networksapp.elasticbeanstalk.com']
the task is being scheduled and the queries are sending the requests to the right url, however they're going to the WorkerDeadLetterQueue
The Worker Tier Environment Log File shows an 403 error:
"POST /networks_app/delete_expired_files HTTP/1.1" 403 1374 "-" "aws-sqsd/2.0"
The task is not being executed (expired files aren't being deleted). However when I access the url it executes the task properly.
I need to make it work automatically and periodically.
My IAM user has this Policies:
AmazonSQSFullAccess
AmazonS3FullAccess
AmazonDynamoDBFullAccess
AdministratorAccess
AWSElasticBeanstalkFullAccess
Why isn't the task being executed? Does this has to do with any IAM permission? Is there any missing configuration? How to make it work? Thanks in advance.

Google App Engine Cron Job Succeeds but does nothing?

I have a Python Cron job that, according to the Logs, runs to completion without errors. However, none of the "logging.error()" messages I have included in the code are included in the log and none of the required processing is done.
So that I can run this manually, I have a link in my HTML menu "Assign Rental Payments Due" that does the processing required and logs error logging messages correctly.
----
Section of app.yaml
- url: /rhrentassign.html
script: frhrentassign.app
----
Full cron.yaml
cron:
- description: Rental Payments Due
url: /rhrentassign
schedule: every day 14:00
----
Full python code (file is frhrentassign.py)
import os
import logging
import webapp2
from CronRH import *
class rhrentassignhandler(webapp2.RequestHandler):
def get(self):
swork = trhrenttopaycron()
swork.allnamespaces()
app = webapp2.WSGIApplication([('/rhrentassign.html', rhrentassignhandler)], debug=True)
----
Any thoughts on what I have done wrong would be most appreciated.
Many Thanks, David
Your handler is mapped to '/rhrentassign.html', but the cron is going to '/rhrentassign'.
Generally unless you have a very very good reason, there's no need to put 'html' in route names.

Google App Engine deferred.defer() error 404

I'm trying to get running a task in the Task Queue using deferred.defer(). The task is added to the default task queue, but the task fail with a 404 error.
This is the handler:
import webapp2
import models
import defer_ajust_utils
from google.appengine.ext import ndb
from google.appengine.ext import deferred
class ajust_utils(webapp2.RequestHandler):
def get(self):
deferred.defer(defer_ajust_utils.DoTheJob)
application = webapp2.WSGIApplication([('/ajust_utils', ajust_utils)], debug=True)
This is the module defer_ajust_utils :
import logging
import models
from google.appengine.ext import ndb
def DoTheJob():
logging.info("Debut de la mise a jour des utilisateurs")
utilisateurs = models.Utilisateur.query()
utilisateurs = utilisateurs.fetch()
for utilisateur in utilisateurs:
utilisateur.produire_factures_i = False
utilisateur.put()
logging.info("Fin de la mise a jour des utilisateurs")
And my app.yaml file :
application: xxxx
version: dev
runtime: python27
api_version: 1
threadsafe: yes
builtins:
- deferred: on
handlers:
- url: /ajust_utils
script : tempo_ajuster_utils.application
login: admin
Here's the log :
0.1.0.2 - - [10/Mar/2014:17:50:45 -0700] "POST /_ah/queue/deferred HTTP/1.1" 404 113
"http://xxxx.appspot.com/ajust_utils" "AppEngine-Google;
(+http://code.google.com/appengine)" "xxxx.appspot.com" ms=6 cpu_ms=0
cpm_usd=0.000013 queue_name=default task_name=17914595085560382799
app_engine_release=1.9.0 instance=00c61b117c0b3648693af0563b92051423b3cb
Thank you for help!!
If you are using push-to-deploy with git, when you add in a 'builtin' part to the app.yaml, such as in
builtins:
- deferred: on
you need to do a 'normal' gcloud deploy before you run the app. Otherwise it will not update the running app, which causes 404 errors for /_ah/queue/deferred
There is an open bug for this, so vote for it and it may get fixed. https://code.google.com/p/googleappengine/issues/detail?id=10139
I was receiving the same error.
It appears to be a documentation defect with https://cloud.google.com/appengine/articles/deferred
I looked in the source code and found the following, which wasn't in any of the documentation:
In order for tasks to be processed, you need to set up the handler. Add the
following to your app.yaml handlers section:
handlers:
- url: /_ah/queue/deferred
script: $PYTHON_LIB/google/appengine/ext/deferred/handler.py
login: admin
As I have threadsafe: true set in my app.yaml, I had to add the following handler:
- url: /_ah/queue/deferred
script: google.appengine.ext.deferred.deferred.application
login: admin
and then the deferred task queue began working and stopped 404'ing.
I think you have to add the option deferred: on to your apps "builtin" options in the app.yaml Here is an excerpt from
https://developers.google.com/appengine/docs/python/config/appconfig#Python_app_yaml_Builtin_handlers
builtins:
- deferred: on
The following builtin handlers are available:
admin_redirect - For production App Engine, this results in a redirect from /_ah/admin to the admin console. This builtin has no effect in the development web server.
appstats - Enables Appstats at /_ah/stats/, which you can use to measure your application's performance. In order to use Appstats, you also need to install the event recorder.
deferred - Enables the deferred handler at /_ah/queue/deferred. This builtin allows developers to use deferred.defer() to simplify the creation of Task Queue tasks. Also see Background work with the deferred library.

Categories