AppEngine MapReduce NDB, Repeated InternalError: internal error - python

We're trying to heavily use MapReduce in our project. Now we have this problem, there are a lot of 'InternalError: internal error.' errors in the log...
One example of it:
"POST /mapreduce/worker_callback HTTP/1.1" 500 0 "http://appname/mapreduce/worker_callback" "AppEngine-Google;
(+http://code.google.com/appengine)" "appname.appspot.com" ms=18856 cpu_ms=15980
queue_name=default task_name=appengine-mrshard-15828822618486744D69C-11-195
instance=00c61b117c47e0cba49bc5e5c7f9d328693e95ce
W 2012-10-24 06:51:27.140
suspended generator _put_tasklet(context.py:274) raised InternalError(internal error.)
W 2012-10-24 06:51:27.153
suspended generator put(context.py:703) raised InternalError(internal error.)
E 2012-10-24 06:51:27.207
internal error.
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~appname/1.362664407983567993/mapreduce/base_handler.py", line 65, in post
self.handle()
File "/base/data/home/apps/s~appname/1.362664407983567993/mapreduce/handlers.py", line 208, in handle
ctx.flush()
File "/base/data/home/apps/s~appname/1.362664407983567993/mapreduce/context.py", line 333, in flush
pool.flush()
File "/base/data/home/apps/s~appname/1.362664407983567993/mapreduce/context.py", line 221, in flush
self.__flush_ndb_puts()
File "/base/data/home/apps/s~appname/1.362664407983567993/mapreduce/context.py", line 239, in __flush_ndb_puts
ndb.put_multi(self.ndb_puts.items, config=self.__create_config())
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3650, in put_multi
for future in put_multi_async(entities, **ctx_options)]
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 325, in get_result
self.check_success()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 368, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 703, in put
key = yield self._put_batcher.add(entity, options)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 368, in _help_tasklet_along
value = gen.throw(exc.__class__, exc, tb)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 274, in _put_tasklet
keys = yield self._conn.async_put(options, datastore_entities)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 454, in _on_rpc_completion
result = rpc.get_result()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 834, in get_result
result = rpc.get_result()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 604, in get_result
return self.__get_result_hook(self)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1569, in __put_hook
self.check_rpc_success(rpc)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1224, in check_rpc_success
raise _ToDatastoreError(err)
InternalError: internal error.
queue.yaml:
queue:
- name: default
rate: 500/s
bucket_size: 100
max_concurrent_requests: 400
retry_parameters:
min_backoff_seconds: 5
max_backoff_seconds: 120
max_doublings: 2
MapReduce mapper params:
'shard_count': 16,
'processing_rate': 200,
'batch_size': 20
we would like to increase these numbers, since we need more speed in processing, but once we try to increase it increases error rate...
Blobstore Files Count: several ( some of them contain millions of lines )
Frontend Instance Class: F4
Processing flow:
We use only mapper for this particular processing.
We user BlobstoreLineInputReader (blob contains text file).
Each line represents new entry we need to create if it does not exist already(some of them we update).
My questions are:
How can we avoid these errors?
Are there any tips/hints on how we can choose/balance mapper params (shard_count, processing_rate, batch_size) ?
What happens with the job, does it get
retried (if so, how can we control it?) or not ?
BTW, we tried to play with some of the suggestions provided here (control batch_size) but we still see this.

This looks like a timeout error - check you logs to see how long that process is running before that happens.
If it is, you should try reducing the number of items that you're calling put_multi() on (ie reduce your batch size) and adding a timer check so that when your average time per put_multi() call gets close to the process time limit you quit and let another one start.

Related

GAE xmpp app shows invalid JID error

Currently I'm testing appengine-crowdguru-python this app by sending xmpp messages from http://localhost:8000/xmpp which has a form to post data .. I have filled from, to, chat (message) fields.
From : avinash#app-live.appspotchat.com
To : ajin#app-live.appspotchat.com
Chat: /tellme Who is Clinton?
where app-live app-id is currently on live. I also changed from and to fields to the ids like avi#xmpp.jp where this account is created through https://www.xmpp.jp/signup but it still shows invalid JID..
ERROR 2016-06-06 08:45:32,157 wsgi.py:280]
Traceback (most recent call last):
File "/home/gemini/softwares/google_appengine/google/appengine/runtime/wsgi.py", line 268, in Handle
result = handler(dict(self._environ), self._StartResponse)
File "/home/gemini/softwares/google_appengine/lib/webapp2-2.3/webapp2.py", line 1519, in __call__
response = self._internal_error(e)
File "/home/gemini/softwares/google_appengine/lib/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/home/gemini/softwares/google_appengine/lib/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/home/gemini/softwares/google_appengine/lib/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/home/gemini/softwares/google_appengine/lib/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/home/gemini/softwares/google_appengine/lib/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/home/gemini/softwares/google_appengine/google/appengine/ext/webapp/xmpp_handlers.py", line 63, in handle_exception
super(BaseHandler, self).handle_exception(exception, debug_mode)
File "/home/gemini/softwares/google_appengine/lib/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/home/gemini/PycharmProjects/appengine-crowdguru-python/guru.py", line 222, in post
super(XmppHandler, self).post()
File "/home/gemini/softwares/google_appengine/google/appengine/ext/webapp/xmpp_handlers.py", line 73, in post
self.message_received(self.xmpp_message)
File "/home/gemini/softwares/google_appengine/google/appengine/ext/webapp/xmpp_handlers.py", line 118, in message_received
handler(message)
File "/home/gemini/PycharmProjects/appengine-crowdguru-python/guru.py", line 302, in tellme_command
message.reply(WAIT_MSG)
File "/home/gemini/softwares/google_appengine/google/appengine/api/xmpp/__init__.py", line 515, in reply
message_type=message_type, raw_xml=raw_xml)
File "/home/gemini/softwares/google_appengine/google/appengine/api/xmpp/__init__.py", line 346, in send_message
raise InvalidJidError()
InvalidJidError
A common behaviour of "get Jid" it's error if Resource (in login phase) it's not definied due to the Jid composition ( user # server /resource). A "full jid" it's complete of Resource, so if it's null you can have a null pointer and so an error.
How to handle:
SOLUTION 1: retrive just "bare Jid"
SOLUTION 2: define a resource (it's a custom name that represent your client).
Hope that's help.

unable to configure apprtc.appspot with own url

This is the error I get when I try to configure apprtc with my own url. I tried to set up my own Turn Server and also tried to give a client url but it still did not work .
<HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects/esuioswebrtc/datasets/prod/tables/analytics/insertAll?alt=json returned "Not Found: Table esuioswebrtc:prod.analytics">
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1511, in __call__
rv = self.handle_exception(request, response, e)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 547, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~esuioswebrtc/2.382445032671238924/apprtc.py", line 503, in post
result = add_client_to_room(self.request, room_id, client_id, is_loopback)
File "/base/data/home/apps/s~esuioswebrtc/2.382445032671238924/apprtc.py", line 373, in add_client_to_room
host=request.host)
File "/base/data/home/apps/s~esuioswebrtc/2.382445032671238924/analytics.py", line 114, in report_event
analytics.report_event(*args, **kwargs)
File "/base/data/home/apps/s~esuioswebrtc/2.382445032671238924/analytics.py", line 94, in report_event
body=obj).execute()
File "/base/data/home/apps/s~esuioswebrtc/2.382445032671238924/third_party/oauth2client/util.py", line 132, in positional_wrapper
return wrapped(*args, **kwargs)
File "/base/data/home/apps/s~esuioswebrtc/2.382445032671238924/third_party/apiclient/http.py", line 723, in execute
raise HttpError(resp, content, uri=self.uri)
HttpError: <HttpError 404 when requesting https://www.googleapis.com/bigquery/v2/projects/esuioswebrtc/datasets/prod/tables/analytics/insertAll?alt=json returned "Not Found: Table esuioswebrtc:prod.analytics">
In order to use insertAll to stream data into a table, you must first create the table and give it the schema you will use.
You should pre-create the table out of band from your streaming insert process, since the rate limits on these apis differ drastically. For scenarios where you stream data into daily tables, we recommend that you pre-create the next few days of tables on some sort of cron process so they are ready for data before you need to stream into them.

sqlalchemy exception mysql is returning version as tuple instead of string

I have a very confusing problem with sqlalchemy. This isn't the first time I've used sqlalchemy. I'm working on a new project we just stood up from scratch, so it is possible there is a configuration error. I include the python connector package below from our pip reqs file in case it is of interest:
mysql-connector-repackaged==0.3.1
I just created a basic unit test to evaluate our sql wrapper class. The method under evaluation is an add_user class, which simply adds a user to the database. The class first runs a query to see if the user exists. This query is not succeeding.
Here is the code for the query:
q = self.session.query(User).\
filter_by(name=name, email=email)
result = q.all()
Seems simple enough, right? I'll include my connection string below in case anyone is interested:
db =create_engine('mysql+mysqlconnector://{user}:{password}#{host}:{port}/{database}'.format(user=conn.USER, password=conn.PASS, host=conn.HOST, port=conn.PORT, database=database, poolclass=NullPool))
The database is empty, although the table structure does exist. When this query is run it throws an exception. The exception reads as below:
StatementError: expected string or buffer (original cause: TypeError: expected string or buffer) u"SHOW VARIABLES LIKE 'sql_mode'" []
This exception comes from the python regular expression class re.py. The end of the stack trace looks like this:
File "/home/vagrant/GitRepos/SqlInteraction/venv/lib/python2.7/re.py", line 137, in match
return _compile(pattern, flags).match(string)
If I break on that line I can see the problem. The string object which the _compile method is running is not a string at all. It is a tuple. The tuple looks like this: (0, 3, 1, '', '')
Now, while it is obvious why that method is breaking, it is not at all obvious why a tuple is being passed in to that method. All of that happens internally to third party libraries. I include the full stack trace below for reference:
Traceback (most recent call last):
File "/home/vagrant/GitRepos/SqlInteraction/test/databasetest.py", line 14, in setUp
userId = self.add_user(Constants.TestObjects.USER_ID, Constants.TestObjects.USER_EMAIL, Constants.TestObjects.USER_PHONE)
File "/home/vagrant/GitRepos/SqlInteraction/test/databasetest.py", line 27, in add_user
result = q.all()
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2320, in all
return list(self)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2438, in __iter__
return self._execute_and_instances(context)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2451, in _execute_and_instances
close_with_result=True)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2442, in _connection_from_session
**kw)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 854, in connection
close_with_result=close_with_result)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 858, in _connection_for_bind
return self.transaction._connection_for_bind(engine)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 322, in _connection_for_bind
conn = bind.contextual_connect()
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1798, in contextual_connect
self.pool.connect(),
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 338, in connect
return _ConnectionFairy._checkout(self)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 644, in _checkout
fairy = _ConnectionRecord.checkout(pool)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 440, in checkout
rec = pool._do_get()
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 963, in _do_get
return self._create_connection()
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 285, in _create_connection
return _ConnectionRecord(self)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/pool.py", line 416, in __init__
exec_once(self.connection, self)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/event/attr.py", line 250, in exec_once
self(*args, **kw)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/event/attr.py", line 260, in __call__
fn(*args, **kw)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 1219, in go
return once_fn(*arg, **kw)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/strategies.py", line 166, in first_connect
dialect.initialize(c)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2453, in initialize
self._detect_ansiquotes(connection)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/base.py", line 2718, in _detect_ansiquotes
connection.execute("SHOW VARIABLES LIKE 'sql_mode'"),
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 721, in execute
return self._execute_text(object, multiparams, params)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 870, in _execute_text
statement, parameters
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 893, in _execute_context
None, None)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1159, in _handle_dbapi_exception
exc_info
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/util/compat.py", line 199, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 889, in _execute_context
context = constructor(dialect, self, conn, *args)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/engine/default.py", line 635, in _init_statement
if not dialect.supports_unicode_statements and \
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 725, in __get__
obj.__dict__[self.__name__] = result = self.fget(obj)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqlconnector.py", line 97, in supports_unicode_statements
return util.py3k or self._mysqlconnector_version_info > (2, 0)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 725, in __get__
obj.__dict__[self.__name__] = result = self.fget(obj)
File "/home/vagrant/GitRepos/SqlInteraction/venv/local/lib/python2.7/site-packages/sqlalchemy/dialects/mysql/mysqlconnector.py", line 131, in _mysqlconnector_version_info
self.dbapi.__version__)
File "/home/vagrant/GitRepos/SqlInteraction/venv/lib/python2.7/re.py", line 137, in match
return _compile(pattern, flags).match(string)
StatementError: expected string or buffer (original cause: TypeError: expected string or buffer) u"SHOW VARIABLES LIKE 'sql_mode'" []
I have no idea why this problem is occurring. I've bounced around the net with no useful results. I'm hoping someone here at SO might have some idea what is going on.

Google app engine datastore timeout on admin task

I have created a backend for my google app that looks like this:
backends:
- name: dbops
options: dynamic
and I've created an admin handler for it:
- url: /backend/.*
script: backend.app
login: admin
Now I understand that admin jobs should be able to run forever and I'm launching this job with a TaskQueue, but for some reason mine is not. My job is simply creating a summary table in datastore from a much larger table. This table holds about 12000 records and it takes several minutes for it to process the job on the development server, but it works fine. When I push the code out to appspot and try to get it to run the same job, I'm getting what looks like datastore timeouts.
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1536, in __call__
rv = self.handle_exception(request, response, e)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1530, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1278, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 1102, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 572, in dispatch
return self.handle_exception(e, self.app.debug)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.5.1/webapp2.py", line 570, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~myzencoder/dbops.362541511260492787/backend.py", line 626, in get
for asset in assets:
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/db/__init__.py", line 2314, in next
return self.__model_class.from_entity(self.__iterator.next())
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 2816, in next
next_batch = self.__batcher.next()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 2678, in next
return self.next_batch(self.AT_LEAST_ONE)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 2715, in next_batch
batch = self.__next_batch.get_result()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 604, in get_result
return self.__get_result_hook(self)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_query.py", line 2452, in __query_result_hook
self._batch_shared.conn.check_rpc_success(rpc)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1224, in check_rpc_success
raise _ToDatastoreError(err)
Timeout: The datastore operation timed out, or the data was temporarily unavailable.
Anyone got any suggestions on how to make this work?
While the backend request can run for a long time, a query can only run for 60 sec. You'll have to loop over your query results with cursors.
Mapreduce will get you a result quicker by doing the queries in parallel.
In production you use the HR datastore and you can run into contention problems. See this article.
https://developers.google.com/appengine/articles/scaling/contention?hl=nl
And have a look at mapreduce for creating a report. Maybe this is a better solution.

AppEngine MapReduce NDB, DeadlineExceededError

we're trying to heavily use MapReduce in our project.
Now we have this problem, there is a lots of 'DeadlineExceededError' errors in the log...
One example of it ( traceback differs each time a bit ) :
Traceback (most recent call last):
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 207, in Handle
result = handler(dict(self._environ), self._StartResponse)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/python27_runtime/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/base_handler.py", line 65, in post
self.handle()
File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/handlers.py", line 208, in handle
ctx.flush()
File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 333, in flush
pool.flush()
File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 221, in flush
self.__flush_ndb_puts()
File "/base/data/home/apps/s~sba/1.362471299468574812/mapreduce/context.py", line 239, in __flush_ndb_puts
ndb.put_multi(self.ndb_puts.items, config=self.__create_config())
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/model.py", line 3625, in put_multi
for future in put_multi_async(entities, **ctx_options)]
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 323, in get_result
self.check_success()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 318, in check_success
self.wait()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 302, in wait
if not ev.run1():
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/eventloop.py", line 219, in run1
delay = self.run0()
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/eventloop.py", line 181, in run0
callback(*args, **kwds)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/tasklets.py", line 365, in _help_tasklet_along
value = gen.send(val)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/ext/ndb/context.py", line 274, in _put_tasklet
keys = yield self._conn.async_put(options, datastore_entities)
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1560, in async_put
for pbs, indexes in pbsgen:
File "/base/python27_runtime/python27_lib/versions/1/google/appengine/datastore/datastore_rpc.py", line 1350, in __generate_pb_lists
incr_size = pb.lengthString(pb.ByteSize()) + 1
DeadlineExceededError
My questions are:
How can we avoid this Error?
What happens with the job, does it get retried (if so how can we control it?) or not ?
Does it causes data inconsistency in the end ?
Apparently you are doing too many puts than it is possible to insert in one datastore call. You have multiple options here:
If this is a relatively rare event - ignore it. Mapreduce will retry the slice and will lower put pool size. Make sure that your map is idempotent.
Take a look at http://code.google.com/p/appengine-mapreduce/source/browse/trunk/python/src/mapreduce/context.py - in your main.py you can lower DATASTORE_DEADLINE, MAX_ENTITY_COUNT or MAX_POOL_SIZE to lower the size of the pool for the whole mapreduce.
If you're using an InputReader, you might be able to adjust the default batch_size to reduce the number of entities processed by each task.
I believe the task queue will retry tasks, but you probably don't want it to, since it'll likley hit the same DeadlineExceededError.
Data inconsistencies are possible.
See this question as well.
App Engine - Task Queue Retry Count with Mapper API

Categories