The documentation for GAE's ndp.put_multi is severely lacking. NDB Entities and Keys - Python — Google Cloud Platform shows that it returns a list of keys (list_of_keys = ndb.put_multi(list_of_entities)), but it says nothing about failures. NDB Functions doesn't provide much more information.
Spelunking through the code (below), shows me that, at least for now, put_multi just aggregates the Future.get_result()s returned from the async method, which itself delegates to the entities' put code. Now, the docs for the NDB Future Class indicate that a result will be returned or else an exception will be raised. I've been told, however, that the result will be None if a particular put failed (I can't find any authoritative documentation to that effect, but if it's anything like db.get then that would make sense).
So all of this boils down to some questions I can't find the answers to:
Clearly, the return value is a list - is it a list with some elements possibly None? Or are exceptions used instead?
When there is an error, what should be re-put? Can all entities be re-put (idempotent), or only those whose return value are None (if that's even how errors are communicated)?
How common are errors (One answer: 1/3000)? Do they show up in logs (because I haven't seen any)? Is there a way to reliably simulate an error for testing?
Usage of the function in an open source library implies that the operation is idempotent, but that's about it. (Other usages don't even bother checking the return value or catching exceptions.)
Handling Datastore Errors makes no mention of anything but exceptions.
I agree with your reading of the code: put_multi() reacts to an error the same way put_async().get_result() does. If put() would raise an exception, put_multi() will also, and will be unhelpful about which of the multiple calls failed. I'm not aware of a circumstance where put_multi() would return None for some entries in the key list.
You can re-put entities that have been put, assuming no other user has updated those entities since the last put attempt. Entities that are created with system-generated IDs have their in-memory keys updated, so re-putting these would overwrite the existing entities and not create new ones. I wouldn't call it idempotent exactly because the retry would overwrite any updates to those entities made by other processes.
Of course, if you need more control over the result, you can perform this update in a transaction, though all entities would need to be in the same entity group for this to work with a primitive transaction. (Cross-group transactions support up to five distinct entity groups.) A failure during a transaction would ensure that none of the entities are created or updated if any of the attempts fail.
I don't know a general error rate for update failures. Such failures are most likely to include contention errors or "hot tablets" (too many updates to nearby records in too short a time, resulting in a timeout), which would depend on app behavior. All such errors are reported by the API as exceptions.
The easiest way to test error handling call paths would be to wrap the Model class and override the methods with a test mode behavior. You could get fancier and dig into the stub API used by testbed, which may have a way to hook into low-level calls and simulate errors. (I don't think this is a feature of testbed directly but you could use a similar technique.)
Related
Recently I have a problem when a coworker made a change in a signature for a function return, we have clients that call the function in this way:
example = function()
But then as I was depending on his changes, he unintentionally change to this:
example, other_stuff = function()
I was not aware of this change, I did the merge and everything seem ok but then the error happen as I was expecting one value, but now it was trying to unpack two
So my question is knowing python is not a typed language, is there a way to know this happen and prevent this behavior (a tool or something), because sadly was until a runtime error was raise when I notice this, or how did we need to handle this
Sounds like a process error. An API shouldn't change its signature without considering its users. Within a project its easy, just search. Externally, the API version number should be bumped and the change should be in the change notes.
The API should have unit tests which include return value tests. "he unintentionally changed" issues should all be caught there. Since this didn't happen, a bug report against the tests should be written.
Of course, the coworker could just change those tests. But all of that should be in a code reviewed change set in your source repository. The coworker should have to justify the change and how to mitigate breakage. Since this API appears to have external clients, it should be very difficult to get an API signature change as all clients will need to be notified.
What is the recommended way to store a Python exception – in a structured way that allows access to the different parts of that exception – in a Django model?
It is common to design a Django model that records “an event” or “an attempt to do foo”; part of the information to be recorded is “… and the result was this error”. That then becomes a field on the model.
An important aspect of recording the error in the database is to query it in various structured ways: What was the exception type? What was the specific message? What were the other arguments to the exception instance? What was the full traceback?
(I'm aware of services that will let me stream logging or errors to them across the network; that is not what this question asks. I need the structured exception data in the project's own database, for reference directly from other models in the same database.)
The Python exception object is a data structure that has all of these parts – the exception type, the arguments to the exception instance, the “cause” and “context”, the traceback – available separately, as attributes of the object. The objective of this question is to have the Python exception information stored in the database model, such that they can be queried in an equivalent structured manner.
So it isn't enough to have a free-form TextField to record a string representation of the exception. A structured field or set of fields is needed; perhaps even an entire separate model for storing exception instances.
Of course I could design such a thing myself, and spend time making the same mistakes countless people before me have undoubtedly made in trying to implement something like this. Instead, I want to learn from existing art in solving this problem.
What is a general pattern to store structured Python exception data in a Django model, preferably in an existing mature general-purpose library at PyPI that my project can use for its models?
I am not that sure that many people had sought a custom way of storing exception data
Ultimately, you have to know what you need. In all projects I had worked so far, the traceback text had contained enough information to dig to the error source. (sometimes, even naively, so that the TB was escaped multipletimes, still it had been enough to "reverse escape" it, and fix the source with only one instance of the error).
Some debugging tools offer live interactive instrospection in each execution frame when an exception happens - but that has to be "live", because you can't ordinarily serialize Python execution frames and store it on the DB.
That said, if the traceback text is not enough for you, you can have the traceback object by calling sys.exc_info()[2]. That allows you to introspect each frame and know for yourself the file, line number, local and global variables as dictionaries. If you want the variable values to be available on the database, you have to serialize them, but not all values on variables will be easily serializable. So, it is your call to know 'when enough is enough' in this process.
Most modern databases allow for JSON fields, and serializing the exception info to a JSON field restricting data to strings, numbers, bool and None is probably enough.
One way is to run manually each key on the f_globals and f_locals dict for each fame, and try to json.dumps that key's value, and on an exception on the JSON serializtion, use the object's repr instead. Or you can customize a JSON serializer that could store customized relevant data for datetime and dates, open files, sockets and so on - only you can know your needs.
TL;DR: Use a JSON field, at the except clause get hold of the traceback object by calling sys.last_traceback(), and have a custom function to serialize what you want from it into the JSON field.
Or, just use traceback.format_tb to save the traceback text - it probably will be enough anyway.
Most people delegate tasks like this to third-party services like RollBar or LogEntries, or to middleware services running in a VPC like Elastic LogStash.
I'd suggest that the Django ORM is not well suited for this type of storage. Most of the advantage of the ORM is letting you join on relational tables without elaborate SQL and manage schema changes and referential integrity with migrations. These are the problems encountered by the "CRUD" applications Django is designed for — web applications with user accounts, preferences, notification inboxes, etc. The ORM is intended to manage mutable data with more reads than writes.
Your problem, storing Python exceptions that happened in production, is quite different. Your schema needs will almost never change. The data you want to store is never modified at all once written, and all writes are strictly appends. Your data does not contain foreign keys or other fields that would change in a migration. You will almost always query recent data over historical, which will rarely be read outside of offline/bulk analytics.
If you really want to store this information in Django, I'd suggest storing only a rolling window that you periodically rotate into compressed logs on disk. Otherwise you will be maintaining costly indexes in data that is almost never needed. In this case, you should consider constructing your own custom Django model that extracts the Exception metadata you need. You could also put this information into a JSON field that you store as a string as #jsbueno suggests, but this sacrifices indexed selection.
(Note Python exceptions cannot be directly serialized to JSON or pickled. There is a project called tblib that enables pickling, which in turn could be stored as BLOB fields in Django, but I have no idea if the performance would be reasonable. My guess is it would not be worth it.)
In recent years there are many alternative DBMS products for log-like, append-only storage with analytic query patterns. But most of this progress is too recent and too "web-scale" to have off-the-shelf integration with Django, which focuses on smaller, more traditional CRUD applications. You should look for solutions that can be integrated with Python more generally, as complex logging/event storage is mostly out-of-scope for Django.
I'm trying to achieve strong consistency with ndb using python.
And looks like I'm missing something as my reads behave like they're not strongly consistent.
The query is:
links = Link.query(ancestor=lead_key).filter(Link.last_status ==
None).fetch(keys_only=True)
if links:
do_action()
The key structure is:
Lead root (generic key) -> Lead -> Website (one per lead) -> Link
I have many tasks that are executed concurrently using TaskQueue and this query is performed at the end of every task. Sometimes I'm getting "too much contention" exception when updating the last_status field but I deal with it using retries. Can it break strong consistency?
The expected behavior is having do_action() called when there are no links left with last_status equal to None. The actual behavior is inconsistent: sometimes do_action() is called twice and sometimes not called at all.
Using an ancestor key to get strong consistency has a limitation: you're limited to one update per second per entity group. One way to work around this is to shard the entity groups. Sharding Counters describes the technique. It's an old article, but as far as I know, the advise is still sound.
Adding to Dave's answer which is the 1st thing to check.
One thing which isn't well documented and can be a bit surprising is that the contention can be caused by read operations as well, not only by the write ones.
Whenever a transaction starts the entity groups being accessed (by read or write ops, doesn't matter) are marked as such. The too much contention error indicates that too many parallel transactions simultaneously try to access the same entity group. It can happen even if none of the transactions actually attempts to write!
Note: this contention is NOT emulated by the development server, it can only be seen when deployed on GAE, with the real datastore!
What can add to the confusion is the automatic re-tries of the transactions, which can happen after both actual write conflicts or just plain access contention. These retries may appear to the end-user as suspicious repeated execution of some code paths - which I suspect could explain your reports of do_action() being called twice.
Usually when you run into such problems you have to re-visit your data structures and/or the way you're accessing them (your transactions). In addition to solutions maintaining the strong consistency (which can be quite expensive) you may want to re-check if consistency is actually a must. In some cases it's added as a blanket requirement just because appears to simplify things. From my experience it doesn't :)
There is nothing in your sample that ensures that your code is only called once.
For the moment, I am going to assume that your "do_action" function does something to the Link entities, specifically that it sets the "last_status" property.
If you do not perform the query and the write to the Link Entity inside a transaction, then it is possible for two different requests (task queue tasks) to get results back from the query, then both write their new value to the Link entity (with the last write overwriting the previous value).
Remember that even if you do use a transaction, you don't know until the transaction is successfully completed that nobody else tried to perform a write. This is important if you are trying to do something external to datastore (for example, making a http request to an external system), as you may see http requests from transactions that would eventually fail with a concurrent modification exception.
I'm expectedly getting a CypherExecutionException. I would like to catch it but I can't seem to find the import for it.
Where is it?
How do I find it myself next time?
Depending on which version of py2neo you're using, and which Cypher endpoint - legacy or transactional - this may be one of the auto-generated errors built dynamically from the server response. Newer functionality (i.e. the transaction endpoint) no longer does this and instead holds hard-coded definitions for all exceptions for just this reason. This wasn't possible for the legacy endpoint when the full list of possible exceptions was undocumented.
You should however be able to catch py2neo.error.GraphError instead which is the base class from which these dynamic errors inherit. You can then study the attributes of that error for more specific checking.
Two code examples (simplified):
.get outside the transaction (object from .get passed into the transactional function)
#db.transactional
def update_object_1_txn(obj, new_value):
obj.prop1 = new_value
return obj.put()
.get inside the transaction
#db.transactional
def update_object2_txn(obj_key, new_value):
obj = db.get(obj_key)
obj.prop1 = new_value
return obj.put()
Is the first example logically sound? Is the transaction there useful at all, does it provide anything? I'm trying to better understand appengine's transactions. Would choosing the second option prevent from concurrent modifications for that object?
To answer your question in one word: yes, your second example is the way to do it. In the boundaries of a transaction, you get some data, change it, and commit the new value.
Your first one is not wrong, though, because you don't read from obj. So even though it might not have the same value that the earlier get returned, you wouldn't notice. Put another way: as written, your examples aren't good at illustrating the point of a transaction, which is usually called "test and set". See a good Wikipedia article on it here: http://en.wikipedia.org/wiki/Test-and-set
More specific to GAE, as defined in GAE docs, a transaction is:
a set of Datastore operations on one or more entities. Each transaction is guaranteed to be atomic, which means that transactions are never partially applied. Either all of the operations in the transaction are applied, or none of them are applied.
which tells you it doesn't have to be just for test and set, it could also be useful for ensuring the batch commit of several entities, etc.