Ndb strong consistency and frequent writes - python

I'm trying to achieve strong consistency with ndb using python.
And looks like I'm missing something as my reads behave like they're not strongly consistent.
The query is:
links = Link.query(ancestor=lead_key).filter(Link.last_status ==
None).fetch(keys_only=True)
if links:
do_action()
The key structure is:
Lead root (generic key) -> Lead -> Website (one per lead) -> Link
I have many tasks that are executed concurrently using TaskQueue and this query is performed at the end of every task. Sometimes I'm getting "too much contention" exception when updating the last_status field but I deal with it using retries. Can it break strong consistency?
The expected behavior is having do_action() called when there are no links left with last_status equal to None. The actual behavior is inconsistent: sometimes do_action() is called twice and sometimes not called at all.

Using an ancestor key to get strong consistency has a limitation: you're limited to one update per second per entity group. One way to work around this is to shard the entity groups. Sharding Counters describes the technique. It's an old article, but as far as I know, the advise is still sound.

Adding to Dave's answer which is the 1st thing to check.
One thing which isn't well documented and can be a bit surprising is that the contention can be caused by read operations as well, not only by the write ones.
Whenever a transaction starts the entity groups being accessed (by read or write ops, doesn't matter) are marked as such. The too much contention error indicates that too many parallel transactions simultaneously try to access the same entity group. It can happen even if none of the transactions actually attempts to write!
Note: this contention is NOT emulated by the development server, it can only be seen when deployed on GAE, with the real datastore!
What can add to the confusion is the automatic re-tries of the transactions, which can happen after both actual write conflicts or just plain access contention. These retries may appear to the end-user as suspicious repeated execution of some code paths - which I suspect could explain your reports of do_action() being called twice.
Usually when you run into such problems you have to re-visit your data structures and/or the way you're accessing them (your transactions). In addition to solutions maintaining the strong consistency (which can be quite expensive) you may want to re-check if consistency is actually a must. In some cases it's added as a blanket requirement just because appears to simplify things. From my experience it doesn't :)

There is nothing in your sample that ensures that your code is only called once.
For the moment, I am going to assume that your "do_action" function does something to the Link entities, specifically that it sets the "last_status" property.
If you do not perform the query and the write to the Link Entity inside a transaction, then it is possible for two different requests (task queue tasks) to get results back from the query, then both write their new value to the Link entity (with the last write overwriting the previous value).
Remember that even if you do use a transaction, you don't know until the transaction is successfully completed that nobody else tried to perform a write. This is important if you are trying to do something external to datastore (for example, making a http request to an external system), as you may see http requests from transactions that would eventually fail with a concurrent modification exception.

Related

Multiple "finally" clauses/"with" statements added on the fly?

Let's say that I am writing code where I need a fail-safe section that - in case of a failure - will save me from wasting money. For example, code that buys virtual servers via AWS API and since those are paid per hour, it's preferable to shut them down as soon as they stop being useful.
The problem is, I don't know how many such instances I would use and I would create them on the fly, adding them to some list or whatnot. The "destructor" of each of the instance might have an unexpected exception and because of that, I'm afraid that any code with a finally clause would be ugly. I cannot also think of how I would use with because I would be introducing objects on the fly. What are other fail-safe solutions I could use with Python?

How to handle errors from ndb.put_multi

The documentation for GAE's ndp.put_multi is severely lacking. NDB Entities and Keys - Python — Google Cloud Platform shows that it returns a list of keys (list_of_keys = ndb.put_multi(list_of_entities)), but it says nothing about failures. NDB Functions doesn't provide much more information.
Spelunking through the code (below), shows me that, at least for now, put_multi just aggregates the Future.get_result()s returned from the async method, which itself delegates to the entities' put code. Now, the docs for the NDB Future Class indicate that a result will be returned or else an exception will be raised. I've been told, however, that the result will be None if a particular put failed (I can't find any authoritative documentation to that effect, but if it's anything like db.get then that would make sense).
So all of this boils down to some questions I can't find the answers to:
Clearly, the return value is a list - is it a list with some elements possibly None? Or are exceptions used instead?
When there is an error, what should be re-put? Can all entities be re-put (idempotent), or only those whose return value are None (if that's even how errors are communicated)?
How common are errors (One answer: 1/3000)? Do they show up in logs (because I haven't seen any)? Is there a way to reliably simulate an error for testing?
Usage of the function in an open source library implies that the operation is idempotent, but that's about it. (Other usages don't even bother checking the return value or catching exceptions.)
Handling Datastore Errors makes no mention of anything but exceptions.
I agree with your reading of the code: put_multi() reacts to an error the same way put_async().get_result() does. If put() would raise an exception, put_multi() will also, and will be unhelpful about which of the multiple calls failed. I'm not aware of a circumstance where put_multi() would return None for some entries in the key list.
You can re-put entities that have been put, assuming no other user has updated those entities since the last put attempt. Entities that are created with system-generated IDs have their in-memory keys updated, so re-putting these would overwrite the existing entities and not create new ones. I wouldn't call it idempotent exactly because the retry would overwrite any updates to those entities made by other processes.
Of course, if you need more control over the result, you can perform this update in a transaction, though all entities would need to be in the same entity group for this to work with a primitive transaction. (Cross-group transactions support up to five distinct entity groups.) A failure during a transaction would ensure that none of the entities are created or updated if any of the attempts fail.
I don't know a general error rate for update failures. Such failures are most likely to include contention errors or "hot tablets" (too many updates to nearby records in too short a time, resulting in a timeout), which would depend on app behavior. All such errors are reported by the API as exceptions.
The easiest way to test error handling call paths would be to wrap the Model class and override the methods with a test mode behavior. You could get fancier and dig into the stub API used by testbed, which may have a way to hook into low-level calls and simulate errors. (I don't think this is a feature of testbed directly but you could use a similar technique.)

is it a good practice to store data in memory in a django application?

I am writing a reusable django application for returning json result for jquery ui autocomplete.
Currently i am storing the Class/function for getting the result in a dictionary with a unique key for each class/function.
When a request comes then I selects the corresponding class/function from the dict and returns the output.
My query is whether is the best practice to do the above or are there some other tricks to obtains the same result.
Sample GIST : https://gist.github.com/ajumell/5483685
You seem to be talking about a form of memoization.
This is OK, as long as you don't rely on that result being in the dictionary. This is because the memory will be local to each process, and you can't guarantee subsequent requests being handled by the same process. But if you have a fallback where you generate the result, this is a perfectly good optimization.
That's a very general question. It primary depends on the infrastructure of your code. The way your class and models are defined and the dynamics of the application.
Second, is important to have into account the resources of the server where your application is running. How much memory do you have available, and how much disk space so you can take into account what would be better for the application.
Last but not least, it's important to take into account how much operations does it need to put all these resources in memory. Memory is volatile, so if your application restarts you'll have to instantiate all the classes again and maybe this is to much work.
Resuming, as an optimization is very good choice to keep in memory objects that are queried often (that's what cache is all about) but you have to take into account all of the previous stuff.
Storing a series of functions in a dictionary and conditionally selecting one based on the request is a perfectly acceptable way to handle it.
If you would like a more specific answer it would be very helpful to post your actual code. And secondly, this might be better suited to codereview.stackexchange

Conflict resolution in ZODB

I do run parallel write requests on my ZODB. I do have multiple BTree instances inside my ZODB. Once the server accesses the same objects inside such a BTree, I get a ConflictError for the IOBucket class. For all my Django bases classes I do have _p_resolveconflict set up, but can't implement it for IOBucket 'cause its a C based class.
I did a deeper analysis, but still don't understand why it complains about the IOBucket class and what it writes into it. Additionally, what would be the right strategy to resolve it?
Thousand thanks for any help!
IOBucket is part of the persistence structure of a BTree; it exists to try and reduce conflict errors, and it does try and resolve conflicts where possible.
That said, conflicts are not always avoidable, and you should restart your transaction. In Zope, for example, the whole request is re-run up to 5 times if a ConflictError is raised. Conflicts are ZODB's way of handling the (hopefully rare) occasion where two different requests tried to change the exact same data structure.
Restarting your transaction means calling transaction.begin() and applying the same changes again. The .begin() will fetch any changes made by the other process and your commit will be based on the fresh data.

how to test in case of eventual consistency?

i have an app ( google app engine + high replication datastore ) which was not using eventual consistency ( high replication ) up till now and all my test worked perfectly.
now, for local testing in high replication, as soon as i moved to eventual consistency, they begin to fail. how do i prevent that ? or how do i test that part ?
i need it for x-entity transaction.
i am using something similar to https://developers.google.com/appengine/docs/python/tools/localunittesting#Writing_HRD_Datastore_Tests
edit:
I need to test the code correctly. The problem I have is with the testing part. How Anyone test eventual consistency ?
edit 1:
I have temporarily solved the problem with using probability=100% in above linked example. But Ideas are welcome.
Fix the failures.
Since you have no code and are very vague, it's hard to answer your question. But essentially either your app code or your tests are not taking into account the eventual consistency (ie, a query may not return with a value that was just updated in the database). When you turned on eventual consistency in the datastore, the query results you get will be different.
You either need to update your code to handle the eventual consistency situations with transactions, or update your tests to expect eventual consistency results.
edit
This question is still too general. It depends if you're doing, say functional or system testing. Are you looking for particular results? Or just an HTTP status=200?
In general, like all testing, you need to identify what constitutes success and what constitutes a failure case. In a given situation, is it acceptable for old data to appear? In that case, the test should succeed with either the old or new values.
I'd recommend starting out considering whether you want to run deterministic or non-deterministic tests. For deterministic tests, you'd essentially want to run through the same tests with probability=0 and probability=100, and ensure you get the correct values for both.
I haven't figured out how to write non-deterministic tests in a completely useful manner, other than as a stress test. You can verify that certain required values are met, and other eventually-consistent values fall within a valid range. This is a lot of work, because most likely you have a range of values that may depend on another range of values, and since your final output may consist of both, you'll have to validate that the combinations are correct - essentially you end up reproducing some of your application logic if you really want to verify everything is correct.
The situation you are facing is one of the drawbacks (or call it a feature) of High Replication Data Store. Usually these situations are tackled via transparent caching using memcache. If you had prior experience working with a db master/slave architecture, slave lags are tackled in a similar manner.

Categories