I do run parallel write requests on my ZODB. I do have multiple BTree instances inside my ZODB. Once the server accesses the same objects inside such a BTree, I get a ConflictError for the IOBucket class. For all my Django bases classes I do have _p_resolveconflict set up, but can't implement it for IOBucket 'cause its a C based class.
I did a deeper analysis, but still don't understand why it complains about the IOBucket class and what it writes into it. Additionally, what would be the right strategy to resolve it?
Thousand thanks for any help!
IOBucket is part of the persistence structure of a BTree; it exists to try and reduce conflict errors, and it does try and resolve conflicts where possible.
That said, conflicts are not always avoidable, and you should restart your transaction. In Zope, for example, the whole request is re-run up to 5 times if a ConflictError is raised. Conflicts are ZODB's way of handling the (hopefully rare) occasion where two different requests tried to change the exact same data structure.
Restarting your transaction means calling transaction.begin() and applying the same changes again. The .begin() will fetch any changes made by the other process and your commit will be based on the fresh data.
Related
I'm new to Python and have been tasked with optimizing some code and am trying to understand why my change has slowed things down. The code I'm working with is in a backend flask app.
The changes I made involved removing the use of temporary object that was being used to store data before copying all fields to a MongoEngine document object. All fields would get assigned to this temporary object, and then there was a conversion function that cast all fields to their proper data types for storage. Instead of using this temporary object, I just instantiated the MongoEngine document and replaced all lines that were assigning to the temporary object to instead assign to the document. I didn't add any lines, just replaced existing ones.
When I checked the changes using the Werkzeug Application Profiler for flask. It's showing 336,897 calls to __setattr__() before the changes and 502,953 calls after the changes.
I'm just wondering if there's any explanation for this other than me inadvertently increasing the calls somehow (I don't think this is the case because I've reviewed the changes in using git diff a few times and I didn't notice anything).
I appreciate any help I can get. Sorry for not providing any code examples (don't want to expose the companies code). However, if needed I can try my best to write some example code to show what I did.
Before:
__setattr()__ calls before changes
After:
__setattr()__ calls after changes
I'm trying to figure out when to use session.add and when to use session.add_all with SQLAlchemy.
Specifically, I don't understand the downsides of using add_all. It can do everything that add can do, so why not just always use it? There is no mention of this in the SQLalchemy documentation.
If you only have one new record to add, then use sqlalchemy.orm.session.Session.add() but if you have multiple records then use sqlalchemy.orm.session.Session.add_all(). There's not really a significant difference, except the API of the first method is for a single instance whereas the second is for multiple instances. Is that a big difference? No. It's just convenience.
I was wondering about the same and as mentioned by others, there is no real difference. However, I would like to add that using add in a loop, instead of using add_all allows you to be more fine grained regarding exception handling. Passing a list of mapped class instances to add_all will cause a rollback for all instances if, for example, one of these objects violates a constraint (e.g., unique). I prefer to decouple my data logic from my service logic and decide what to do with instances not stored in the service layer by returning them from my data layer. However, I think it depends on how you are handling exceptions.
I am currently working on a Django 2+ project involving a blockchain, and I want to make copies of some of my object's states into that blockchain.
Basically, I have a model (say "contract") that has a list of several "signature" objects.
I want to make a snapshot of that contract, with the signatures. What I am basically doing is taking the contract at some point in time (when it's created for example) and building a JSON from it.
My problem is: I want to update that snapshot anytime a signature is added/updated/deleted, and each time the contract is modified.
The intuitive solution would be to override each "delete", "create", "update" of each of the models involved in that snapshot, and pray that all of them the are implemented right, and that I didn't forget any. But I think that this is not scalable at all, hard to debug and ton maintain.
I have thought of a solution that might be more centralized: using a periodical job to get the last update date of my object, compare it to the date of my snapshot, and update the snapshot if necessary.
However with that solution, I can identify changes when objects are modified or created, but not when they are deleted.
So, this is my big question mark: how with django can you identify deletions in relationships, without any prior context, just by looking at the current database's state ? Is there a django module to record deleted objects ? What are your thoughts on my issue ?
All right?
I think that, as I understand your problem, you are in need of a module like Django Signals, which listens for changes in the database and, when identified (and if all the desired conditions are met), executes certain commands in your application ( even be able to execute in the database).
This is the most recent documentation:
https://docs.djangoproject.com/en/3.1/topics/signals/
I'm trying to achieve strong consistency with ndb using python.
And looks like I'm missing something as my reads behave like they're not strongly consistent.
The query is:
links = Link.query(ancestor=lead_key).filter(Link.last_status ==
None).fetch(keys_only=True)
if links:
do_action()
The key structure is:
Lead root (generic key) -> Lead -> Website (one per lead) -> Link
I have many tasks that are executed concurrently using TaskQueue and this query is performed at the end of every task. Sometimes I'm getting "too much contention" exception when updating the last_status field but I deal with it using retries. Can it break strong consistency?
The expected behavior is having do_action() called when there are no links left with last_status equal to None. The actual behavior is inconsistent: sometimes do_action() is called twice and sometimes not called at all.
Using an ancestor key to get strong consistency has a limitation: you're limited to one update per second per entity group. One way to work around this is to shard the entity groups. Sharding Counters describes the technique. It's an old article, but as far as I know, the advise is still sound.
Adding to Dave's answer which is the 1st thing to check.
One thing which isn't well documented and can be a bit surprising is that the contention can be caused by read operations as well, not only by the write ones.
Whenever a transaction starts the entity groups being accessed (by read or write ops, doesn't matter) are marked as such. The too much contention error indicates that too many parallel transactions simultaneously try to access the same entity group. It can happen even if none of the transactions actually attempts to write!
Note: this contention is NOT emulated by the development server, it can only be seen when deployed on GAE, with the real datastore!
What can add to the confusion is the automatic re-tries of the transactions, which can happen after both actual write conflicts or just plain access contention. These retries may appear to the end-user as suspicious repeated execution of some code paths - which I suspect could explain your reports of do_action() being called twice.
Usually when you run into such problems you have to re-visit your data structures and/or the way you're accessing them (your transactions). In addition to solutions maintaining the strong consistency (which can be quite expensive) you may want to re-check if consistency is actually a must. In some cases it's added as a blanket requirement just because appears to simplify things. From my experience it doesn't :)
There is nothing in your sample that ensures that your code is only called once.
For the moment, I am going to assume that your "do_action" function does something to the Link entities, specifically that it sets the "last_status" property.
If you do not perform the query and the write to the Link Entity inside a transaction, then it is possible for two different requests (task queue tasks) to get results back from the query, then both write their new value to the Link entity (with the last write overwriting the previous value).
Remember that even if you do use a transaction, you don't know until the transaction is successfully completed that nobody else tried to perform a write. This is important if you are trying to do something external to datastore (for example, making a http request to an external system), as you may see http requests from transactions that would eventually fail with a concurrent modification exception.
I'm going to have two independent programs (using SqlAlchemy / ORM / Declarative)
that will inevitably try to access the same database-file/table(SQLite) at the same time.
They could both want to read or write to that table.
Will there be a conflict when this happens?
If the answer is yes, how could this be handled?
Sqlite is resistant to any issues as you describe. http://www.sqlite.org/howtocorrupt.html gives you details on what could cause problems, and they're generally isolated from anything the code might accidentally do.
If you're concerned due to the nature of your application data access, use BEGIN TRANSACTION and COMMIT/ROLLBACK as appropriate. If your transactions are single query access (that is, you're not reading a value in one query and then changing it in another relative to what you already read), this should not be necessary.