I'm trying to figure out when to use session.add and when to use session.add_all with SQLAlchemy.
Specifically, I don't understand the downsides of using add_all. It can do everything that add can do, so why not just always use it? There is no mention of this in the SQLalchemy documentation.
If you only have one new record to add, then use sqlalchemy.orm.session.Session.add() but if you have multiple records then use sqlalchemy.orm.session.Session.add_all(). There's not really a significant difference, except the API of the first method is for a single instance whereas the second is for multiple instances. Is that a big difference? No. It's just convenience.
I was wondering about the same and as mentioned by others, there is no real difference. However, I would like to add that using add in a loop, instead of using add_all allows you to be more fine grained regarding exception handling. Passing a list of mapped class instances to add_all will cause a rollback for all instances if, for example, one of these objects violates a constraint (e.g., unique). I prefer to decouple my data logic from my service logic and decide what to do with instances not stored in the service layer by returning them from my data layer. However, I think it depends on how you are handling exceptions.
Related
So let's say there's a model A which looks like this:
class A(model):
name = char(unique=True)
When a user tries to create a new A, a view will check whether the name is already taken. Like that:
name_taken = A.objects.get(name=name_passed_by_user)
if name_taken:
return "Name exists!"
# Creating A here
It used to work well, but as the system grew there started to appear concurrent attempts at creating A's with the same name. And sometimes multiple requests pass the "name exists check" in the same few milliseconds, resulting in integrity errors, since the name field has to be UNIQUE, and multiple requests to create a certain name pass the check.
The current solution is a lot of "try: except IntegrityError:" wraps around creation parts, despite the prior check. Is there a way to avoid that? Because there are a lot of models with UNIQUE constraints like that, thus a lot of ugly "try: except IntegrityError:" wraps. Is it possible to lock as to not prevent from SELECTing, but lock as to prevent from SELECTing FOR UPDATE? Or maybe there's a more proper solution? I'm certain it's a common problem with usernames and other fields/columns like them, and there must be a proper approach rather than exception catching.
The DB is Postgres10, ORM is SQLAlchemy of Python, but tweaks to db directly are applicable too.
The only thing you can do is to set the appropriate transaction isolation directly to postgres. Neither python nor the ORM can do anything about it. serialized level will most likely solve your problem. But it might slow down performance, so you should try repeatable read too.
If you are using Python, you should have heard of the “ask forgiveness, not permission” design principle.
To avoid the race condition you describe, simply try to add the new row to the table.
If you get a unique_violation (SQLSTATE 23505), rollback the transaction and return that the name exists.
in my app I have a mixin that defines 2 fields like start_date and end_date. I've added this mixin to all table declarations which require these fields.
I've also defined a function that returns filters (conditions) to test a timestamp (e.g. now) to be >= start_date and < end_date. Currently I'm manually adding these filters whenever I need to query a table with these fields.
However sometimes me or my colleagues forget to add the filters, and I wonder whether it is possible to automatically extend any query on such a table. Like e.g. an additional function in the mixin that is invoked by SQLalchemy whenever it "compiles" the statement. I'm using 'compile' only as an example here, actually I don't know when or how to best do that.
Any idea how to achieve this?
In case it works for SELECT, does it also work for INSERT and UPDATE?
thanks a lot for your help
Juergen
Take a look at this example. You can change the criteria expressed in the private method to refer to your start and end dates.
Note that this query will be less efficient because it overrides the get method to bypass the identity map.
I'm not sure what the enable_assertions false call does; I'd recommend understanding that before proceeding.
I tried extending Query but had a hard time. Eventually (and unfortunately) I moved back to my previous approach of little helper functions returning filters and applying them to queries.
I still wish I would find an approach that automatically adds certain filters if a table (Base) has certain columns.
Juergen
I have a model called Theme. It has a lot of columns, but I need to retrieve only the field called "name", so I did this:
Theme.objects.only("name")
But it doesn't work, it is still retrieving all the columns.
PD: I don't want to use values() because it returns only a python dictionary. I need to return a set of model instances, to access to its attributes and methods.
Using only or its counterpart defer does not prevent accessing the deferred attributes. It only delays retrieval of said attributes until they are accessed. So take the following:
for theme in Theme.objects.all():
print theme.name
print theme.other_attribute
This will execute a single query when the loop starts. Now consider the following:
for theme in Theme.objects.only('name'):
print theme.name
print theme.other_attribute
In this case, the other_attribute is not loaded in the initial query at the start of the loop. However, it is added to the model's list of deferred attributes. When you try to access it, another query is executed to retrieve the value of other_attribute. In the second case, a total of n+1 queries is executed for n Theme objects.
The only and defer methods should only ever be used in advanced use-cases, after the need for optimization arises, and after proper analysing of your code. Even then, there are often workarounds that work better than deferring fields. Please read the note at the bottom of the defer documentation.
If what you want is a single column, I think what you are looking for is .values() instead of .only.
surfing on the web, reading about django dev best practices points to use pickled model fields with extreme caution.
But in a real life example, where would you use a PickledObjectField, to solve what specific problems?
We have a system of social-networks "backends" which do some generic stuff like "post message", "get status", "get friends" etc. The link between each backend class and user is django model, which keeps user, backend name and credentials. Now imagine how many auth systems are there: oauth, plain passwords, facebook's obscure js stuff etc. This is where JSONField shines, we keep all backend-specif auth data in a dictionary on this model, which is stored in db as json, we can put anything into it no problem.
You would use it to store... almost-arbitrary Python objects. In general there's little reason to use it; JSON is safer and more portable.
You can definitely substitute a PickledObjectField with JSON and some extra logic to create an object out of the JSON. At the end of the day, your use case, when considering to use a PickledObjectField or JSON+logic, is serializing a Python object into your database. If you can trust the data in the Python object, and know that it will always be serialize-able, you can reasonably use the PickledObjectField. In my case (I don't use django's ORM, but this should still apply), I have a couple different object types that can go into my PickledObjectField, and their definitions are constantly mutating. Rather than constantly updating my JSON parsing logic to create an object out of JSON values, I simply use a PickledObjectField to just store the different objects, and then later retrieve them in perfectly usable form (calling their functions). Caveat: If you store an object via PickledObjectField, then you change the object definition, and then you retrieve the object, the old object may have trouble fitting into the new object's definition (depending on what you changed).
The problems to be solved are the efficiency and the convenience of defining and handling a complex object consisting of many parts.
You can turn each part type into a Model and connect them via ForeignKeys.
Or you can turn each part type into a class, dictionary, list, tuple, enum or whathaveyou to your liking and use PickledObjectField to store and retrieve the whole beast in one step.
That approach makes sense if you will never manipulate parts individually, only the complex object as a whole.
Real life example
In my application there are RQdef objects that represent essentially a type with a certain basic structure (if you are curious what they mean, look here).
RQdefs consist of several Aspects and some fixed attributes.
Aspects consist of one or more Facets and some fixed attributes.
Facets consist of two or more Levels and some fixed attributes.
Levels consist of a few fixed attributes.
Overall, a typical RQdef will have about 20-40 parts.
An RQdef is always completely constructed in a single step before it is stored in the database and it is henceforth never modified, only read (but read frequently).
PickledObjectField is more convenient and much more efficient for this purpose than would be a set of four models and 20-40 objects for each RQdef.
I'm using SQLAlchemy's declarative extension. I'd like all changes to tables logs, including changes in many-to-many relationships (mapping tables). Each table should have a separate "log" table with a similar schema, but additional columns specifying when the change was made, who made the change, etc.
My programming model would be something like this:
row.foo = 1
row.log_version(username, change_description, ...)
Ideally, the system wouldn't allow the transaction to commit without row.log_version being called.
Thoughts?
There are too many questions in one, so they that full answers to all them won't fit StackOverflow answer format. I'll try to describe hints in short, so ask separate question for them if it's not enough.
Assigning user and description to transaction
The most popular way to do so is assigning user (and other info) to some global object (threading.local() in threaded application). This is very bad way, that causes hard to discover bugs.
A better way is assigning user to the session. This is OK when session is created for each web request (in fact, it's the best design for application with authentication anyway), since there is the only user using this session. But passing description this way is not as good.
And my favorite solution is to extent Session.commit() method to accept optional user (and probably other info) parameter and assign it current transaction. This is the most flexible, and it suites well to pass description too. Note that info is bound to single transaction and is passed in obvious way when transaction is closed.
Discovering changes
There is a sqlalchemy.org.attributes.instance_state(obj) contains all information you need. The most useful for you is probably state.committed_state dictionary which contains original state for changed fields (including many-to-many relations!). There is also state.get_history() method (or sqlalchemy.org.attributes.get_history() function) returning a history object with has_changes() method and added and deleted properties for new and old value respectively. In later case use state.manager.keys() (or state.manager.attributes) to get a list of all fields.
Automatically storing changes
SQLAlchemy supports mapper extension that can provide hooks before and after update, insert and delete. You need to provide your own extension with all before hooks (you can't use after since the state of objects is changed on flush). For declarative extension it's easy to write a subclass of DeclarativeMeta that adds a mapper extension for all your models. Note that you have to flush changes twice if you use mapped objects for log, since a unit of work doesn't account objects created in hooks.
We have a pretty comprehensive "versioning" recipe at http://www.sqlalchemy.org/trac/wiki/UsageRecipes/LogVersions . It seems some other users have contributed some variants on it. The mechanics of "add a row when something changes at the ORM level" are all there.
Alternatively you can also intercept at the execution level using ConnectionProxy, search through the SQLA docs for how to use that.
edit: versioning is now an example included with SQLA: http://docs.sqlalchemy.org/en/rel_0_8/orm/examples.html#versioned-objects