My question comes from a situation where I want to emulate the ON UPDATE CASCADE in SQL (when I update an id and I have a Foreignkey, it is going to be automatically updated) in Django, but I realized that (apparently) doesn't exist a native way to do it.
I have visited this old question where he is trying to do the same.
My question is: Why Django doesn't have a native option? Is that a bad practice? If so, why? Does that bring problems to my software or structure?
Is there a simple way to do this?
Thanks in advance!
It is a bad practice to think about updating key fields. Note that this field is used for index generation and other computationally expensive operations. This is why the key fields must contain a unique and unrepeatable value that identifies the record. It does not necessarily have to have a direct meaning with the content of the register, it can be an autonumeric value. If you need to update it, you may need to put the value in another field of the table to be able to do it without affecting the relationships.
It is for this reason that you will not find in Django the operation you are looking for.
Related
I am currently working on a Django 2+ project involving a blockchain, and I want to make copies of some of my object's states into that blockchain.
Basically, I have a model (say "contract") that has a list of several "signature" objects.
I want to make a snapshot of that contract, with the signatures. What I am basically doing is taking the contract at some point in time (when it's created for example) and building a JSON from it.
My problem is: I want to update that snapshot anytime a signature is added/updated/deleted, and each time the contract is modified.
The intuitive solution would be to override each "delete", "create", "update" of each of the models involved in that snapshot, and pray that all of them the are implemented right, and that I didn't forget any. But I think that this is not scalable at all, hard to debug and ton maintain.
I have thought of a solution that might be more centralized: using a periodical job to get the last update date of my object, compare it to the date of my snapshot, and update the snapshot if necessary.
However with that solution, I can identify changes when objects are modified or created, but not when they are deleted.
So, this is my big question mark: how with django can you identify deletions in relationships, without any prior context, just by looking at the current database's state ? Is there a django module to record deleted objects ? What are your thoughts on my issue ?
All right?
I think that, as I understand your problem, you are in need of a module like Django Signals, which listens for changes in the database and, when identified (and if all the desired conditions are met), executes certain commands in your application ( even be able to execute in the database).
This is the most recent documentation:
https://docs.djangoproject.com/en/3.1/topics/signals/
I know there are similar questions (like this, this, this and this) but I have specific requirements and looking for a less-expensive way to do the following (on Django 1.10.2):
Looking to not have sequential/guessable integer ids in the URLs and ideally meet the following requirements:
Avoid UUIDs since that makes the URL really long.
Avoid a custom primary key. It doesn’t seem to work well if the models have ManyToManyFields. Got affected by at least three bugs while trying that (#25012, #24030 and #22997), including messing up the migrations and having to delete the entire db and recreating the migrations (well, lots of good learning too)
Avoid checking for collisions if possible (hence avoid a db lookup for every insert)
Don’t just want to look up by the slug since it’s less performant than just looking up an integer id.
Don’t care too much about encrypting the id - just don’t want it to
be a visibly sequential integer.
Note: The app would likely have 5 million records or so in the long term.
After researching a lot of options on SO, blogs etc., I ended up doing the following:
Encoding the id to base32 only for the URLs and decoding it back in urls.py (using an edited version of Django’s util functions to encode to base 36 since I needed uppercase letters instead of lowercase).
Not storing the encoded id anywhere. Just encoding and decoding everytime on the fly.
Keeping the default id intact and using it as primary key.
(good hints, posts and especially this comment helped a lot)
What this solution helps achieve:
Absolutely no edits to models or post_save signals.
No collision checks needed. Avoiding one extra request to the db.
Lookup still happens on the default id which is fast. Also, no double save()requests on the model for every insert.
Short and sweet encoded ID (the number of characters go up as the number of records increase but still not very long)
What it doesn’t help achieve/any drawbacks:
Encryption - the ID is encoded but not encrypted, so the user may
still be able to figure out the pattern to get to the id (but I dont
care about it much, as mentioned above).
A tiny overhead of encoding and decoding on each URL construction/request but perhaps that’s better than collision checks and/or multiple save() calls on the model object for insertions.
For reference, looks like there are multiple ways to generate random IDs that I discovered along the way (like Django’s get_random_string, Python’s random, Django’s UUIDField etc.) and many ways to encode the current ID (base 36, base 62, XORing, and what not).
The encoded ID can also be stored as another (indexed) field and looked up every time (like here) but depends on the performance parameters of the web app (since looking up a varchar id is less performant that looking up an integer id). This identifier field can either be saved from a overwritten model’s save() function, or by using a post_save() signal (see here) (while both approaches will need the save() function to be called twice for every insert).
All ears to optimizations to the above approach. I love SO and the community. Everytime there’s so much to learn here.
Update: After more than a year of this post, I found this great library called hashids which does pretty much the same thing quite well! Its available in many languages including Python.
I have an SQLAlchemy model, say Entity and this has a column, is_published. If I query this, say by id, I only want to return this if is_published is set to True. I think I can achieve using a filter. But in case there is a relationship and I am able to and require to access it like another_model_obj.entity and I only want this to give the corresponding entity object if the is_published for that instance is set to True. How should I do this? One solution would be wrap this around using an if block each time I use this. But I use this too many times and if I use it again, I will have to remember this detail. Is there any way I can automate this in SQLAlchemy, or is there any other better solution to this problem as a whole? Thank you
It reads as if you would need a join:
session().query(AnotherModel).join(Entity).filter(Entity.is_published)
This question was asked many times here and still doesn't have a good answer. Here are possible solutions suggested by the author of SQLAlchemy. A more elaborate query class to exclude unpublished objects is provided in iktomi library. It works with SQLAlchemy 0.8.* branch only for now but should be ported to 0.9.* soon. See test cases for limitations (tests that fail are marked with #unittest.skip()) and usage examples.
I would like to have a map data type for one of my entity types in my python google app engine application. I think what I need is essentially the python dict datatype where I can create a list of key-value mappings. I don't see any obvious way to do this with the provided datatypes in app engine.
The reason I'd like to do this is that I have a User entity and I'd like to track within that user a mapping of lessonIds to values that represent that user's status with a particular lesson id. I'd like to do this without creating a whole new entity that might be titled UserLessonStatus and have it reference the User and have to be queried, since I often want to iterate through all the lesson statuses. Maybe it is better done this way, in which case, I'd appreciate opinions that this is how it's best done. Otherwise if someone knows a good way to create a mapping within my User entity itself, that'd be great.
One solution I considered is using two ListProperties in conjunction, i.e. when adding an object append the key and value to each list; when locating, find the index of the string in one list and reference using that index in the other; when removing, find the index in one, use it to remove from each, and so forth.
You're probably better off using another kind, as you suggest. If you do want to store it all in the one entity, though, you have several options - parallel lists, as you suggest, are one option. You could also simply pickle a Python dictionary, assuming you don't want to query on it.
You may want to check out the ndb project, which supports nested entities, which would also be a viable solution.
Short story
I have a technical problem with a third-party library at my hands that I seem to be unable to easily solve in a way other than creating a surrogate key (despite the fact that I'll never need it). I've read a number of articles on the Net discouraging the use of surrogate keys, and I'm a bit at a loss if it is okay to do what I intend to do.
Long story
I need to specify a primary key, because I use SQLAlchemy ORM (which requires one), and I cannot just set it in __mapper_args__, since the class is being built with classobj, and I have yet to find a way to reference the field of a not-yet-existing class in the appropriate PK definition argument. Another problem is that the natural equivalent of the PK is a composite key that is too long for the version of MySQL I use (and it's generally a bad idea to use such long primary keys anyway).
I always make surrogate keys when using ORMs (or rather, I let the ORMs make them for me). They solve a number of problems, and don't introduce any (major) problems.
So, you've done your job by acknowledging that there are "papers on the net" with valid reasons to avoid surrogate keys, and that there's probably a better way to do it.
Now, write "# TODO: find a way to avoid surrogate keys" somewhere in your source code and go get some work done.
"Using a surrogate key allows duplicates to be created when using a natural key would have prevented such problems" Exactly, so you should have both keys, not just a surrogate. The error you seem to be making is not that you are using a surrogate, it's that you are assuming the table only needs one key. Make sure you create all the keys you need to ensure the integrity of your data.
Having said that, in this case it seems like a deficiency of the ORM software (apparently not being able to use a composite key) is the real cause of your problems. It's unfortunate that a software limitation like that should force you to create keys you don't otherwise need. Maybe you could consider using different software.
I use surrogate keys in a db that I use reflection on with sqlalchemy. The pro is that you can more easily manage the foreign keys / relationships that exists in your tables / models. Also, the rdbms is managing the data more efficiently. The con is the data inconsistency: duplicates. To avoid this - always use the unique constraint on your natural key.
Now, I understand from your long story that you can't enforce this uniqueness because of your mysql limitations. For long composite keys mysql causes problems. I suggest you move to postgresql.