Mongoengine change document structure - python

I'm trying for the first time to use mongo, and I choose mongoengine.
After defining the Document structure if I try to change it (adding a field, removing a field, renaming ecc..) the reading operations still works, but any other operation on previously stored document fail since they're note compliant anymore with the document structure.
Is there any way to manage this situation? should I only user Dynamic documents with Dictionaries instead of EmbeddedDocuments?

Using DynamicDocument or setting meta = {'strict': False} on your Document may help in some cases but the only proper solution to this is running a migration script.
I'd recommend doing this using pymongo but you could also do that from the mongo shell. Every time your model change in a way that is not compatible, you should run a migration on the existing data so that it fits the new model. Otherwise mongoengine will complain at some point (mongoengine contributor here)

Related

How to initialize django objects automatically for the first time?

This is my first django application and I looked all over the place to find an answer, to no avail.
I created my models and I know need to to initialize the values to one of the classes. I could do it using the admin page, one by one, but I want anyone using my application to be able to just load the application for the first time to have all the correct objects (and associated records in the database) to be created automatically.
Please help
If you want to populate database check the wiki for initial data. You can use JSON, XML or YAML (with PyYAML installed). I think you are looking for this as your question is not that clear.

Is it possible to manually set a django foreignkey ID to an item that doesn't exist yet?

I'm designing a bulk import tool from an old system into a new Django based system.
I'd like to retain all of the current IDs of objects (they are just 5 digit strings), now due to the design in the current system there are lots of references between these objects.
To import I can see 2 possible techniques - import a known object, and carefully recurse through these relationships making sure to import in the right way and only set relationships as soon as I know they exist
... or start at item 00001 set the foreignkeys to values I know exist and just grab everything in order, knowing that once we get to item 99999 all the relationships will exist.
So is there a way to set a foreignkey to the ID of an item that doesn't exist, but will, even for imports only?
To add further complexity, not all of these relationships are straightforward foreignkeys, some are ManyToMany relationships as well.
To be able to handle any database that Django supports and not have to deal with peculiarities of the backend, I'd export the old database in the format that Django loaddata can read, and then give this exported file to loaddata. This command has no issue importing the type of structure you are talking about.
Creating the file that loaddata will read could be done by writing your own converter that reads the old database and dumps an appropriate file. However, a way which might be easier would be to create a throwaway Django project with models that have the same structure as the tables in the old database, point the Django project to the old database, and use dumpdata to create the file. If table details between the old database and the new database have changed, you'd still have to modify the file but at least some of the conversion work would have already been done.
A more direct way would be to bypass Django completely to do the import in SQL but turn off foreign key constraints for the time of the import. For MySQL this would be done by setting foreign_key_checks to 0 for the time of the import, and then back to 1 when done. For SQLite this would be done by using PRAGMA foreign_keys = OFF; and then ON when done.
Postgresql does not allow just turning off these constraints but Django creates foreign key constraints as DEFERRABLE INITIALLY DEFERRED, which means that the constraint is not checked until the end of a transaction. So initiating a transaction, importing and then committing should work. If something prevents this, then you have to drop the constraint before importing and add it back afterwards.
Sounds like you need a database migration tool like South, the standard for Django. Worth noting that Django 1.7 Beta 1 was released recently and it provides in-built migration.

django-mongodb-engine Where is GridFSField Model Field

I'm using django-mongodb-engine for a project I'm working on and the documentation advises users to use the GridFS storage system for storing blobs as opposed to using the filesystem method. Obviously, this is one reason why we choose to use mongodb in the first place. One issue though, the documentation is sparse to say the least. In the docs they mention to use the GridFSField as your blob model field. One problem... where is the GridFSField?
class Better(models.Model):
blob = GridFSField()
iPython/Django shell:
from django_mongodb_engine.storage import GridFSStorage
#... define the class/exec
/usr/local/lib/python2.7/dist-packages/django/core/management/commands/shell.pyc in Better()
1 class Better(models.Model):
----> 2 blob = GridFSField()
3
NameError: name 'GridFSField' is not defined
Um, okay Django! Where is it defined then?!
This is not really a specific answer to your question as that answer is going to most likely be about your setup configuration. But as you seem to be going through documentation examples and therefore evaluating I thought it would be worthwhile to provide some points on using GridFS.
The intention of GridFS is not to be a way of storing "blobs" or replacing usage of the "filesystem method" as you (or the ODM documentation) are stating. Yes it can be used that way, but the sole reason for it's existence is to overcome the 16MB limitation MongoDB has on BSON document storage.
There is a common misconception that GridFS is a "feature" of MongoDB, yet it is actually a specification implemented on the driver side, for dealing with chunking large document content. There is no magic that occurs on the server side at all, as far as internal operations to MongoDB are concerned, this is just another BSON document with fields and data.
What the driver implementation is doing is breaking the content up into smaller chunks and distributing the content over several documents in a collection. Likewise when reading the content, there are methods provided to follow and fetch the various documents that make up the total content. In a nutshell, reading and writing using the GridFS methods results in multiple calls over the wire to MongoDB.
With that in mind, if your content is actually going to always be a size under 16MB then you are probably better off just using your encoded binary data within a single document, as updates will be atomic and the result will be faster reads from a single read operation per document.
So if you must have documents over 16MB in size, then use GridFS. If not just encode the content into a normal document field as that is all GridFS is doing anyway.
For more information, please read the FAQ:
http://docs.mongodb.org/manual/faq/developers/#when-should-i-use-gridfs
You can find GridFSField under django_mongodb_engine.fields
i.e.
from django_mongodb_engine.fields import GridFSField
from django.db import models
class Image(models.Model):
blob = GridFSField()

Getting text from db as django dictionary

I recently joined a company which is using django to build their product. I'm currently responsible for one of the apps, which was already developed a little bit before I was here.
One of the entities in the app has a json dictionary attribute, which has been kept in the db as a text field. Also, this attribute is marked in the model as a text field. So, as you can imagine it's not being handled correctly.
I wanted to change this and set it as a json field using https://github.com/bradjasper/django-jsonfield , which works really well.
However, I've run into a peculiar problem. Previous data stored in the db was not correctly handled and since it was unicode data, the text field in the db looks like:
{u'key': u'value'}
Now when the entity manager tries to load those values using the json field, it of course breaks since it's no longer a valid json string.
I've done some research on how to overcome this, but haven't found nothing.
My question:
Do you have any suggestion on how to overcome this? It can be any type of solution.
Something that I can run over night altering that field to transform it to a valid json string.
Some changes to the json-field code, which enables it to correctly handle these values.
Additional info
We use postgres with psycopg2 as django's db backend.
Thank you very much.
You're probably just going to need to iterate over the whole table, load the field, convert it into a real Python dict, and dump it back out with json.dumps. ast.literal_eval is a good choice for the conversion stage because it works like the built-in eval but is more restricted, so less risky to your system.
for obj in MyModel.objects.all():
value = ast.literal_eval(obj.dict_value)
obj.dict_value = json.dumps(value)
value.save()

Does django with mongodb make migrations a thing of the past?

Since mongo doesn't have a schema, does that mean that we won't have to do migrations when we change the models?
What does the migration process look like with a non-relational db?
I think this is a really good question, but the answers are going to be a little scattered based on the libs you're using and your expectations for a "migration".
Let's take a look at some common migration actions:
Add a field: Mongo makes this very easy. Just add a field and you're done.
Delete a field: In theory, you're not actually tied to your schema, so "deletion" here is relative. If you remove the "property" and no longer load the field, then it doesn't really matter if that field is in the data. So if you don't care about "cleaning up" the database, then removing a field doesn't affect the database. If you do care about cleaning the DB, you'll basically need to run a giant for loop against the DB.
Modify a field name: This is also a difficult problem. When you rename a field "where" are you renaming it? If you want the DB to reflect the new field name, then you basically have to execute a giant for loop on the DB. TO be safe you probably have to "add" data, then push code, then "unset" the old field.
Some Wrinkles
However, the concept of a field name in tandem with an ActiveRecord object is just a little skewed. An ActiveRecord object is effectively providing mappings of object properties to actual database fields.
In a typical RDBMS the "size" of a field name is not really relevant. However, in Mongo, the field name actually occupies data space and this makes a big difference in terms of performance.
Now, if you're using some form of "data object" like ActiveRecord, why would you attempt to store full field names in the data? The DB should probably be storing all fields in alphabetical order with a map on the Object side. So a Document could have 8 fields/properties and the DB names would be "a", "b"..."j", but the Object names would be readable stuff like "Name", "Price", "Quantity".
The reason I bring this up is that it adds yet another wrinkle to Modify a field name. If you're implementing a mapping then modifying a field name doesn't really cause a migration at all.
Some more Wrinkles
If you do want to implement a migration on a deletion, then you'll have to do so after a deploy. You'll also have to recognize that you won't save any current disk space when you do so.
Mongo pre-allocates space and it doesn't really "give it back" unless you do a DB repair. So if you delete a bunch of fields on documents, those documents still occupy the same space on disk. If the documents are later moved, then you may reclaim space, however documents only move when they grow.
If you remove a large field from lots of documents you'll want to do a repair or a check out the new in-place compact command.
There is no silver bullet. Adding or removing fields is easier with non-relational db (just don't use unneeded fields or use new fields), renaming a field is easier with traditional db (you'll usually have to change a lot of data in case of field rename in schemaless db), data migration is on par - depending on task.
What does the migration process look like with a non-relational db?
Depends on if you need to update all the existing data or not.
In many cases, you may not need to touch the old data, such as when adding a new optional field. If that field also has a default value, you may also not need to update the old documents, if your application can handle a missing field correctly. However, if you want to build an index on the new field to be able to search/filter/sort, you need to add the default value back into the old documents.
Something like field renaming (trivial in a relational db, because you only need to update the catalog and not touch any data) is a major undertaking in MongoDB (you need to rewrite all documents).
If you need to update the existing data, you usually have to write a migration function that iterates over all the documents and updates them one by one (although this process can be shared and run in parallel). For large data sets, this can take a lot of time (and space), and you may miss transactions (if you end up with a crashed migration that went half-way through).

Categories