Assume I have a model definition like this:
class Image(db.Model):
id = db.StringProperty()
url = db.URLProperty()
Now I want to add some fields to this model to make it look like this:
class Image(db.Model):
id = db.StringProperty()
url = db.URLProperty()
width = db.IntegerProperty()
height = db.IntegerProperty()
So, this new model will be applied properly to newly added Image entities. But I also want to update already existing entities so that they contained these two new fields and fill them with values. Will an already existing entity get these two fields automatically so when I refer to them, it will give me empty fields or will it cause an error? I suppose I will have to create a helper function that will go through all existing entities and set new fields values, right? So, what should I keep in mind and how to better do this model update? I think it will happen sometimes as the application emerges, so I think it would be useful to have some straightforward flow to do that.
This exact scenario is covered in the GAE docs (articles section):
Updating your model's schema.
Basically just change the model definition as you've done, then perform some operation to supply default values for all your extant entities. There are several ways to do the second part - the article describes one.
No already exisiting entity won't get these two fields automatically or it won't assume it to None. It will cause an error when those fields are accessed in existing objects. Only solution avaliable now is to use remote_apy and write your own script to update the existing records. It won't be big deal, write a script to get all the records in the datastore and to set some default values for the new attributes..
Setting_Up_remote_api
Update_schema
Related
I've been trying to build a Tutorial system that we usually see on websites. Like the ones we click next -> next -> previous etc to read.
All Posts are stored in a table(model) called Post. Basically like a pool of post objects.
Post.objects.all() will return all the posts.
Now there's another Table(model)
called Tutorial That will store the following,
class Tutorial(models.Model):
user = models.ForeignKey(User, on_delete=models.CASCADE)
tutorial_heading = models.CharField(max_length=100)
tutorial_summary = models.CharField(max_length=300)
series = models.CharField(max_length=40) # <---- Here [10,11,12]
...
Here entries in this series field are post_ids stored as a string representation of a list.
example: series will have [10,11,12] where 10, 11 and 12 are post_id that correspond to their respective entries in the Post table.
So my table entry for Tutorial model looks like this.
id heading summary series
"5" "Series 3 Tutorial" "lorem on ullt consequat." "[12, 13, 14]"
So I just read the series field and get all the Posts with the ids in this list then display them using pagination in Django.
Now, I've read from several stackoverflow posts that having multiple entries in a single field is a bad idea. And having this relationship to span over multiple tables as a mapping is a better option.
What I want to have is the ability to insert new posts into this series anywhere I want. Maybe in the front or middle. This can be easily accomplished by treating this series as a list and inserting as I please. Altering "[14,12,13]" will reorder the posts that are being displayed.
My question is, Is this way of storing multiple values in field for my usecase is okay. Or will it take a performance hit Or generally a bad idea. If no then is there a way where I can preserve or alter order by spanning the relationship by using another table or there is an entirely better way to accomplish this in Django or MYSQL.
Here entries in this series field are post_ids stored as a string representation of a list.
(...)
So I just read the series field and get all the Posts with the ids in this list then display them using pagination in Django.
DON'T DO THIS !!!
You are working with a relational database. There is one proper way to model relationships between entities in a relational database, which is to use foreign keys. In your case, depending on whether a post can belong only to a single tutorial ("one to many" relationship) or to many tutorials at the same time ("many to many" relationship, you'll want either to had to post a foreign key on tutorial, or to use an intermediate "post_tutorials" table with foreign keys on both post and tutorials.
Your solution doesn't allow the database to do it's job properly. It cannot enforce integrity constraints (what if you delete a post that's referenced by a tutorial ?), it cannot optimize read access (with proper schema the database can retrieve a tutorial and all it's posts in a single query) , it cannot follow reverse relationships (given a post, access the tutorial(s) it belongs to) etc. And it requires an external program (python code) to interact with your data, while with proper modeling you just need standard SQL.
Finally - but this is django-specific - using proper schema works better with the admin features, and with django rest framework if you intend to build a rest API.
wrt/ the ordering problem, it's a long known (and solved) issue, you just need to add an "order" field (small int should be enough). There are a couple 3rd part django apps that add support for this to both your models and the admin so it's almost plug and play.
IOW, there are absolutely no good reason to denormalize your schema this way and only good reasons to use proper relational modeling. FWIW I once had to work on a project based on some obscure (and hopefully long dead) PHP cms that had the brillant idea to use your "serialized lists" anti-pattern, and I can tell you it was both a disaster wrt/ performances and a complete nightmare to maintain. So do yourself and the world a favour: don't try to be creative, follow well-known and established best practices instead, and your life will be much happier. My 2 cents...
I can think of two approaches:
Approach One: Linked List
One way is using linked list like this:
class Tutorial(models.Model):
...
previous = models.OneToOneField('self', null=True, blank=True, related_name="next")
In this approach, you can access the previous Post of the series like this:
for tutorial in Tutorial.objects.filter(previous__isnull=True):
print(tutorial)
while(tutorial.next_post):
print(tutorial.next)
tutorial = tutorial.next
This is kind of complicated approach, for example whenever you want to add a new tutorial in middle of a linked-list, you need to change in two places. Like:
post = Tutorial.object.first()
next_post = post.next
new = Tutorial.objects.create(...)
post.next=new
post.save()
new.next = next_post
new.save()
But there is a huge benefit in this approach, you don't have to create a new table for creating series. Also, there is possibility that the order in tutorials will not be modified frequently, which means you don't need to take too much hassle.
Approach Two: Create a new Model
You can simply create a new model and FK to Tutorial, like this:
class Series(models.Model):
name = models.CharField(max_length=255)
class Tutorial(models.Model):
..
series = models.ForeignKey(Series, null=True, blank=True, related_name='tutorials')
order = models.IntegerField(default=0)
class Meta:
unique_together=('series', 'order') # it will make sure that duplicate order for same series does not happen
Then you can access tutorials in series by:
series = Series.object.first()
series.tutorials.all().order_by('tutorials__order')
Advantage of this approach is its much more flexible to access Tutorials through series, but there will be an extra table created for this, and one extra field as well to maintain order.
I want to have several "bundles" (Mjbundle), which essentially are bundles of questions (Mjquestion). The Mjquestion has an integer "index" property which needs to be unique, but it should only be unique within the bundle containing it. I'm not sure how to model something like this properly, I try to do it using a structured (repeating) property below, but there is yet nothing actually constraining the uniqueness of the Mjquestion indexes. What is a better/normal/correct way of doing this?
class Mjquestion(ndb.Model):
"""This is a Mjquestion."""
index = ndb.IntegerProperty(indexed=True, required=True)
genre1 = ndb.IntegerProperty(indexed=False, required=True, choices=[1,2,3,4,5,6,7])
genre2 = ndb.IntegerProperty(indexed=False, required=True, choices=[1,2,3])
#(will add a bunch of more data properties later)
class Mjbundle(ndb.Model):
"""This is a Mjbundle."""
mjquestions = ndb.StructuredProperty(Mjquestion, repeated=True)
time = ndb.DateTimeProperty(auto_now_add=True)
(With the above model and having fetched a certain Mjbundle entity, I am not sure how to quickly fetch a Mjquestion from mjquestions based on the index. The explanation on filtering on structured properties looks like it works on the Mjbundle type level, whereas I already have a Mjbundle entity and was not sure how to quickly query only on the questions contained by that entity, without looping through them all "manually" in code.)
So I'm open to any suggestion on how to do this better.
I read this informational answer: https://stackoverflow.com/a/3855751/129202 It gives some thoughts about scalability and on a related note I will be expecting just a couple of bundles but each bundle will have questions in the thousands.
Maybe I should not use the mjquestions property of Mjbundle at all, but rather focus on parenting: each Mjquestion created should have a certain Mjbundle entity as parent. And then "manually" enforce uniqueness at "insert time" by doing an ancestor query.
When you use a StructuredProperty, all of the entities that type are stored as part of the containing entity - so when you fetch your bundle, you have already fetched all of the questions. If you stick with this way of storing things, iterating to check in code is the solution.
I'm using Flask-Admin and I want to be able to update many fields at once from the list view. It seemed like what I'm looking for is a custom action.
I was able to make it work, but I suspect not in the best way. I'm wondering if it could be done more "Flask"-ily.
What I do now, for example if I was updating all rows in table cars to have tires = 4:
A custom action in the CarView class collects the ids of the rows to be modified, a callback url from request.referrer, and the tablename cars, and returns render_template(mass_update_info.html) with these as parameters.
mass_update_info.html is an HTML form where the user specifies 1) the field they would like to change and 2) the value to change it to. On submit, the form makes a POST to a a certain view (do_mass_update) with this data (everything else is passed as hidden fields in this form).
do_mass_update uses the data sent to it to construct a SQL query string -- in its entirety, "UPDATE {} SET {}='{}' WHERE id IN ({})".format(table, column, value, ids) -- which is run via db.engine.execute().
The user is redirected to the callback url.
It bothers me that I don't seem to be using any of SQLAlchemy, but (from a newbie's perspective) it all seems to be based on the model objects e.g. User.query(...), while I only have access to the model/table name as a string. Can I get some kind of identifier from the model, pass that through, and do a lookup to retrieve the on the other side?
Is it possible in any way to query entities using one of their parent's property in GAE, like this (which doesn't work)?
class Car(db.Model):
title = db.StringProperty()
type = db.StringProperty()
class Part(db.Model):
title = db.StringProperty()
car = Car()
car.title = 'BMW X5'
car.type = 'SUV'
car.put()
part = Part(parent = car)
part.title = 'Left door'
part.put()
parts = Part.all()
parts.filter('parent.type ==', 'SUV') # this in particular
I've read about ReferenceProperty, and Indexes but I'm not sure what I need.
GAE lets me set a parent to the Part entity, but do I need an actually (kind of duplicate):
parent = db.ReferenceProperty(Car, required=True)
That would feel like duplicating what the system does already since it has a parent. Or is there an other way?
It's not an answer to your question as such, but NDB offers structured properties.
https://developers.google.com/appengine/docs/python/ndb/properties#structured
You can structure a model's properties. For example, you can define a model class Contact containing a list of addresses, each with internal structure.
Although the structured properties instances are defined using the same syntax as for model classes, they are not full-fledged entities. They don't have their own keys in the Datastore. They cannot be retrieved independently of the entity to which they belong. An application can, however, query for the values of their individual fields.
So here car would contain parts as a structured property. If this is viable in your use case depends on how you structure your data. If you want to know what parts make up a specific car, that seems viable. If you want to filer global parts regardless of what car they belong to, then you can still do that but you'll have to make the "parts" inside each car also refer to a different model. If you see what I mean (I'm not sure I do), as each car contains it's own parts.
Adding the parent as an explicit property isn't going to help.
You can break it up in two parts though:
for suv in Car.all().filter('type', 'SUV'):
for part in Part.all(ancestor=suv):
...do something with part...
If you want to query on the property of another (parent) object, you gotta get that object first.
I can think of two solutions to your problem:
Guido's way is to query for the parent, and then query for the part. This way issues more queries.
The second way is to store a copy of parent.type inside your Part. The downsides are that you're storing duplicate data (more storage), and you have to be careful that your the data in Part and data in Car match up. However, you only need to issue one query.
You'll have to figure out which one works better for you.
I'm having trouble figuring out how to best implement a document (paragraph-) revision system in Django.
I want to save a revision history of a document, paragraph-by-paragraph. In other words, there will be a class Document, which has a ManyToManyField to Paragraph. To maintain the order of the paragraphs, a third class ParagraphContainer can be created.
My question is, what is a good way to implement this in Django so that the order of paragraphs is maintained when someone adds a new paragraph in-between existing paragraphs?
One obvious way would be to have a position attribute in the ParagraphContainer class, but then this field will have to be updated in all paragraphs following the inserted (or deleted) paragraph. A linked list is another option, but I'm scared that might be very slow for retrieval of the whole document. Any advice?
Editors often solve this problem with a Piece Table. The table is a list of objects that point to spans of characters that are a) contiguous in memory, and b) share common attributes. The order of the pieces in the table is used for mapping character-in-document addresses to memory and vice versa. By reordering the piece table you effectively reorder the document without moving anything around. The key point is that the piece table itself is independent of the objects that make up the content of the document.
So one way of mapping your paragraph order would be to have a simplified version of a peice table. This could be as simple as a list of para-ids in document order. When you need to change something, you fetch the list, unpickle it, make you edits on the list, pickle and save.
Another advantage of the table is that it greatly simplifies implementing undo. The history file is a simple list of edits to the table, and undoing/redoing is a matter of reversing or reapplying a particular edit to the table, the data itself doesn't change. This should play well with any versioning you want to do.
You can solve this problem if you add a through table to your ManyToManyField with an order attribute:
class Paragraph(models.Model):
text = models.TextField()
class Document(models.Model):
paragraphs = models.ManyToManyField(Paragraph, through='DocumentParagraph')
class DocumentParagraph(models.Model):
paragraph = models.ForeignKey(Paragraph)
document = models.ForeignKey(Document)
order = models.PositiveIntegerField()
Of cource you will have to add some custom methods for updating the order etc, for that you can look into overriding Paragraph.save or use a post_save-signal for example!