I have a flask app where I made a bunch of classes all with relationships to each other:
User
Course
Lecture
Note
Queue
Asset
So I'm trying to make a new lecture and note, and I have a method defined for each thing.
create Note
def createPad(user,course,lecture):
lecture.queues.first().users.append(user)
# make new etherpad for user to wait in
newNote = Note(dt) # init also creates a new pad at /p/groupID$noteID
db.session.add(newNote)
#db.session.commit()
# add note to user, course, and lecture
user.notes.append(newNote)
course.notes.append(newNote)
lecture.notes.append(newNote)
db.session.commit()
return newNote
createLecture
def createLecture(user, course):
# create new lecture
now = datetime.now()
dt = now.strftime("%Y-%m-%d-%H-%M")
newLecture = Lecture(dt)
db.session.add(newLecture)
# add lecture to course, add new queue to lecture, add user to queue, add new user to lecture
course.lectures.append(newLecture)
newQueue = MatchQueue('neutral')
db.session.add(newQueue)
newLecture.users.append(user)
# hook up the new queue to the user, lecture
newQueue.users.append(user)
newQueue.lecture = newLecture
# put new lecture in correct course
db.session.commit()
newLecture.groupID = pad.createGroupIfNotExistsFor(newLecture.course.name+dt)['groupID']
db.session.commit()
return newLecture
which is all called from some controller logic
newlec = createLecture(user, courseobj)
# make new pad
newNote = createPad(user,courseobj,newlec)
# make lecture live
newLecture.live = True
db.session.commit()
redirect(somewhere)
This ends up throwing this error:
ObjectDereferencedError: Can't emit change event for attribute 'Queue.users' - parent object of type has been garbage collected.
At lecture.queues.first().users.append(user) in createPad.
I have no clue what this means. I think I'm lacking some fundamental knowledge of sqlalchemy here (I am a sqlalchemy noob). What's going on?
lecture.queues.first().users.append(user)
it means:
the first() method hits the database and produces an object, I'm not following your mappings but I have a guess its a Queue object.
then, you access the "users" collection on Queue.
At this point, Python itself garbage collects Queue - it is not referenced anywhere, once "users" has been returned. This is how reference counting garbage collection works.
Then you attempt to append a "user" to "users". SQLAlchemy has to track the changes to all mapped attributes, if you were to say Queue.name = "some name", SQLAlchemy needs to register that with the parent Queue object so it knows to flush it. If you say Queue.users.append(someuser), same idea, it needs to register the change event with the parent.
SQLAlchemy can't do this, because the Queue is gone. Hence the message is raised. SQLAlchemy has a weakref to the parent here so it knows exactly what has happened (and we can't prevent it because people get very upset when we create unnecessary reference cycles in their object models).
The solution is very easy and also easier to read which is to assign the query result to a variable:
queue = lecture.queues.first()
queue.users.append(user)
I did a bunch of refactoring and put in the objects corresponding to some objects during instantiation, which made things a lot neater. Somehow the problem went away.
One thing I did differently was in my many-to-many relationships, I changed the backref to a db.backref():
courses = db.relationship('Course', secondary=courseTable, backref=db.backref('users', lazy='dynamic'))
lectures = db.relationship('Lecture', secondary=lectureTable, backref=db.backref('users', lazy='dynamic'))
notes = db.relationship('Note', secondary=noteTable, backref=db.backref('users', lazy='dynamic'))
queues = db.relationship('Queue', secondary=queueTable, backref=db.backref('users', lazy='dynamic'))
Related
Is there any way to explicitly mark an object as clean in the SQLAlchemy ORM?
This is related partly to a previous question on bulk update strategies.
I want to, within a before_flush event listener mark a bunch of object as actually not needing to be flushed. This is due to them being manually synced with the database by other means.
I have tried the strategy below, but it results in the object being removed from the session, which then can cause problems later when a lazy load happens.
#event.listens_for(SignallingSession, 'before_flush')
def before_flush(session, flush_context, instances):
ledgers = []
if session.dirty:
for elem in session.dirty:
if ( session.is_modified(elem, include_collections=False) ):
if isinstance(elem, Wallet):
session.expunge(elem) # causes problems later
ledgers.append(Ledger(id=elem.id, amount=elem.balance))
if ledgers:
session.bulk_save_objects(ledgers)
session.execute('UPDATE wallet w JOIN ledger l on w.id = l.id SET w.balance = l.amount')
session.execute('TRUNCATE ledger')
I want to do something like:
session.dirty.remove(MyObject)
But that doesn't work as session.dirty is a computed property, not a regular attribute. I've been digging around the instrumentation code, but can't see how I might fool the dirty list to not contain something. I see there is also a history on the object state that will need taking care of as well.
Any ideas? The underlying database is MySQL if that makes any difference.
-Matt
When you modify the database outside of the ORM, you can let the ORM know the current database state by using set_committed_value().
Example:
wallet = session.query(Wallet).filter_by(id=123)
wallet.balance = 0
session.execute("UPDATE wallet SET balance = 0 WHERE id = 123;")
set_committed_value(wallet, "balance", 0)
session.commit() # won't issue additional SQL to update wallet
If you really wanted to mark the instance as not dirty, you can muck with the internals of SQLAlchemy:
state = inspect(p)
session.identity_map._modified.discard(state)
state.modified = False
print(p in session.dirty) # False
Let me summarize this insanity.
from sqlalchemy.orm import attributes
attributes.instance_state(your_object).committed_state.clear()
Easy. (no)
I am currently trying to create a CustomUser entity in my app engine project upon a user signing in for the first time. I would like CustomUser entities to be unique, and I would like to prevent the same entity from being created more than once. This would be fairly easy to do if I can supply it with an ancestor upon entity creation, as this will make the transaction strongly consistent.
Unfortunately, this is not the case, due to the fact that a CustomUser entity is a root entity, and it will thus be eventually consistent, not strongly consistent. Because of this, there are instances when the entity is created twice, which I would like to prevent as this will cause problems later on.
So the question is, is there a way I can prevent the entity from being created more than once? Or at least make the commit of the ancestor entity strongly consistent to prevent duplication? Here's my code, and interim (hacky) solution.
import time
import logging
from google.appengine.ext import ndb
# sample Model
class CustomUser(ndb.Model):
user_id = ndb.StringProperty(required=True)
some_data = ndb.StringProperty(required=True)
some_more_data = ndb.StringProperty(required=True)
externally_based_user_id = "id_taken_from_somewhere_else"
# check if this id already exists in the Model.
# If it does not exist yet, create it
user_entity = CustomUser.query(
CustomUser.user_id == externally_based_user_id,
ancestor=None).get()
if not user_entity:
# prepare the entity
user_entity = CustomUser(
user_id=externally_based_user_id,
some_data="some information",
some_more_data="even more information",
parent=None
)
# write the entity to ndb
user_key = user_entity.put()
# inform of success
logging.info("user " + str(user_key) + " created")
# eventual consistency workaround - loop and keep checking if the
# entity has already been created
#
# I understand that a while loop may not be the wisest solution.
# I can also use a for loop with n range to avoid going around the loop infinitely.
# Both however seem to be band aid solutions
while not entity_check:
entity_check = CustomUser.query(
CustomUser.user_id == externally_based_user_id,
ancestor=None).get()
# time.sleep to prevent the instance from consuming too much processing power and
# memory, although I'm not certain if this has any real effect apart from
# reducing the number of loops
if not entity_check:
time.sleep(0.5)
EDIT: Solution I ended up using based on both of Daniel Roseman's suggestions. This can be further simplified by using get_or_insert as suggested by voscausa. I've stuck to using the usual way of doing things to make things clearer.
import logging
from google.appengine.ext import ndb
# ancestor Model
# we can skip the creation of an empty class like this, and just use a string when
# retrieving a key
class PhantomAncestor(ndb.Model):
pass
# sample Model
class CustomUser(ndb.Model):
# user_id now considered redundance since we will be
# user_id = ndb.StringProperty(required=True)
some_data = ndb.StringProperty(required=True)
some_more_data = ndb.StringProperty(required=True)
externally_based_user_id = "id_taken_from_somewhere_else"
# construct the entity key using information we know.
# entity_key = ndb.Key(*arbitrary ancestor kind*, *arbitrary ancestor id*, *Model*, *user_id*)
# we can also use the string "PhantomAncestor" instead of passing in an empty class like so:
# entity_key = ndb.Key("SomeRandomString", externally_based_user_id, CustomUser, externally_based_user_id)
# check this page on how to construct a key: https://cloud.google.com/appengine/docs/python/ndb/keyclass#Constructors
entity_key = ndb.Key(PhantomAncestor, externally_based_user_id, CustomUser, externally_based_user_id)
# check if this id already exists in the Model.
user_entity = entity_key.get()
# If it does not exist yet, create it
if not user_entity:
# prepare the entity with the desired key value
user_entity = CustomUser(
# user_id=externally_based_user_id,
some_data="some information",
some_more_data="even more information",
parent=None,
# specify the custom key value here
id=externally_based_user_id
)
# write the entity to ndb
user_key = user_entity.put()
# inform of success
logging.info("user " + str(user_key) + " created")
# we should also be able to use CustomUser.get_and_insert to simplify the code further
A couple of things here.
First, note that the ancestor doesn't have to actually exist. If you want a strongly consistent query, you can use any arbitrary key as an ancestor.
A second option would be to use user_id as your key. Then you can do a key get, rather than a query, which again is strongly consistent.
I'm relatively new to Python, coming from the PHP world. In PHP, I would routinely fetch an row, which would correspond to and object from the database, say User, and add properties to it before passing the user object to my view page.
For example, the user has properties email, name and id.
I get 5 users from the database and in a for loop, I assign a dynamic property to the user, say image.
This doesn't seem to work in Python/Google App Engine datastore models (I think it has to do more with the datastore model than python) in a for loop. It works within the for loop (meaning I can reference user.image within the for loop, but once the for loop ends, all of the objects seem to not have the new attribute image anymore.
Here is a code example:
# Model
Class User(ndb.Model):
email = ndb.StringProperty()
name = ndb.StringProperty()
# And then a function that returns a list of users
users = User.get_users()
user_list = []
# For loop
for user in user:
# For example, get image
user.image = Image.get_image(user.key)
user_list.append(user)
# If I print or log this user in the for loop, I see a result
logging.info(user.image) # WORKS!
for ul in user_list:
print ul.image # Results in None/ATTR Error
Can anyone explain to me why this is happening and how to achieve this goal?
I've searched the forms, but I couldn't find anything.
Try using Expando Model
Sometimes you don't want to declare your properties ahead of time. A
special model subclass, Expando, changes the behavior of its entities
so that any attribute assigned (as long as it doesn't start with an
underscore) is saved to the Datastore.
For performance reasons, I've got a denormalized database where some tables contain data which has been aggregated from many rows in other tables. I'd like to maintain this denormalized data cache by using SQLAlchemy events. As an example, suppose I was writing forum software and wanted each Thread to have a column tracking the combined word count of all comments in the thread in order to efficiently display that information:
class Thread(Base):
id = Column(UUID, primary_key=True, default=uuid.uuid4)
title = Column(UnicodeText(), nullable=False)
word_count = Column(Integer, nullable=False, default=0)
class Comment(Base):
id = Column(UUID, primary_key=True, default=uuid.uuid4)
thread_id = Column(UUID, ForeignKey('thread.id', ondelete='CASCADE'), nullable=False)
thread = relationship('Thread', backref='comments')
message = Column(UnicodeText(), nullable=False)
#property
def word_count(self):
return len(self.message.split())
So every time a comment is inserted (for the sake of simplicity let's say that comments are never edited or deleted), we want to update the word_count attribute on the associated Thread object. So I'd want to do something like
def after_insert(mapper, connection, target):
thread = target.thread
thread.word_count = sum(c.word_count for c in thread.comments)
print("updated cached word count to", thread.word_count)
event.listen(Comment, "after_insert", after_insert)
So when I insert a Comment, I can see the event firing and see that it has correctly calculated the word count, but that change is not saved to the Thread row in the database. I don't see any caveats about updated other tables in the after_insert documentation, though I do see some caveats in some of the others, such as after_delete.
So is there a supported way to do this with SQLAlchemy events? I'm already using SQLAlchemy events for lots of other things, so I'd like to do everything that way instead of having to write database triggers.
the after_insert() event is one way to do this, and you might notice it is passed a SQLAlchemy Connection object, instead of a Session as is the case with other flush related events. The mapper-level flush events are intended to be used normally to invoke SQL directly on the given Connection:
#event.listens_for(Comment, "after_insert")
def after_insert(mapper, connection, target):
thread_table = Thread.__table__
thread = target.thread
connection.execute(
thread_table.update().
where(thread_table.c.id==thread.id).
values(word_count=sum(c.word_count for c in thread.comments))
)
print "updated cached word count to", thread.word_count
what is notable here is that invoking an UPDATE statement directly is also a lot more performant than running that attribute change through the whole unit of work process again.
However, an event like after_insert() isn't really needed here, as we know the value of "word_count" before the flush even happens. We actually know it as Comment and Thread objects are associated with each other, and we could just as well keep Thread.word_count completely fresh in memory at all times using attribute events:
def _word_count(msg):
return len(msg.split())
#event.listens_for(Comment.message, "set")
def set(target, value, oldvalue, initiator):
if target.thread is not None:
target.thread.word_count += (_word_count(value) - _word_count(oldvalue))
#event.listens_for(Comment.thread, "set")
def set(target, value, oldvalue, initiator):
# the new Thread, if any
if value is not None:
value.word_count += _word_count(target.message)
# the old Thread, if any
if oldvalue is not None:
oldvalue.word_count -= _word_count(target.message)
the great advantage of this method is that there's also no need to iterate through thread.comments, which for an unloaded collection means another SELECT is emitted.
still another method is to do it in before_flush(). Below is a quick and dirty version, which can be refined to more carefully analyze what has changed in order to determine if the word_count needs to be updated or not:
#event.listens_for(Session, "before_flush")
def before_flush(session, flush_context, instances):
for obj in session.new | session.dirty:
if isinstance(obj, Thread):
obj.word_count = sum(c.word_count for c in obj.comments)
elif isinstance(obj, Comment):
obj.thread.word_count = sum(c.word_count for c in obj.comments)
I'd go with the attribute event method as it is the most performant and up-to-date.
You can do this with SQLAlchemy-Utils aggregated columns: http://sqlalchemy-utils.readthedocs.org/en/latest/aggregates.html
i am writing an app to compare products, using Python and GAE. The products will belong to a set of similar products, and the app calculates the best value in each set.
When i create a new product, it can be added to an existing set or a new set can be created.
When testing the app, the first set gets created just fine. I populate an instance of the set with the name of the product. I use a form on one web page to POST the data into the "suppbook" page. I'm still not clear on how a web page can be a class but that's a different question.
There's more code around all of this but I'm trying to make my question as clear as possible.
class Supp(db.Model):
name = db.StringProperty(multiline=False)
# a bunch of other attributes using Google's DB Model
class SuppSet(db.Model):
name = db.StringProperty(default='')
supp_list = set([])
# a bunch of other attributes using Google's DB Model
# i tried to add this after reading a few questions on SO but GAE doesn't like it
def __init__(self,):
self.name = 'NoName'
self.best_value = 'NoBestValue'
self.supp_list = set([])
Class Suppbook(webapp.RequestHandler):
def post(self):
supp = Supp()
suppSet = SuppSet()
...
supp.name = self.request.get('name')
supp.in_set = self.request.get('newset')
suppSet.name = supp.in_set
suppSet.supp_list.add(supp.name)
self.response.out.write('%s now contains %s<p>' % (suppSet.name,suppSet.supp_list))
This works well the first time around, and if I only use one SuppSet, I can add many supps to it. If I create another SuppSet, though, both suppSets will have the same contents for their supp_list. I have been looking through the questions on here and I think (know) I'm doing something wrong regarding class vs. instance attribute access. I tried to create an __init__ method for SuppSet but GAE complained: AttributeError: 'SuppSet' object has no attribute '_entity'
Also, I am using the GAE datastore to put() and get() the Supps and SuppSets, so I'm not clear why I'm not acting on the unique instances that I should be pulling from the DB.
I am not sure if I am providing enough information but I wanted to get started on this issue. Please let me know if more info is needed to help debug this.
I'm also open to the idea that i'm going about this completely wrong. I'm considering re-writing the whole thing, but I'm so close to being "finished" with basic functionality that I'd like to try to solve this issue.
Thanks
In your init you will need to call the super's init, db.Model has some important stuff to do in its init, you will have to match the signature.
However you likely shouldn't be setting up things like defaults in there. Try and just use the datastore Properties ability to set a default.
You've got some (I assume) typos in your code. Python is sensitive to case and white-space. The attribute names you use also don't match your defs, such as in_set. When possible, post actual working examples demonstrating your problem.
class Supp(db.Model):
name = db.StringProperty(multiline=False)
in_set = db.StringProperty(multiline=False)
# your other stuff ...
class SuppSet(db.Model):
name = db.StringProperty(default='')
supp_list = db.StringListProperty()
# your other stuff ...
# In Python, you need to explicitly call the parent's __init__ with your args.
# Note that this is NOT needed here.
def __init__(self, **kwargs):
db.Model.__init__(self, **kwargs)
class Suppbook(webapp.RequestHandler):
def post(self):
# This will create a NEW Supp and SuppSet every request,
# it won't fetch anything from the datastore.
# These are also NOT needed (included for explanation)
supp = Supp()
suppSet = SuppSet()
# It sounds like you want something like:
product_name = self.request.get('name')
product_set = self.request.get('newset')
# check for missing name / set:
if not product_name or not product_set:
# handle the error
self.error(500)
return
# Build the keys and batch fetch.
supp_key = db.Key.from_path('Supp', product_name)
suppset_key = db.Key.from_path('SuppSet', product_set)
supp, suppset = db.get([supp_key, suppset_key])
if not supp:
supp = Supp(key_name=product_name,
name=product_name)
if not suppset:
suppset = SuppSet(key_name=product_set,
name=product_set)
# Update the entities
supp.in_set = product_set
if product_name not in suppset.supp_list:
suppset.supp_list.append(product_name)
# Batch put...
db.put([supp, suppset])
self.response.out.write('%s now contains %s<p>' % (suppset.name, str(suppset.supp_list)))