I'm implementing a restful POST API with Flask, using sqlalchemy to update resource in PostgreSQL, say MyResource:
class MyResource(db.Model):
__tablename__ = 'my_resource'
res_id = Column(Integer, primary_key=True)
<other columns>
time_updated = Column(TIMESTAMP(timezone=True), onupdate=datetime.now(timezone.utc))
There's a MyResource instance derived from API's request payload, let's call it input_instance. Below is my approach for updating the resource:
input_instance_dict = input_instance.__dict__
input_instance_dict.pop('_sa_instance_state', None) # this extra meta field will cause error in update().
update_count = MyResource.query.filter(MyResource.res_id == input_instance.res_id).update(input_instance_dict)
db.session.commit()
With above, columns are updated except time_updated, which remains null; I expect it to be updated with current date time.
If I remove the time_updated field from input prior to calling Query.update(),
input_instance_dict.pop('time_updated', None)
then the null value in time_updated column will be updated with current date time, BUT... on subsequent updates, this column's value remains the same old value.
My doubt is that, even with time_updated field removed from input dict, onupdate only takes effect for first update but not afterwards. Why? Thanks.
--- Update 12/23 10:56am GMT+8
Additional observation, I just re-triggered the same update as last night's twice, time_updated column is getting updated for first retry but not the second ones. Which means, After the very first update, onupdate takes effect on and off for following updates. I can't figure out the pattern, when it'll work and when it won't.
Similar problem is also observed for the other timestamp field to be populated with default, say... a record was inserted yesterday, but all records inserted today end up having the same time_created value as yesterday's value.
time_created = Column(TIMESTAMP(timezone=True), nullable=False, default=datetime.now(timezone.utc))
After changing the argument (for default and onupdate), replacing the python datetime function with sqlalchemy.func.now() resolves the weird behaviour.
time_created = Column(TIMESTAMP(timezone=True), nullable=False, default=func.now())
time_updated = Column(TIMESTAMP(timezone=True), onupdate=func.now())
I'm not sure why the behaviour difference, there are many tutorials that using datetime function as argument, I wonder if those programs having the same problem.
Related
I am using Flask-SQLAlchemy with Postgres. I noticed that when I delete a record, the next record will reuse that one's id, which is not ideal for my purposes. Another SO question that this is the default behavior. In particular, his SO question discussed the sql behind the scenes. However, when I tested the solution in this problem, it did not work. In fact, postgres was not using SERIAL for the primary key. I was having to edit it in pgadmin myself. Solutions in other programs mention using a Sequence but it is not shown where the sequence is coming from.
So I would hope this code:
class Test1(db.Model):
__tablename__ = "test1"
# id = ... this is what needs to change
id = db.Column(db.Integer, primary_key=True)
would not reuse say 3 if record 3 was deleted and another was created like so:
i1 = Invoice()
db.session.add(i1)
i2 = Invoice()
db.session.add(i2)
i3 = Invoice()
db.session.add(i3)
db.session.commit()
invs = Invoice.query.all()
for i in invs:
print(i.id) # Should print 1,2,3
Invoice.query.filter(id=3).delete() # no 3 now
db.session.commit()
i4 = Invoice()
db.session.add(i4)
db.session.commit()
invs = Invoice.query.all()
for i in invs:
print(i.id) # Should print 1,2,4
Other, solutions said to use autoincrement=False. Okay, but then how do I determine what the number to set the id to is? Is there a way to save a variable in the class without it being a column:
class Test2(db.Model)
__tablename__ = 'test2'
id = ...
last_id = 3
# code to set last_id when a record is deleted
Edit:
So I could (although I do not think I should) use Python to do this. I think this more clearly tries to illustrate what I am trying to do.
class Test1(db.Model):
__tablename__ = "test1"
# id = ... this is what needs to change
id = db.Column(db.Integer, primary_key=True)
last_used_id = 30
def __init__(self):
self.id = last_used_id + 1
self.last_used_id +=1
# Not sure if this somehow messes with SQLAlchemy / the db making the id first.
This will make any new record not touch an id that was already used.
However, with this I approach, I do encounter the class variable issue behavior of Python. See this SO question
Future self checking: See UUID per #net comment here:
You should use autoincrement=True. This will automatically increment the id everytime you add a new row.
class Test1(db.Model):
__tablename__ = "test1"
id = db.Column(db.Integer, primary_key=True, autoincrement=True, unique=True, nullable=False)
....
By default Postgres will not reuse ids due to performance issues. Attempting to avoid gaps or to re-use deleted IDs creates horrible performance problems. See the PostgreSQL wiki FAQ.
You don't need to keep track of the id. When you call db.session.add(i4) and db.session.commit() it will automatically insert with the incremented id.
Given a simple declarative based class;
class Entity(db.Model):
__tablename__ = 'brand'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(255), nullable=False)
And the next script
entity = Entity()
entity.name = 'random name'
db.session.add(entity)
db.session.commit()
# Just by accessing the property name of the created object a
# SELECT statement is sent to the database.
print entity.name
When I enable echo mode in SQLAlchemy, I can see in the terminal the INSERT statement and an extra SELECT just when I access a property (column) of the model (table row).
If I don't access to any property, the query is not created.
What is the reason for that behavior? In this basic example, We already have the value of the name property assigned to the object. So, Why is needed an extra query? It to secure an up to date value, or something like that?
By default, SQLAlchemy expires objects in the session when you commit. This is controlled via the expire_on_commit parameter.
The reasoning behind this is that the row behind the instance could have been modified outside of the transaction, so if you are not careful you could run into data races, but if you know what you are doing you can safely turn it off.
I have the following two models (just for a test):
class IdGeneratorModel(models.Model):
table = models.CharField(primary_key=True, unique=True,
null=False, max_length=32)
last_created_id = models.BigIntegerField(default=0, null=False,
unique=False)
#staticmethod
def get_id_for_table(table: str) -> int:
try:
last_id_set = IdGeneratorModel.objects.get(table=table)
new_id = last_id_set.last_created_id + 1
last_id_set.last_created_id = new_id
last_id_set.save()
return new_id
except IdGeneratorModel.DoesNotExist:
np = IdGeneratorModel()
np.table = table
np.save()
return IdGeneratorModel.get_id_for_table(table)
class TestDataModel(models.Model):
class Generator:
#staticmethod
def get_id():
return IdGeneratorModel.get_id_for_table('TestDataModel')
id = models.BigIntegerField(null=False, primary_key=True,
editable=False, auto_created=True,
default=Generator.get_id)
data = models.CharField(max_length=16)
Now I use the normal Django Admin site to create a new Test Data Set element. What I expected (and maybe I'm wrong here) is, that the method Generator.get_id() is called exactly one time when saving the new dataset to the database. But what really happens is, that the Generator.get_id() method is called three times:
First time when I click the "add a Test Data Set" button in the admin area
A second time shortly after that (no extra interaction from the user's side)
And a third time when finally saving the new data set
The first time could be OK: This would be the value pre-filled in a form field. Since the primary key field is not displayed in my form, this may be an unnecessary call.
The third time is also clear: It's done before saving. When it's really needed.
The code above is only an example and it is a test for me. In the real project I have to ask a remote system for an ID instead from another table model. But whenever I query that system, the delivered ID gets locked there - like the get_id_for_table() method counts up.
I'm sure there are better ways to get an ID from a method only when really needed - the method should be called exactly one time - when inserting the new dataset. Any idea how to achieve that?
Forgot the version: It's Django 1.8.5 on Python 3.4.
This is not an answer to your question, but could be a solution to your problem
I believe this issue is very complicated. Especially because you want a transaction that spans a webservice call and a database insert... What I would use in this case: generate a uuid locally. This value is practially guaranteed to be unique in the 4d world (time + location) and use that as id. Later, when the save is done, sync with your remote services.
In this code the last_seen field is being refreshed with the current time whenever the user uses the site. However, in the call to the db, he (Minuel Grindberg "Flask Web Development") adds self instead of self.last_seen, which confuses me. I understand what the basic principals of OOP are, and I (thought) understand what self is (reference to the object being created), but I do NOT understand why we don't add self.last_seen in the last line db.session.add(self)? Full code below. . .
class User(UserMixin, db.Model):
__tablename__ = 'users'
id = db.Column(db.Integer, primary_key=True)
email = db.Column(db.String(64), unique=True, index=True)
username = db.Column(db.String(64), unique=True, index=True)
role_id = db.Column(db.Integer, db.ForeignKey('roles.id'))
password_hash = db.Column(db.String(128))
confirmed = db.Column(db.Boolean, default=False)
name = db.Column(db.String(64))
location = db.Column(db.String(64))
about_me = db.Column(db.Text())
member_since = db.Column(db.DateTime(), default=datetime.utcnow)
last_seen = db.Column(db.DateTime(), default=datetime.utcnow)
def ping(self):
self.last_seen = datetime.utcnow()
db.session.add(self)
Looks very simple and I'm sure it is, but obviously I'm missing something, or haven't learned something I should have. If i knew what to google for an answer, I would have certainly done so, but I'm not even sure what to search for other than the principals of Python OOP which I thought I already understood (I did review). Any help would be greatly appreciated because this is driving me crazy, lol.
He is adding the updated model to the DB. The model changed so db.session.add() will update the proper row behind the scene. I don't believe SQLAlchemy would allow you to add on the property of model because it wouldn't know which row to update
Perhaps an example would make this clearer. Let's take the following model:
class User(db.model):
__tablename__ = 'User'
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(25))
Now there are 2 very important attributes on the model for inserting/updating it in the DB. The table name and the id. So to add that model to the DB with plain SQL we would need to do something like:
INSERT INTO User (name) VALUES ('Some string');
This is roughly what happens when you use db.session.add() on a new model. To update our model we would need to do something like:
UPDATE User
SET name='Some other String'
WHERE id=1;
Now if you were to only pass one attribute of a model to SQLAlchemy how would it be able to figure out what table you wanted to add to or which row was supposed to get changed?
If you just passed self.name to db.session.add() the query would end up looking like this:
UPDATE # There is no way to know the table
SET name='Some other String'
WHERE ; # There is no way to know which row needs to be changed
SQLAlchemy would most likely throw an exception if you tried. As for why it can't deduce the model from self that is probably way outside the scope of an SO question.
IanAuld is right-- but I'll make an effort to try and explain it in a long-winded fashion.
Lets put ourselves in SQLAlchemy's role, and lets pretend we are the db.session.add method.
self.last_seen is a datetime object, so lets pretend we're sitting at home, and an envelope comes through the door and it's addressed to db.session.add. Great, that's us, so we open it up and read the message which just says 2014-07-29 nothing else. We know we need to file it away in the filing cabinet somewhere, but we just don't have enough information to do so, all we know is we've got a datetime, we've got no idea what User it belongs to, or even if it does belong to a User at all, it's just a datetime-- we're stuck.
If instead the next thing that comes through the door is a parcel, again addressed to db.session.add, and again we open it-- this time it's a little model of a User, it's got a name, an email-- and even a last_seen datetime written on it's arm. Now it's easy-- I can go right to the filing cabinet and have a look to see if I've already got it in there, and either make a few changes to make them match, or simple file this one away if it's new.
That's the difference-- with an ORM model, you're passing these full User's or Products, or anything around, and SQLALchemy knows that it's a db.Model and therefore can know how, and where to handle it by inspecting it's details.
I want to create a flat forum, where threads are no separate table, with a composite primary key for posts.
So posts have two fields forming a natural key: thread_id and post_number, where the further is the ID of the thread they are part of, and the latter is their position in the thread. if you aren’t convinced, check below the line.
My problem is that i don’t know how to tell SQLAlchemy
when committing the addition of new Post instances with thread_id tid, look up how many posts with thread_id tid exist, and autoincrement from that number on.
Why do i think that schema is a good idea? because it’s natural and performant:
class Post(Base):
number = Column(Integer, primary_key=True, autoincrement=False, nullable=False)
thread_id = Column(Integer, primary_key=True, autoincrement=False, nullable=False)
title = Column(Text) #nullable for not-first posts
text = Column(Text, nullable=False)
...
PAGESIZE = 10
#test
tid = 5
page = 4
Entire Thread (query):
thread5 = session.query(Post).filter_by(thread_id=5)
Thread title:
title = thread5.filter_by(number=0).one().title
Thread page
page4 = thread5.filter(
Post.number >= (page * PAGESIZE),
Post.number < ((page+1) * PAGESIZE)).all()
#or
page4 = thread5.offset(page * PAGESIZE).limit(PAGESIZE).all()
Number of pages:
ceil(thread5.count() / PAGESIZE)
You can probably do this with an SQL expression as a default value (see the default argument). Give it a callable like this:
from sqlalchemy.sql import func
def maxnumber_for_threadid(context):
return post_table.select([func.max(post_table.c.number)]).where(post_table.c.thread_id==context.current_parameters['thread_id'])
I'm not absolutely sure you can return an sql expression from a default callable--you may have to actually execute this query and return a scalar value inside the callback. (The cursor should be available from the context parameter.)
However, I strongly recommend you do what #kindall says and just use another auto-incrementing sequence for the number column. What you want to do is actually very tricky to get right even without SQLAlchemy. For example, if you are using an MVCC database you need to introduce special row-level locking so that the number of rows with a matching thread_id does not change while you are running the transaction. How this is done is database-dependent. For example with MySQL InnoDB, you need to do something like this:
BEGIN TRANSACTION;
SELECT MAX(number)+1 FROM posts WHERE thread_id=? FOR UPDATE;
INSERT INTO posts (thread_id, number) VALUES (?, ?); -- number is from previous query
COMMIT;
If you didn't use FOR UPDATE, then conceivably another connection trying to insert a new post into the same thread at the same time could have gotten the same value for number.
So rather than being performant, post inserts are actually quite slow (relatively speaking) because of the extra query and locking required.
All this is resolved by using a separate sequence and not worrying about post number incrementing only within a thread_id.
You should just use a global post number that increments for posts in any thread. Then you don't need to figure out the right number to use for a given thread. A given thread, then, might have posts numbered 7, 20, 42, 51, and so on. This does not matter because you can easily get the number of posts in the thread from the size of the recordset you get back from the query, and you can easily number the posts in the HTML output separately from the actual post numbers.