Effective insert-only permissions for peewee tables - python

I'm wondering what the best strategy is for using insert-only permissions to a postgres db with Peewee. I'd like this in order to be certain that a specific user can't read any data back out of the database.
I granted INSERT permissions to my table, 'test', in postgres. But I've run into the problem that when I try to save new rows with something like:
thing = Test(value=1)
thing.save()
The sql actually contains a RETURNING clause that needs more permissions (namely, SELECT) than just insert:
INSERT INTO "test" ("value") VALUES (1) RETURNING "test"."id"
Seems like the same sql is generated when I try to use query = test.insert(value=1)' query.execute() as well.
From looking around, it seems like you need either grant SELECT privileges, or use a more exotic feature like "row level security" in Postgres. Is there any way to go about this with peewee out of the box? Or another suggestion of how to add new rows with truly write-only permissions?

You can omit the returning clause by explicitly writing your INSERT query and supplying a blank RETURNING. Peewee uses RETURNING whenever possible so that the auto-generated PK can be recovered in a single operation, but it is possible to disable it:
# Empty call to returning will disable the RETURNING clause:
iq = Test.insert(value=1).returning()
iq.execute()
You can also override this for all INSERT operations by setting the returning_clause attribute on the DB to False:
db = PostgresqlDatabase(...)
db.returning_clause = False
This is not an officially supported approach, though, and may have unintended side-effects or weird behavior - caveat emptor.

Related

Does Django provide any built-in way to update PostgreSQL autoincrement counters?

I'm migrating a Django site from MySQL to PostgreSQL. The quantity of data isn't huge, so I've taken a very simple approach: I've just used the built-in Django serialize and deserialize routines to create JSON records, and then load them in the new instance, loop over the objects, and save each one to the new database.
This works very nicely, with one hiccup: after loading all the records, I run into an IntegrityError when I try to add new data after loading the old records. The Postgres equivalent of a MySQL autoincrement ID field is a serial field, but the internal counter for serial fields isn't incremented when id values are specified explicitly. As a result, Postgres tries to start numbering records at 1 -- already used -- causing a constraint violation. (This is a known issue in Django, marked wontfix.)
There are quite a few questions and answers related to this, but none of the answers seem to address the issue directly in the context of Django. This answer gives an example of the query you'd need to run to update the counter, but I try to avoid making explicit queries when possible. I could simply delete the ID field before saving and let Postgres do the numbering itself, but there are ForeignKey references that will be broken in that case. And everything else works beautifully!
It would be nice if Django provided a routine for doing this that intelligently handles any edge cases. (This wouldn't fix the bug, but it would allow developers to work around it in a consistent and correct way.) Do we really have to just use a raw query to fix this? It seems so barbaric.
If there's really no such routine, I will simply do something like the below, which directly runs the query suggested in the answer linked above. But in that case, I'd be interested to hear about any potential issues with this approach, or any other information about what I might be doing wrong. For example, should I just modify the records to use UUIDs instead, as this suggests?
Here's the raw approach (edited to reflect a simplified version of what I actually wound up doing). It's pretty close to Pere Picornell's answer, but his looks more robust to me.
table = model._meta.db_table
cur = connection.cursor()
cur.execute(
"SELECT setval('{}_id_seq', (SELECT max(id) FROM {}))".format(table, table)
)
About the debate: my case is a one-time migration, and my decision was to run this function right after I finish each table's migration, although you could call it anytime you suspect integrity could be broken.
def synchronize_last_sequence(model):
# Postgresql aut-increments (called sequences) don't update the 'last_id' value if you manually specify an ID.
# This sets the last incremented number to the last id
sequence_name = model._meta.db_table+"_"+model._meta.pk.name+"_seq"
with connections['default'].cursor() as cursor:
cursor.execute(
"SELECT setval('" + sequence_name + "', (SELECT max(" + model._meta.pk.name + ") FROM " +
model._meta.db_table + "))"
)
print("Last auto-incremental number for sequence "+sequence_name+" synchronized.")
Which I did using the SQL query you proposed in your question.
It's been very useful to find your post. Thank you!
It should work with custom PKs but not with multi-field PKs.
One option is to use natural keys during serialization and deserialization. That way when you insert it into PostgreSQL, it will auto-increment the primary key field and keep everything inline.
The downside to this approach is that you need to have a set of unique fields for each model that don't include the id.

flask-sqlalchemy delete query failing with "Could not evaluate current criteria in Python"

I have a query using flask-sqlalchemy in which I want to delete all the stocks from the database where there ticker matches one in a list. This is the current query I have:
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete()
Where new_tickers is a list of str of valid tickers.
The error I am getting is the following:
sqlalchemy.exc.InvalidRequestError: Could not evaluate current criteria in Python: "Cannot evaluate clauselist with operator <function comma_op at 0x1104e4730>". Specify 'fetch' or False for the synchronize_session parameter.
You need to use one of options for bulk delete
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete(synchronize_session=False)
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete(synchronize_session='evaluate')
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete(synchronize_session='fetch')
Basically, SQLAlchemy maintains the session in Python as you issue various SQLAlchemy methods. When you delete entries, how will SQLAlchemy remove any removed rows from the session? This is controlled by a parameter to the delete method, "synchronize_session". synchronize_session has three possible:
'evaluate': it evaluates the produced query directly in Python to determine the objects that need to be removed from the session. This is the default and is very efficient, but is not very robust and complicated queries cannot be be evaluated. If it can't evaluate the query, it raises the sqlalchemy.orm.evaluator.UnevaluatableError condition
'fetch': this performs a select query before the delete and uses that result to determine which objects in the session need to be removed. This is less efficient (potential much less efficient) but will be able to handle any valid query
False: this doesn't attempt to update the session, so it's very efficient, however if you continue to use the session after the delete you may get inaccurate results.
Which option you use is very dependent on how your code uses the session. In most simple queries where you just need to delete rows based on a complicated query, False should work fine. (the example in the question fits this scenario)
SQLAlchemy Delete Method Reference
Try it with this code:
Stock.query.filter(Stock.ticker.in_(new_tickers)).delete(synchronize_session=False)
https://docs.sqlalchemy.org/en/latest/orm/query.html?highlight=delete#sqlalchemy.orm.query.Query.delete

Should I always use 'implicit_returning':False in SQLAlchemy?

What's the potential pitfall of always using 'implicit_returning': False in SQLAlchemy?
I've encountered problems a number of times when working on MSSQL tables that have triggers defined, and since the DB is in replication, all of the tables have triggers.
I'm not sure now what the problem exactly is. It has something to do with auto-increment fields - maybe because I'm prefetching the auto-incremented value so I can insert it in another table.
If I don't set 'implicit_returning': False for the table, when I try to insert values, I get this error:
The target table of the DML statement cannot have any enabled triggers
if the statement contains an OUTPUT clause without INTO clause.
So what if I put __table_args__ = {'implicit_returning': False} into all mapped classes just to be safe?
Particularly frustrating for me is that local DB I use for development & testing is not in replication and doesn't need that option, but the production DB is replicated so when I deploy changes they sometimes don't work. :)
As you probably already know, the cause of your predicament is described in SQLAlchemy Docs as the following:
SQLAlchemy by default uses OUTPUT INSERTED to get at newly generated
primary key values via IDENTITY columns or other server side
defaults. MS-SQL does not allow the usage of OUTPUT INSERTED on
tables that have triggers. To disable the usage of OUTPUT INSERTED
on a per-table basis, specify implicit_returning=False for each
Table which has triggers.
If you set your SQLAlchemy engine to echo the SQL, you will see that by default, it does this:
INSERT INTO [table] (id, ...) OUTPUT inserted.[id] VALUES (...)
But if you disable implicit_returning, it does this instead:
INSERT INTO [table] (id, ...) VALUES (...); select scope_identity()
So the question, "Is there any harm in disabling implicit_returning for all tables just in case?" is really, "Is there any disadvantage to using SCOPE_IDENTITY() instead of OUTPUT INSERTED?"
I'm no expert, but I get the impression that although OUTPUT INSERTED is the preferred method these days, SCOPE_IDENTITY() is usually fine too. In the past, SQL Server 2008 (and maybe earlier versions too?) had a bug where SCOPE_IDENTITY sometimes didn't return the correct value, but I hear that has now been fixed (see this question for more detail). (On the other hand, other techniques like ##IDENTITY and IDENT_CURRENT() are still dangerous since they can return the wrong value in corner cases. See this answer and the others on that same page for more detail.)
The big advantage that OUTPUT INSERTED still has is that it can work for cases where you are inserting multiple rows via a single INSERT statement. Is that something you are doing with SQLAlchemy? Probably not, right? So it doesn't matter.
Note that if you are going to have to disable implicit_returning for many tables, you could avoid a bit of boilerplate by making a mixin for it (and whichever other columns and properties you want all of the tables to inherit):
class AutoincTriggerMixin():
__table_args__ = {
'implicit_returning': False
}
id = Column(Integer, primary_key=True, autoincrement=True)
class SomeModel(AutoincTriggerMixin, Base):
some_column = Column(String(1000))
...
See this page in the SQLALchemy documentation for more detail. As an added bonus, it makes it more obvious which tables involve triggers.

Django models - assign id instead of object

I apologize if my question turns out to be silly, but I'm rather new to Django, and I could not find an answer anywhere.
I have the following model:
class BlackListEntry(models.Model):
user_banned = models.ForeignKey(auth.models.User,related_name="user_banned")
user_banning = models.ForeignKey(auth.models.User,related_name="user_banning")
Now, when i try to create an object like this:
BlackListEntry.objects.create(user_banned=int(user_id),user_banning=int(banning_id))
I get a following error:
Cannot assign "1": "BlackListEntry.user_banned" must be a "User" instance.
Of course, if i replace it with something like this:
user_banned = User.objects.get(pk=user_id)
user_banning = User.objects.get(pk=banning_id)
BlackListEntry.objects.create(user_banned=user_banned,user_banning=user_banning)
everything works fine. The question is:
Does my solution hit the database to retrieve both users, and if yes, is it possible to avoid it, just passing ids?
The answer to your question is: YES.
Django will hit the database (at least) 3 times, 2 to retrieve the two User objects and a third one to commit your desired information. This will cause an absolutelly unnecessary overhead.
Just try:
BlackListEntry.objects.create(user_banned_id=int(user_id),user_banning_id=int(banning_id))
These is the default name pattern for the FK fields generated by Django ORM. This way you can set the information directly and avoid the queries.
If you wanted to query for the already saved BlackListEntry objects, you can navigate the attributes with a double underscore, like this:
BlackListEntry.objects.filter(user_banned__id=int(user_id),user_banning__id=int(banning_id))
This is how you access properties in Django querysets. with a double underscore. Then you can compare to the value of the attribute.
Though very similar, they work completely different. The first one sets an atribute directly while the second one is parsed by django, that splits it at the '__', and query the database the right way, being the second part the name of an attribute.
You can always compare user_banned and user_banning with the actual User objects, instead of their ids. But there is no use for this if you don't already have those objects with you.
Hope it helps.
I do believe that when you fetch the users, it is going to hit the db...
To avoid it, you would have to write the raw sql to do the update using method described here:
https://docs.djangoproject.com/en/dev/topics/db/sql/
If you decide to go that route keep in mind you are responsible for protecting yourself from sql injection attacks.
Another alternative would be to cache the user_banned and user_banning objects.
But in all likelihood, simply grabbing the users and creating the BlackListEntry won't cause you any noticeable performance problems. Caching or executing raw sql will only provide a small benefit. You're probably going to run into other issues before this becomes a problem.

getting the id of a created record in SQLAlchemy

How can I get the id of the created record in SQLAlchemy?
I'm doing:
engine.execute("insert into users values (1,'john')")
When you execute a plain text statement, you're at the mercy of the DBAPI you're using as to whether or not the new PK value is available and via what means. With SQlite and MySQL DBAPIs you'll have it as result.lastrowid, which just gives you the value of .lastrowid for the cursor. With PG, Oracle, etc., there's no ".lastrowid" - as someone else said you can use "RETURNING" for those in which case results are available via result.fetchone() (although using RETURNING with oracle, again not taking advantage of SQLAlchemy expression constructs, requires several awkward steps), or if RETURNING isn't available you can use direct sequence access (NEXTVAL in pg), or a "post fetch" operation (CURRVAL in PG, ##identity or scope_identity() in MSSQL).
Sounds complicated right ? That's why you're better off using table.insert(). SQLAlchemy's primary system of providing newly generated PKs is designed to work with these constructs. One you're there, the result.last_inserted_ids() method gives you the newly generated (possibly composite) PK in all cases, regardless of backend. The above methods of .lastrowid, sequence execution, RETURNING etc. are all dealt with for you (0.6 uses RETURNING when available).
There's an extra clause you can add: RETURNING
ie
INSERT INTO users (name, address) VALUES ('richo', 'beaconsfield') RETURNING id
Then just retrieve a row like your insert was a SELECT statement.

Categories