Python MySQLdb: Update if exists, else insert - python

I am looking for a simple way to query an update or insert based on if the row exists in the first place. I am trying to use Python's MySQLdb right now.
This is how I execute my query:
self.cursor.execute("""UPDATE `inventory`
SET `quantity` = `quantity`+{1}
WHERE `item_number` = {0}
""".format(item_number,quantity));
I have seen four ways to accomplish this:
DUPLICATE KEY. Unfortunately the primary key is already taken up as a unique ID so I can't use this.
REPLACE. Same as above, I believe it relies on a primary key to work properly.
mysql_affected_rows(). Usually you can use this after updating the row to see if anything was affected. I don't believe MySQLdb in Python supports this feature.
Of course the last ditch effort: Make a SELECT query, fetchall, then update or insert based on the result. Basically I am just trying to keep the queries to a minimum, so 2 queries instead of 1 is less than ideal right now.
Basically I am wondering if I missed any other way to accomplish this before going with option 4. Thanks for your time.

Mysql DOES allow you to have unique indexes, and INSERT ... ON DUPLICATE UPDATE will do the update if any unique index has a duplicate, not just the PK.
However, I'd probably still go for the "two queries" approach. You are doing this in a transaction, right?
Do the update
Check the rows affected, if it's 0 then do the insert
OR
Attempt the insert
If it failed because of a unique index violation, do the update (NB: You'll want to check the error code to make sure it didn't fail for some OTHER reason)
The former is good if the row will usually exist already, but can cause a race (or deadlock) condition if you do it outside a transaction or have your isolation mode is not high enough.
Creating a unique index on item_number in your inventory table sounds like a good idea to me, because I imagine (without knowing the details of your schema) that one item should only have a single stock level (assuming your system doesn't allow multiple stock locations etc).

Related

Query across all schemas on identical table on Postgres

I'm using postgres and I have multiple schemas with identical tables where they are dynamically added the application code.
foo, bar, baz, abc, xyz, ...,
I want to be able to query all the schemas as if they are a single table
!!! I don't want to query all the schemas one by one and combine the results
I want to "combine"(not sure if this would be considered a huge join) the tables across schemas and then run the query.
For example, an order by query shouldn't be like
1. schema_A.result_1
2. schema_A.result_3
3. schema_B.result_2
4. schema_B.result 4
but instead it should be
1. schema_A.result_1
2. schema_B.result_2
3. schema_A.result_3
4. schema_B.result 4
If possible I don't want to generate a query that goes like
SELECT schema_A.table_X.field_1, schema_B.table_X.field_1 FROM schema_A.table_X, schema_B.table_X
But I want that to be taken care of in postgresql, in the database.
Generating a query with all the schemas(namespaces) appended can make my queries HUGE with ~50 field and ~50 schemas.
Since these tables are generated I also cannot inherit them from some global table and query that instead.
I'd also like to know if this is not really possible in a reasonable speed.
EXTRA:
I'm using django and django-tenants so I'd also accept any answer that actually helps me generate the entire query and run it to get a global queryset EVEN THOUGH it would be really slow.
Your question isn't as much of a question as it is an admission that you've got a really terrible database and applicaiton design. It's as if you parittioned something that iddn't need to be parittioned, or partitioned it in the wrong way.
Since you're doing something awkward, the database itself won't provide you with any elegant solution. Instead, you'll have to get more and more awkward until the regret becomes too much to bear and you redesign your database and/or your application.
I urge you to repent now, the sooner the better.
After that giant caveat based on a haughty moral position, I acknolwedge that the only reason we answer questions here is to get imaginary internet points. And so, my answer is this: use a view that unions all of the values together and presents them as if they came from one table. I can't make any sense of the "order by query", so I just ignore it for now. Maybe you mean that you want the results in a certain order; if so, you can add constants to each SELECT operand of each UNION ALL and ORDER BY that constant column coming out of the union. But if the order of the rows matters, I'd assert that you are showing yet another symptom of a poor database design.
You can programatically update the view whenever it is you update or create the new schemas and their catalogs.
A working example is here: http://sqlfiddle.com/#!17/c09265/1
with this schema creation and population code:
CREATE Schema Fooey;
CREATE SCHEMA Junk;
CREATE TABLE Fooey.Baz (SomeINteger INT);
CREATE TABLE Junk.Baz (SomeINteger INT);
INSERT INTO Fooey.Baz (SomeInteger) VALUES (17), (34), (51);
INSERT INTO Junk.Baz (SomeInteger) VALUES (13), (26), (39);
CREATE VIEW AllOfThem AS
SELECT 'FromFooey' AS SourceSchema, SomeINteger FROM Fooey.Baz
UNION ALL
SELECT 'FromJunk' AS SourceSchema, SomeInteger FROM Junk.Baz;
and this query:
SELECT *
FROM AllOfThem
ORDER BY SourceSchema;
Why are per-tenant schemas a bad design?
This design favors laziness over scalability. If you don't want to make changes to your application, you can simply slam connections to a particular shcema and keep working without any code changes. Adding more tennants means adding more schemas, which it sounds like you've automated. Adding many schemas will eventually make database management cumbersome (what if you have thousands or millions of tenants?) and even if you have only a few, the dynamic nature of the list and the problems in writing system-wide queries is an issue that you've already discovered.
Consider instead combining everything and adding the tenant ID as part of a key on each table. In that case, adding more tenants means adding more rows. Any summary queries trivially come from single tables, and all of the features and power of the database implementation and its query language are at your fingertips without any fuss whatsoever.
It's simply false that a database design can't be changed, even in an existing and busy system. It takes a lot of effort to do it, but it can be done and people do it all the time. That's why getting the database design right as early as possible is important.
The README of the django-tenants package you're using describes thier decision to trade-off towards laziness, and cites a whitpaper that outlines many of the shortcomings and alternatives of that method.

Adding elements to Django database

I have a large database of elements each of which has unique key. Every so often (once a minute) I get a load more items which need to be added to the database but if they are duplicates of something already in the database they are discarded.
My question is - is it better to...:
Get Django to give me a list (or set) of all of the unique keys and then, before trying to add each new item, check if its key is in the list or,
have a try/except statement around the save call on the new item and reply on Django catching duplicates?
Cheers,
Jack
If you're using MySQL, you have the power of INSERT IGNORE at your finger tips and that would be the most performant solution. You can execute custom SQL queries using the cursor API directly. (https://docs.djangoproject.com/en/1.9/topics/db/sql/#executing-custom-sql-directly)
If you are using Postgres or some other data-store that does not support INSERT IGNORE then things are going to be a bit more complicated.
In the case of Postgres, you can use rules to essentially make your own version of INSERT IGNORE.
It would look something like this:
CREATE RULE "insert_ignore" AS ON INSERT TO "some_table"
WHERE EXISTS (SELECT 1 FROM some_table WHERE pk=NEW.pk) DO INSTEAD NOTHING;
Whatever you do, avoid the "selecting all rows and checking first approach" as the worst-case performance is O(n) in Python and essentially short-circuits any performance advantage afforded by your database since the check is being performed on the app machine (and also eventually memory-bound).
The try/except approach is marginally better than the "select all rows" approach but it still requires constant hand-off to the app server to deal with each conflict, albeit much quicker. Better to make the database do the work.

Python SQLite executemany() always in order?

In Python, I'm using SQLite's executemany() function with INSERT INTO to insert stuff into a table. If I pass executemany() a list of things to add, can I rely on SQLite inserting those things from the list in order? The reason is because I'm using INTEGER PRIMARY KEY to autoincrement primary keys. For various reasons, I need to know the new auto-incremented primary key of a row around the time I add it to the table (before or after, but around that time), so it would be very convenient to simply be able to assume that the primary key will go up one for every consecutive element of the list I'm passing executemany(). I already have the highest existing primary key before I start adding stuff, so I can increment a variable to keep track of the primary key I expect executemany() to give each inserted row. Is this a sound idea, or does it presume too much?
(I guess the alternative is to use execute() one-by-one with sqlite3_last_insert_rowid(), but that's slower than using executemany() for many thousands of entries.)
Python's sqlite3 module executes the statement with the list values in the correct order.
Note: if the code already knows the to-be-generated ID value, then you should insert this value explicitly so that you get an error if this expectation turns out to be wrong.

SQLAlchemy(Postgresql) - Race Conditions

We are writing an inventory system and I have some questions about sqlalchemy (postgresql) and transactions/sessions. This is a web app using TG2, not sure this matters but to much info is never a bad.
How can make sure that when changing inventory qty's that i don't run into race conditions. If i understand it correctly if user on is going to decrement inventory on an item to say 0 and user two is also trying to decrement the inventory to 0 then if user 1s session hasn't been committed yet then user two starting inventory number is going to be the same as user one resulting in a race condition when both commit, one overwriting the other instead of having a compound effect.
If i wanted to use postgresql sequence for things like order/invoice numbers how can I get/set next values from sqlalchemy without running into race conditions?
EDIT: I think i found the solution i need to use with_lockmode, using for update or for share. I am going to leave open for more answers or for others to correct me if I am mistaken.
TIA
If two transactions try to set the same value at the same time one of them will fail. The one that loses will need error handling. For your particular example you will want to query for the number of parts and update the number of parts in the same transaction.
There is no race condition on sequence numbers. Save a record that uses a sequence number the DB will automatically assign it.
Edit:
Note as Limscoder points out you need to set the isolation level to Repeatable Read.
Setup the scenario you are talking about and see how your configuration handles it. Just open up two separate connections to test it.
Also read up on FOR UPDATE For Update and also on transaction isolation level Isolation Level

Insert/Delete performance

DB Table:
id int(6)
message char(5)
I have to add a record (message) to the DB table. In case of duplicate message(this message already exists with different id) I want to delete (or inactivate somehow) the both of the messages and get their ID's in reply.
Is it possible to perform with only one query? Any performance tips ?...
P.S.
I use PostgreSQL.
The main my problem I worried about, is a need to use locks when performing this with two or more queries...
Many thanks!
If you really want to worry about locking do this.
UPDATE table SET status='INACTIVE' WHERE id = 'key';
If this succeeds, there was a duplicate.
INSERT the additional inactive record. Do whatever else you want with your duplicates.
If this fails, there was no duplicate.
INSERT the new active record.
Commit.
This seizes an exclusive lock right away. The alternatives aren't quite as nice.
Start with an INSERT and check for duplicates doesn't seize a lock until you start updating. It's not clear if this is a problem or not.
Start with a SELECT would need to add a LOCK TABLE to assure that the select held the row found so it could be updated. If no row is found, the insert will work fine.
If you have multiple concurrent writers and two writers could attempt access at the same time, you may not be able to tolerate row-level locking.
Consider this.
Process A does a LOCK ROW and a SELECT but finds no row.
Process B does a LOCK ROW and a SELECT but finds no row.
Process A does an INSERT and a COMMIT.
Process B does an INSERT and a COMMIT. You now have duplicate active records.
Multiple concurrent insert/update transactions will only work with table-level locking. Yes, it's a potential slow-down. Three rules: (1) Keep your transactions as short as possible, (2) release the locks as quickly as possible, (3) handle deadlocks by retrying.
You could write a procedure with both of those commands in it, but it may make more sense to use an insert trigger to check for duplicates (or a nightly job, if it's not time-sensitive).
It is a little difficult to understand your exact requirement. Let me rephrase it two ways:
You want both the entries with same messages in the table (with different IDs), and want to know the IDs for some further processing (marking them as inactive, etc.). For this, You could write a procedure with the separate queries. I don't think you can achieve this with one query.
You do not want either of the entries in the table (i got this from 'i want to delete'). For this, you only have to check if the message already exists and then delete the row if it does, else insert it. I don't think this too can be achieved with one query.
If performance is a constraint during insert, you could insert without any checks and then periodically, sanitize the database.

Categories