sqlite tracking IDs - find missing integers in a seq - python

First I am not even sure whether I am asking the right question, sorry for that. SQL is new to me. I have a table I create in SQLITE like this:
CREATE TABLE ENTRIES "(ID INTEGER PRIMARY KEY AUTOINCREMENT,DATA BLOB NOT NULL)"
Which is all fine if I have only additions for entries. If I create entries, they increment. Let us say I added 7 entries. Now I delete 3 entries:
DELETE FROM NODES WHERE ID = 3
DELETE FROM NODES WHERE ID = 4
DELETE FROM NODES WHERE ID = 5
Entries I now have are:
1,2,6,7.
The next time I add an entry it will have ID=8.
So, my question is:
How do I get the next 3 entries, to get the IDs 3, 4, 5 and only the 4 entry will then get 8? I realize this is similar to SQL: find missing IDs in a table, and it is maybe also a general programming (not just SQL) problem. So, I would be happy to see some Python and SQLite solutions.
Thanks,
Oz

I don't think that's the way auto incrementing fields work. SQLite keeps a counter of the last used integer. It will never 'fill in' the deleted values if you want to get the next 3 rows after
an id you could:
SELECT * FROM NODES WHERE ID > 2 LIMIT 3;
This will give you the next three rows with an id greater than 2
Additionally you could just create a deleted flag or something? so the rows are never actually removed from your database.

You can't. SQLite will never re-use deleted IDs, for database integrity reasons. Let's assume you have a second table which has a foreign key which references the first table. If, for some reason, a corresponding row is removed without removing the rows which reference it (using the primary ID) as well, it will point to the wrong row.
Example: If you remove a person record without removing the purchases as well, the purchase records will point to the wrong person once you re-assign the old ID.
───────────────────── ────────────────────
Table 1 – customers Table 2 – purchase
───────────────────── ────────────────────
*ID <───────┐ *ID
Name │ Item
Address └─────── Customer
Phone Price
This is why pretty much any database engine out there assigns primary IDs strictly incremental. They are database internals, you usually shouldn't touch them. If you need to assign your own IDs, just add a separate column (think twice before doing so).
If you want to keep track of the number of rows, you can query it like this: SELECT Count(*) FROM table_name.

Related

Why is COUNT returning the wrong number?

I'm very new to programming and trying to figure what I'm doing wrong. I have a database with two tables. One is called "addresses and the other is called "tablePlayers". I'm trying to count the number of times a specific person's name appears in the "winner" column and then update it under the "W" column in the table "tablePlayers" on the row of that same person's name.
Here's the code I'm using
c.execute("UPDATE tablePlayers SET W = COUNT(winner) FROM addresses WHERE winner ='Mika'")
Here's what the tables look like in DB Browser for SQLite. As you can see "Mika" only appears once under the "winners" column. But the count says 6 in the other table, and is only printed on one row and not the one with the matching name
addresses
tablePlayers
I can't tell you why exactly it is going wrong, but I would recommend that you use parentheses and a SELECT statement to build a nested query. For example, your query could rather look like this:
UPDATE tablePlayers SET W = (SELECT COUNT(winner) FROM addresses WHERE winner ='Mika') WHERE Name='Mika'
You could also do the more general case and do it for all names at once:
UPDATE tablePlayers SET W = (SELECT COUNT(winner) FROM addresses WHERE winner=tablePlayers.Name)

Delete first row from SQLITE table in python

Its a simple question, how can I just delete the first line from a table without having to give a search criteria.
Normaly it is:
c.execute('DELETE FROM name_table WHERE tada=?', (tadida,))
I just want to delete first row. Not having the WHERE part. The reason is that I want to create a FIFO table (or stack) add from the bottom and delete from the top.
I can do this by keeping track of time and date or giving the rows a ID. But I would prefer the described method.
Thanx.
I just want to delete first row
SQL tables have no inherent ordering, so there is no defined concept of first row, unless a column (or a set of columns) is specified for ordering.
Assuming that you do have an ordering colum, say id, you can use limit to restrict which row should be deleted:
delete from mytable order by id limit 1
This removes the record that has the smallest id from the table.
Unless you use a custom version of sqlite, you can't use ORDER BY or LIMIT with DELETE.
If your version of sqlite wasn't built with that option (Some OS-distributed ones are, some aren't), and building and installing a copy with it is beyond your comfort level, an alternative, assuming a column named id is used for ordering, with the smallest value of id being the oldest record:
DELETE FROM yourtable WHERE id = (SELECT min(id) FROM yourtable);

best way to upsert 300 million entries into postgres?

I have a new csv file every day with 400 million+ entries which I need to upsert into my database (3 tables with 2 foreign keys, indexed). The majority of the entries are already in the table, in which case I need to update a column. Some entries, which are not already in the table need to be inserted.
I tried to insert the CSV each day into a temptable then run:
INSERT INTO restaurants (name, food_id, street_id, datecreated, lastdayobservedopen) SELECT DISTINCT temptable.name, typesoffood.food_id, location.street_id, temptable.datecreated, temptable.lastdayobservedopen FROM temptable INNER JOIN typesoffood on typesoffood.food_type = temptable.food_type INNER JOIN location ON location.street_name = temptable.street_name ON CONFLICT ON CONSTRAINT restaurants_pk DO UPDATE SET lastdayobservedopen = EXCLUDED.lastdayobservedopen
But it takes over 6 hrs.
Is it possible to make this faster?
Edit:
Some more details: 3 tables- restaurants(name, food_id, street_id, datecreated, lastdayobservedopen) with pk (name, street_id) and fks (food_id and street_id); typesoffood(food_id, food_type) with pk (food_id) and index on food_type; location(street_id, street_name) with pk (street_id) and index on street_name; as for the csv file, I don’t know which are new or old entries, but I do know that the majority of the entries are already in the database which would require me to update the lastdayobserved date. The rest are to be inserted with the lastdayobserved date as today. This is supposed to help distinguish between restaurants that are no longer in operation (in which case their lastdayobserved column would not be updated) and currently operating restaurants whose date in that column should always match today’s date. Open to more efficient schema suggestions, as well. Thanks to all!
There is a function in sql called bulk insert can handle large volume of data:
bulk insert #temp
from "file location path"
If you can change you postgres settings you could take advantage of parallelism in Postgres. Otherwise you could at least speed up the csv upload using Postgres's bulk upload otherwise known as the COPY command.
Without more details it's hard to give better advice.

I am delete object of model with pk=1, but new object have pk=2 [duplicate]

I have got a table with auto increment primary key. This table is meant to store millions of records and I don't need to delete anything for now. The problem is, when new rows are getting inserted, because of some error, the auto increment key is leaving some gaps in the auto increment ids.. For example, after 5, the next id is 8, leaving the gap of 6 and 7. Result of this is when I count the rows, it results 28000, but the max id is 58000. What can be the reason? I am not deleting anything. And how can I fix this issue.
P.S. I am using insert ignore while inserting records so that it doesn't give error when I try to insert duplicate entry in unique column.
This is by design and will always happen.
Why?
Let's take 2 overlapping transaction that are doing INSERTs
Transaction 1 does an INSERT, gets the value (let's say 42), does more work
Transaction 2 does an INSERT, gets the value 43, does more work
Then
Transaction 1 fails. Rolls back. 42 stays unused
Transaction 2 completes with 43
If consecutive values were guaranteed, every transaction would have to happen one after the other. Not very scalable.
Also see Do Inserted Records Always Receive Contiguous Identity Values (SQL Server but same principle applies)
You can create a trigger to handle the auto increment as:
CREATE DEFINER=`root`#`localhost` TRIGGER `mytable_before_insert` BEFORE INSERT ON `mytable` FOR EACH ROW
BEGIN
SET NEW.id = (SELECT IFNULL(MAX(id), 0) + 1 FROM mytable);;
END
This is a problem in the InnoDB, the storage engine of MySQL.
It really isn't a problem as when you check the docs on “AUTO_INCREMENT Handling in InnoDB” it basically says InnoDB uses a special table to do the auto increments at startup
And the query it uses is something like
SELECT MAX(ai_col) FROM t FOR UPDATE;
This improves concurrency without really having an affect on your data.
To not have this use MyISAM instead of InnoDB as storage engine
Perhaps (I haven't tested this) a solution is to set innodb_autoinc_lock_mode to 0.
According to http://dev.mysql.com/doc/refman/5.7/en/innodb-auto-increment-handling.html this might make things a bit slower (if you perform inserts of multiple rows in a single query) but should remove gaps.
You can try insert like :
insert ignore into table select (select max(id)+1 from table), "value1", "value2" ;
This will try
insert new data with last unused id (not autoincrement)
if in unique fields duplicate entry found ignore it
else insert new data normally
( but this method not support to update fields if duplicate entry found )

Postgres: autogenerate primary key in postgres using python

cursor.execute('UPDATE emp SET name = %(name)s',{"name": name} where ?)
I don't understand how to get primary key of a particular record.
I have some N number of records present in DB. I want to access those record &
manipulate.
Through SELECT query i got all records but i want to update all those records accordingly
Can someone lend a helping hand?
Thanks in Advance!
Table structure:
ID CustomerName ContactName
1 Alfreds Futterkiste
2 Ana Trujillo
Here ID is auto genearted by system in postgres.
I am accessing CustomerName of two record & updating. So here when i am updating
those record the last updated is overwrtited in first record also.
Here i want to set some condition so that When executing update query according to my record.
After Table structure:
ID CustomerName ContactName
1 xyz Futterkiste
2 xyz Trujillo
Here I want to set first record as 'abc' 2nd record as 'xyz'
Note: It ll done using PK. But i dont know how to get that PK
You mean you want to use UPDATE SQL command with WHERE statement:
cursor.execute("UPDATE emp SET CustomerName='abc' WHERE ID=1")
cursor.execute("UPDATE emp SET CustomerName='xyz' WHERE ID=2")
This way you will UPDATE rows with specific IDs.
Maybe you won't like this, but you should not use autogenerated keys in general. The only exception is when you want to insert some rows and do not do anything else with them. The proper solution is this:
Create a sequencefor your table. http://www.postgresql.org/docs/9.4/static/sql-createsequence.html
Whenever you need to insert a new row, get the next value from the generator (using select nextval('generator_name')). This way you will know the ID before you create the row.
Then insert your row by specifying the id value explicitly.
For the updates:
You can create unique constraints (or unique indexes) on sets of coulmns that are known to be unique
But you should identify the rows with the identifiers internally.
When referring records in other tables, use the identifiers, and create foreign key constraints. (Not always, but usually this is good practice.)
Now, when you need to updatea row (for example: a customer) then you should already know which customer needs to be modified. Because all records are identified by the primary key id, you should already know the id for that row. If you don't know it, but you have an unique index on a set of fields, then you can try to get the id. For example:
select id from emp where CustomerName='abc' -- but only if you have a unique constraing on CustomerName!
In general, if you want to update a single row, then you should NEVER update this way:
update emp set CustomerName='newname' where CustomerName='abc'
even if you have an unique constraint on CustomerName. The explanation is not easy, and won't fit here. But think about this: you may be sending changes in a transaction block, and there can be many opened transactions at the same time...
Of course, it is fine to update rows, if you intention is to update all rows that satisfy your condition.

Categories