Gday,
I'm starting to work with an SQL Database in python and I want to have multiple tables, where I can reference rows in each other's table by ID. Therefore, I'm using a column "testID integer PRIMARY KEY".
The Column increases in value as desired, but if I delete the row with the maximum ID and then add another entry, it will receive the ID which was set already before. That works because deleting the most recent row leads to a lower maximum ID in the column, which makes sense to me.
I was wondering now, is there a way to have the database memorize every ID that was set earlier and not set the same ID twice, even when the maximum ID is deleted from the database?
MWE to maybe make that clearer:
conn = sqlite3.connect("db_problem.db")
c = conn.cursor()
with conn:
c.execute("CREATE TABLE test ("
"testID integer PRIMARY KEY, "
"col1 integer)")
c.execute("INSERT INTO test (col1) VALUES (1)")
c.execute("INSERT INTO test (col1) VALUES (2)")
c.execute("DELETE FROM test WHERE col1 LIKE 1")
c.execute("SELECT testID FROM test WHERE col1 LIKE 2")
print(c.fetchall()) # this stays 2, even tho there is only one row left, which works fine
c.execute("INSERT INTO test (col1) VALUES (3)")
c.execute("DELETE FROM test WHERE col1 LIKE 3")
c.execute("INSERT INTO test (col1) VALUES (4)")
c.execute("SELECT testID FROM test WHERE col1 LIKE 4")
print(c.fetchall()) # Here the autoincrement was set to 3 although it is the fourth entry made
From SQLite Autoincrement:
If the AUTOINCREMENT keyword appears after INTEGER PRIMARY KEY, that
changes the automatic ROWID assignment algorithm to prevent the reuse
of ROWIDs over the lifetime of the database. In other words, the
purpose of AUTOINCREMENT is to prevent the reuse of ROWIDs from
previously deleted rows.
So create the table with this statement:
c.execute("CREATE TABLE test ("
"testID integer PRIMARY KEY AUTOINCREMENT , "
"col1 integer)")
But there is another part of the documentation which you must consider:
The AUTOINCREMENT keyword imposes extra CPU, memory, disk space, and
disk I/O overhead and should be avoided if not strictly needed.
The choice is yours.
Related
I'd like to insert select records from Table A --> Table B (in this example case, different "databases" == different tables to not worry about ATTACH), where Table A has less columns than Table B. The additional B_Table column (col3) should also be populated.
I've tried this sequence in raw-SQL (through SQLAlch.):
1.) INSERTing A_Table into Table B using an engine.connect().execute(text)
text("INSERT INTO B_Table (col1, col2) SELECT col1, col2 FROM A_Table")
2.) UPDATEing B_Table w/ col3 info with an engine.connect()ion (all newly inserted records are populated/updated w/ the same identifier, NewInfo)
text("UPDATE B_Table SET col3 = NewInfo WHERE B_Table.ID >= %s" % (starting_ID#_of_INSERT'd_records))
More efficient alternative?
But this is incredibly inefficient. It takes 4x longer to UPDATE a single column than to INSERT. This seems like it should be a fraction of the INSERT time. I'd like to reduce the total time to ~just the insertion time.
What's a better way to copy data from one table to another w/out INSERTing followed by an UPDATE? I was considering a:
1.) SQLAlchemy session.query(A_Table), but wasn't sure how to then edit that object (for col3) and then insert that updated object w/out loading all the A_Table queried info into RAM (which I understand raw-SQL's INSERT does not do).
You can use 'NewInfo' as a string literal in the SELECT statement:
INSERT INTO B_Table (col1, col2, col3)
SELECT col1, col2, 'NewInfo'
FROM A_Table;
I create a table with primary key and autoincrement.
with open('RAND.xml', "rb") as f, sqlite3.connect("race.db") as connection:
c = connection.cursor()
c.execute(
"""CREATE TABLE IF NOT EXISTS race(RaceID INTEGER PRIMARY KEY AUTOINCREMENT,R_Number INT, R_KEY INT,\
R_NAME TEXT, R_AGE INT, R_DIST TEXT, R_CLASS, M_ID INT)""")
I want to then insert a tuple which of course has 1 less number than the total columns because the first is autoincrement.
sql_data = tuple(b)
c.executemany('insert into race values(?,?,?,?,?,?,?)', b)
How do I stop this error.
sqlite3.OperationalError: table race has 8 columns but 7 values were supplied
It's extremely bad practice to assume a specific ordering on the columns. Some DBA might come along and modify the table, breaking your SQL statements. Secondly, an autoincrement value will only be used if you don't specify a value for the field in your INSERT statement - if you give a value, that value will be stored in the new row.
If you amend the code to read
c.executemany('''insert into
race(R_number, R_KEY, R_NAME, R_AGE, R_DIST, R_CLASS, M_ID)
values(?,?,?,?,?,?,?)''',
sql_data)
you should find that everything works as expected.
From the SQLite documentation:
If the column-name list after table-name is omitted then the number of values inserted into each row must be the same as the number of columns in the table.
RaceID is a column in the table, so it is expected to be present when you're doing an INSERT without explicitly naming the columns. You can get the desired behavior (assign RaceID the next autoincrement value) by passing an SQLite NULL value in that column, which in Python is None:
sql_data = tuple((None,) + a for a in b)
c.executemany('insert into race values(?,?,?,?,?,?,?,?)', sql_data)
The above assumes b is a sequence of sequences of parameters for your executemany statement and attempts to prepend None to each sub-sequence. Modify as necessary for your code.
Can we do autoincrement string in sqlite3? IF not how can we do that?
Exemple:
RY001
RY002
...
With Python, I can do it easily with print("RY"+str(rowid+1)). But how about it performances?
Thank you
If your version of SQLite is 3.31.0+ you can have a generated column, stored or virtual:
CREATE TABLE tablename(
id INTEGER PRIMARY KEY AUTOINCREMENT,
str_id TEXT GENERATED ALWAYS AS (printf('RY%03d', id)),
<other columns>
);
The column id is declared as the primary key of the table and AUTOINCREMENT makes sure that no missing id value (because of deletions) will ever be reused.
The column str_id will be generated after each new row is inserted, as the concatenation of the 'RY' and the left padded with 0s value of id.
As it is, str_id will be VIRTUAL, meaning that it will be created every time you query the table.
If you add STORED to its definition:
str_id TEXT GENERATED ALWAYS AS (printf('RY%03d', id)) STORED
it will be stored in the table.
Something like this:
select
printf("RY%03d", rowid) as "id"
, *
from myTable
?
I have a database where i store some values with a auto generated index key. I also have a n:m mapping table like this:
create table data(id int not null identity(1,1), col1 int not null, col2 varchar(256) not null);
create table otherdata(id int not null identity(1.1), value varchar(256) not null);
create table data_map(dataid int not null, otherdataid int not null);
every day the data table needs to be updated with a list of new values, where a lot of them are already present but needs to be inserted into the data_map (the key in otherdata is then generated, so in this table the data is always new).
one way of doing it would be to first try to insert all values, then select the generated id, then insert into data_map:
mydata = [] # list of tuples
cursor.executemany("if not exists (select * from data where col1 = %d and col2 = %d) insert into data (col1, col2) values (%d, %d)", mydata);
# now select the id's
# [...]
but that obviously is quite bad because i need to select all things without using the key and also i need to do the check without using the key, so i need indexed data first, otherwise everything is very slow.
my next approach was to use a hashfunction (like md5 or crc64) to generate my own hash over col1 and col2, to be able to insert all values without using a select and be able to use the indexed key when inserting missing values.
can this be optimized or is it the best thing i could do?
the amout of lines is >500k per change, where maybe ~20-50% will be already in the database.
timing wise it looks like that calculating the hashes is much faster than inserting data into the database.
As far as I concern, you use mysql.connector. If it is, when you run cursor.execute() you should not use %d types. Everything should be just %s and connector will do this job about type conversions
I've got a table which I want to insert around 1000 items each query, and get their PK after creation, for later use as FK for other tables.
I've tried inserting them using returning syntax in postgresql.
but it takes around 10 sec to insert
INSERT INTO table_name (col1, col2, col3) VALUES (a1,a2,a3)....(a(n-2),a(n-1),a(n)) returning id;
By removing RETURNING I get much better performance ~50ms.
I think that if I can get an atomic operation to get the first id and insert the rows at the same time I could get better performance by removing the RETURNING.
but don't understand if that is possible.
Generate id using the nextval
http://www.postgresql.org/docs/9.1/static/sql-createsequence.html
CREATE TEMP TABLE temp_val AS(
VALUES(nextval('table_name_id_seq'),a1,a2,a3),
(nextval('table_name_id_seq'),a1,a2,a3)
);
INSERT INTO table_name (id, col1, col2, col3)(
SELECT column1,column2,column3,column4
FROM temp_val
);
SELECT column1 FROM temp_val;