SQL Query for filling a SINGLE column with values - python

I have an existing SQL database table I'm interacting with over pyodbc. I have written a class that uses pyodbc to interact with the database by performing reads and creating and deleting columns. The final bit of functionality I require is the ability to fill a created column (full of NULLs by default) with values from a python list (that I plan to iterate over and then finalize with db.commit()) - without having an effect on other columns or adding extra rows.
I tried the following query iterated over in a for loop;
INSERT INTO table_name (required_column) VALUES (value)
Thus the class method;
def writeToColumn(self, columnName, tableName, writeData):
for item in writeData:
self.cursor.execute('INSERT INTO ' + tableName + '(' + columnName + ') VALUES (' + item + ')')
self.cursor.commit()
Where value represent the current index value of a list.
But this adds an entire new row and fills cells of columns of not mentioned with nulls.
what I want to do is replace all of the data in a column without the other columns being effected in any way. Is there some query that can do this?
Thanks!

Not surprisingly, calling INSERT will always insert a new row, hence the name. If you need to update an existing row, you need to call UPDATE.
UPDATE table_name SET required_column=value WHERE ...
where the WHERE condition identifies your existing row somehow (probably via the primary key).

Related

ON DUPLICATE KEY UPDATE non-index columns

I have a code that is updating a few mySQL tables with the data that is coming from a Sybase database. The table structures are exactly the same.
Since the number of tables may increase in the future, I wrote a Python script that loops over an array of table names, and based on the number of columns in each of those tables, the insert statement dynamically changes:
'''insert into databaseName.{} ({}) values ({})'''.format(table, columns, parameters)
as you can see, the value parameters are not hardcoded, which has caused this problem where I can't modify this query to do an "ON DUPLICATE KEY UPDATE".
for example, the insert statement may look like:
insert into databaseName.table_foo (col1,col2,col3,col4,col5) values (%s,%s,%s,%s,%s)
or
insert into databaseName.table_bar (col1,col2,col3) values (%s,%s,%s)
how can I use "ON DUPLICATE KEY UPDATE" in here to update non-index columns with their corresponding non-index values?
I can update this question by including more details if needed.
The easiest solution is this:
'''replace into databaseName.{} ({}) values ({})'''.format(table, columns, parameters)
This works similarly to IODKU, in that if the values conflict with a PRIMARY KEY or UNIQUE KEY of the table, it replaces the row, overwriting the other columns, instead of causing a duplicate key error.
The difference is that REPLACE does a DELETE of the old row followed by an INSERT of the new row. Whereas IODKU does either an INSERT or an UPDATE. We know this because if you create triggers on the table, you'll see which triggers are activated.
Anyway, using REPLACE would make your task a lot simpler in this case.
If you must use IODKU, you would need to add more syntax after the update at the end. Unfortunately, there is no syntax for "assign all the columns respectively to the new row's values." You must assign them individually.
For MySQL 8.0.19 or later use this syntax:
INSERT INTO t1 (a,b,c) VALUES (?,?,?) AS new
ON DUPLICATE KEY UPDATE a = new.a, b = new.b, c = new.c;
In earlier MySQL, use this syntax:
INSERT INTO t1 (a,b,c) VALUES (?,?,?)
ON DUPLICATE KEY UPDATE a = VALUES(a), b = VALUES(b), c = VALUES(c);

Best way to perform bulk insert SQLAlchemy

I have a tabled called products
which has following columns
id, product_id, data, activity_id
What I am essentially trying to do is copy bulk of existing products and update it's activity_id and create new entry in the products table.
Example:
I already have 70 existing entries in products with activity_id 2
Now I want to create another 70 entries with same data except for updated activity_id
I could have thousands of existing entries that I'd like to make a copy of and update the copied entries activity_id to be a new id.
products = self.session.query(model.Products).filter(filter1, filter2).all()
This returns all the existing products for a filter.
Then I iterate through products, then simply clone existing products and just update activity_id field.
for product in products:
product.activity_id = new_id
self.uow.skus.bulk_save_objects(simulation_skus)
self.uow.flush()
self.uow.commit()
What is the best/ fastest way to do these bulk entries so it kills time, as of now it's OK performance, is there a better solution?
You don't need to load these objects locally, all you really want to do is have the database create these rows.
You essentially want to run a query that creates the rows from the existing rows:
INSERT INTO product (product_id, data, activity_id)
SELECT product_id, data, 2 -- the new activity_id value
FROM product
WHERE activity_id = old_id
The above query would run entirely on the database server; this is far preferable over loading your query into Python objects, then sending all the Python data back to the server to populate INSERT statements for each new row.
Queries like that are something you could do with SQLAlchemy core, the half of the API that deals with generating SQL statements. However, you can use a query built from a declarative ORM model as a starting point. You'd need to
Access the Table instance for the model, as that then lets you create an INSERT statement via the Table.insert() method.
You could also get the same object from models.Product query, more on that later.
Access the statement that would normally fetch the data for your Python instances for your filtered models.Product query; you can do so via the Query.statement property.
Update the statement to replace the included activity_id column with your new value, and remove the primary key (I'm assuming that you have an auto-incrementing primary key column).
Apply that updated statement to the Insert object for the table via Insert.from_select().
Execute the generated INSERT INTO ... FROM ... query.
Step 1 can be achieved by using the SQLAlchemy introspection API; the inspect() function, applied to a model class, gives you a Mapper instance, which in turn has a Mapper.local_table attribute.
Steps 2 and 3 require a little juggling with the Select.with_only_columns() method to produce a new SELECT statement where we swapped out the column. You can't easily remove a column from a select statement but we can, however, use a loop over the existing columns in the query to 'copy' them across to the new SELECT, and at the same time make our replacement.
Step 4 is then straightforward, Insert.from_select() needs to have the columns that are inserted and the SELECT query. We have both as the SELECT object we have gives us its columns too.
Here is the code for generating your INSERT; the **replace keyword arguments are the columns you want to replace when inserting:
from sqlalchemy import inspect, literal
from sqlalchemy.sql import ClauseElement
def insert_from_query(model, query, **replace):
# The SQLAlchemy core definition of the table
table = inspect(model).local_table
# and the underlying core select statement to source new rows from
select = query.statement
# validate asssumptions: make sure the query produces rows from the above table
assert table in select.froms, f"{query!r} must produce rows from {model!r}"
assert all(c.name in select.columns for c in table.columns), f"{query!r} must include all {model!r} columns"
# updated select, replacing the indicated columns
as_clause = lambda v: literal(v) if not isinstance(v, ClauseElement) else v
replacements = {name: as_clause(value).label(name) for name, value in replace.items()}
from_select = select.with_only_columns([
replacements.get(c.name, c)
for c in table.columns
if not c.primary_key
])
return table.insert().from_select(from_select.columns, from_select)
I included a few assertions about the model and query relationship, and the code accepts arbitrary column clauses as replacements, not just literal values. You could use func.max(models.Product.activity_id) + 1 as a replacement value (wrapped as a subselect), for example.
The above function executes steps 1-4, producing the desired INSERT SQL statement when printed (I created a products model and query that I thought might be representative):
>>> print(insert_from_query(models.Product, products, activity_id=2))
INSERT INTO products (product_id, data, activity_id) SELECT products.product_id, products.data, :param_1 AS activity_id
FROM products
WHERE products.activity_id != :activity_id_1
All you have to do is execute it:
insert_stmt = insert_from_query(models.Product, products, activity_id=2)
self.session.execute(insert_stmt)

Update a row with a specific id

id is the first column of my Sqlite table.
row is a list or tuple with the updated content, with the columns in the same order than in the database.
How can I do an update command with:
c.execute('update mytable set * = ? where id = ?', row)
without hardcoding all the column names? (I'm in prototyping phase, and this is often subject to change, that's why I don't want to hardcode the column names now).
Obviously * = ? is probably incorrect, how to modify this?
Also, having where id = ? at the end of the query should expect having id as the last element of row, however, it's the first element of row (because, still, row elements use the same column order as the database itself, and id is first column).
You could extract the column names using the table_info PRAGMA. this will have the column names in order. You could then build the statement in parts and finally combine them.
e.g. for a table defined with :-
CREATE TABLE "DATA" ("idx" TEXT,"status" INTEGER,"unit_val" TEXT DEFAULT (null) );
Then
PRAGMA table_info (data);
returns :-
i.e. you want to extract the name column.
You may be interested in - PRAGMA Statements
An alternative approach would be to extract the create sql from sqlite_master. However that would require more complex code to extract the column names.

Copy row from Cassandra database and then insert it using Python

I'm using plugin DataStax Python Driver for Apache Cassandra.
I want to read 100 rows from database and then insert them again into database after changing one value. I do not want to miss previous records.
I know how to get my rows:
rows = session.execute('SELECT * FROM columnfamily LIMIT 100;')
for myrecord in rows:
print(myrecord.timestamp)
I know how to insert new rows into database:
stmt = session.prepare('''
INSERT INTO columnfamily (rowkey, qualifier, info, act_date, log_time)
VALUES (, ?, ?, ?, ?)
IF NOT EXISTS
''')
results = session.execute(stmt, [arg1, arg2, ...])
My problems are that:
I do not know how to change only one value in a row.
I don't know how to insert rows into database without using CQL. My columnfamily has more than 150 columns and writing all their names in query does not seem as a best idea.
To conclude:
Is there a way to get rows, modify one value from every one of them and then insert this rows into database without using only CQL?
First, you need to select only needed columns from Cassandra - it will be faster to transfer the data. You need to include all columns of primary key + column that you want to change.
After you get the data, you can use UPDATE command to update only necessary column (example from documentation):
UPDATE cycling.cyclist_name
SET comments ='='Rides hard, gets along with others, a real winner'
WHERE id = fb372533-eb95-4bb4-8685-6ef61e994caa
You can also use prepared statement to make it more performant...
But be careful - the UPDATE & INSERT in CQL are really UPSERTs, so if you change columns that are part of primary key, then it will create new entry...

Python: sqlite3 - how to speed up updating of the database

I have a database, which I store as a .db file on my disk. I implemented all the function neccessary for managing this database using sqlite3. However, I noticed that updating the rows in the table takes a large amount of time. My database has currently 608042 rows. The database has one table - let's call it Table1. This table consists of the following columns:
id | name | age | address | job | phone | income
(id value is generated automaticaly while a row is inserted to the database).
After reading-in all the rows I perform some operations (ML algorithms for predicting the income) on the values from the rows, and next I have to update (for each row) the value of income (thus, for each one from 608042 rows I perform the SQL update operation).
In order to update, I'm using the following function (copied from my class):
def update_row(self, new_value, idkey):
update_query = "UPDATE Table1 SET income = ? WHERE name = ?" %
self.cursor.execute(update_query, (new_value, idkey))
self.db.commit()
And I call this function for each person registered in the database.
for each i out of 608042 rows:
update_row(new_income_i, i.name)
(values of new_income_i are different for each i).
This takes a huge amount of time, even though the dataset is not giant. Is there any way to speed up the updating of the database? Should I use something else than sqlite3? Or should I instead of storing the database as a .db file store it in memory (using sqlite3.connect(":memory:"))?
Each UPDATE statement must scan the entire table to find any row(s) that match the name.
An index on the name column would prevent this and make the search much faster. (See Query Planning and How does database indexing work?)
However, if the name column is not unique, then that value is not even suitable to find individual rows: each update with a duplicate name would modify all rows with the same name. So you should use the id column to identify the row to be updated; and as the primary key, this column already has an implicit index.

Categories