I need to insert/update bulk rows via SQLAlchemy. And get inserted rows.
I tried to do it with session.execute:
>>> posts = db.session.execute(Post.__table__.insert(), [{'title': 'dfghdfg', 'content': 'sdfgsdf', 'topic': topic}]*2)
>>> posts.fetchall()
ResourceClosedError Traceback (most recent call last)
And with engine:
In [17]: conn = db.engine.connect()
In [18]: result = conn.execute(Post.__table__.insert(), [{'title': 'title', 'content': 'content', 'topic': topic}]*2)
In [19]: print result.fetchall()
ResourceClosedError: This result object does not return rows. It has been closed automatically.
The same response is an object has been closed automatically. How to prevent it?
First answer - on "preventing automatic closing".
SQLAlchemy runs DBAPI execute() or executemany() with insert and do not do any select queries.
So the exception you've got is expected behavior. ResultProxy object returned after insert query executed wraps DB-API cursor that doesn't allow to do .fetchall() on it. Once .fetchall() fails, ResultProxy returns user the exception your saw.
The only information you can get after insert/update/delete operation would be number of affected rows or the value of primary key after auto increment (depending on database and database driver).
If your goal is to receive this kind information, consider checking ResultProxy methods and attributes like:
.inserted_primary_key
.last_inserted_params()
.lastrowid
etc
Second answer - on "how to do bulk insert/update and get resulting rows".
There is no way to load inserted rows while doing single insert query using DBAPI. SQLAlchemy SQL Expression API you are using for doing bulk insert/updates also doesn't provide such functionality.
SQLAlchemy runs DBAPI executemany() call and relies on driver implementation. See this section of documentation for details.
Solution would be to design your table in a way that every record would have natural key to identify records (combination of columns' values that identify record in unique way). So insert/update/select queries would be able to target one record.
After doing it would be possible to do bulk insert/update first and then doing select query by natual key. Thus you won't need to know autoincremented primary key value.
Another option: may be you can use SQLAlchemy Object Relational API for creating objects - then SQLAlchemy may try to optimize insert into doing one query with executemany for you. It worked for me while using Oracle DB.
There won't be any optimization for updates out of the box. Check this SO question for efficient bulk update ideas
Related
None of the "similar questions" really get at this specific topic, but I am trying to find out how SQLAlchemy's Session handles transactions, when:
Passing raw SQL text to the execute() method, rather than utilizing any SQLAlchemy model objects, AND
The raw SQL text contains multiple distinct commands.
For instance:
bulk_operation = """
DELETE FROM the_table WHERE id = ...;
INSERT INTO the_table (id, name) VALUES (...);
"""
sql = text(bulk_operation)
session.execute(sql.bindparams(id=foo, name=bar))
The goal here is to restore the original state, if either the DELETE or the INSERT fails for any reason.
But does Session.execute() actually guarantee this, in this context? Is it necessary to include BEGIN and COMMIT commands within the raw SQL text itself, or manage from the Python level with session.commit() or something else? Thanks in advance!
I have work in Perl where I am able to get the newly created data object ID by passing the result back to a variable. For example:
my $data_obj = $schema->resultset('PersonTable')->create(\%psw_rec_hash);
Where the $data_obj contains the primary key's column value.
I want to be able to do the same thing using Python 3.7, Flask and flask-mysqldb,
but without having to do another query. I want to be able to use the specific
record's primary key column value for another method.
Python and flask-mysqldb inserts data like so:
query = "INSERT INTO PersonTable (fname, mname, lname) VALUES('Phil','','Vil')
cursor = db.connection.cursor()
cursor.execute(query)
db.connection.commit()
cursor.close()
The PersonTable has a primary key column called, id. So, the newly inserted data row would look
like:
23, 'Phil', 'Vil'
Because there are 22 rows of data before the last inserted data, I don't want to perform a search
for the data, because there could be more than one entry with the same data. However, all I want
the most recent data row.
Can I do something similar to Perl with python 3.7 and flask-mysqldb?
You may want to consider the Flask-SQLAlchemy package to help you with this.
Although the syntax is going to be slightly different from Perl, what you can do is, when you create the model object, you can set it to a variable. Then, when you either flush or commit on the Database session, you can pull up your primary key attribute on that model object you had created (whether it's "id" or something else), and use it as needed.
SQLAlchemy supports MySQL, as well as several other relational databases. In addition, it is able to help prevent SQL injection attacks so long as you use model objects and add/delete them to your database session, as opposed to straight SQL commands.
I am using python to perform basic ETL to transfer records from a mysql database to a postgres database. I am using python to commence the tranfer:
python code
source_cursor = source_cnx.cursor()
source_cursor.execute(query.extract_query)
data = source_cursor.fetchall()
source_cursor.close()
# load data into warehouse db
if data:
target_cursor = target_cnx.cursor()
#target_cursor.execute("USE {};".format(datawarehouse_name))
target_cursor.executemany(query.load_query, data)
print('data loaded to warehouse db')
target_cursor.close()
else:
print('data is empty')
MySQL Extract (extract_query):
SELECT `tbl_rrc`.`id`,
`tbl_rrc`.`col_filing_operator`,
`tbl_rrc`.`col_medium`,
`tbl_rrc`.`col_district`,
`tbl_rrc`.`col_type`,
DATE_FORMAT(`tbl_rrc`.`col_timestamp`, '%Y-%m-%d %T.%f') as `col_timestamp`
from `tbl_rrc`
PostgreSQL (loading_query)
INSERT INTO geo_data_staging.tbl_rrc
(id,
col_filing_operator,
col_medium,
col_district,
col_type,
col_timestamp)
VALUES
(%s,%s,%s,%s,%s);
Of note, there is a PK constraint on Id.
The problem is while I have no errors, I'm not seeing any of the records in the target table. I tested this by manually inserting a record, then running again. The code errored out violating PK constraint. So I know it's finding the table.
Any idea on what I could be missing, I would be greatly appreciate it.
Using psycopg2, you have to call commit() on your cursors in order for transactions to be committed. If you just call close(), the transaction will implicitly roll back.
There are a couple of exceptions to this. You can set the connection to autocommit. You can also use your cursors inside a with block, which will automatically commit if the block doesn't throw any exceptions.
I am trying to learn how to use peewee with mysql.
I have an existing database on a mysql server with an existing table. The table is currently empty (I am just testing right now).
>>> db = MySQLDatabase('nhl', user='root', passwd='blahblah')
>>> db.connect()
>>> class schedule(Model):
... date = DateField()
... team = CharField()
... class Meta:
... database = db
>>> test = schedule.select()
>>> test
<class '__main__.schedule'> SELECT t1.`id`, t1.`date`, t1.`team` FROM `nhl` AS t1 []
>>> test.get()
I get the following error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.6/site-packages/peewee.py", line 1408, in get
return clone.execute().next()
File "/usr/lib/python2.6/site-packages/peewee.py", line 1437, in execute
self._qr = QueryResultWrapper(self.model_class, self._execute(), query_meta)
File "/usr/lib/python2.6/site-packages/peewee.py", line 1232, in _execute
return self.database.execute_sql(sql, params, self.require_commit)
File "/usr/lib/python2.6/site-packages/peewee.py", line 1602, in execute_sql
res = cursor.execute(sql, params or ())
File "/usr/lib64/python2.6/site-packages/MySQLdb/cursors.py", line 201, in execute
self.errorhandler(self, exc, value)
File "/usr/lib64/python2.6/site-packages/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
_mysql_exceptions.OperationalError: (1054, "Unknown column 't1.id' in 'field list'")
Why is peewee adding the 'id' column into the select query? I do not have an id column in the table that already exists in the database. I simply want to work with the existing table and not depend on peewee having to create one every time I want to interact with the database. This is where I believe the error is.
The result of the query should be empty since the table is empty but since I am learning I just wanted to try out the code. I appreciate your help.
EDIT
Based on the helpful responses by Wooble and Francis I come to wonder whether it even makes sense for me to use peewee or another ORM like sqlalchemy. What are the benefits of using an ORM instead of just running direct queries in python using MySQLdb?
This is what I expect to be doing:
-automatically downloading data from various web servers. Most of the data is in xls or csv format. I can convert the xls into csv using the xlrd package.
-parsing/processing the data in list objects before inserting/bulk-inserting into a mysql db table.
-running complex queries to export data from mysql into python into appropriate data structured (lists for example) for various statistical computation that is easier to do in python instead of mysql. Anything that can be done in mysql will be done there but I may run complex regressions in python.
-run various graphical packages on the data retrieved from queries. Some of this may include using the ggplot2 package (from R-project), which is an advanced graphical package. So I will involve some R/Python integration.
Given the above - is it best that I spend the hours hacking away to learn ORM/Peewee/SQLAlchemy or stick to direct mysql queries using MySQLdb?
Most simple active-record pattern ORMs need an id column to track object identity. PeeWee appears to be one of them (or at least I am not aware of any way to not use an id). You probably can't use PeeWee without altering your tables.
Your existing table doesn't seem to be very well designed anyway, since it appears to lack a key or compound key. Every table should have a key attribute - otherwise it is impossible to distinguish one row from another.
If one of these columns is a primary key, try adding a primary_key=True argument as explained in the docs concerning non-integer primary keys
date = DateField(primary_key=True)
If your primary key is not named id, then you must set your table's actual primary key to a type of "PrimaryKeyField()" in your peewee Model for that table.
You should investigate SQLAlchemy, which uses a data-mapper pattern. It's much more complicated, but also much more powerful. It doesn't place any restrictions on your SQL table design, and in fact it can automatically reflect your table structure and interrelationships in most cases. (Maybe not as well in MySQL since foreign key relationships are not visible in the default table engine.) Most importantly for you, it can handle tables which lack a key.
If your primary key column name is other than 'id' you should add additional field to that table model class:
class Table(BaseModel):
id_field = PrimaryKeyField()
That will tell your script that your table has primary keys stored in the column named 'id_field' and that column is INT type with Auto Increment enabled.
Here is the documentation describing field types in peewee.
If you want more control on your primary key field, as already pointed by Francis Avila, you should use primary_key=True argument when creating field:
class Table(BaseModel):
id_field = CharField(primary_key=True)
See this link on non-integer primary keys documentation
You have to provide a primary_key field for this model.
If your table doesn't have a single primary_key field(just like mine), a CompositeKey defined in Meta will help.
primary_key = peewee.CompositeKey('date', 'team')
You need to us peewee's create table method to create the actual database table before you can call select(), which will create an id column in the table.
I'm inserting multiple rows into an SQLite3 table using SQLAlchemy, and frequently the entries are already in the table. It is very slow to insert the rows one at a time, and catch the exception and continue if the row already exists. Is there an efficient way to do this? If the row already exists, I'd like to do nothing.
You can use an SQL statement
INSERT OR IGNORE INTO ... etc. ...
to simply ignore the insert if it is a duplicate. Learn about the IGNORE conflict clause here
Perhaps you can use OR IGNORE as a prefix in your SQLAlchemy Insert -- the documentation for how to place OR IGNORE between INSERT and INTO in your SQL statement is here
If you are happy to run 'native' sqlite SQL you can just do:
REPLACE INTO my_table(id, col2, ..) VALUES (1, 'value', ...);
REPLACE INTO my_table(...);
...
COMMIT
However, this won't be portable across all DBMS's and is therefore the reason that its not found in the general sqlalchemy dialect.
Another thing you could do is use the SQLAlchemy ORM, define a 'domain model' -- a python class which maps to your database table. Then you can create many instances of your domain class and call session.save_or_update(domain_object) on each of the items you wish to insert (or ignore) and finally call session.commit() when you want to insert (or ignore) the items to your database table.
This question looks like a duplicate of SQLAlchemy - INSERT OR REPLACE equivalent