SQLAlchemy core insert does not insert data into db - python

I have such issue that SQLAlchemy Core does not insert rows when I'm trying to insert data using connection.execute(table.insert(), list_of_rows). I construct connection object without any additional parameters, it means connection = engine.connect() and engine only with one additional parameter engine = create_engine(uri, echo=True).
Except that I can't find data in db also I can't find "INSERT" statement in logs of my app.
May be important that this issue I'm reproducing during py.test tests.
DB that I use is mssql in docker container.
EDIT1:
rowcount of proxyresult is always -1 regardless if I use transaction or no and if I changed insert to connection.execute(table.insert().execution_options(autocommit=True), list_of_rows).rowcount
EDIT2:
I rewrote this code and now it works. I don't see any major difference.

What's the inserted row count after connection.execute:
proxy = connection.execute(table.insert(), list_of_rows)
print(proxy.rowcount)
if rowcount is positive integer, it proves it indeed writes the data into DB, but may be only present in a transaction, if so you could then check whether autocommit is on: https://docs.sqlalchemy.org/en/latest/core/connections.html#understanding-autocommit

Related

insert records do not show up in postgres

I am using python to perform basic ETL to transfer records from a mysql database to a postgres database. I am using python to commence the tranfer:
python code
source_cursor = source_cnx.cursor()
source_cursor.execute(query.extract_query)
data = source_cursor.fetchall()
source_cursor.close()
# load data into warehouse db
if data:
target_cursor = target_cnx.cursor()
#target_cursor.execute("USE {};".format(datawarehouse_name))
target_cursor.executemany(query.load_query, data)
print('data loaded to warehouse db')
target_cursor.close()
else:
print('data is empty')
MySQL Extract (extract_query):
SELECT `tbl_rrc`.`id`,
`tbl_rrc`.`col_filing_operator`,
`tbl_rrc`.`col_medium`,
`tbl_rrc`.`col_district`,
`tbl_rrc`.`col_type`,
DATE_FORMAT(`tbl_rrc`.`col_timestamp`, '%Y-%m-%d %T.%f') as `col_timestamp`
from `tbl_rrc`
PostgreSQL (loading_query)
INSERT INTO geo_data_staging.tbl_rrc
(id,
col_filing_operator,
col_medium,
col_district,
col_type,
col_timestamp)
VALUES
(%s,%s,%s,%s,%s);
Of note, there is a PK constraint on Id.
The problem is while I have no errors, I'm not seeing any of the records in the target table. I tested this by manually inserting a record, then running again. The code errored out violating PK constraint. So I know it's finding the table.
Any idea on what I could be missing, I would be greatly appreciate it.
Using psycopg2, you have to call commit() on your cursors in order for transactions to be committed. If you just call close(), the transaction will implicitly roll back.
There are a couple of exceptions to this. You can set the connection to autocommit. You can also use your cursors inside a with block, which will automatically commit if the block doesn't throw any exceptions.

How to truncate and insert on the same transaction using sqlalchemy?

I have a production_table and stage_table.
I have a python script that runs for few hours and generate data in the stage_table.
I want at the end of the script to COPY data from the stage_table to the production_table.
Basically this is what I want:
1. TRUNCATE production_table
2. COPY production_table from stage_table
This is my code:
from sqlalchemy import create_engine
from sqlalchemy.sql import text as sa_text
engine = create_engine("mysql+pymysql:// AMAZON AWS")
engine.execute(sa_text('''TRUNCATE TABLE {1}; COPY TABLE {1} from {0}'''.format(stage_table, production_table)).execution_options(autocommit=True))
This should generate :
TRUNCATE TABLE production_table; COPY TABLE production_table from stage_table
However this doesn't work.
sqlalchemy.exc.ProgrammingError: (pymysql.err.ProgrammingError) (1064,
u"You have an error in your SQL syntax;
How can I make it work? and how can I make sure that the TRUNCATE and COPY are together. I don't want TRUNCATE to happen if COPY aborts.
The usual way to handle multiple statements in a single transaction in SQLAlchemy would be to begin an explicit transaction and execute each statement in it:
with engine.begin() as conn:
conn.execute(statement_1)
conn.execute(statement_2)
...
As to your original attempt, there is no COPY statement in MySQL. Some other DBMS do have something of the kind. Also not all DB-API drivers support multiple statements in a single query or command, at least out of the box, which would seem to be the case here as well. See this issue and the related note in the PyMySQL ChangeLog.
The biggest issue is that not all statements in MySQL can be rolled back, of which the most common are DDL statements. In other words you simply cannot execute TRUNCATE [TABLE] ... in the same transaction as the following INSERT INTO ... and must design your application around that limitation. As suggested in the comments by Christian W. you could perhaps create an entirely new table from your staging table and rename, or just swap the production and staging tables. RENAME TABLE ... cannot be rolled back either, but at least you'd reduce the window for error, and could undo the changes since the original production table would still be there, just under a new name. You could then remove the original production table when all else is done. Here's something that demonstrates the idea, but requires manual intervention, if something goes awry:
# No point in faking transactions here, since MySQL in use.
engine.execute("CREATE TABLE new_production AS SELECT * FROM stage_table")
engine.execute("RENAME TABLE production_table TO old_production")
engine.execute("RENAME TABLE new_production TO production_table")
# Point of no return:
engine.execute("DROP TABLE old_production")

Update MSSQL table through SQLAlchemy using dataframes

I'm trying to replace some old MSSQL stored procedures with python, in an attempt to take some of the heavy calculations off of the sql server. The part of the procedure I'm having issues replacing is as follows
UPDATE mytable
SET calc_value = tmp.calc_value
FROM dbo.mytable mytable INNER JOIN
#my_temp_table tmp ON mytable.a = tmp.a AND mytable.b = tmp.b AND mytable.c = tmp.c
WHERE (mytable.a = some_value)
and (mytable.x = tmp.x)
and (mytable.b = some_other_value)
Up to this point, I've made some queries with SQLAlchemy, stored those data in Dataframes, and done the requisite calculations on them. I don't know now how to put the data back into the server using SQLAlchemy, either with raw SQL or function calls. The dataframe I have on my end would essentially have to work in the place of the temporary table created in MSSQL Server, but I'm not sure how I can do that.
The difficulty is of course that I don't know of a way to join between a dataframe and a mssql table, and I'm guessing this wouldn't work so I'm looking for a workaround
As the pandas doc suggests here :
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://user:password#DSN", echo = False)
dataframe.to_sql('tablename', engine , if_exists = 'replace')
engine parameter for msSql is basically the connection string check it here
if_exist parameter is a but tricky since 'replace' actually drops the table first and then recreates it and then inserts all data at once.
by setting the echo attribute to True it shows all background logs and sql's.

SQL command SELECT fetches uncommitted data from Postgresql database

In short:
I have Postgresql database and I connect to that DB through Python's psycopg2 module. Such script might look like this:
import psycopg2
# connect to my database
conn = psycopg2.connect(dbname="<my-dbname>",
user="postgres",
password="<password>",
host="localhost",
port="5432")
cur = conn.cursor()
ins = "insert into testtable (age, name) values (%s,%s);"
data = ("90", "George")
sel = "select * from testtable;"
cur.execute(sel)
print(cur.fetchall())
# prints out
# [(100, 'Paul')]
#
# db looks like this
# age | name
# ----+-----
# 100 | Paul
# insert new data - no commit!
cur.execute(ins, data)
# perform the same select again
cur.execute(sel)
print(cur.fetchall())
# prints out
# [(100, 'Paul'),(90, 'George')]
#
# db still looks the same
# age | name
# ----+-----
# 100 | Paul
cur.close()
conn.close()
That is, I connect to that database which at the start of the script looks like this:
age | name
----+-----
100 | Paul
I perform SQL select and retrieve only Paul data. Then I do SQL insert, however without any commit, but the second SQL select still fetches both Paul and George - and I don't want that. I've looked both into psycopg and Postgresql docs and found out about ISOLATION LEVEL (see Postgresql and see psycopg2). In Postgresql docs (under 13.2.1. Read Committed Isolation Level) it explicitly says:
However, SELECT does see the effects of previous updates executed within its own transaction, even though they are not yet committed.
I've tried different isolation levels, I understand, that Read Committed and Repeatable Read don't wokr, I thought, that Serializable might work, but it does not -- meaning that I still can fetch uncommitted data with select.
I could do conn.set_isolation_level(0), where 0 represents psycopg2.extensions.ISOLATION_LEVEL_AUTOCOMMIT, or I could probably wrap the execute commands inside with statements (see).
After all, I am bit confused, whether I understand transactions and isolations (and the behavior of select without commit is completely normal) or not. Can somebody enlighten this topic to me?
Your two SELECT statements are using the same connection, and therefore the same transaction. From the psycopg manual you linked:
By default, the first time a command is sent to the database ... a new transaction is created. The following database commands will be executed in the context of the same transaction.
Your code is therefore equivalent to the following:
BEGIN TRANSACTION;
select * from testtable;
insert into testtable (age, name) values (90, 'George');
select * from testtable;
ROLLBACK TRANSACTION;
Isolation levels control how a transaction interacts with other transactions. Within a transaction, you can always see the effects of commands within that transaction.
If you want to isolate two different parts of your code, you will need to open two connections to the database, each of which will (unless you enable autocommit) create a separate transaction.
Note that according to the document already linked, creating a new cursor will not be enough:
...not only the commands issued by the first cursor, but the ones issued by all the cursors created by the same connection
Using autocommit will not solve your problem. When autocommit is one every insert and update is automatically committed to the database and all subsequent reads will see that data.
It's most unusual to not want to see data that has been written to the database by you. But if that's what you want, you need two separate connections and you must make sure that your select is executed prior to the commit.

Redshift COPY operation doesn't work in SQLAlchemy

I'm trying to do a Redshift COPY in SQLAlchemy.
The following SQL correctly copies objects from my S3 bucket into my Redshift table when I execute it in psql:
COPY posts FROM 's3://mybucket/the/key/prefix'
WITH CREDENTIALS 'aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey'
JSON AS 'auto';
I have several files named
s3://mybucket/the/key/prefix.001.json
s3://mybucket/the/key/prefix.002.json
etc.
I can verify that the new rows were added to the table with select count(*) from posts.
However, when I execute the exact same SQL expression in SQLAlchemy, execute completes without error, but no rows get added to my table.
session = get_redshift_session()
session.bind.execute("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';")
session.commit()
It doesn't matter whether I do the above or
from sqlalchemy.sql import text
session = get_redshift_session()
session.execute(text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';"))
session.commit()
I basically had the same problem, though in my case it was more:
engine = create_engine('...')
engine.execute(text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';"))
By stepping through pdb, the problem was obviously the lack of a .commit() being invoked. I don't know why session.commit() is not working in your case (maybe the session "lost track" of the sent commands?) so it might not actually fix your problem.
Anyhow, as explained in the sqlalchemy docs
Given this requirement, SQLAlchemy implements its own “autocommit” feature which works completely consistently across all backends. This is achieved by detecting statements which represent data-changing operations, i.e. INSERT, UPDATE, DELETE [...] If the statement is a text-only statement and the flag is not set, a regular expression is used to detect INSERT, UPDATE, DELETE, as well as a variety of other commands for a particular backend.
So, there are 2 solutions, either:
text("COPY posts FROM 's3://mybucket/the/key/prefix' WITH CREDENTIALS aws_access_key_id=myaccesskey;aws_secret_access_key=mysecretaccesskey' JSON AS 'auto';").execution_options(autocommit=True).
Or, get a fixed version of the redshift dialect... I just opened a PR about it
Add a commit to the end of the copy worked for me:
<your copy sql>;commit;
I have had success using the core expression language and Connection.execute() (as opposed to the ORM and sessions) to copy delimited files to Redshift with the code below. Perhaps you could adapt it for JSON.
def copy_s3_to_redshift(conn, s3path, table, aws_access_key, aws_secret_key, delim='\t', uncompress='auto', ignoreheader=None):
"""Copy a TSV file from S3 into redshift.
Note the CSV option is not used, so quotes and escapes are ignored. Empty fields are loaded as null.
Does not commit a transaction.
:param Connection conn: SQLAlchemy Connection
:param str uncompress: None, 'gzip', 'lzop', or 'auto' to autodetect from `s3path` extension.
:param int ignoreheader: Ignore this many initial rows.
:return: Whatever a copy command returns.
"""
if uncompress == 'auto':
uncompress = 'gzip' if s3path.endswith('.gz') else 'lzop' if s3path.endswith('.lzo') else None
copy = text("""
copy "{table}"
from :s3path
credentials 'aws_access_key_id={aws_access_key};aws_secret_access_key={aws_secret_key}'
delimiter :delim
emptyasnull
ignoreheader :ignoreheader
compupdate on
comprows 1000000
{uncompress};
""".format(uncompress=uncompress or '', table=text(table), aws_access_key=aws_access_key, aws_secret_key=aws_secret_key)) # copy command doesn't like table name or keys single-quoted
return conn.execute(copy, s3path=s3path, delim=delim, ignoreheader=ignoreheader or 0)

Categories