I was playing around with SQLalchemy and Microsoft SQL Server to get a hang of the functions when I came across a strange behavior. I was taught that the attribute rowcount on the result proxy object will tell how many rows were effected by executing a statement. However, when I select or insert single or multiple rows in my test database, I always get -1. How could this be and how can I fix this to reflect the reality?
connection = engine.connect()
metadata = MetaData()
# Ex1: select statement for all values
student = Table('student', metadata, autoload=True, autoload_with=engine)
stmt = select([student])
result_proxy = connection.execute(stmt)
results = result_proxy.fetchall()
print(result_proxy.rowcount)
# Ex2: inserting single values
stmt = insert(student).values(firstname='Severus', lastname='Snape')
result_proxy = connection.execute(stmt)
print(result_proxy.rowcout)
# Ex3: inserting multiple values
stmt = insert(student)
values_list = [{'firstname': 'Rubius', 'lastname': 'Hagrid'},
{'firstname': 'Minerva', 'lastname': 'McGonogall'}]
result_proxy = connection.execute(stmt, values_list)
print(result_proxy.rowcount)
The print function for each block seperately run example code prints -1. The Ex1 successfully fetches all rows and both insert statements successfully write the data to the database.
According to the following issue, the rowcount attribute isn't always to be trusted. Is that true here as well? And when, how can I compensate with a Count statement in a SQLalcehmy transaction?
PDO::rowCount() returning -1
The single-row INSERT … VALUES ( … ) is trivial: If the statement succeeds then one row was affected, and if it fails (throws an error) then zero rows were affected.
For a multi-row INSERT simply perform it inside a transaction and rollback if an error occurs. Then the number of rows affected will either be zero or len(values_list).
To get the number of rows that a SELECT will return, wrap the select query in a SELECT count(*) query and run that first, for example:
select_stmt = sa.select([Parent])
count_stmt = sa.select([sa.func.count(sa.text("*"))]).select_from(
select_stmt.alias("s")
)
with engine.connect() as conn:
conn.execution_options(isolation_level="SERIALIZABLE")
rows_found = conn.execute(count_stmt).scalar()
print(f"{rows_found} row(s) found")
results = conn.execute(select_stmt).fetchall()
for item in results:
print(item.id)
Related
I've been trying to use this piece of code:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
cur = db_conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()
So I could connect into Postgres DB and insert values from a df.
I get these 2 errors for this code:
LINE 1: INSERT INTO mrr.shipments (mainFreight_freight_motherVesselD...
psycopg2.errors.UndefinedColumn: column "mainfreight_freight_mothervesseldepartdatetime" of relation "shipments" does not exist
for some reason, the columns can't get the values properly
What can I do to fix it?
You should not do your own string interpolation; let psycopg2 handle it. From the docs:
Warning Never, never, NEVER use Python string concatenation (+) or string parameters interpolation (%) to pass variables to a SQL query string. Not even at gunpoint.
Since you also have dynamic column names, you should use psycopg2.sql to create the statement and then use the standard method of passing query parameters to psycopg2 instead of using format.
I have a database with two tables. The ssi_processed_files_prod table contains file information including the created date and a boolean indicating if the data has been deleted. The data table contains the actual data the boolean references.
I want to get a list of IDs over the age of 45 days from the file_info table, delete the associated rows from the data table, then set the boolean from file_info to True to indicate the data has been deleted.
file_log_test= Table('ssi_processed_files_prod', metadata, autoload=True, autoload_with=engine)
stmt = select([file_log_test.columns.id])
stmt = stmt.where(func.datediff(text('day'),
file_log_test.columns.processing_end_time, func.getDate()) > 45)
connection = engine.connect()
results = connection.execute(stmt).fetchall()
This query returns the correct results, however, I have not been able to work with the output effectively.
For those who would like to know the answer. This was based on reading the Essential SQL Alchemy book. The initial block of cod was correct, but I had to flatten the results into a list. From there I could use the in_() conjuction to work with the list of ids. This allowed me to delete rows from the relevant table and update data status in anohter.
file_log_test= Table('ssi_processed_files_prod', metadata, autoload=True,
autoload_with=engine)
stmt = select([file_log_test.columns.id])
stmt = stmt.where(func.datediff(text('day'),
file_log_test.columns.processing_end_time, func.getDate()) > 45)
connection = engine.connect()
results = connection.execute(stmt).fetchall()
ids_to_delete = [x[0] for x in results]
d = delete(data).where(data.c.filename_id.in_(ids_to_delete))
connection.execute(d)
I'm troubleshooting a script I am using to query the database. To make sure I had everything working right I stripped it down to a simple 'SHOW TABLES' query. The problem is that it is returning a count of the tables instead of the list of names it should return.
import pymysql
connection = pymysql.connect(host='10.0.0.208', user='admin', passwd='Passwrd')
cursor = connection.cursor()
sqlstring = 'SHOW TABLES;'
cursor.execute('USE CustDB')
x = cursor.execute(sqlstring)
print(x)
This is only returning '17'. What am I missing??
Per the documentation, execute returns the number of rows affected
Returns: Number of affected rows
In order to get the desired results, you need to loop through the cursor
cursor.execute('USE CustDB')
tables = [c for c in cursor]
or use fetchall
cursor.execute('USE CustDB')
tables = cursor.fetchall()
I want to know if a row exists already in one of my tables, in this case coll. In order to do this I played around with SQLite in the shell a little and stumbled upon SELECT EXISTS(SELECT 1 FROM coll WHERE ceeb="1234"). In SQLite this works perfectly and it returns either a 0 or a 1-- which is exactly what I wanted. So, with code in hand, I wrote up a quick Python script to see if I could get this to work for me before sticking it into my program. This is what I came up with:
import sqlite3
conn = sqlite3.connect('stu.db')
c = conn.cursor()
sceeb = int(raw_input(":> "))
ceeb_exists = c.execute('SELECT EXISTS(SELECT 1 FROM coll WHERE ceeb="%d" LIMIT 1)' % sceeb)
print ceeb_exists
Instead of assigning ceeb_existsa 1 or a 0 it gives me an output that looks like <sqlite3.Cursor object at 0x01DF6860>. What am I doing wrong here?
The execution of a query always results in 0 or more rows. You'd need to fetch those rows; a SELECT EXISTS query results in 1 row, so you'd need to fetch that row.
Rows always consist of 1 or more columns, here you get one, so you could use tuple assignment (note the , comma after ceeb_exists):
c.execute('SELECT EXISTS(SELECT 1 FROM coll WHERE ceeb="%d" LIMIT 1)' % sceeb)
ceeb_exists, = c.fetchone()
However, using an EXISTS query is a bit redundant here; you could just test if there is any row returned. You should also use query parameters to avoid a SQL injection attack (you are asking a user to give you the ceeb value, so that is easily hijacked):
c.execute('SELECT 1 FROM coll WHERE ceeb=? LIMIT 1', (sceeb,))
ceeb_exists = c.fetchone() is not None
cursor.fetchone() returns None if there is no row available to fetch, the is not None test turns that into True or False.
.executes() returns a cursor object as you can see.
In order to print the results of the query you need to iterate over it:
for result in exists:
print result
I have two tables in my SQL.
Table 1 contains many data, but Table 2 contains huge data.
Here's the code I implement using Python
import MySQLdb
db = MySQLdb.connect(host = "localhost", user = "root", passwd="", db="fak")
cursor = db.cursor()
#Execute SQL Statement:
cursor.execute("SELECT invention_title FROM auip_wipo_sample WHERE invention_title IN (SELECT invention_title FROM us_pat_2005_to_2012)")
#Get the result set as a tuple:
result = cursor.fetchall()
#Iterate through results and print:
for record in result:
print record
print "Finish."
#Finish dealing with the database and close it
db.commit()
db.close()
However, it takes so long. I have run the Python script for 1 hour, and it still doesn't give me any results yet.
Please help me.
Do you have index on invention_title in both tables? If not, then create it:
ALTER TABLE auip_wipo_sample ADD KEY (`invention_title`);
ALTER TABLE us_pat_2005_to_2012 ADD KEY (`invention_title`);
Then combine your query into one which don't use subqueries:
SELECT invention_title FROM auip_wipo_sample
INNER JOIN us_pat_2005_to_2012 ON auip_wipo_sample.invention_title = us_pat_2005_to_2012.invention_title
And let me know about your results.