I would like to switch on the fast_executemany option for the pyODBC driver while using SQLAlchemy to insert rows to a table. By default it is of and the code runs really slow... Could anyone suggest how to do this?
Edits:
I am using pyODBC 4.0.21 and SQLAlchemy 1.1.13 and a simplified sample of the code I am using are presented below.
import sqlalchemy as sa
def InsertIntoDB(self, tablename, colnames, data, create = False):
"""
Inserts data into given db table
Args:
tablename - name of db table with dbname
colnames - column names to insert to
data - a list of tuples, a tuple per row
"""
# reflect table into a sqlalchemy object
meta = sa.MetaData(bind=self.engine)
reflected_table = sa.Table(tablename, meta, autoload=True)
# prepare an input object for sa.connection.execute
execute_inp = []
for i in data:
execute_inp.append(dict(zip(colnames, i)))
# Insert values
self.connection.execute(reflected_table.insert(),execute_inp)
Try this for pyodbc
crsr = cnxn.cursor()
crsr.fast_executemany = True
Starting with version 1.3, SQLAlchemy has directly supported fast_executemany, e.g.,
engine = create_engine(connection_uri, fast_executemany=True)
Related
Currently i'm executing stored procedure that way:
engine = sqlalchemy.create_engine(self.getSql_conn_url())
query = "exec sp_getVariablesList #City = '{0}', #Station='{1}'".format(City, Station)
self.Variables = pd.read_sql_query(query, engine)
but at How set ARITHABORT ON at sqlalchemy was correctly noticed that that make that open to SQL injection. I tried different ways but without success. So how should I pass parameters to the MSSQL stored procedure to eliminate the risk of SQL injection? That can be with sqlalchemy or any other way.
Write your SQL command text using the "named" paramstyle, wrap it in a SQLAlchemy text() object, and pass the parameter values as a dict:
import pandas as pd
import sqlalchemy as sa
connection_uri = "mssql+pyodbc://#mssqlLocal64"
engine = sa.create_engine(connection_uri)
# SQL command text using "named" paramstyle
sql = """
SET NOCOUNT ON;
SET ARITHABORT ON;
EXEC dbo.breakfast #name = :name_param, #food = :food_param;
"""
# parameter values
param_values = {"name_param": "Gord", "food_param": "bacon"}
# execute query wrapped in SQLAlchemy text() object
df = pd.read_sql_query(sa.text(sql), engine, params=param_values)
print(df)
"""
column1
0 Gord likes bacon for breakfast.
"""
I have created a database with pandas :
import numpy as np
import sqlite3
import pandas as pd
import sqlite3
import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
df = pd.DataFrame(np.random.normal(0, 1, (10, 2)), columns=['A', 'B'])
path = 'sqlite:////home/username/Desktop/example.db'
engine = create_engine(path, echo=False)
df.to_sql('flows', engine, if_exists='append', index=False)
# This is only to show I am able to read the database
df_l = pd.read_sql("SELECT * FROM flows WHERE A>0 AND B<0", engine)
Now I would like to add one or more indexes to the database.
Is this case I would like to make first only the column A and then both the columns indices.
How can I do that?
If possible I would like a solution that uses only SqlAlchemy so that it is independent from the choice of the database.
You should use reflection to get hold of the table that pandas created for you.
With reference to:
SQLAlchemy Reflecting Database Objects
A Table object can be instructed to load information about itself from
the corresponding database schema object already existing within the
database. This process is called reflection. In the most simple case
you need only specify the table name, a MetaData object, and the
autoload=True flag. If the MetaData is not persistently bound, also
add the autoload_with argument:
you could try this:
meta = sqlalchemy.MetaData()
meta.reflect(bind=engine)
flows = meta.tables['flows']
# alternative of retrieving the table from meta:
#flows = sqlalchemy.Table('flows', meta, autoload=True, autoload_with=engine)
my_index = sqlalchemy.Index('flows_idx', flows.columns.get('A'))
my_index.create(bind=engine)
# lets confirm it is there
inspector = reflection.Inspector.from_engine(engine)
print(inspector.get_indexes('flows'))
This seems to work for me. You will have to define the variables psql_URI, table, and col yourself. Here I assume that the table name / column name may be in (partial) uppercase but you want the name of the index to be lowercase.
Derived from the answer here: https://stackoverflow.com/a/72976667/3406189
import sqlalchemy
from sqlalchemy.orm import Session
engine_psql = sqlalchemy.create_engine(psql_URI)
autocommit_engine = engine_psql.execution_options(isolation_level="AUTOCOMMIT")
with Session(autocommit_engine) as session:
session.execute(
f'CREATE INDEX IF NOT EXISTS idx_{table.lower()}_{col.lower()} ON sdi_ai."{table}" ("{col}");'
)
I'm trying to upload a pandas data frame to an SQL table. It seemed to me that pandas to_sql function is the best solution for larger data frames, but I can't get it to work. I can easily extract data, but get an error message when trying to write it to a new table:
# connect to Exasol DB
exaString='DSN=exa'
conDB = pyodbc.connect(exaString)
# get some data from somewhere, works without error
sqlString = "SELECT * FROM SOMETABLE"
data = pd.read_sql(sqlString, conDB)
# now upload this data to a new table
data.to_sql('MYTABLENAME', conDB, flavor='mysql')
conDB.close()
The error message I get is
pyodbc.ProgrammingError: ('42000', "[42000] [EXASOL][EXASolution driver]syntax error, unexpected identifier_chain2, expecting
assignment_operator or ':' [line 1, column 6] (-1)
(SQLExecDirectW)")
Unfortunately I have no idea how the query that caused this syntax error looks like or what else is wrong. Can someone please point me in the right direction?
(Second) EDIT:
Following Humayuns and Joris suggestions, I now use Pandas version 0.14 and SQLAlchemy in combination with the Exasol dialect (?). Since I am connecting to a defined schema, I am using the meta data option, but the programm crashes with "Bus error (core dumped)".
engine = create_engine('exa+pyodbc://uid:passwd#exa/mySchemaName', echo=True)
# get some data
sqlString = "SELECT * FROM SOMETABLE" # SOMETABLE is a view in mySchemaName
df = pd.read_sql(sqlString, con=engine) # works
print engine.has_table('MYTABLENAME') # MYTABLENAME is a view in mySchemaName
# prints "True"
# upload it to a new table
meta = sqlalchemy.MetaData(engine, schema='mySchemaName')
meta.reflect(engine, schema='mySchemaName')
pdsql = sql.PandasSQLAlchemy(engine, meta=meta)
pdsql.to_sql(df, 'MYTABLENAME')
I am not sure about setting "mySchemaName" in create_engine(..), but the outcome is the same.
Pandas does not support the EXASOL syntax out of the box, so it need to be changed a bit, here is a working example of your code without SQLAlchemy:
import pyodbc
import pandas as pd
con = pyodbc.connect('DSN=EXA')
con.execute('OPEN SCHEMA TEST2')
# configure pandas to understand EXASOL as mysql flavor
pd.io.sql._SQL_TYPES['int']['mysql'] = 'INT'
pd.io.sql._SQL_SYMB['mysql']['br_l'] = ''
pd.io.sql._SQL_SYMB['mysql']['br_r'] = ''
pd.io.sql._SQL_SYMB['mysql']['wld'] = '?'
pd.io.sql.PandasSQLLegacy.has_table = \
lambda self, name: name.upper() in [t[0].upper() for t in con.execute('SELECT table_name FROM cat').fetchall()]
data = pd.read_sql('SELECT * FROM services', con)
data.to_sql('SERVICES2', con, flavor = 'mysql', index = False)
If you use the EXASolution Python package, then the code would look like follows:
import exasol
con = exasol.connect(dsn='EXA') # normal pyodbc connection with additional functions
con.execute('OPEN SCHEMA TEST2')
data = con.readData('SELECT * FROM services') # pandas data frame per default
con.writeData(data, table = 'services2')
The problem is that also in pandas 0.14 the read_sql and to_sql functions cannot deal with schemas, but using exasol without schemas makes no sense. This will be fixed in 0.15. If you want to use it now look at this pull request https://github.com/pydata/pandas/pull/7952
I want to implement a function that gives information about all the tables (and their column names) that are present in a database (not only those created with SQLAlchemy). While reading the documentation it seems to me that this is done via reflection but I didn't manage to get something working. Any suggestions or examples on how to do this?
start with an engine:
from sqlalchemy import create_engine
engine = create_engine("postgresql://u:p#host/database")
quick path to all table /column names, use an inspector:
from sqlalchemy import inspect
inspector = inspect(engine)
for table_name in inspector.get_table_names():
for column in inspector.get_columns(table_name):
print("Column: %s" % column['name'])
docs: http://docs.sqlalchemy.org/en/rel_0_9/core/reflection.html?highlight=inspector#fine-grained-reflection-with-inspector
alternatively, use MetaData / Tables:
from sqlalchemy import MetaData
m = MetaData()
m.reflect(engine)
for table in m.tables.values():
print(table.name)
for column in table.c:
print(column.name)
docs: http://docs.sqlalchemy.org/en/rel_0_9/core/reflection.html#reflecting-all-tables-at-once
First set up the sqlalchemy engine.
from sqlalchemy import create_engine, inspect, text
from sqlalchemy.engine import url
connect_url = url.URL(
'oracle',
username='db_username',
password='db_password',
host='db_host',
port='db_port',
query=dict(service_name='db_service_name'))
engine = create_engine(connect_url)
try:
engine.connect()
except Exception as error:
print(error)
return
Like others have mentioned, you can use the inspect method to get the table names.
But in my case, the list of tables returned by the inspect method was incomplete.
So, I found out another way to find table names by using pure SQL queries in sqlalchemy.
query = text("SELECT table_name FROM all_tables where owner = '%s'"%str('db_username'))
table_name_data = self.session.execute(query).fetchall()
Just for sake of completeness of answer, here's the code to fetch table names by inspect method (if it works good in your case).
inspector = inspect(engine)
table_names = inspector.get_table_names()
Hey I created a small module that helps easily reflecting all tables in a database you connect to with SQLAlchemy, give it a look: EZAlchemy
from EZAlchemy.ezalchemy import EZAlchemy
DB = EZAlchemy(
db_user='username',
db_password='pezzword',
db_hostname='127.0.0.1',
db_database='mydatabase',
d_n_d='mysql' # stands for dialect+driver
)
# this function loads all tables in the database to the class instance DB
DB.connect()
# List all associations to DB, you will see all the tables in that database
dir(DB)
I'm proposing another solution as I was not satisfied by any of the previous in the case of postgres which uses schemas. I hacked this solution together by looking into the pandas source code.
from sqlalchemy import MetaData, create_engine
from typing import List
def list_tables(pg_uri: str, schema: str) -> List[str]:
with create_engine(pg_uri).connect() as conn:
meta = MetaData(conn, schema=schema)
meta.reflect(views=True)
return list(meta.tables.keys())
In order to get a list of all tables in your schema, you need to form your postgres database uri pg_uri (e.g. "postgresql://u:p#host/database" as in the zzzeek's answer) as well as the schema's name schema. So if we use the example uri as well as the typical schema public we would get all the tables and views with:
list_tables("postgresql://u:p#host/database", "public")
While reflection/inspection is useful, I had trouble getting the data out of the database. I found sqlsoup to be much more user-friendly. You create the engine using sqlalchemy and pass that engine to sqlsoup.SQlSoup. ie:
import sqlsoup
def create_engine():
from sqlalchemy import create_engine
return create_engine(f"mysql+mysqlconnector://{database_username}:{database_pw}#{database_host}/{database_name}")
def test_sqlsoup():
engine = create_engine()
db = sqlsoup.SQLSoup(engine)
# Note: database must have a table called 'users' for this example
users = db.users.all()
print(users)
if __name__ == "__main__":
test_sqlsoup()
If you're familiar with sqlalchemy then you're familiar with sqlsoup. I've used this to extract data from a wordpress database.
I made a table using SQLAlchemy and forgot to add a column. I basically want to do this:
users.addColumn('user_id', ForeignKey('users.user_id'))
What's the syntax for this? I couldn't find it in the docs.
I have the same problem, and a thought of using migration library only for this trivial thing makes me
tremble. Anyway, this is my attempt so far:
def add_column(engine, table_name, column):
column_name = column.compile(dialect=engine.dialect)
column_type = column.type.compile(engine.dialect)
engine.execute('ALTER TABLE %s ADD COLUMN %s %s' % (table_name, column_name, column_type))
column = Column('new_column_name', String(100), primary_key=True)
add_column(engine, table_name, column)
Still, I don't know how to insert primary_key=True into raw SQL request.
This is referred to as database migration (SQLAlchemy doesn't support migration out of the box). You can look at using sqlalchemy-migrate to help in these kinds of situations, or you can just ALTER TABLE through your chosen database's command line utility,
See this section of the SQLAlchemy documentation: http://docs.sqlalchemy.org/en/latest/core/metadata.html#altering-schemas-through-migrations
Alembic is the latest software to offer this type of functionality and is made by the same author as SQLAlchemy.
I have a database called "ncaaf.db" built with sqlite3 and a table called "games". So I would CD into the same directory on my linux command prompt and do
sqlite3 ncaaf.db
alter table games add column q4 type float
and that is all it takes! Just make sure you update your definitions in your sqlalchemy code.
from sqlalchemy import create_engine
engine = create_engine('sqlite:///db.sqlite3')
engine.execute('alter table table_name add column column_name String')
I had the same problem, I ended up just writing my own function in raw sql. If you are using SQLITE3 this might be useful.
Then if you add the column to your class definition at the same time it seems to do the trick.
import sqlite3
def add_column(database_name, table_name, column_name, data_type):
connection = sqlite3.connect(database_name)
cursor = connection.cursor()
if data_type == "Integer":
data_type_formatted = "INTEGER"
elif data_type == "String":
data_type_formatted = "VARCHAR(100)"
base_command = ("ALTER TABLE '{table_name}' ADD column '{column_name}' '{data_type}'")
sql_command = base_command.format(table_name=table_name, column_name=column_name, data_type=data_type_formatted)
cursor.execute(sql_command)
connection.commit()
connection.close()
I've recently had this same issue so I took a point from AlexP in an earlier answer. The problem was in getting the new column into my program's metadata. Using sqlAlchemy's append_column functionality had some unexpected downstream effects ('str' object has no attribute 'dialect impl'). I corrected this by adding the column with DDL (MySQL database in this case) and then reflecting the table back from the DB into my metadata.
Code is as roughly as follows (modified slightly from what I have in order to reduce it to its minimal essence. I apologize for any mistakes - if there, they should be minor)...
try:
# Use back quotes as a protection against SQL Injection Attacks. Can we do more?
common.qry_engine.execute('ALTER TABLE %s ADD COLUMN %s %s' %
('`' + self.tbl.schema + '`.`' + self.tbl.name + '`',
'`' + self.outputs[new_col] + '`', 'VARCHAR(50)'))
except exc.SQLAlchemyError as msg:
raise GRError(desc='Unable to physically add derived column to table. Contact support.',
data=str(self.outputs), other_info=str(msg))
try: # Refresh the metadata to show the new column
self.tbl = sqlalchemy.Table(self.tbl.name, self.tbl.metadata, extend_existing=True, autoload=True)
except exc.SQLAlchemyError as msg:
raise GRError(desc='Unable to establish metadata for new column. Contact support.',
data=str(self.outputs), other_info=str(msg))
Yes you can
Install sqlalchemy-migrate (pip install sqlalchemy-migrate) and use it in your script to call Table and Column create() method:
from sqlalchemy import String, MetaData, create_engine
from migrate.versioning.schema import Table, Column
db_engine = create_engine(app.config.get('SQLALCHEMY_DATABASE_URI'))
db_meta = MetaData(bind=db_engine)
table = Table('tabel_name' , db_meta)
col = Column('new_column_name', String(20), default='foo')
col.create(table)
Just continuing the simple way proposed by chasmani, little improvement
'''
# simple migration
# columns to add:
# last_status_change = Column(BigInteger, default=None)
# last_complete_phase = Column(String, default=None)
# complete_percentage = Column(DECIMAL, default=0.0)
'''
import sqlite3
from config import APP_STATUS_DB
from sqlalchemy import types
def add_column(database_name: str, table_name: str, column_name: str, data_type: types, default=None):
ret = False
if default is not None:
try:
float(default)
ddl = ("ALTER TABLE '{table_name}' ADD column '{column_name}' '{data_type}' DEFAULT {default}")
except:
ddl = ("ALTER TABLE '{table_name}' ADD column '{column_name}' '{data_type}' DEFAULT '{default}'")
else:
ddl = ("ALTER TABLE '{table_name}' ADD column '{column_name}' '{data_type}'")
sql_command = ddl.format(table_name=table_name, column_name=column_name, data_type=data_type.__name__,
default=default)
try:
connection = sqlite3.connect(database_name)
cursor = connection.cursor()
cursor.execute(sql_command)
connection.commit()
connection.close()
ret = True
except Exception as e:
print(e)
ret = False
return ret
add_column(APP_STATUS_DB, 'procedures', 'last_status_change', types.BigInteger)
add_column(APP_STATUS_DB, 'procedures', 'last_complete_phase', types.String)
add_column(APP_STATUS_DB, 'procedures', 'complete_percentage', types.DECIMAL, 0.0)
If using docker:
go to the terminal of the container holding your DB
get into the db: psql -U usr [YOUR_DB_NAME]
now you can alter tables using raw SQL: alter table [TABLE_NAME] add column [COLUMN_NAME] [TYPE]
Note you will need to have mounted your DB for the changes to persist between builds.
Adding the column "manually" (not using python or SQLAlchemy) is perhaps the easiest?
Same problem over here. What I will do is iterating over the db and add each entry to a new database with the extra column, then delete the old db and rename the new to this one.