sqlalchemy add index to existing sqlite3 database - python

I have created a database with pandas :
import numpy as np
import sqlite3
import pandas as pd
import sqlite3
import sqlalchemy
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
df = pd.DataFrame(np.random.normal(0, 1, (10, 2)), columns=['A', 'B'])
path = 'sqlite:////home/username/Desktop/example.db'
engine = create_engine(path, echo=False)
df.to_sql('flows', engine, if_exists='append', index=False)
# This is only to show I am able to read the database
df_l = pd.read_sql("SELECT * FROM flows WHERE A>0 AND B<0", engine)
Now I would like to add one or more indexes to the database.
Is this case I would like to make first only the column A and then both the columns indices.
How can I do that?
If possible I would like a solution that uses only SqlAlchemy so that it is independent from the choice of the database.

You should use reflection to get hold of the table that pandas created for you.
With reference to:
SQLAlchemy Reflecting Database Objects
A Table object can be instructed to load information about itself from
the corresponding database schema object already existing within the
database. This process is called reflection. In the most simple case
you need only specify the table name, a MetaData object, and the
autoload=True flag. If the MetaData is not persistently bound, also
add the autoload_with argument:
you could try this:
meta = sqlalchemy.MetaData()
meta.reflect(bind=engine)
flows = meta.tables['flows']
# alternative of retrieving the table from meta:
#flows = sqlalchemy.Table('flows', meta, autoload=True, autoload_with=engine)
my_index = sqlalchemy.Index('flows_idx', flows.columns.get('A'))
my_index.create(bind=engine)
# lets confirm it is there
inspector = reflection.Inspector.from_engine(engine)
print(inspector.get_indexes('flows'))

This seems to work for me. You will have to define the variables psql_URI, table, and col yourself. Here I assume that the table name / column name may be in (partial) uppercase but you want the name of the index to be lowercase.
Derived from the answer here: https://stackoverflow.com/a/72976667/3406189
import sqlalchemy
from sqlalchemy.orm import Session
engine_psql = sqlalchemy.create_engine(psql_URI)
autocommit_engine = engine_psql.execution_options(isolation_level="AUTOCOMMIT")
with Session(autocommit_engine) as session:
session.execute(
f'CREATE INDEX IF NOT EXISTS idx_{table.lower()}_{col.lower()} ON sdi_ai."{table}" ("{col}");'
)

Related

how to commit changes to SQLAlchemy object in the database table

I am trying to translate a set of columns in my MySQL database using Python's googletrans library.
Sample MySQL table Data:
Label Answer Label_Translated Answer_Translated
cómo estás Wie heißen sie? NULL NULL
wie gehts per favore rivisita NULL NULL
元気ですか Cuántos años tienes NULL NULL
Below is my sample code:
import pandas as pd
import googletrans
from googletrans import Translator
import sqlalchemy
import pymysql
import numpy as np
from sqlalchemy import create_engine, MetaData, Table
from sqlalchemy.orm import sessionmaker
engine = create_engine("mysql+pymysql:.....")
Session = sessionmaker(bind = engine)
session = Session()
translator = Translator()
I read the database table using:
sql_stmt = "SELECT * FROM translate"
data = session.execute(sql_stmt)
I perform the translation steps using:
for to_translate in data:
to_translate.Answer_Translated = translator.translate(to_translate.Answer, dest = 'en')
to_translate.Label_Translated = translator.translate(to_translate.Label, dest = 'en')
I tried session.commit() but the changes are not reflected in the database. Could someone please let me know how to make the changes permanent in the database.
Also when I try:
for rows in data:
print(rows)
I don't see any output. Before enforcing the changes in the database, is there a way we can view the changes in Python ?
Rewriting my answer because I missed OP was using a raw query to get his set.
Your issue seems to be that there is no real update logic in your code (although you might have missed that out. Here is what you could do. Keep in mind that it's not the most efficient or elegant way to deal with this, but this might get you in the right direction.
# assuming import sqlalchemy as sa
for to_translate in data:
session = Session()
print(to_translate)
mappings = {}
mappings['Label'] = to_translate[0]
mappings['Answer_Translated'] = translator.translate(to_translate.Answer, dest="en")
mappings['Label_Translated'] = translator.translate(to_translate.Label, dest="en")
update_str = "update Data set Answer_Translated=:Answer_Translated, set Label_Translated=:Label_Translated where Label == :Label"
session.execute(sa.text(update_str), mappings)
session.commit()
This will update your db. Now I can't guarantee it will work out of the box, because your actual table might differ from the sample you posted, but the print statement should be able to guide you in fixing update_str. Note that using the ORM would make this a lot nicer.

SQLAlchemy: How to transfer data from one table in an old DB to another table in a new/different DB?

I am transfering some data from one DB to another DB using sqlalchemy in python. I want to make a direct and rapid transfer.
I don't know how to use the function of bulk_insert_mappings() from SQLAlchemy. (Field-wise both tables are identical)
This is what I have tried so far.
from sqlalchemy import create_engine, Column, Integer, String, Date
from sqlalchemy.orm import sessionmaker
from sqlalchemy.ext.declarative import declarative_base
engine_old = create_engine('mysql+pymysql://<id>:<pw>#database_old.amazonaws.com:3306/schema_name_old?charset=utf8')
engine_new = create_engine('mysql+pymysql://<id>:<pw>#database_new.amazonaws.com:3306/schema_name_new?charset=utf8')
data_old = engine_before.execute('SELECT * FROM table_old')
session = sessionmaker()
session.configure(bind=engine_after)
s = session()
how to handle with "s.bulk_insert_mappings(????, data_old)"?**
Could anyone help me?
Thank you.
There are many ways to achieve moving data from one database to another. The specificity of the method depends your individual needs and what you already have implemented. Assuming that both databases old and new already have a schema in their respective DBs, you would need two separate bases and engines. The mapping of an existing database's schema is achieved using automap_base(). Below I am sharing a short example of how this would look like:
from sqlalchemy.orm import Session
from sqlalchemy import create_engine
from sqlalchemy.ext.automap import automap_base
old_base = automap_base()
old_engine = create_engine("<OLD_DB_URI>", echo=True)
old_base.prepare(old_engine, reflect=True)
TableOld = old_base.classes.table_old
old_session = Session(old_engine)
new_base = automap_base()
new_engine = create_engine("<NEW_DB_URI>", echo=True)
new_base.prepare(new_engine, reflect=True)
TableNew = old_base.classes.table_new
new_session = Session(new_engine)
# here you can write your queries
old_table_results = old_session.query(TableOld).all()
new_data = []
for result in old_table_results:
new = TableNew()
new.id = result.id
new.name = result.name
new_data.append(new)
new_session.bulk_save_objects(new_data)
new_session.commit()
Now, about you second question here's a link of examples directly from SQLAlchemy's site: http://docs.sqlalchemy.org/en/latest/_modules/examples/performance/bulk_inserts.html and to answer you question bulk_insert_mappings takes a two parameter a db model (TableNew or TableOld) in the example above and a list of dictionaries representing instances (aka rows) in a db model.

Specifying pyODBC options (fast_executemany = True in particular) using SQLAlchemy

I would like to switch on the fast_executemany option for the pyODBC driver while using SQLAlchemy to insert rows to a table. By default it is of and the code runs really slow... Could anyone suggest how to do this?
Edits:
I am using pyODBC 4.0.21 and SQLAlchemy 1.1.13 and a simplified sample of the code I am using are presented below.
import sqlalchemy as sa
def InsertIntoDB(self, tablename, colnames, data, create = False):
"""
Inserts data into given db table
Args:
tablename - name of db table with dbname
colnames - column names to insert to
data - a list of tuples, a tuple per row
"""
# reflect table into a sqlalchemy object
meta = sa.MetaData(bind=self.engine)
reflected_table = sa.Table(tablename, meta, autoload=True)
# prepare an input object for sa.connection.execute
execute_inp = []
for i in data:
execute_inp.append(dict(zip(colnames, i)))
# Insert values
self.connection.execute(reflected_table.insert(),execute_inp)
Try this for pyodbc
crsr = cnxn.cursor()
crsr.fast_executemany = True
Starting with version 1.3, SQLAlchemy has directly supported fast_executemany, e.g.,
engine = create_engine(connection_uri, fast_executemany=True)

sqlalchemy map table from mssql database with "prefix-namespaces"

I have been struggling on this for a while now and did not find an answer yet, or maybe I already have seen the answer and just didnt get it - however, I hope I am able to describe my problem.
I have a MS SQL database in which the tables are grouped in namespaces (or whatever it is called), denoted by Prefix.Tablename (with a dot). So a native sql statement to request some content looks like this:
SELECT TOP 100
[Value], [ValueDate]
FROM [FinancialDataBase].[Reporting].[IndexedElements]
How to map this to sqlalchemy?
If the "Reporting" prefix would not be there, the solution (or one way to do it) looks like this:
from sqlalchemy import *
from sqlalchemy.ext.declarative import declarative_base, declared_attr
from sqlalchemy.orm import sessionmaker
def get_session():
from urllib.parse import quote_plus as urllib_quote_plus
server = "FinancialDataBase.sql.local"
connstr = "DRIVER={SQL Server};SERVER=%s;DATABASE=FinancialDataBase" % server
params = urllib_quote_plus(connstr)
base_url = "mssql+pyodbc:///?odbc_connect=%s" % params
engine = create_engine(base_url,echo=True)
Session = sessionmaker(bind=engine)
session = Session()
return engine, session
Base = declarative_base()
class IndexedElements(Base):
__tablename__ = "IndexedElements"
UniqueID = Column(String,primary_key=True)
ValueDate = Column(DateTime)
Value = Column(Float)
And then requests can be done and wrapped in a Pandas dataframe for example like this:
import pandas as pd
engine, session = get_session()
query = session.query(IndexedElements.Value,IndexedElements.ValueDate)
data = pd.read_sql(query.statement,query.session.bind)
But the SQL statement that is compiled and actually executed in this, includes this wrong FROM part:
FROM [FinancialDataBase].[IndexedElements]
Due to the namespace-prefix it would have to be
FROM [FinancialDataBase].[Reporting].[IndexedElements]
Simply expanding the table name to
__tablename__ = "Reporting.IndexedElements"
doesnt fix it, because it changes the compiled sql statement to
FROM [FinancialDataBase].[Reporting.IndexedElements]
which doesnt work properly.
So how can this be solved?
The answer is given in the comment by Ilja above:
The "namespace" is a so called schema and has to be declarated in the mapped object. Given the example from the opening post, the mapped table has to be defined like this:
class IndexedElements(Base):
__tablename__ = "IndexedElements"
__table_args__ = {"schema": "Reporting"}
UniqueID = Column(String,primary_key=True)
ValueDate = Column(DateTime)
Value = Column(Float)
Or define a base class containing these informations for different schemata. Check also "Augmenting the base" in sqlalchemy docs:
http://docs.sqlalchemy.org/en/latest/orm/extensions/declarative/mixins.html#augmenting-the-base

List database tables with SQLAlchemy

I want to implement a function that gives information about all the tables (and their column names) that are present in a database (not only those created with SQLAlchemy). While reading the documentation it seems to me that this is done via reflection but I didn't manage to get something working. Any suggestions or examples on how to do this?
start with an engine:
from sqlalchemy import create_engine
engine = create_engine("postgresql://u:p#host/database")
quick path to all table /column names, use an inspector:
from sqlalchemy import inspect
inspector = inspect(engine)
for table_name in inspector.get_table_names():
for column in inspector.get_columns(table_name):
print("Column: %s" % column['name'])
docs: http://docs.sqlalchemy.org/en/rel_0_9/core/reflection.html?highlight=inspector#fine-grained-reflection-with-inspector
alternatively, use MetaData / Tables:
from sqlalchemy import MetaData
m = MetaData()
m.reflect(engine)
for table in m.tables.values():
print(table.name)
for column in table.c:
print(column.name)
docs: http://docs.sqlalchemy.org/en/rel_0_9/core/reflection.html#reflecting-all-tables-at-once
First set up the sqlalchemy engine.
from sqlalchemy import create_engine, inspect, text
from sqlalchemy.engine import url
connect_url = url.URL(
'oracle',
username='db_username',
password='db_password',
host='db_host',
port='db_port',
query=dict(service_name='db_service_name'))
engine = create_engine(connect_url)
try:
engine.connect()
except Exception as error:
print(error)
return
Like others have mentioned, you can use the inspect method to get the table names.
But in my case, the list of tables returned by the inspect method was incomplete.
So, I found out another way to find table names by using pure SQL queries in sqlalchemy.
query = text("SELECT table_name FROM all_tables where owner = '%s'"%str('db_username'))
table_name_data = self.session.execute(query).fetchall()
Just for sake of completeness of answer, here's the code to fetch table names by inspect method (if it works good in your case).
inspector = inspect(engine)
table_names = inspector.get_table_names()
Hey I created a small module that helps easily reflecting all tables in a database you connect to with SQLAlchemy, give it a look: EZAlchemy
from EZAlchemy.ezalchemy import EZAlchemy
DB = EZAlchemy(
db_user='username',
db_password='pezzword',
db_hostname='127.0.0.1',
db_database='mydatabase',
d_n_d='mysql' # stands for dialect+driver
)
# this function loads all tables in the database to the class instance DB
DB.connect()
# List all associations to DB, you will see all the tables in that database
dir(DB)
I'm proposing another solution as I was not satisfied by any of the previous in the case of postgres which uses schemas. I hacked this solution together by looking into the pandas source code.
from sqlalchemy import MetaData, create_engine
from typing import List
def list_tables(pg_uri: str, schema: str) -> List[str]:
with create_engine(pg_uri).connect() as conn:
meta = MetaData(conn, schema=schema)
meta.reflect(views=True)
return list(meta.tables.keys())
In order to get a list of all tables in your schema, you need to form your postgres database uri pg_uri (e.g. "postgresql://u:p#host/database" as in the zzzeek's answer) as well as the schema's name schema. So if we use the example uri as well as the typical schema public we would get all the tables and views with:
list_tables("postgresql://u:p#host/database", "public")
While reflection/inspection is useful, I had trouble getting the data out of the database. I found sqlsoup to be much more user-friendly. You create the engine using sqlalchemy and pass that engine to sqlsoup.SQlSoup. ie:
import sqlsoup
def create_engine():
from sqlalchemy import create_engine
return create_engine(f"mysql+mysqlconnector://{database_username}:{database_pw}#{database_host}/{database_name}")
def test_sqlsoup():
engine = create_engine()
db = sqlsoup.SQLSoup(engine)
# Note: database must have a table called 'users' for this example
users = db.users.all()
print(users)
if __name__ == "__main__":
test_sqlsoup()
If you're familiar with sqlalchemy then you're familiar with sqlsoup. I've used this to extract data from a wordpress database.

Categories