Specify Oracle Schema for SQLAlchemy Engine - python

I need to create a SQLAlchemy engine for a database, that doesn't use the default schema. What I want to be able to do is something like this:
from sqlalchemy import create_engine
string = "oracle+cx_oracle://batman:batpassword#batcave.com:1525/some_database"
engine = create_engine(string, schema="WEIRD_SCHEMA")
tables = engine.table_names()
Is there a way to do this?
I'm working with some legacy code, that uses engine.table_names() method.

Related

How to convert SQL Query to Pandas DataFrame using SQLAlchemy ORM?

According to SQLAlchemy documentation you are supposed to use Session object when executing SQL statements. But using a Session with Pandas .read_sql, gives an error: AttributeError 'Session' object has no attribute 'cursor'.
However using the Connection object works even with the ORM Mapped Class:
with ENGINE.connect() as conn:
df = pd.read_sql_query(
sqlalchemy.select(MeterValue),
conn
)
Where MeterValue is a Mapped Class.
This doesn't feel like the correct solution, because SQLAlchemy documentation says you are not supposed to use engine connection with ORM. I just can't find out why.
Does anyone know if there is any issue using the connection instead of Session with ORM Mapped Class?
What is the correct way to read sql in to a DataFrame using SQLAlchemy ORM?
I found a couple of old answers on this where you use the engine directly as the second argument, or use session.bind and so on. Nothing works.
Just reading the documentation of pandas.read_sql_query:
pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)
Parameters:
sql: str SQL query or SQLAlchemy Selectable (select or text object)
SQL query to be executed.
con: SQLAlchemy connectable, str, or sqlite3 connection
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.
...
So pandas does allow a SQLAlchemy Selectable (e.g. select(MeterValue)) and a SQLAlchemy connectable (e.g. engine.connect()), so your code block is correct and pandas will handle the querying correctly.
with ENGINE.connect() as conn:
df = pd.read_sql_query(
sqlalchemy.select(MeterValue),
conn,
)

Bulk insert many rows using sqlalchemy

I want to insert thousands of rows in to Oracle db using Python. For this I am trying to insert bulk_insert_mappings method of a sqlalchemy session. I am following this tutorial where it shows how to load a csv file into the database fast. I failed because bulk_insert_mappings expects a mapper object as well which they don't pass.
The code to create the connection and the mapping without a csv:
from sqlalchemy.sql import select, sqltypes
from sqlalchemy.schema import CreateSchema
from sqlalchemy import create_engine, MetaData, Table, inspect, engine, Column, String, DateTime, insert
from sqlalchemy.orm import sessionmaker
from sqlalchemy_utils import get_mapper
import pytz
import datetime
import pandas as pd
engine_url = engine.URL.create(
drivername='oracle',
username='ADMIN',
password='****',
host='***',
database='****',
)
oracle_engine = create_engine(engine_url, echo=False)
Session = sessionmaker(bind=oracle_engine)
base = datetime.datetime.today().replace(tzinfo=pytz.utc)
date_list = [base - datetime.timedelta(days=x) for x in range(20)]
df = pd.DataFrame(date_list, columns = ['date_time'])
I use the following line of code to create the table if doesnt exist:
df[:0].to_sql('test_insert_df', oracle_engine, schema='ADMIN', if_exists='replace')
The I used this line to insert data into the table:
with Session() as session:
session.bulk_insert_mappings(df.to_dict('records'))
The traceback I receive is the following:
TypeError: bulk_insert_mappings() missing 1 required positional argument: 'mappings'
How can I create the mapper if I dont use the sqlalchemy ORM to create the table? Looking at this question I know how to create the mapper object with a sqlalchemy model but not otherwise.
Also I have the option of inserting using bulk_insert_objects method but that would also need a model which I dont have
PS: I am doing this because I want to insert many rows into a Oracle database using sqlalchemy. If you have any better solution it is also welcome.
The simplest way to insert would be to use the dataframe's to_sql method, but if you want to use SQLAlchemy you can use its core features to insert the data. There's no need to use the ORM in your use case. Something like this should work, assuming you are using SQLAlchemy 1.4 and you can arrange your data as a list of dicts:
import sqlalchemy as sa
...
tbl = sa.Table('test_insert_of', sa.MetaData(), autoload_with=engine)
ins = tbl.insert()
with engine.begin() as conn:
# This will automatically commit
conn.execute(ins, list_of_dicts)
If you have a lot of data you might want to insert in batches.

Postgres tables reflected with SQLAlchemy not recognizing columns for Join

I'm trying to learn SQLAlchemy and I feel like I must be missing something fundamental. Hopefully, this is a straightforward error on my part and someone can show me the link to the documentation hat explains what I'm doing wrong.
I have a Postgres database running in a Docker container on my local machine. I can connect to it and export queries to Python using psycopg2 with no issues.
I'm trying to recreate what I did using pyscopg2 using SQLAlchemy but I'm having trouble when I try to join two tables. My current code looks like this:
from sqlalchemy import *
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import select
conn = create_engine('postgresql://postgres:my_password#localhost/in_votes')
metadata = MetaData(conn)
metadata.reflect(bind = conn)
pop = metadata.tables['pop']
absentee = metadata.tables['absentee']
Session = sessionmaker(bind=conn)
session = Session()
session.query(pop).join(absentee, county == pop.county).all()
I'm trying to join the pop and absentee tables on the county field and I get the error:
NameError: name 'county' is not defined
I can view the columns in each table and loop through them to access the data.
Can someone clear this up for me and explain what I'm doing wrong?

dataframe.to_sql into Teradata (this user does not have permission to create on LABUSERS) Datalab Table name

I have an issue with the dataframe.to_sql when trying to use this function
The dataframe.to_sql does not recognize or separate the data lab name and the table name, instead it takes it all as a string to create a table. So it is trying to create it on the default root level and gives the error, this user does not have permission to create on LABUSERS.
from sqlalchemy import create_engine
engine = create_engine(f'teradata://{username}:{password}#tdprod:22/')
df.to_sql('data_lab.table_name', engine)
How can I use df.to_sql function and specify the datalab?
schema (str, optional)
Specify the schema (if database flavor supports this). If None, use default schema.
to_sql documentation

How to get database size in SQL Alchemy?

I am looking for a way to get the size of a database with SQL Alchemy. Ideally, it will be agnostic to which underlying type of database is used. Is this possible?
Edit:
By size, I mean total number of bytes that the database uses.
The way I would do is to find out if you can run a SQL query to get the answer. Then, you can just run this query via SQLAlchemy and get the result.
Currently, SQLAlchemy does not provide any convenience function to determine the table size in bytes. But you can execute an SQL statement. Caveat is that you have to use a statement that is specific to your type of SQL (MySQL, Postgres, etc)
Checking this answer, for MySQL you can execute a statement manually like:
from sqlalchemy import create_engine
connection_string = 'mysql+pymysql://...'
engine = create_engine(connection_string)
statement = 'SELECT table_schema, table_name, data_length, index_length FROM information_schema.tables'
with engine.connect() as con:
res = con.execute(statement)
size_of_table = res.fetchall()
For SQLite you can just check the entire database size with the os module:
import os
os.path.getsize('sqlite.db')
For PostgreSQL you can do it like this:
from sqlalchemy.orm import Session
dbsession: Session
engine = dbsession.bind
database_name = engine.url.database
# https://www.a2hosting.com/kb/developer-corner/postgresql/determining-the-size-of-postgresql-databases-and-tables
# Return the database size in bytes
database_size = dbsession.execute(f"""SELECT pg_database_size('{database_name}')""").scalar()

Categories