Can we specify a different cursor when executing queries via SQLAlchemy.
I want to be able to use Realdict cursor (as mentioned in this article: http://alexharvey.eu/database/postgresql/postgresql-query-result-as-python-dictionary/) with SQLAlchemy
This is a sample code of how the query is executed:
with connection:
result = connection.execute(select(columns_to_select).where(clause))
row = result.fetchone()
Related
According to SQLAlchemy documentation you are supposed to use Session object when executing SQL statements. But using a Session with Pandas .read_sql, gives an error: AttributeError 'Session' object has no attribute 'cursor'.
However using the Connection object works even with the ORM Mapped Class:
with ENGINE.connect() as conn:
df = pd.read_sql_query(
sqlalchemy.select(MeterValue),
conn
)
Where MeterValue is a Mapped Class.
This doesn't feel like the correct solution, because SQLAlchemy documentation says you are not supposed to use engine connection with ORM. I just can't find out why.
Does anyone know if there is any issue using the connection instead of Session with ORM Mapped Class?
What is the correct way to read sql in to a DataFrame using SQLAlchemy ORM?
I found a couple of old answers on this where you use the engine directly as the second argument, or use session.bind and so on. Nothing works.
Just reading the documentation of pandas.read_sql_query:
pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)
Parameters:
sql: str SQL query or SQLAlchemy Selectable (select or text object)
SQL query to be executed.
con: SQLAlchemy connectable, str, or sqlite3 connection
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.
...
So pandas does allow a SQLAlchemy Selectable (e.g. select(MeterValue)) and a SQLAlchemy connectable (e.g. engine.connect()), so your code block is correct and pandas will handle the querying correctly.
with ENGINE.connect() as conn:
df = pd.read_sql_query(
sqlalchemy.select(MeterValue),
conn,
)
I am looking for a way to get the size of a database with SQL Alchemy. Ideally, it will be agnostic to which underlying type of database is used. Is this possible?
Edit:
By size, I mean total number of bytes that the database uses.
The way I would do is to find out if you can run a SQL query to get the answer. Then, you can just run this query via SQLAlchemy and get the result.
Currently, SQLAlchemy does not provide any convenience function to determine the table size in bytes. But you can execute an SQL statement. Caveat is that you have to use a statement that is specific to your type of SQL (MySQL, Postgres, etc)
Checking this answer, for MySQL you can execute a statement manually like:
from sqlalchemy import create_engine
connection_string = 'mysql+pymysql://...'
engine = create_engine(connection_string)
statement = 'SELECT table_schema, table_name, data_length, index_length FROM information_schema.tables'
with engine.connect() as con:
res = con.execute(statement)
size_of_table = res.fetchall()
For SQLite you can just check the entire database size with the os module:
import os
os.path.getsize('sqlite.db')
For PostgreSQL you can do it like this:
from sqlalchemy.orm import Session
dbsession: Session
engine = dbsession.bind
database_name = engine.url.database
# https://www.a2hosting.com/kb/developer-corner/postgresql/determining-the-size-of-postgresql-databases-and-tables
# Return the database size in bytes
database_size = dbsession.execute(f"""SELECT pg_database_size('{database_name}')""").scalar()
I am executing a number of SELECT queries on a postgres database using psycopg2, but I am getting ERROR: Out of shared memory. It suggests that I should increase max_locks_per_transaction., but this confuses me because each SELECT query is operating on only one table, and max_locks_per_transaction is already set to 512, 8 times the default.
I am using TimescaleDB, which could be the result of a larger than normal number of locks (one for each chunk rather than one for each table, maybe), but this still can't explain running out when so many are allowed. I'm assuming what is happening here is that all the queries are all being run as part of one transaction.
The code I am using looks something as follows.
db = DatabaseConnector(**connection_params)
tables = db.get_table_list()
for table in tables:
result = db.query(f"""SELECT a, b, COUNT(c) FROM {table} GROUP BY a, b""")
print(result)
Where db.query is defined as:
def query(self, sql):
with self._connection.cursor() as cur:
cur.execute(sql)
return_value = cur.fetchall()
return return_value
and self._connection is:
self._connection = psycopg2.connect(**connection_params)
Do I need to explicitly end the transaction in some way to free up locks? And how can I go about doing this in psycopg2? I would have assumed that there was an implicit end to the transaction when the cursor is closed on __exit__. I know if I was inserting or deleting rows I would use COMMIT at the end, but it seems strange to use as I am not changing the table.
UPDATE: When I explicitly open and close the connection in the loop, the error does not show. However, I assume there is a better way to end the transaction after each SELECT than this.
I am using MySql. I was however able to find ways to do this using sqlalchemy orm but not using expression language.So I am specifically looking for core / expression language based solutions. The databases lie on the same server
This is how my connection looks like:
connection = engine.connect().execution_options(
schema_translate_map = {current_database_schema: new_database_schema})
engine_1=create_engine("mysql+mysqldb://root:user#*******/DB_1")
engine_2=create_engine("mysql+mysqldb://root:user#*******/DB_2",pool_size=5)
metadata_1=MetaData(engine_1)
metadata_2=MetaData(engine_2)
metadata.reflect(bind=engine_1)
metadata.reflect(bind=engine_2)
table_1=metadata_1.tables['table_1']
table_2=metadata_2.tables['table_2']
query=select([table_1.c.name,table_2.c.name]).select_from(join(table_2,table_1.c.id==table_2.c.id,'left')
result=connection.execute(query).fetchall()
However, when I try to join tables from different databases it throws an error obviously because the connection belongs to one of the databases. And I haven't tried anything else because I could not find a way to solve this.
Another way to put the question (maybe) is 'how to connect to multiple databases using a single connection in sqlalchemy core'.
Applying the solution from here to Core only you could create a single Engine object that connects to your server, but without defaulting to one database or the other:
engine = create_engine("mysql+mysqldb://root:user#*******/")
and then using a single MetaData instance reflect the contents of each schema:
metadata = MetaData(engine)
metadata.reflect(schema='DB_1')
metadata.reflect(schema='DB_2')
# Note: use the fully qualified names as keys
table_1 = metadata.tables['DB_1.table_1']
table_2 = metadata.tables['DB_2.table_2']
You could also use one of the databases as the "default" and pass it in the URL. In that case you would reflect tables from that database as usual and pass the schema= keyword argument only when reflecting the other database.
Use the created engine to perform the query:
query = select([table_1.c.name, table_2.c.name]).\
select_from(outerjoin(table1, table_2, table_1.c.id == table_2.c.id))
with engine.begin() as connection:
result = connection.execute(query).fetchall()
I am using sqlalchemy for connection pooling only (need to call existing procs) and want to return a REF CURSOR which is an out parameter.
There seems to be no cursor in sqlalchemy to do this.
Any advice greatly appreciated.
Gut feel - you may have to dive down a lower level than SQLAlchemy, perhaps to the underlying cx_oracle classes.
From an answer provided by Gerhard Häring on another forum :
import cx_Oracle
con = cx_Oracle.connect("me/secret#tns")
cur = con.cursor()
outcur = con.cursor()
cur.execute("""
BEGIN
MyPkg.MyProc(:cur);
END;""", cur=outcur)
for row in out_cur:
print row
I would presume that as SQLAlchemy uses cx_oracle under the hood there should be some way to use the same pooled connections.
Is there any option to wrap your function returning the REF CURSOR in a view on the Oracle side?? (Provided the shape of the REF CURSOR doesn't change, and you can somehow get the right parameters into your function - possibly by sticking them in as session variables, or in a temp table, this should work - I've used this approach to retrieve data from a REF CURSOR function to a language that only supports a limited subset of Oracle features).