I am using sqlalchemy for connection pooling only (need to call existing procs) and want to return a REF CURSOR which is an out parameter.
There seems to be no cursor in sqlalchemy to do this.
Any advice greatly appreciated.
Gut feel - you may have to dive down a lower level than SQLAlchemy, perhaps to the underlying cx_oracle classes.
From an answer provided by Gerhard Häring on another forum :
import cx_Oracle
con = cx_Oracle.connect("me/secret#tns")
cur = con.cursor()
outcur = con.cursor()
cur.execute("""
BEGIN
MyPkg.MyProc(:cur);
END;""", cur=outcur)
for row in out_cur:
print row
I would presume that as SQLAlchemy uses cx_oracle under the hood there should be some way to use the same pooled connections.
Is there any option to wrap your function returning the REF CURSOR in a view on the Oracle side?? (Provided the shape of the REF CURSOR doesn't change, and you can somehow get the right parameters into your function - possibly by sticking them in as session variables, or in a temp table, this should work - I've used this approach to retrieve data from a REF CURSOR function to a language that only supports a limited subset of Oracle features).
Related
Now, I have a study about python sqlite3 database. I think it is very simple problem but not allow next step. Could help me?
There is print OK on vscode terminal, but not revised to DB file. I'm searching several times but I can not fix it.
If I execute the code, it not sorting on DB files.
import sqlite3
conn = sqlite3.connect('sqliteDB1.db')
cursor = conn.cursor()
cursor.execute("SELECT * FROM member")
temp123 = cursor. fetchall()
print(temp123)
cursor.execute("SELECT * FROM member ORDER BY -code")
temp321 = cursor.fetchall()
conn.commit
print(temp321)
conn.close()
A select statement just returns data from a database, it will not modify it. Moreover, tables in SQL databases are inherently unordered sets. They have no intrinsic value, and you should never rely on the order of the rows that happens to be returned unless you explicitly sort it with an order by clause.
I am looking for a way to get the size of a database with SQL Alchemy. Ideally, it will be agnostic to which underlying type of database is used. Is this possible?
Edit:
By size, I mean total number of bytes that the database uses.
The way I would do is to find out if you can run a SQL query to get the answer. Then, you can just run this query via SQLAlchemy and get the result.
Currently, SQLAlchemy does not provide any convenience function to determine the table size in bytes. But you can execute an SQL statement. Caveat is that you have to use a statement that is specific to your type of SQL (MySQL, Postgres, etc)
Checking this answer, for MySQL you can execute a statement manually like:
from sqlalchemy import create_engine
connection_string = 'mysql+pymysql://...'
engine = create_engine(connection_string)
statement = 'SELECT table_schema, table_name, data_length, index_length FROM information_schema.tables'
with engine.connect() as con:
res = con.execute(statement)
size_of_table = res.fetchall()
For SQLite you can just check the entire database size with the os module:
import os
os.path.getsize('sqlite.db')
For PostgreSQL you can do it like this:
from sqlalchemy.orm import Session
dbsession: Session
engine = dbsession.bind
database_name = engine.url.database
# https://www.a2hosting.com/kb/developer-corner/postgresql/determining-the-size-of-postgresql-databases-and-tables
# Return the database size in bytes
database_size = dbsession.execute(f"""SELECT pg_database_size('{database_name}')""").scalar()
I am using MySql. I was however able to find ways to do this using sqlalchemy orm but not using expression language.So I am specifically looking for core / expression language based solutions. The databases lie on the same server
This is how my connection looks like:
connection = engine.connect().execution_options(
schema_translate_map = {current_database_schema: new_database_schema})
engine_1=create_engine("mysql+mysqldb://root:user#*******/DB_1")
engine_2=create_engine("mysql+mysqldb://root:user#*******/DB_2",pool_size=5)
metadata_1=MetaData(engine_1)
metadata_2=MetaData(engine_2)
metadata.reflect(bind=engine_1)
metadata.reflect(bind=engine_2)
table_1=metadata_1.tables['table_1']
table_2=metadata_2.tables['table_2']
query=select([table_1.c.name,table_2.c.name]).select_from(join(table_2,table_1.c.id==table_2.c.id,'left')
result=connection.execute(query).fetchall()
However, when I try to join tables from different databases it throws an error obviously because the connection belongs to one of the databases. And I haven't tried anything else because I could not find a way to solve this.
Another way to put the question (maybe) is 'how to connect to multiple databases using a single connection in sqlalchemy core'.
Applying the solution from here to Core only you could create a single Engine object that connects to your server, but without defaulting to one database or the other:
engine = create_engine("mysql+mysqldb://root:user#*******/")
and then using a single MetaData instance reflect the contents of each schema:
metadata = MetaData(engine)
metadata.reflect(schema='DB_1')
metadata.reflect(schema='DB_2')
# Note: use the fully qualified names as keys
table_1 = metadata.tables['DB_1.table_1']
table_2 = metadata.tables['DB_2.table_2']
You could also use one of the databases as the "default" and pass it in the URL. In that case you would reflect tables from that database as usual and pass the schema= keyword argument only when reflecting the other database.
Use the created engine to perform the query:
query = select([table_1.c.name, table_2.c.name]).\
select_from(outerjoin(table1, table_2, table_1.c.id == table_2.c.id))
with engine.begin() as connection:
result = connection.execute(query).fetchall()
I'm trying to replace some old MSSQL stored procedures with python, in an attempt to take some of the heavy calculations off of the sql server. The part of the procedure I'm having issues replacing is as follows
UPDATE mytable
SET calc_value = tmp.calc_value
FROM dbo.mytable mytable INNER JOIN
#my_temp_table tmp ON mytable.a = tmp.a AND mytable.b = tmp.b AND mytable.c = tmp.c
WHERE (mytable.a = some_value)
and (mytable.x = tmp.x)
and (mytable.b = some_other_value)
Up to this point, I've made some queries with SQLAlchemy, stored those data in Dataframes, and done the requisite calculations on them. I don't know now how to put the data back into the server using SQLAlchemy, either with raw SQL or function calls. The dataframe I have on my end would essentially have to work in the place of the temporary table created in MSSQL Server, but I'm not sure how I can do that.
The difficulty is of course that I don't know of a way to join between a dataframe and a mssql table, and I'm guessing this wouldn't work so I'm looking for a workaround
As the pandas doc suggests here :
from sqlalchemy import create_engine
engine = create_engine("mssql+pyodbc://user:password#DSN", echo = False)
dataframe.to_sql('tablename', engine , if_exists = 'replace')
engine parameter for msSql is basically the connection string check it here
if_exist parameter is a but tricky since 'replace' actually drops the table first and then recreates it and then inserts all data at once.
by setting the echo attribute to True it shows all background logs and sql's.
I have a 1,000,000 x 50 Pandas DataFrame that I am currently writing to a SQL table using:
df.to_sql('my_table', con, index=False)
It takes an incredibly long time. I've seen various explanations about how to speed up this process online, but none of them seem to work for MSSQL.
If I try the method in:
Bulk Insert A Pandas DataFrame Using SQLAlchemy
then I get a no attribute copy_from error.
If I try the multithreading method from:
http://techyoubaji.blogspot.com/2015/10/speed-up-pandas-tosql-with.html
then I get a QueuePool limit of size 5 overflow 10 reach, connection timed out error.
Is there any easy way to speed up to_sql() to an MSSQL table? Either via BULK COPY or some other method, but entirely from within Python code?
I've used ctds to do a bulk insert that's a lot faster with SQL server. In example below, df is the pandas DataFrame. The column sequence in the DataFrame is identical to the schema for mydb.
import ctds
conn = ctds.connect('server', user='user', password='password', database='mydb')
conn.bulk_insert('table', (df.to_records(index=False).tolist()))
in pandas 0.24 you can use method ='multi' with chunk size of 1000 which is the sql server limit
chunksize=1000, method='multi'
https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method
New in version 0.24.0.
The parameter method controls the SQL insertion clause used. Possible values are:
None: Uses standard SQL INSERT clause (one per row).
'multi': Pass multiple values in a single INSERT clause. It uses a special SQL syntax not supported by all backends. This usually provides better performance for analytic databases like Presto and Redshift, but has worse performance for traditional SQL backend if the table contains many columns. For more information check the SQLAlchemy documention.
even I had the same issue so I applied sqlalchemy with fast execute many.
from sqlalchemy import event, create_engine
engine = create_egine('connection_string_with_database')
#event.listens_for(engine, 'before_cursor_execute')
def plugin_bef_cursor_execute(conn, cursor, statement, params, context,executemany):
if executemany:
cursor.fast_executemany = True # replace from execute many to fast_executemany.
cursor.commit()
always make sure that the given function should be present after the engine variable and before cursor execute.
conn = engine.execute()
df.to_sql('table', con=conn, if_exists='append', index=False) # for reference go to the pandas to_sql documentation.