Bulk insert many rows using sqlalchemy - python

I want to insert thousands of rows in to Oracle db using Python. For this I am trying to insert bulk_insert_mappings method of a sqlalchemy session. I am following this tutorial where it shows how to load a csv file into the database fast. I failed because bulk_insert_mappings expects a mapper object as well which they don't pass.
The code to create the connection and the mapping without a csv:
from sqlalchemy.sql import select, sqltypes
from sqlalchemy.schema import CreateSchema
from sqlalchemy import create_engine, MetaData, Table, inspect, engine, Column, String, DateTime, insert
from sqlalchemy.orm import sessionmaker
from sqlalchemy_utils import get_mapper
import pytz
import datetime
import pandas as pd
engine_url = engine.URL.create(
drivername='oracle',
username='ADMIN',
password='****',
host='***',
database='****',
)
oracle_engine = create_engine(engine_url, echo=False)
Session = sessionmaker(bind=oracle_engine)
base = datetime.datetime.today().replace(tzinfo=pytz.utc)
date_list = [base - datetime.timedelta(days=x) for x in range(20)]
df = pd.DataFrame(date_list, columns = ['date_time'])
I use the following line of code to create the table if doesnt exist:
df[:0].to_sql('test_insert_df', oracle_engine, schema='ADMIN', if_exists='replace')
The I used this line to insert data into the table:
with Session() as session:
session.bulk_insert_mappings(df.to_dict('records'))
The traceback I receive is the following:
TypeError: bulk_insert_mappings() missing 1 required positional argument: 'mappings'
How can I create the mapper if I dont use the sqlalchemy ORM to create the table? Looking at this question I know how to create the mapper object with a sqlalchemy model but not otherwise.
Also I have the option of inserting using bulk_insert_objects method but that would also need a model which I dont have
PS: I am doing this because I want to insert many rows into a Oracle database using sqlalchemy. If you have any better solution it is also welcome.

The simplest way to insert would be to use the dataframe's to_sql method, but if you want to use SQLAlchemy you can use its core features to insert the data. There's no need to use the ORM in your use case. Something like this should work, assuming you are using SQLAlchemy 1.4 and you can arrange your data as a list of dicts:
import sqlalchemy as sa
...
tbl = sa.Table('test_insert_of', sa.MetaData(), autoload_with=engine)
ins = tbl.insert()
with engine.begin() as conn:
# This will automatically commit
conn.execute(ins, list_of_dicts)
If you have a lot of data you might want to insert in batches.

Related

How to convert SQL Query to Pandas DataFrame using SQLAlchemy ORM?

According to SQLAlchemy documentation you are supposed to use Session object when executing SQL statements. But using a Session with Pandas .read_sql, gives an error: AttributeError 'Session' object has no attribute 'cursor'.
However using the Connection object works even with the ORM Mapped Class:
with ENGINE.connect() as conn:
df = pd.read_sql_query(
sqlalchemy.select(MeterValue),
conn
)
Where MeterValue is a Mapped Class.
This doesn't feel like the correct solution, because SQLAlchemy documentation says you are not supposed to use engine connection with ORM. I just can't find out why.
Does anyone know if there is any issue using the connection instead of Session with ORM Mapped Class?
What is the correct way to read sql in to a DataFrame using SQLAlchemy ORM?
I found a couple of old answers on this where you use the engine directly as the second argument, or use session.bind and so on. Nothing works.
Just reading the documentation of pandas.read_sql_query:
pandas.read_sql_query(sql, con, index_col=None, coerce_float=True, params=None, parse_dates=None, chunksize=None, dtype=None)
Parameters:
sql: str SQL query or SQLAlchemy Selectable (select or text object)
SQL query to be executed.
con: SQLAlchemy connectable, str, or sqlite3 connection
Using SQLAlchemy makes it possible to use any DB supported by that library. If a DBAPI2 object, only sqlite3 is supported.
...
So pandas does allow a SQLAlchemy Selectable (e.g. select(MeterValue)) and a SQLAlchemy connectable (e.g. engine.connect()), so your code block is correct and pandas will handle the querying correctly.
with ENGINE.connect() as conn:
df = pd.read_sql_query(
sqlalchemy.select(MeterValue),
conn,
)

Postgres tables reflected with SQLAlchemy not recognizing columns for Join

I'm trying to learn SQLAlchemy and I feel like I must be missing something fundamental. Hopefully, this is a straightforward error on my part and someone can show me the link to the documentation hat explains what I'm doing wrong.
I have a Postgres database running in a Docker container on my local machine. I can connect to it and export queries to Python using psycopg2 with no issues.
I'm trying to recreate what I did using pyscopg2 using SQLAlchemy but I'm having trouble when I try to join two tables. My current code looks like this:
from sqlalchemy import *
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import select
conn = create_engine('postgresql://postgres:my_password#localhost/in_votes')
metadata = MetaData(conn)
metadata.reflect(bind = conn)
pop = metadata.tables['pop']
absentee = metadata.tables['absentee']
Session = sessionmaker(bind=conn)
session = Session()
session.query(pop).join(absentee, county == pop.county).all()
I'm trying to join the pop and absentee tables on the county field and I get the error:
NameError: name 'county' is not defined
I can view the columns in each table and loop through them to access the data.
Can someone clear this up for me and explain what I'm doing wrong?

Is there a way to create tables in a DB using python Dictionary?

I have a large dictionary of items - ~400 columns. I have a script to detect and create the type of each item so I have {'age' = INT, "Name" = String,etc..) but I'm not sure how to use that to create a table in SQLAlchemy or directly creating the query?
I am using postgres but I am familiar with mysql & sqlite so anything that works for those I would be able to apply to my usecase.
What about this:
from sqlalchemy import create_engine, Table, Column, MetaData
metadata = MetaData()
fields = (Column(colname, coltype) for colname, coltype in your_dict.items())
t = Table(name, metadata, *fields)
engine = create_engine(database)
metadata.create_all(engine)
You need to have objects from sqlalchemy.sql.sqltypes rather than strings as values in your_dict:
from sqlalchemy.sql.sqltypes import String, Integer
See https://docs.sqlalchemy.org/en/13/core/type_basics.html for the whole list.

Specify Oracle Schema for SQLAlchemy Engine

I need to create a SQLAlchemy engine for a database, that doesn't use the default schema. What I want to be able to do is something like this:
from sqlalchemy import create_engine
string = "oracle+cx_oracle://batman:batpassword#batcave.com:1525/some_database"
engine = create_engine(string, schema="WEIRD_SCHEMA")
tables = engine.table_names()
Is there a way to do this?
I'm working with some legacy code, that uses engine.table_names() method.

How to load data into existing database tables, using sqlalchemy?

I have my data loaded from excel files and organized as python dict where each key is database table name and its value is defined as list of dictionaries (the rows)
system_data = {table_name1:[{'col_1':val1, 'col2': val1...},
{'col_1':val2, 'col2': val2..}..],
table_name2:[{},{}..],[{},{}..]..}
This data needs to be loaded into existing database while picking table_names keys and values from system_data.
Additionally I use ordered_table list which I've created in specific order to avoid FK problems while data is being loaded.
Here is the code (one of the 1000 versions I've tried):
from sqlalchemy import create_engine
from sqlalchemy.sql import insert
def alchemy_load():
system_data = load_current_system_data()
engine = create_engine('mysql+pymysql://username:password#localhost/my_db')
conn = engine.connect()
for table_name in ordered_tables:
conn.execute(insert(table_name, system_data[table_name]))
print("System's Data successfully loaded into Database!")
This function yield a following error:
"TypeError: 'method' object is not iterable"
I've wasted almost all day on this stuff (((
All the online examples describe the situation when a user uses MetaData and creates its own tables... There is nothing about how to actually add data into existing tables.
There is a solution to my problem using "dataset" library.
The code:
import dataset
def current_data():
db = dataset.connect(url='mysql+pymysql://user:pass#localhost/my_db')
system_data = load_current_system_data()
for table_name in ordered_tables:
db[table_name].insert_many(system_data[table_name])
print("System's Data successfully loaded into Database!")
BUT, I have no idea how to implement this code using sqlalchemy...
Any help will be appreciated.
One possible solution using SQLAlchemy metadata would go like this:
In [7]:
from sqlalchemy.schema import MetaData
meta = MetaData()
meta.reflect(bind=engine)
In [20]:
for t in meta.tables:
for x in engine.execute(meta.tables[t].select()):
print x
(1, u'one')
(2, u'two')
(1, 1, 1)
(2, 1, 2)
(3, 2, 0)
(4, 2, 4)
(I use select instead of insert and apply it to a silly database I've got for trials.)
Hope it helps.
EDIT: After comments, I add some clarification.
In MetaData(), tables is a dictionary of the tables in the schema. The dictionary goes by table name, and it is actually very similar to the dictionary in your code. You could iterate the metadata like this,
for table_name, table in meta.tables.items():
for x in engine.execute(table.select()):
print x
or, trying to adapt it into your code, you could just do something like,
for table_name in ordered_tables:
conn.execute(meta.tables[table_name].insert_many(system_data[table_name]))
This is the solution I used:
from sqlalchemy import create_engine, MetaData
# from myplace import load_current_system_data() and other relevant functions
def alchemy_current():
system_data = load_current_system_data()
engine = create_engine('mysql+pymysql://user:password#localhost/your_db_name')
meta = MetaData()
meta.reflect(bind=engine)
conn = engine.connect()
for table_name in ordered_tables:
conn.execute(meta.tables[table_name].insert().values(system_data[table_name]))
conn.close()
# engine.dispose()
print("System's Data successfully loaded into Database!")
All this should work, assuming that one have:
Existing mysql database.
An organized data as I've described prior in this question.
An ordered table_name_list in order to keep referential integrity and avoid FK problems.
you could import text:
from sqlalchemy.sql import text
and then execute the following
conn.execute(text("mysql commands goes in here"))
example for insert:
conn.execute(text("INSERT INTO `table_name`(column_1,column_2,...) VALUES (value_1,value_2,...);"))

Categories