How to load data into existing database tables, using sqlalchemy? - python

I have my data loaded from excel files and organized as python dict where each key is database table name and its value is defined as list of dictionaries (the rows)
system_data = {table_name1:[{'col_1':val1, 'col2': val1...},
{'col_1':val2, 'col2': val2..}..],
table_name2:[{},{}..],[{},{}..]..}
This data needs to be loaded into existing database while picking table_names keys and values from system_data.
Additionally I use ordered_table list which I've created in specific order to avoid FK problems while data is being loaded.
Here is the code (one of the 1000 versions I've tried):
from sqlalchemy import create_engine
from sqlalchemy.sql import insert
def alchemy_load():
system_data = load_current_system_data()
engine = create_engine('mysql+pymysql://username:password#localhost/my_db')
conn = engine.connect()
for table_name in ordered_tables:
conn.execute(insert(table_name, system_data[table_name]))
print("System's Data successfully loaded into Database!")
This function yield a following error:
"TypeError: 'method' object is not iterable"
I've wasted almost all day on this stuff (((
All the online examples describe the situation when a user uses MetaData and creates its own tables... There is nothing about how to actually add data into existing tables.
There is a solution to my problem using "dataset" library.
The code:
import dataset
def current_data():
db = dataset.connect(url='mysql+pymysql://user:pass#localhost/my_db')
system_data = load_current_system_data()
for table_name in ordered_tables:
db[table_name].insert_many(system_data[table_name])
print("System's Data successfully loaded into Database!")
BUT, I have no idea how to implement this code using sqlalchemy...
Any help will be appreciated.

One possible solution using SQLAlchemy metadata would go like this:
In [7]:
from sqlalchemy.schema import MetaData
meta = MetaData()
meta.reflect(bind=engine)
In [20]:
for t in meta.tables:
for x in engine.execute(meta.tables[t].select()):
print x
(1, u'one')
(2, u'two')
(1, 1, 1)
(2, 1, 2)
(3, 2, 0)
(4, 2, 4)
(I use select instead of insert and apply it to a silly database I've got for trials.)
Hope it helps.
EDIT: After comments, I add some clarification.
In MetaData(), tables is a dictionary of the tables in the schema. The dictionary goes by table name, and it is actually very similar to the dictionary in your code. You could iterate the metadata like this,
for table_name, table in meta.tables.items():
for x in engine.execute(table.select()):
print x
or, trying to adapt it into your code, you could just do something like,
for table_name in ordered_tables:
conn.execute(meta.tables[table_name].insert_many(system_data[table_name]))

This is the solution I used:
from sqlalchemy import create_engine, MetaData
# from myplace import load_current_system_data() and other relevant functions
def alchemy_current():
system_data = load_current_system_data()
engine = create_engine('mysql+pymysql://user:password#localhost/your_db_name')
meta = MetaData()
meta.reflect(bind=engine)
conn = engine.connect()
for table_name in ordered_tables:
conn.execute(meta.tables[table_name].insert().values(system_data[table_name]))
conn.close()
# engine.dispose()
print("System's Data successfully loaded into Database!")
All this should work, assuming that one have:
Existing mysql database.
An organized data as I've described prior in this question.
An ordered table_name_list in order to keep referential integrity and avoid FK problems.

you could import text:
from sqlalchemy.sql import text
and then execute the following
conn.execute(text("mysql commands goes in here"))
example for insert:
conn.execute(text("INSERT INTO `table_name`(column_1,column_2,...) VALUES (value_1,value_2,...);"))

Related

Writing pandas dataframe into SQL server table - no result and no error

Here is my code
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("connection string")
conn_obj = engine.connect()
my_df = pd.DataFrame({'col1': ['29199'], 'date_created': ['2022-06-29 17:15:49.776867']})
my_df.to_sql('SomeSQLTable', conn_obj, if_exists='append', index = False)
I also created SomeSQLTable with script:
CREATE TABLE SomeSQLTable(
col1 nvarchar(90),
date_created datetime2)
GO
Everything runs fine, but no records are inserted into SQL table and no errors are displayed. I am not sure how to troubleshoot. conn_obj works fine, I was able to pull data.
I don't think it's exactly the answer but I don't have the privileges of commenting right now.
First of all, the pd.to_sql() returns the number of rows affected by the operation, can you please check that?
Lastly, you are defining the data types in the table creation, it could be a problem of casting the data types. I never create the table through sql as pd.to_sql() can create it if needed.
Thirdly, Please check on the table name, there could be an issue with the pascal case in some db's.

Bulk insert many rows using sqlalchemy

I want to insert thousands of rows in to Oracle db using Python. For this I am trying to insert bulk_insert_mappings method of a sqlalchemy session. I am following this tutorial where it shows how to load a csv file into the database fast. I failed because bulk_insert_mappings expects a mapper object as well which they don't pass.
The code to create the connection and the mapping without a csv:
from sqlalchemy.sql import select, sqltypes
from sqlalchemy.schema import CreateSchema
from sqlalchemy import create_engine, MetaData, Table, inspect, engine, Column, String, DateTime, insert
from sqlalchemy.orm import sessionmaker
from sqlalchemy_utils import get_mapper
import pytz
import datetime
import pandas as pd
engine_url = engine.URL.create(
drivername='oracle',
username='ADMIN',
password='****',
host='***',
database='****',
)
oracle_engine = create_engine(engine_url, echo=False)
Session = sessionmaker(bind=oracle_engine)
base = datetime.datetime.today().replace(tzinfo=pytz.utc)
date_list = [base - datetime.timedelta(days=x) for x in range(20)]
df = pd.DataFrame(date_list, columns = ['date_time'])
I use the following line of code to create the table if doesnt exist:
df[:0].to_sql('test_insert_df', oracle_engine, schema='ADMIN', if_exists='replace')
The I used this line to insert data into the table:
with Session() as session:
session.bulk_insert_mappings(df.to_dict('records'))
The traceback I receive is the following:
TypeError: bulk_insert_mappings() missing 1 required positional argument: 'mappings'
How can I create the mapper if I dont use the sqlalchemy ORM to create the table? Looking at this question I know how to create the mapper object with a sqlalchemy model but not otherwise.
Also I have the option of inserting using bulk_insert_objects method but that would also need a model which I dont have
PS: I am doing this because I want to insert many rows into a Oracle database using sqlalchemy. If you have any better solution it is also welcome.
The simplest way to insert would be to use the dataframe's to_sql method, but if you want to use SQLAlchemy you can use its core features to insert the data. There's no need to use the ORM in your use case. Something like this should work, assuming you are using SQLAlchemy 1.4 and you can arrange your data as a list of dicts:
import sqlalchemy as sa
...
tbl = sa.Table('test_insert_of', sa.MetaData(), autoload_with=engine)
ins = tbl.insert()
with engine.begin() as conn:
# This will automatically commit
conn.execute(ins, list_of_dicts)
If you have a lot of data you might want to insert in batches.

Insert 2d Array into Table without importing sqlalchemy Table Column etc objects?

I'm writing an app in Python and part of it includes an api that needs to interact with a MySQL database. Coming from sqlite3 to sqlalchemy, there are parts of the workflow that seem a bit too verbose for my taste and wasn't sure if there was a way to simplify the process.
Sqlite3 Workflow
If I wanted to take a list from Python and insert it into a table I would use the approach below
# connect to db
con = sqlite3.connect(":memory:")
cur = con.cursor()
# create table
cur.execute("create table lang (name, first_appeared)")
# prep data to be inserted
lang_list = [
("Fortran", 1957),
("Python", 1991),
("Go", 2009)
]
# add data to table
cur.executemany("insert into lang values (?, ?)", lang_list)
con.commit()
SqlAlchemy Workflow
In sqlalchemy, I would have to import the Table, Column, String, Integer Metadata etc objects and do something like this
# connect to db
engine = create_engine("mysql+pymysql://....")
# (re)create table, seems like this needs to be
# done every time I want to insert anything into it?
metadata = MetaData()
metadata.reflect(engine, only=['lang'])
table = Table('lang', meta,
Column('name', String),
Column('first_appeared', Integer),
autoload=True, autoload_with=engine)
# prep data to be inserted
lang_list = [
{'first_appeared': 1957, 'name': 'Fortran'},
{'first_appeared': 1991, 'name': 'Python'},
{'first_appeared': 2009, 'name': 'Go'}
]
# add data to table
engine.execute(table.insert(), lang_list)
Question
Is there a way to add data to a table in sqlalchemy without having to use Metadata, Table and Column objects? Specifically just using the connection, a statement and the list so all that needs to be run is execute?
I want to do as little sql work in Python as possible and this seems too verbose for my taste.
Possible different route
I could use a list comprehension to transform the list into one long INSERT statement so the final query looks like this
statement = """
INSERT INTO lang
VALUES ("Fortran", 1957),
("Python", 1991),
("Go", 2009);"""
con.execute(statement)
- but wasn't sure if sqlalchemy had a simple equivalent to sqlite3's executemany for inserts and a list without having to incorporate all these objects every time in order to do so.
If a list comprehension -> big statement -> execute was the simplest way to go in that regard then that's fine, I am just new to sqlalchemy and had been using sqlite3 up until this point.
For clarification, in my actual code the connection is already using the appropriate database and the tables themselves exist - the code snippets used above have nothing to do with the actual data/tables I'm working with and are just for reproducibility/testing sake. It's the workflow for adding to them that felt verbose when I had to reconstruct the tables with imported objects just to add to them.
I didn't know SQLite allowed weakly typed columns as your demonstrated in your example. As far as I know most other databases, mysql and postgresql, will require strongly typed columns. Usually the table metadata is either reflected or pre-defined and used. Sort of like type definitions in a statically typed language. SQLAlchemy will use these types to determine how to properly format the SQL statements. Ie. wrapping strings with quotes and NOT wrapping integers with quotes.
In your mysql example you should be able to use the table straight off the metadata with metadata.tables["lang"], they call this reflecting-all-tables-at-once in the docs. This assumes the table is already defined in the mysql database. You only need to define the table columns if you need to override the reflected table's definition, as they do in the overriding-reflected-columns docs.
The docs state that this should utilize executemany and should work if you reflected the table from a database that already had it defined:
engine = create_engine("mysql+pymysql://....")
metadata = MetaData()
# Pull in table definitions from database, only lang table.
metadata.reflect(engine, only=['lang'])
# prep data to be inserted
lang_list = [
{'first_appeared': 1957, 'name': 'Fortran'},
{'first_appeared': 1991, 'name': 'Python'},
{'first_appeared': 2009, 'name': 'Go'}
]
# add data to table
engine.execute(metadata.tables["lang"].insert(), lang_list)

How to convert select_from object into a new table in sqlalchemy

I have a database that contains two tables in the data, cdr and mtr. I want a join of the two based on columns ego_id and alter_id, and I want to output this into another table in the same database, complete with the column names, without the use of pandas.
Here's my current code:
mtr_table = Table('mtr', MetaData(), autoload=True, autoload_with=engine)
print(mtr_table.columns.keys())
cdr_table = Table('cdr', MetaData(), autoload=True, autoload_with=engine)
print(cdr_table.columns.keys())
query = db.select([cdr_table])
query = query.select_from(mtr_table.join(cdr_table,
((mtr_table.columns.ego_id == cdr_table.columns.ego_id) &
(mtr_table.columns.alter_id == cdr_table.columns.alter_id))),
)
results = connection.execute(query).fetchmany()
Currently, for my test code, what I do is to convert the results as a pandas dataframe and then put it back in the original SQL database:
df = pd.DataFrame(results, columns=results[0].keys())
df.to_sql(...)
but I have two problems:
loading everything into a pandas dataframe would require too much memory when I start working with the full database
the columns names are (apparently) not included in results and would need to be accessed by results[0].keys()
I've checked this other stackoverflow question but it uses the ORM framework of sqlalchemy, which I unfortunately don't understand. If there's a simpler way to do this (like pandas' to_sql), I think this would be easier.
What's the easiest way to go about this?
So I found out how to do this via CREATE TABLE AS:
query = """
CREATE TABLE mtr_cdr AS
SELECT
mtr.idx,cdr.*
FROM mtr INNER JOIN cdr
ON (mtr.ego_id = cdr.ego_id AND mtr.alter_id = cdr.alter_id)""".format(new_table)
with engine.connect() as conn:
conn.execute(query)
The query string seems to be highly sensitive to parentheses though. If I put a parentheses enclosing the whole SELECT...FROM... statement, it doesn't work.

Return last inserted ID sqlalchemy sqlite

I'm trying to get the last inserted row id from an sqlalchemy insert with sqlite. It appears this should be easy but I haven't managed to figure it out yet and didn't find anything in my searches so far. I'm a new to python and I know there are some similar posts so I hope this is not a repeat. Below is some simple sample script:
from sqlalchemy import *
engine = create_engine('sqlite:///myTest.db',echo=False)
metadata = MetaData()
dbTable1 = Table('dbTable1', metadata,Column('ThisNum', Integer),Column('ThisString', String))
metadata.create_all(engine)
dbConn = engine.connect()
insertList={}
insertList['ThisNum']=1
insertList['ThisString']='test'
insertStatement = dbTable1.insert().values(insertList)
lastInsertID = dbConn.execute(insertStatement).inserted_primary_key
The value returned is empty. I get the same result using
lastInsertID = dbConn.execute(insertStatement).last_inserted_ids()
I can get the last rowid using a separate statement after the insert:
lastInsertID = dbConn.execute('SELECT last_insert_rowid();')
But this would not guarantee the database had not been accessed in between the executions so the returned ID might not be correct. Lastly I tried executing the insert and select statements in one execution for instance:
lastInsertID = dbConn.execute('INSERT INTO "dbTable1" ("ThisNum", "ThisString") VALUES (1, "test"); SELECT last_insert_rowid();')
But this gives the error: sqlite3.Warning: You can only execute one statement at a time.
Any help would be appreciated. Thanks.

Categories