Bulk insert using sqlalchemy Engine

Bulk insert using sqlalchemy Engine - python

Is there a way to bulk-insert/update values into a Microsoft SQLserver Database using Engine?
I have read several (very) old posts regarding this, and it seems not very easy to do (back then).
E.g in some examples we need to create a class, add those classes to a session and at last commit the session.
Isn't there a way like (pseudo) this:
from sqlalchemy import String, Integer, Float
values= [(1,"hello",2.5),(2,"world",10.5)] #values to insert
table = "my_schema.my_table" #Table name
col = ["id","statement","ratio"] #Name of the columns in the database
type = [Integer,String,Float] #Type of each value
engine = sqlalchemy.create_engine(connection_string)
with engine.session():
try:
engine.bulk_insert(table,values,col,type)
except:
engine.rollback()
or something else, instead of looping over engine.execute("INSERT INTO ...")?
I know I can use pandas.DataFrame.to_sql but since I want to be able to roll-back in case of errors etc. I won't use that

Related

Optional column in sqlAlchmey table

I need to import some sqlite databases to a postgresql data base unfortunately in some of the sqlite databases are some columns missing.
I'd like to use the same model for all the databases and if one column is missing, it should just initialize the variable corresponding to it to None.
I'd imagine the syntax would be like this:
import sqlalchemy as sql
import sqlalchemy.orm
TableBase = sql.orm.declarative_base()
class ExampleTbl(TableBase):
__tablename__ = "example"
# ... other columns
optional = sql.Column(sql.Text, if_not_exsits=None) # optional should be None if the column does not exists
# ... other columns
It probably isn't as easy as setting a flag but how could I achieve this?
EDIT:
I can actually find out if the column is in the table before querying it. So if sqlalchemy doesn't support it out of the box I could probably write a function that generates the required table that would match the one in the database but I would prefer something easier.
EDIT 2:
I'm going to do the approach that I suggested in the first edit, I've run in some problems so I'm going make a new question.

Solved: Adding new Column to ORM SQLAlchemy table in a volatile setting

I am working on a open source persistance layer for a MQTT-Broker https://github.com/volkerjaenisch/amqtt_db
Incoming MQTT messages are irregular blobs of data so usually the DB-Backend is some kind of object storage.
I do it the hard way and deserialize the blobs into typed data colums and store them into a fast relational database. My finally target will be timescaleDB but first I go via SQLAlchemy to access a wide bunch of DBs with one API.
MQTT messages are volatile (think not always complete) so the DB scheme has to adjust dynamically e.g. adding new columns for new information.
First Message:
Time: 1234
Temperature : 23.4
Second Message:
Time: 1245
Temperature : 23.6
Rel Hum : 87 %
I have used SQLalchemy ORM for more than a decade but always for quite static databases. So I am new to work dynmically.
Utilizing the ORM to build DB tables dynamically from the structure of incoming MQTT-Messages was quite doable and worked out perfect.
But currently I am stuck with the case of additional information in the MQTT-Packages that extends the tables with new columns.
What I did so far:
Utilizing sqlalchemy-migration it was quite easy to dynamically add new columns to the existing table in the DB. In the code "topic_cls" is the declarative class and "column_def" a col_name - type mapping.
from migrate.versioning.schema import Table as MiTable, Column as MiColumn
def add_new_colums(self, topic_cls, column_def):
table_name = str(topic_cls.__table__.name)
table = MiTable(table_name, self.metadata)
for col_name, col_type in column_def.items():
col = MiColumn(col_name, col_type)
col.create(table)
Works like a charm. But how to get this changes to the DB reflected back into declarative classes? I tried to get a new inspection of the table:
new_table = Table(topic_cls.__table__.name, self.metadata, autoload_with=self.engine)
This also works but it gives me a new table but not a declarative base.
So my stupid questions are:
Is this the right way to achive my goal?
How can I get a declarative class by inspecting an already existing table in a DB?
"Drop the ORM and use SQL" is not the answer I am looking for.
Cheers,
Volker

Found a solution but it is a bit of a hack.
new_table = Table("test/topic_growth", Base.metadata, autoload_with=self.engine)
Base.metadata.remove(topic_cls.__table__)
new_dcl = type(str(table_name), (Base,), {'__table__': new_table})
Base.metadata._add_table(table_name, None, new_table)
After you obtained the new table via inspection, remove the old table entry from the metadata.
Then generate a new declarative base with the new table and same table name.
At last add the new table to the metadata.

Too many server roundtrips w/ psycopg2

I am making a script, that should create a schema for each customer. I’m fetching all metadata from a database that defines how each customer’s schema should look like, and then create it. Everything is well defined, the types, names of tables, etc. A customer has many tables (fx, address, customers, contact, item, etc), and each table has the same metadata.
My procedure now:
get everything I need from the metadataDatabase.
In a for loop, create a table, and then Alter Table and add each metadata (This is done for each table).
Right now my script runs in about a minute for each customer, which I think is too slow. It has something to do with me having a loop, and in that loop, I’m altering each table.
I think that instead of me altering (which might be not so clever approach), I should do something like the following:
Note that this is just a stupid but valid example:
for table in tables:
con.execute("CREATE TABLE IF NOT EXISTS tester.%s (%s, %s);", (table, "last_seen date", "valid_from timestamp"))
But it gives me this error (it seems like it reads the table name as a string in a string..):
psycopg2.errors.SyntaxError: syntax error at or near "'billing'"
LINE 1: CREATE TABLE IF NOT EXISTS tester.'billing' ('last_seen da...

Consider creating tables with a serial type (i.e., autonumber) ID field and then use alter table for all other fields by using a combination of sql.Identifier for identifiers (schema names, table names, column names, function names, etc.) and regular format for data types which are not literals in SQL statement.
from psycopg2 import sql
# CREATE TABLE
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (ID serial)"""
cur.execute(sql.SQL(query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table")))
# ALTER TABLE
items = [("last_seen", "date"), ("valid_from", "timestamp")]
query = """ALTER TABLE {shm}.{tbl} ADD COLUMN {col} {typ}"""
for item in items:
# KEEP IDENTIFIER PLACEHOLDERS
final_query = query.format(shm="{shm}", tbl="{tbl}", col="{col}", typ=i[1])
cur.execute(sql.SQL(final_query).format(shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"),
col = sql.Identifier(item[0]))
Alternatively, use str.join with list comprehension for one CREATE TABLE:
query = """CREATE TABLE IF NOT EXISTS {shm}.{tbl} (
"id" serial,
{vals}
)"""
items = [("last_seen", "date"), ("valid_from", "timestamp")]
val = ",\n ".join(["{{}} {typ}".format(typ=i[1]) for i in items])
# KEEP IDENTIFIER PLACEHOLDERS
pre_query = query.format(shm="{shm}", tbl="{tbl}", vals=val)
final_query = sql.SQL(pre_query).format(*[sql.Identifier(i[0]) for i in items],
shm = sql.Identifier("tester"),
tbl = sql.Identifier("table"))
cur.execute(final_query)
SQL (sent to database)
CREATE TABLE IF NOT EXISTS "tester"."table" (
"id" serial,
"last_seen" date,
"valid_from" timestamp
)

However, this becomes heavy as there are too many server roundtrips.
How many tables with how many columns are you creating that this is slow? Could you ssh to a machine closer to your server and run the python there?
I don't get that error. Rather, I get an SQL syntax error. A values list is for conveying data. But ALTER TABLE is not about data, it is about metadata. You can't use a values list there. You need the names of the columns and types in double quotes (or no quotes) rather than single quotes. And you can't have a comma between name and type. And you can't have parentheses around each pair. And each pair needs to be introduced with "ADD", you can't have it just once. You are using the wrong tool for the job. execute_batch is almost the right tool, except it will use single quotes rather than double quotes around the identifiers. Perhaps you could add a flag to it tell it to use quote_ident.
Not only is execute_values the wrong tool for the job, but I think python in general might be as well. Why not just load from a .sql file?

How do I handle database columns with reserved characters in SQLAlchemy ORM?

I'm somewhat new to SQLAlchemy ORM, and I'm trying to select and then store data from a column within a view that has a forward slash in the name of the column.
The databases are mapped using the following:
source_engine = create_engine("...")
base = automap_base()
base.prepare(source_engine, reflect=True)
metadata = MetaData(self.engine)
table_1 = Table("table_1", self.metadata, autoload=True)
The second destination table is mapped the same way.
Then, I connect to this database, and I'm trying to select information from columns to copy into a different database:
source_table_session = Session(source_engine)
dest_table_session = Session(dest_engine)
table_1_data = table_1_session.query(table_1)
for instance in table_1_data:
newrow = dest_table.base.classes.dest_table()
newrow.Column1 = instance.Column1 # This works fine, column has normal name
But then, the problem is that there's a column in the view with the name "Slot/Port"
With a direct query, you can do:
select "Slot/Port" from source_database;
But in ORM, you can't just type:
newrow.Slot/Port = instance.Slot/Port
or
newrow.'Slot/Port' = instance.'Slot/Port'
That isn't going to be correct, and the following doesn't work either:
newrow.SlotPort = instance.SlotPort
AttributeError: 'result' object has no attribute 'SlotPort'
I have no control over how columns are named in the source database.
I find the SQLAlchemy documentation to be generally fragmented (only showing small snippets of code) and confusing, so I'm not sure if this is kind of thing is addressed or not. Is there a way to get around this limitation, or perhaps if the columns are already mapped to a valid name without a slash or a way to do so?

Thanks to #DeepSpace for helping me find the answer.
Instead of
newrow.whatever = instance.whatever
I needed:
setattr(newrow, 'Slot/Port', getattr(instance, 'Slot/Port'))

Proper use of MySQL full text search with SQLAlchemy

I would like to be able to full text search across several text fields of one of my SQLAlchemy mapped objects. I would also like my mapped object to support foreign keys and transactions.
I plan to use MySQL to run the full text search. However, I understand that MySQL can only run full text search on a MyISAM table, which does not support transactions and foreign keys.
In order to accomplish my objective I plan to create two tables. My code will look something like this:
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
name = Column(String(50))
description = Column(Text)
users_myisam = Table('users_myisam', Base.metadata,
Column('id', Integer),
Column('name', String(50)),
Column('description', Text),
mysql_engine='MyISAM')
conn = Base.metadata.bind.connect()
conn.execute("CREATE FULLTEXT INDEX idx_users_ftxt \
on users_myisam (name, description)")
Then, to search I will run this:
q = 'monkey'
ft_search = users_myisam.select("MATCH (name,description) AGAINST ('%s')" % q)
result = ft_search.execute()
for row in result: print row
This seems to work, but I have a few questions:
Is my approach of creating two tables to solve my problem reasonable? Is there a standard/better/cleaner way to do this?
Is there a SQLAlchemy way to create the fulltext index, or am I best to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
Looks like I have a SQL injection problem in my search/match against query. How can I do the select the "SQLAlchemy way" to fix this?
Is there a clean way to join the users_myisam select/match against right back to my user table and return actual User instances, since this is what I really want?
In order to keep my users_myisam table in sync with my mapped object user table, does it make sense for me to use a MapperExtension on my User class, and set the before_insert, before_update, and before_delete methods to update the users_myisam table appropriately, or is there some better way to accomplish this?
Thanks,
Michael

Is my approach of creating two tables to solve my problem reasonable?
Is there a standard/better/cleaner way to do this?
I've not seen this use case attempted before, as developers who value transactions and constraints tend to use Postgresql in the first place. I understand that may not be possible in your specific scenario.
Is there a SQLAlchemy way to create the fulltext index, or am I best
to just directly execute "CREATE FULLTEXT INDEX ..." as I did above?
conn.execute() is fine though if you want something slightly more integrated you can use the DDL() construct, read through http://docs.sqlalchemy.org/en/rel_0_8/core/schema.html?highlight=ddl#customizing-ddl for details
Looks like I have a SQL injection problem in my search/match against query. How can I do the
select the "SQLAlchemy way" to fix this?
note: this recipe is only for MATCH against multiple columns simultaneously - if you have just one column, use the match() operator more simply.
most basically you could use the text() construct:
from sqlalchemy import text, bindparam
users_myisam.select(
text("MATCH (name,description) AGAINST (:value)",
bindparams=[bindparam('value', q)])
)
more comprehensively you could define a custom construct:
from sqlalchemy.ext.compiler import compiles
from sqlalchemy.sql.expression import ClauseElement
from sqlalchemy import literal
class Match(ClauseElement):
def __init__(self, columns, value):
self.columns = columns
self.value = literal(value)
#compiles(Match)
def _match(element, compiler, **kw):
return "MATCH (%s) AGAINST (%s)" % (
", ".join(compiler.process(c, **kw) for c in element.columns),
compiler.process(element.value)
)
my_table.select(Match([my_table.c.a, my_table.c.b], "some value"))
docs:
http://docs.sqlalchemy.org/en/rel_0_8/core/compiler.html
Is there a clean way to join the users_myisam select/match against right back
to my user table and return actual User instances, since this is what I really want?
you should probably create a UserMyISAM class, map it just like User, then use relationship() to link the two classes together, then simple operations like this are possible:
query(User).join(User.search_table).\
filter(Match([UserSearch.x, UserSearch.y], "some value"))
In order to keep my users_myisam table in sync with my mapped object
user table, does it make sense for me to use a MapperExtension on my
User class, and set the before_insert, before_update, and
before_delete methods to update the users_myisam table appropriately,
or is there some better way to accomplish this?
MapperExtensions are deprecated, so you'd at least use the event API, and in most cases we want to try applying object mutations outside of the flush process. In this case, I'd be using the constructor for User, or alternatively the init event, as well as a basic #validates decorator which will receive values for the target attributes on User and copy those values into User.search_table.
Overall, if you've been learning SQLAlchemy from another source (like the Oreilly book), its really out of date by many years, and I'd be focusing on the current online documentation.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Bulk insert using sqlalchemy Engine - python

Related

Optional column in sqlAlchmey table

Solved: Adding new Column to ORM SQLAlchemy table in a volatile setting

Too many server roundtrips w/ psycopg2

How do I handle database columns with reserved characters in SQLAlchemy ORM?

Proper use of MySQL full text search with SQLAlchemy

Categories

Resources