etl-process: from python-dataframe to postgres with SQLAlchemy

etl-process: from python-dataframe to postgres with SQLAlchemy - python

I want to create tables in a Postgres database using Python's SQLAlchemy package and insert data from a dataframe into them. I also want to assign foreign keys and primary keys.
The following code creates the two tables, but in the schema "public" instead of the schema "my_schema".
Can anyone find the error?
from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, MetaData
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import relationship
# Basisklasse für alle Tabellen definieren
metadata_obj = MetaData(schema="mein_schema")
Base = declarative_base(metadata_obj)
# Tabelle 2 definieren
class Tabelle2(Base):
__tablename__ = 'tabelle2'
id = Column(Integer, primary_key=True)
name = Column(String)
# Tabelle 1 definieren
class Tabelle1(Base):
__tablename__ = 'tabelle1'
id = Column(Integer, primary_key=True)
name = Column(String)
tabelle2_id = Column(Integer, ForeignKey('tabelle2.id'))
tabelle2 = relationship(Tabelle2)
# Alle Tabellen in der Datenbank erstellen
Base.metadata.create_all(engine)

You are passing metadata_obj as a parameter to declarative_base but not specifying it belongs to the metadata named parameter:
Base = declarative_base(metadata=metadata_obj)
declarative_base never knows the use the specified schema and falls back to the default schema of the connection.

Related

Efficiently copy data between databases with sqlalchemy

I'm trying to mirror a postgresql + PostGIS database that I defined with sqlalchemy to a sqlite (spatialite) file database. The session.merge() method appears to work for adding the instances queried from the first session to the other session, but it does not scale for nearly a million rows. See the example below that copies data from an in-memory sqlite database to another memory database for the sake of easy reproducibility. I'm looking for an approach (potentially completely different from what I'm doing now) to efficiently move all the data from one database to another.
from sqlalchemy import create_engine
from sqlalchemy import Table, Column, Integer, ForeignKey, String
from sqlalchemy.orm import declarative_base, sessionmaker
from sqlalchemy.orm import relationship, joinedload
engine_0 = create_engine('sqlite:///:memory:', echo=True)
engine_1 = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
Session0 = sessionmaker(bind=engine_0)
Session1 = sessionmaker(bind=engine_1)
# Define ORM models
association_table = Table('association', Base.metadata,
Column('parent_id', ForeignKey('parent.id'), primary_key=True),
Column('child_id', ForeignKey('child.id'), primary_key=True)
)
class Parent(Base):
__tablename__ = 'parent'
id = Column(Integer, primary_key=True)
name = Column(String)
children = relationship(
"Child",
secondary=association_table,
back_populates="parents")
class Child(Base):
__tablename__ = 'child'
id = Column(Integer, primary_key=True)
name = Column(String)
parents = relationship(
"Parent",
secondary=association_table,
back_populates="children")
# Create schema
Base.metadata.create_all(engine_0)
Base.metadata.create_all(engine_1)
# Create some example instances
# Children
bart = Child(name='Bart')
lisa = Child(name='Lisa')
maggie = Child(name='Maggie')
milhouse = Child(name='Milhouse')
# Parents
homer = Parent(name='Homer',
children=[bart, lisa, maggie])
marge = Parent(name='Marge',
children=[bart, lisa, maggie])
flanders = Parent(name='Ned')
kirk = Parent(name='Kirk', children=[milhouse])
# Insert data into first database
session_0 = Session0()
session_0.add_all([homer, marge, flanders, kirk])
session_0.commit()
# Query the data and insert it into the second database
all_obj = session_0.query(Parent).options(joinedload('*')).all()
session_0.expunge_all()
session_1 = Session1()
for obj in all_obj:
session_1.merge(obj)
session_1.commit()
# MAke sure that 4 instance of child are present in the second database
print(session_1.query(Child).all())
One alternative approach I have tried (unsuccessfully) is to make the parent objects transient using the sqlalchemy.orm.make_transient() function and use session.add_all() instead of session.merge() to insert the objects into the second session. However, this does not propagate to the relationships and only Parent objects are made transient.

null value for primary key when trying to insert list of dictionaries

I'm trying to do a large number of inserts with one call, and the way someone here recommended was by giving .insert a list of dictionaries. This is using SQLAlchemy Core.
As an example:
try:
engine = db.create_engine(f"postgres://user:pass#myip/addressbook", connect_args={'connect_timeout': 5})
connection = engine.connect()
metadata = db.MetaData()
except exc.OperationalError:
print_error(f":: Could not connect to myip!")
sys.exit()
table_addressbook = db.Table('addressbook', metadata, autoload=True, autoload_with=engine)
list = []
list.append({'firstname': "John", 'lastname': "Doe"})
list.append({'firstname': "Jane", 'lastname': "Doe"})
query = db.insert(table_addressbook).values(list)
connection.execute(query)
But I'm getting an error saying the column id violates a non-null constraint. This is because insert normally auto-generates the primary-key id. How do I use this method but specify that id should be auto-generated? Or is there a different method I should use?
edit
Table name is addressbook.
Column id is type integer with default sequence 'untitled_table_id_seq', constraints are PRIMARY_KEY. This was autogenerated by Postico for Mac, but I've always been able to insert without including id and it auto increments from the last inserted ID.
Columns firstname and lastname are type text, no default, no constraints.

Without any information on your model and/or connection it is a bit difficult to answer your question. Please find below a piece of code which uses insert without throwing non-null constraint errors. Hopefully it helps you.
from sqlalchemy import create_engine, Column, Integer, String, Table
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import insert
engine = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
firstname = Column(String)
lastname = Column(String)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
Session.configure(bind=engine) # once engine is available
session = Session()
new_users = []
new_users.append({'firstname': "John", 'lastname': "Doe"})
new_users.append({'firstname': "Jane", 'lastname': "Doe"})
i = insert(User).values(new_users)
session.execute(i)
PS: most of this is coming from the tutorial on: https://docs.sqlalchemy.org/en/13/orm/tutorial.html

from sqlalchemy import Column
from sqlalchemy import create_engine
from sqlalchemy import Integer
from sqlalchemy import String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session
engine = create_engine('sqlite:///:memory:', echo=True)
Base = declarative_base()
# Example Model definition for the illustration
class Customer(Base):
__tablename__ = "customer"
id = Column(Integer, primary_key=True)
name = Column(String(255))
description = Column(String(255))
Base.metadata.create_all(engine)
######################################################
# Bulk insert using dictionaries.
######################################################
# Insert test records into `customer`table.
def bulk_insert_customers(n):
session = Session(bind=engine)
session.bulk_insert_mappings(
Customer,
[
dict(
name="customer name %d" % i,
description="customer description %d" % i,
)
for i in range(n)
],
)
session.commit()
Refer these for more examples of how to do bulk inserts in different ways:
https://docs.sqlalchemy.org/en/13/_modules/examples/performance/bulk_inserts.html

How to define a function(expression dependant on other columns) as a default value to a column while defining table in SQLAlchemy?

How to add a function/expression which takes arguments as other columns as a default value to a column in the table of SQLAlchemy? For example: I want to define c as a column which is 2*x(other column);which should be saved in the database(could be in other table too). Can #hybrid_property decorator be used in this context?
from sqlalchemy import Column, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import Session, aliased
from sqlalchemy.ext.hybrid import hybrid_property, hybrid_method
from sqlalchemy import create_engine
from sqlalchemy import MetaData
from sqlalchemy.orm import sessionmaker
engine = create_engine('sqlite:///Helloworld.db', echo=False)
Session = sessionmaker(bind=engine)
metadata = MetaData(engine)
Base = declarative_base()
class HelloWorld(Base):
__tablename__ = 'helloworld'
pm_key = Column(Integer, primary_key=True)
x = Column(Integer, nullable=False)
c = Column(Integer,default=2*x)
Base.metadata.create_all(engine)

It is possible. Below I'am just adding a piece of code you can try . For more I think this will help you.
def mydefault(context):
return context.current_parameters.get('X')
class HelloWorld(Base):
__tablename__ = 'helloworld'
pm_key = Column(Integer, primary_key=True)
x = Column(Integer, nullable=False)
c = Column(Integer,default=mydefault)

SQLAlchemy many-to-many without foreign key

Could some one help me figure out how should i write primaryjoin/secondaryjoin
on secondary table that lacking one ForeignKey definition. I can't modify database
itself since it's used by different application.
from sqlalchemy import schema, types, func, orm
from sqlalchemy.dialects.postgresql import ARRAY
from sqlalchemy.ext.declarative import declarative_base
Base = declarative_base()
class A(Base):
__tablename__ = 'atab'
id = schema.Column(types.SmallInteger, primary_key=True)
class B(Base):
__tablename__ = 'btab'
id = schema.Column(types.SmallInteger, primary_key=True)
a = orm.relationship(
'A', secondary='abtab', backref=orm.backref('b')
)
class AB(Base):
__tablename__ = 'abtab'
id = schema.Column(types.SmallInteger, primary_key=True)
a_id = schema.Column(types.SmallInteger, schema.ForeignKey('atab.id'))
b_id = schema.Column(types.SmallInteger)
I've tried specifing foreign on join condition:
a = orm.relationship(
'A', secondary='abtab', backref=orm.backref('b'),
primaryjoin=(id==orm.foreign(AB.b_id))
)
But received following error:
ArgumentError: Could not locate any simple equality expressions involving locally mapped foreign key columns for primary join condition '"atab".id = "abtab"."a_id"' on relationship Category.projects. Ensure that referencing columns are associated with a ForeignKey or ForeignKeyConstraint, or are annotated in the join condition with the foreign() annotation. To allow comparison operators other than '==', the relationship can be marked as viewonly=True.

You can add foreign_keys to your relationship configuration. They mention this in a mailing list post:
from sqlalchemy import create_engine
from sqlalchemy import Column, Integer, String
from sqlalchemy.orm import relationship
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
logon = Column(String(10), primary_key=True)
group_id = Column(Integer)
class Group(Base):
__tablename__ = 'groups'
group_id = Column(Integer, primary_key=True)
users = relationship('User', backref='group',
primaryjoin='User.group_id==Group.group_id',
foreign_keys='User.group_id')
engine = create_engine('sqlite:///:memory:', echo=True)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
u1 = User(logon='foo')
u2 = User(logon='bar')
g = Group()
g.users = [u1, u2]
session.add(g)
session.commit()
g = session.query(Group).first()
print([user.logon for user in g.users])
output:
['foo', 'bar']

sqlalchemy and auto increments with postgresql

I created a table with a primary key and a sequence but via the debug ad later looking at the table design, the sequence isn't applied, just created.
from sqlalchemy import create_engine, MetaData, Table, Column,Integer,String,Boolean,Sequence
from sqlalchemy.orm import mapper, sessionmaker
from sqlalchemy.ext.declarative import declarative_base
import json
class Bookmarks(object):
pass
#----------------------------------------------------------------------
engine = create_engine('postgresql://iser:p#host/sconf', echo=True)
Base = declarative_base()
class Tramo(Base):
__tablename__ = 'tramos'
__mapper_args__ = {'column_prefix':'tramos'}
id = Column(Integer, Sequence('seq_tramos_id', start=1, increment=1),primary_key=True)
nombre = Column(String)
tramo_data = Column(String)
estado = Column(Boolean,default=True)
def __init__(self,nombre,tramo_data):
self.nombre=nombre
self.tramo_data=tramo_data
def __repr__(self):
return '[id:%d][nombre:%s][tramo:%s]' % self.id, self.nombre,self.tramo_data
Session = sessionmaker(bind=engine)
session = Session()
tabla = Tramo.__table__
metadata = Base.metadata
metadata.create_all(engine)
the table is just created like this
CREATE TABLE tramos (
id INTEGER NOT NULL,
nombre VARCHAR,
tramo_data VARCHAR,
estado BOOLEAN,
PRIMARY KEY (id)
)
I was hoping to see the declartion of the default nexval of the sequence
but it isn't there.
I also used the __mapper_args__ but looks like it's been ignored.
Am I missing something?

I realize this is an old thread, but I stumbled on it with the same problem and were unable to find a solution anywhere else.
After some experimenting I was able to solve this with the following code:
TABLE_ID = Sequence('table_id_seq', start=1000)
class Table(Base):
__tablename__ = 'table'
id = Column(Integer, TABLE_ID, primary_key=True, server_default=TABLE_ID.next_value())
This way the sequence is created and is used as the default value for column id, with the same behavior as if created implicitly by SQLAlchemy.

I ran into a similar issue with composite multi-column primary keys. The SERIAL is only implicitly applied to a single column primary key. However, this behaviour can be controlled via the autoincrement argument (defaults to "auto"):
id = Column(Integer, primary_key=True, autoincrement=True)

You specified an explicit Sequence() object with name. If you were to omit that, then SERIAL would be added to the id primary key specification:
CREATE TABLE tramos (
id INTEGER SERIAL NOT NULL,
nombre VARCHAR,
tramo_data VARCHAR,
estado BOOLEAN,
PRIMARY KEY (id)
)
A DEFAULT is only generated if the column is not a primary key.
When inserting, SQLAlchemy will issue a select nextval(..) as needed to create a next value. See the PostgreSQL documentation for details.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

etl-process: from python-dataframe to postgres with SQLAlchemy - python

You are passing metadata_obj as a parameter to declarative_base but not specifying it belongs to the metadata named parameter: Base = declarative_base(metadata=metadata_obj) declarative_base never knows the use the specified schema and falls back to the default schema of the connection.

Related

Efficiently copy data between databases with sqlalchemy

null value for primary key when trying to insert list of dictionaries

How to define a function(expression dependant on other columns) as a default value to a column while defining table in SQLAlchemy?

SQLAlchemy many-to-many without foreign key

sqlalchemy and auto increments with postgresql

Categories

Resources