I'm trying to create a table from dictionary values in SQLAlchemy. I'm using Flask, and currently my class looks like this:
class Machine(db.Model):
"""Template for the Machine Info table"""
__tablename__ = 'machine'
id = db.Column(db.Integer, primary_key=True)
machine_name = db.Column(db.String(32))
date = db.Column(db.String(32))
time = db.Column(db.String(32))
sensor1 = db.Column(db.String(32))
sensor2 = db.Column(db.String(32))
This works fine, but my issue is that I will eventually have many columns in this table, possibly +100. I would rather not fill up my models.py file with 100 lines of this kind of stuff. I wanted to have it in its own dictionary in its own file, the dictionary looks like this:
SENSOR_LOOKUP_DICT = {
"machine_name":"machine_name",
"date":"date",
"time":"time",
"sensor1":"sensor1",
"sensor2":"sensor2"
}
A list would probably work here too.
I was thinking I could use some kind of loop, like this:
class Machine(db.Model):
"""Template for the Machine Info table"""
__tablename__ = 'machine'
id = db.Column(db.Integer, primary_key=True)
for sensor in SENSOR_LOOKUP_DICT:
sensor = db.Column(db.String(32))
But this just gives me a column called sensor. I found a couple sort of relevant questions with sqlalchemy but they didn't use this structure for creating tables. I would very much prefer a method if possible that continues to use the db.Model structure, rather than a structure that uses create_engine, due to some JSON serialization later which is easier with this structure (as well as some app structure stuff). Is there any way to do this?
Instead of cramming all the sensor values in to a single row of hundred or more columns, you could split your design to machine and sensor tables:
from datetime import datetime
from sqlalchemy.orm.collections import attribute_mapped_collection
from sqlalchemy.ext.associationproxy import association_proxy
class Machine(db.Model):
"""The Machine Info table"""
__tablename__ = 'machine'
id = db.Column(db.Integer, primary_key=True)
machine_name = db.Column(db.String(32))
datetime = db.Column(db.DateTime, default=datetime.utcnow)
sensors = db.relationship(
'Sensor',
collection_class=attribute_mapped_collection('name'),
cascade='all, delete-orphan')
sensor_values = association_proxy(
'sensors', 'value',
creator=lambda k, v: Sensor(name=k, value=v))
class Sensor(db.Model):
"""The Sensor table"""
__tablename__ = 'sensor'
machine_id = db.Column(db.Integer, db.ForeignKey('machine.id'),
primary_key=True)
# Note that this could be a numeric ID as well
name = db.Column(db.String(16), primary_key=True)
value = db.Column(db.String(32))
The dictionary collection relationship combined with the association proxy allow you to handle the sensor values like so:
In [10]: m = Machine(machine_name='Steam Machine')
In [11]: m.sensor_values['sensor1'] = 'some interesting value'
In [12]: db.session.add(m)
In [13]: db.session.commit()
In [14]: m.sensor_values
Out[14]: {'sensor1': 'some interesting value'}
In [16]: m.sensor_values['sensor1']
Out[16]: 'some interesting value'
An added benefit of having separate tables instead of a fixed schema is that if you add sensors later in life, you don't need to migrate your schema to accommodate that – in other words no need to alter the table to add columns. Just add the new sensor values to the sensor table like before.
Finally, some RDBMS support different kinds of document types, such as Postgresql's hstore, json, and jsonb columns that you could use, since the sensor table is essentially a key/value store.
Related
I want to create a child table using Flask-SQLAlchemy that holds around 600 columns.
Each column is supposed to be a different hobby with a boolean value of true or false (whether or not he has this hobby).
However I do have an issue. Using Flask-SQLAlchemy's Model to create it seems problematic since I will have to write each field myself.
i.e:
class hobbies(database.Model):
id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
hobby1 = db.Column(db.Boolean)
hobby2 = db.Column(db.Boolean)
hobby3 = db.Column(db.Boolean)
....
....
hobby600 = db.Column(db.Boolean)
(database stands for the variable that holds - database = SQLAlchemy(app))
Is there a way such as using a for or a while loop to add all of the columns to the table?
Thank you for your time.
This is bad table design(no offence), instead of this you can create table as given below
class Hobbies(database.Model):
id = db.Column(db.Integer, primary_key=True)
hobby = db.Column(db.String(50))
class UserHobbies(database.Model):
user_id = db.Column(db.Integer, db.ForeignKey('user.id'),primary_key=True)
hobbie_id = db.Column(db.Integer, db.ForeignKey('hobbies.id'),primary_key=True)
Instead of creating 600 columns in hobbies table just create 600 rows and create another table UserHobbies for many to many relationship with users and hobbies.
You can also utilize bulk_insert_mappings(ModelName,list_data) function for inserting bulk data in to hobbies table.
Background
I'm working with a table (in a postgres db), let's call it Person. It is related to a table, JobTitle through the association table PersonJobTitleAssociation. (Each person can have many job titles.)
engine = create_engine(DB_URI)
Base = declarative_base(engine)
class Person(Base):
__tablename__ = 'person'
id = Column(Integer, unique=True, primary_key=True)
name = Column(String, unique=False)
# relationship with all job_titles
all_job_titles = relationship('JobTitle',
secondary='person_job_title_association',
order_by='desc(person_job_title_association.c.date_created)')
# Update this
magic_value = Column(String, unique=False)
class PersonJobTitleAssociation(Base):
__tablename__ = 'person_job_title_association'
person_id = Column(Integer, ForeignKey('person.id'), primary_key=True)
job_title_id = Column(Integer, ForeignKey('job_title.id'), primary_key=True)
date_created = Column(DateTime, nullable=False, default=datetime.datetime.utcnow)
class JobTitle(Base):
__tablename__ = 'job_title'
id = Column(Integer, unique=True, primary_key=True)
name = Column(String, unique=True)
# Once everything is declared, bind to the session
session = sessionmaker(bind=engine)()
Problem
I'd like to access each Person and their most recent JobTitle and perform some_magic_function() to this person's name and job title. (Mask for "some operation which must be done in python").
import random
import string
def some_magic_function(name, job_title):
"""This operation must be done in python"""
# Update the job_title if blank
if not job_title:
job_title = 'unspecified'
# Get a random character and check if it's in our person's name
char = random.choice(string.ascii_letters)
if char in name:
return job_title.upper()
else:
return job_title.lower()
I'm updating values like so:
(Let's pretend this query is optimized and doesn't need to be improved)
query = session.query(Person)\
.options(joinedload(Person.all_job_titles))\
.order_by(Person.id)
# operate on all people
for person in query:
# Get and set the magic value
magic_value = some_magic_function(person.name, person.all_job_titles[0])
if person.magic_value != magic_value:
person.magic_value = magic_value
# Finally, once complete, commit the session
session.commit()
Issue
Querying and updating values is pretty fast on the python side. But things get real slow when calling session.commit(). Did some research, it appears sqlalchemy is locking the entire person table each time it updates a value. Further, each update is executed as its own command. (That's 50K independent SQL commands for 50K records.)
Desired outcome
I'd love a pythonic solution which would update all 50K records in "one swoop."
I've considered utilizing a read_only session, then passing update values into an array of tuples and sending updates through an with_updates session. This seems like a more SQL friendly approach, but is a bit heavy handed and unstraightforward.
Much appreciated!
You might be able to reduce the number of round trips to the database by simply enabling batch fast execution helper, but as a more explicit approach produce a temporary/derived table of changes one way or the other:
CREATE TEMPORARY TABLE and COPY
(VALUES ...) AS ..., possibly combined with explicit use of execute_values()
unnest() an array of rows
from JSON using json(b)_to_recordset()
allowing you to bulk send the changes, and do UPDATE ... FROM:
import csv
from io import StringIO
# Pretending that the query is optimized and deterministic
query = session.query(Person)\
.options(joinedload(Person.all_job_titles))\
.order_by(Person.id)
# Prepare data for COPY
changes_csv = StringIO()
changes_writer = csv.writer(changes_csv)
for p in query:
mv = some_magic_function(p.name, p.all_job_titles[0])
if p.magic_value != mv:
changes_writer.writerow((p.id, mv))
changes_csv.seek(0)
session.execute("""
CREATE TEMPORARY TABLE new_magic (
person_id INTEGER,
value TEXT
) ON COMMIT DROP
""")
# Access the underlying psycopg2 connection directly to obtain a cursor
with session.connection().connection.cursor() as cur:
stmt = "COPY new_magic FROM STDIN WITH CSV"
cur.copy_expert(stmt, changes_csv)
# Make sure that the planner has proper statistics to work with
session.execute("ANALYZE new_magic ( person_id )")
session.execute("""
UPDATE person
SET magic_value = new_magic.value
FROM new_magic
WHERE person.id = new_magic.person_id
""")
session.commit()
Not exactly "Pythonic" in the sense that it does not let the ORM figure out what to do, but on the other hand explicit is better than implicit.
I have used the following documentation as a guide and tried to implement an upset mechanism for my Games table. I want to be able to dynamically update all columns of the selected table at a time (without having to specify each column individually). I have tried different approaches, but none have provided a proper SQL query which can be executed. What did I misunderstand respectively what are the errors in the code?
https://docs.sqlalchemy.org/en/12/dialects/mysql.html?highlight=on_duplicate_key_update#insert-on-duplicate-key-update-upsert
https://github.com/sqlalchemy/sqlalchemy/issues/4483
class Game(CustomBase, Base):
__tablename__ = 'games'
game_id = Column('id', Integer, primary_key=True)
date_time = Column(DateTime, nullable=True)
hall_id = Column(Integer, ForeignKey(SportPlace.id), nullable=False)
team_id_home = Column(Integer, ForeignKey(Team.team_id))
team_id_away = Column(Integer, ForeignKey(Team.team_id))
score_home = Column(Integer, nullable=True)
score_away = Column(Integer, nullable=True)
...
def put_games(games): # games is a/must be a list of type Game
insert_stmt = insert(Game).values(games)
#insert_stmt = insert(Game).values(id=Game.game_id, data=games)
on_upset_stmt = insert_stmt.on_duplicate_key_update(**games)
print(on_upset_stmt)
...
I regularly load original data from an external API (incl. ID) and want to update my database with it, i.e. update the existing entries (with the same ID) with the new data and add missing ones without completely reloading the database.
Updates
The actual code results in
TypeError: on_duplicate_key_update() argument after ** must be a
mapping, not list
With the commented line #insert_statement... instead of first insert_stmt is the error message
sqlalchemy.exc.CompileError: Unconsumed column names: data
My code is working with a mixture of pandas dataframes and orm tables. Because I wanted to speed up the retrieval of data using an index (as opposed to reading an entire file into a dataframe and re-writing it each time), I created a class statement to facilitate orm queries. But I'm struggling to put it all together.
Here is my class statement:
engine_local = create_engine(Config.SQLALCHEMY_DATABASE_URI_LOCAL)
Base_local = declarative_base()
Base_local.metadata.create_all(engine_local)
Session_local = sessionmaker(bind=engine_local)
Session_local.configure(bind=engine_local)
session_local = Session_local()
class Clients(Base_local):
__tablename__ = 'clients'
id = sa.Column(sa.Integer, primary_key=True)
client_id = sa.Column(sa.Integer, primary_key=True)
client_year = sa.Column(sa.Integer, primary_key=True)
client_cnt = sa.Column(sa.Integer, nullable=False)
date_posted = sa.Column(sa.DateTime, nullable=False, default=datetime.utcnow)
client_company = sa.Column(sa.Integer, nullable=False)
client_terr = sa.Column(sa.Integer, nullable=False)
client_credit = sa.Column(sa.Integer, nullable=False)
client_ann_prem = sa.Float(sa.Float)
def __repr__(self):
return f"Clients('{self.client_id}', '{self.client_year}', '{self.client_ann_prem}')"
meta = sa.MetaData()
meta.bind = engine_local
meta.drop_all(engine_local)
meta.create_all(engine_local)
And here is my panda definition statement:
clients_df = pd.DataFrame(client_id, columns=feature_list)
clients_df['client_year'] = client_year
clients_df['client_cnt'] = client_cnt
clients_df['client_company'] = client_company
clients_df['client_terr'] = client_terr
clients_df['client_credit'] = client_credit
clients_df['client_ann_prem'] = client_ann_prem
I have an initialize step where I need to save this entire dataframe for the first time (so it will constitute the entire database and can write over any pre-existing data). Later, however, I will want to import only a portion of the table based on client_year, and then append the updated dataframe to the existing table.
Questions I am struggling with:
Is it useful to define a class at all? (I'm choosing this path since I believe orm is easier than raw SQL)
Will the pd.to_sql statement automatically match the dataframes to the class definitions?
If I want to create new versions of the table (e.g. for a threaded process), can i create inherited classes based upon Clients without having to go through an initialize step? (e.g. a Clients01 and Clients02 table).
Thanks!
I'm creating a REST-like API to get data in and out of a legacy database. At the moment I don't have the option to alter the db structure. Nearly every table in the database has the exact same structure. Here's how I'm currently handling it using Flask-SQLAlchemy:
class MyDatasetBase(object):
timestamp = db.Column('colname', db.Float, primary_key=True)
val1 = db.Column(db.Float)
val2 = db.Column(db.Float)
val3 = db.Column(db.Float)
fkeyid = db.Column(db.Integer)
#declared_attr
def foreigntable(self):
return db.relationship('ForeignTableClass', uselist=False)
class Table1(MyDatasetBase, db.Model):
__tablename__ = 'table1'
class Table2(MyDatasetBase, db.Model):
__tablename__ = 'table2'
class ForeignTableClass(db.Model):
__tablename__ = 'ForeignTable'
id = db.Column(db.Integer, db.ForeignKey('table1.fkeyid'), db.ForeignKey('table2.fkeyid'), primary_key=True)
name = db.Column(db.String)
This all works, but some of these datasets contain a lot of tables and I feel like there has to be a more efficient way to do a few of these things.
Is there a way to get around explicitly defining a class for each of the tables derived from MyDatasetBase? If there is, will that cause problems with the foreign key stuff, where in ForeignTableClass I have to define id as being a foreign key for every table listed above?
one way to do that is to use type method :
names = ['table1', 'table2', 'table3']
for name in names:
type(name.title(), (MyDatasetBase, db.Model), { '__tablename__' : name })
# generate <class 'flask_sqlalchemy.Table1'> ...
There is maybe a more pythonic/elegant way to do that with the Metedata and Table.tometadata() method.