Converting a pandas dataframe to a class and saving using orm - python

My code is working with a mixture of pandas dataframes and orm tables. Because I wanted to speed up the retrieval of data using an index (as opposed to reading an entire file into a dataframe and re-writing it each time), I created a class statement to facilitate orm queries. But I'm struggling to put it all together.
Here is my class statement:
engine_local = create_engine(Config.SQLALCHEMY_DATABASE_URI_LOCAL)
Base_local = declarative_base()
Base_local.metadata.create_all(engine_local)
Session_local = sessionmaker(bind=engine_local)
Session_local.configure(bind=engine_local)
session_local = Session_local()
class Clients(Base_local):
__tablename__ = 'clients'
id = sa.Column(sa.Integer, primary_key=True)
client_id = sa.Column(sa.Integer, primary_key=True)
client_year = sa.Column(sa.Integer, primary_key=True)
client_cnt = sa.Column(sa.Integer, nullable=False)
date_posted = sa.Column(sa.DateTime, nullable=False, default=datetime.utcnow)
client_company = sa.Column(sa.Integer, nullable=False)
client_terr = sa.Column(sa.Integer, nullable=False)
client_credit = sa.Column(sa.Integer, nullable=False)
client_ann_prem = sa.Float(sa.Float)
def __repr__(self):
return f"Clients('{self.client_id}', '{self.client_year}', '{self.client_ann_prem}')"
meta = sa.MetaData()
meta.bind = engine_local
meta.drop_all(engine_local)
meta.create_all(engine_local)
And here is my panda definition statement:
clients_df = pd.DataFrame(client_id, columns=feature_list)
clients_df['client_year'] = client_year
clients_df['client_cnt'] = client_cnt
clients_df['client_company'] = client_company
clients_df['client_terr'] = client_terr
clients_df['client_credit'] = client_credit
clients_df['client_ann_prem'] = client_ann_prem
I have an initialize step where I need to save this entire dataframe for the first time (so it will constitute the entire database and can write over any pre-existing data). Later, however, I will want to import only a portion of the table based on client_year, and then append the updated dataframe to the existing table.
Questions I am struggling with:
Is it useful to define a class at all? (I'm choosing this path since I believe orm is easier than raw SQL)
Will the pd.to_sql statement automatically match the dataframes to the class definitions?
If I want to create new versions of the table (e.g. for a threaded process), can i create inherited classes based upon Clients without having to go through an initialize step? (e.g. a Clients01 and Clients02 table).
Thanks!

Related

SqlAlchemy association_proxy filter returned array (_AssociationList)

im using an association_proxy like this:
study_participantions = association_proxy("quests", "study_participant",creator=lambda sp: sp.questionnaire)
I my Db there is:
A Patient table
A StudyParticipant table
A Questionnaire table
Patient and Questionnaire are a Many-To-One relationship
A Questionnaire can be Part of a StudyParticiapnt via a One-To-One relationship
StudyParticipant and Patient are not directly linked since StudyParticipant can be annonymous.
So via getter and setter i can query the Patient trough the questionnaire.
Since im working with an existing codebase I have to keep the patient inside the questionnaire
The StudyParticipant can be find via the proxy from the Patient. getting and setting works but if the Questionnaire is not part of a StudyParticiapnt the returned array contains None values is it possible to filter them out so i would get a clean array? For sure it should still be an
sqlalchemy.ext.associationproxy._AssociationList so appending and removing to it would still work.
Simplified Classes:
class Patient(Model):
__tablename__ = 'patient'
id = Column(Integer, primary_key=True)
study_participantions = association_proxy("quests", "study_participant",creator=lambda sp: sp.questionnaire)
class StudyParticipant(Model): #better name would be participation
__tablename__ = "study_participant"
id = Column(Integer, primary_key=True)
pseudonym = Column(String(40), nullable = True)
questionnaire = relationship("Questionnaire", backref="study_participant",uselist=False) # why go via the StudyQuestionnaire
class Questionnaire(Model, metaclass=QuestionnaireMeta):
__tablename__ = 'questionnaire'
id = Column(Integer, primary_key=True)
patient_id = Column(Integer(), ForeignKey('patient.id'), nullable=True)
patient = relationship('Patient', backref='quests',
primaryjoin=questionnaire_patient_join)

Adding multiple columns in a table such as using a for / while loop in flask-sqlalchemy

I want to create a child table using Flask-SQLAlchemy that holds around 600 columns.
Each column is supposed to be a different hobby with a boolean value of true or false (whether or not he has this hobby).
However I do have an issue. Using Flask-SQLAlchemy's Model to create it seems problematic since I will have to write each field myself.
i.e:
class hobbies(database.Model):
id = db.Column(db.Integer, primary_key=True)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'))
hobby1 = db.Column(db.Boolean)
hobby2 = db.Column(db.Boolean)
hobby3 = db.Column(db.Boolean)
....
....
hobby600 = db.Column(db.Boolean)
(database stands for the variable that holds - database = SQLAlchemy(app))
Is there a way such as using a for or a while loop to add all of the columns to the table?
Thank you for your time.
This is bad table design(no offence), instead of this you can create table as given below
class Hobbies(database.Model):
id = db.Column(db.Integer, primary_key=True)
hobby = db.Column(db.String(50))
class UserHobbies(database.Model):
user_id = db.Column(db.Integer, db.ForeignKey('user.id'),primary_key=True)
hobbie_id = db.Column(db.Integer, db.ForeignKey('hobbies.id'),primary_key=True)
Instead of creating 600 columns in hobbies table just create 600 rows and create another table UserHobbies for many to many relationship with users and hobbies.
You can also utilize bulk_insert_mappings(ModelName,list_data) function for inserting bulk data in to hobbies table.

SQLAlchemy upsert Function for MySQL

I have used the following documentation as a guide and tried to implement an upset mechanism for my Games table. I want to be able to dynamically update all columns of the selected table at a time (without having to specify each column individually). I have tried different approaches, but none have provided a proper SQL query which can be executed. What did I misunderstand respectively what are the errors in the code?
https://docs.sqlalchemy.org/en/12/dialects/mysql.html?highlight=on_duplicate_key_update#insert-on-duplicate-key-update-upsert
https://github.com/sqlalchemy/sqlalchemy/issues/4483
class Game(CustomBase, Base):
__tablename__ = 'games'
game_id = Column('id', Integer, primary_key=True)
date_time = Column(DateTime, nullable=True)
hall_id = Column(Integer, ForeignKey(SportPlace.id), nullable=False)
team_id_home = Column(Integer, ForeignKey(Team.team_id))
team_id_away = Column(Integer, ForeignKey(Team.team_id))
score_home = Column(Integer, nullable=True)
score_away = Column(Integer, nullable=True)
...
def put_games(games): # games is a/must be a list of type Game
insert_stmt = insert(Game).values(games)
#insert_stmt = insert(Game).values(id=Game.game_id, data=games)
on_upset_stmt = insert_stmt.on_duplicate_key_update(**games)
print(on_upset_stmt)
...
I regularly load original data from an external API (incl. ID) and want to update my database with it, i.e. update the existing entries (with the same ID) with the new data and add missing ones without completely reloading the database.
Updates
The actual code results in
TypeError: on_duplicate_key_update() argument after ** must be a
mapping, not list
With the commented line #insert_statement... instead of first insert_stmt is the error message
sqlalchemy.exc.CompileError: Unconsumed column names: data

Creating Table from dictionary in SQLAlchemy

I'm trying to create a table from dictionary values in SQLAlchemy. I'm using Flask, and currently my class looks like this:
class Machine(db.Model):
"""Template for the Machine Info table"""
__tablename__ = 'machine'
id = db.Column(db.Integer, primary_key=True)
machine_name = db.Column(db.String(32))
date = db.Column(db.String(32))
time = db.Column(db.String(32))
sensor1 = db.Column(db.String(32))
sensor2 = db.Column(db.String(32))
This works fine, but my issue is that I will eventually have many columns in this table, possibly +100. I would rather not fill up my models.py file with 100 lines of this kind of stuff. I wanted to have it in its own dictionary in its own file, the dictionary looks like this:
SENSOR_LOOKUP_DICT = {
"machine_name":"machine_name",
"date":"date",
"time":"time",
"sensor1":"sensor1",
"sensor2":"sensor2"
}
A list would probably work here too.
I was thinking I could use some kind of loop, like this:
class Machine(db.Model):
"""Template for the Machine Info table"""
__tablename__ = 'machine'
id = db.Column(db.Integer, primary_key=True)
for sensor in SENSOR_LOOKUP_DICT:
sensor = db.Column(db.String(32))
But this just gives me a column called sensor. I found a couple sort of relevant questions with sqlalchemy but they didn't use this structure for creating tables. I would very much prefer a method if possible that continues to use the db.Model structure, rather than a structure that uses create_engine, due to some JSON serialization later which is easier with this structure (as well as some app structure stuff). Is there any way to do this?
Instead of cramming all the sensor values in to a single row of hundred or more columns, you could split your design to machine and sensor tables:
from datetime import datetime
from sqlalchemy.orm.collections import attribute_mapped_collection
from sqlalchemy.ext.associationproxy import association_proxy
class Machine(db.Model):
"""The Machine Info table"""
__tablename__ = 'machine'
id = db.Column(db.Integer, primary_key=True)
machine_name = db.Column(db.String(32))
datetime = db.Column(db.DateTime, default=datetime.utcnow)
sensors = db.relationship(
'Sensor',
collection_class=attribute_mapped_collection('name'),
cascade='all, delete-orphan')
sensor_values = association_proxy(
'sensors', 'value',
creator=lambda k, v: Sensor(name=k, value=v))
class Sensor(db.Model):
"""The Sensor table"""
__tablename__ = 'sensor'
machine_id = db.Column(db.Integer, db.ForeignKey('machine.id'),
primary_key=True)
# Note that this could be a numeric ID as well
name = db.Column(db.String(16), primary_key=True)
value = db.Column(db.String(32))
The dictionary collection relationship combined with the association proxy allow you to handle the sensor values like so:
In [10]: m = Machine(machine_name='Steam Machine')
In [11]: m.sensor_values['sensor1'] = 'some interesting value'
In [12]: db.session.add(m)
In [13]: db.session.commit()
In [14]: m.sensor_values
Out[14]: {'sensor1': 'some interesting value'}
In [16]: m.sensor_values['sensor1']
Out[16]: 'some interesting value'
An added benefit of having separate tables instead of a fixed schema is that if you add sensors later in life, you don't need to migrate your schema to accommodate that – in other words no need to alter the table to add columns. Just add the new sensor values to the sensor table like before.
Finally, some RDBMS support different kinds of document types, such as Postgresql's hstore, json, and jsonb columns that you could use, since the sensor table is essentially a key/value store.

Joining tables in Flask-SqlAlchemy

Suppose I have several tables and want to perform join query:
schedule_calendars = ScheduleCalendar.query\
.join(Schedule)\
.join(ClinicBranchHasDoctor)\
.filter_by(clinic_branch_id=clinic_id, doctor_has_specialty_id=doctor_speciality_id).all()
The thing is here is that my result will only contain attributes of ScheduleCalendar class. How do I query such that my result will contain attributes of all joined tables.
Schedule:
id = Column(db.Integer, primary_key=True)
added_date = Column(db.DateTime(timezone=True), default=get_current_time, nullable=False)
start_date = Column(db.Date, nullable=False)
name = Column(db.String(128), nullable=False)
is_valid = Column(db.Boolean, default=IS_VALID, nullable=False)
slot_size = Column(db.Integer, default=30)
ScheduleCalendar:
schedule_id = Column(db.Integer, db.ForeignKey("schedules.id"), nullable=False)
ClientBranchHasDoctor:
schedule_id = Column(db.Integer, db.ForeignKey("schedules.id"), nullable=False)
I skipped some attributes here. I think the most important is that my tables have appropriate constraints, otherwise join will fail.
You need to add a back reference to your classes.
For example, in your ScheduleCalendar class, add:
schedule_id = Column(db.Integer, db.ForeignKey("schedules.id"), nullable=False)
schedule = db.relationship("Schedule", back_populates="calendar", lazy="dynamic")
And in your Schedule class add:
calendar = db.relationship("ScheduleCalendar", back_populates="schedule")
Now you can access Schedule objects from ScheduleCalendar.
In your example, you would access it like this:
schedule_calendars = ScheduleCalendar.query\
.join(Schedule)\
.join(ClinicBranchHasDoctor)\
.filter_by(clinic_branch_id=clinic_id, doctor_has_specialty_id=doctor_speciality_id).all()
schedule_calendars[0].schedule
I tried many answers but was not able to join tables to get its column data at the same time. After creating back reference as suggested by #AArias you can use this code to get your table's data.
results = db.session.query(Schedule, ScheduleCalendar, ClientBranchHasDoctor). \
select_from(Schedule).join(ScheduleCalendar).join(ClientBranchHasDoctor).all()
for schedule,scheduleCalendar,hasDoctor in results:
print(schedule.name, scheduleCalendar.schedule_id , hasDoctor.schedule_id)
This way you can access all data of 3 tables simultaneously.

Categories