SQLAlchemy query where a column is a substring of another string - python

This question is similar to SQLAlchemy query where a column contains a substring, but the other way around: I'm trying to query a column containing a string which is a sub-string of another given string. How can I achieve this?
Here is a code example of a database set up using the ORM:
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy.sql import exists
engine = create_engine('sqlite:///:memory:')
Base = declarative_base()
class User(Base):
__tablename__ = 'users'
id = Column(Integer, primary_key=True)
url = Column(String)
fullname = Column(String)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
session.add_all([
User(url='john', fullname='John Doe'),
User(url='mary', fullname='Mary Contrary')
])
session.commit()
The following works:
e = session.query(exists().where(User.url == 'john')).scalar()
upon which e has the value True. However, I would like to do something like
e = session.query(exists().where(User.url in 'johndoe')).scalar()
where in is in the sense of the __contains__ method of Python's string type. Is this possible?

It's just like (heh) the linked question, except you turn it around:
SELECT ... WHERE 'johndoe' LIKE '%' || url || '%';
You'll need to take care to escape special characters if you've got those in your table:
SELECT ... WHERE 'johndoe' LIKE '%' || replace(replace(replace(url, '\', '\\'), '%', '\%'), '_', '\_') ESCAPE '\';
In SQLAlchemy:
escaped_url = func.replace(func.replace(func.replace(User.url, "\\", "\\\\"),
"%", "\\%"),
"_", "\\_")
session.query(... .where(literal("johndoe").like("%" + escaped_url + "%", escape="\\")))
Note the escaped backslashes in Python.

You can use like
e = session.query(exists().where(User.url.like("%{}%".format('put your string here')))).scalar()

Related

SQLalchemy with column names starting and ending with underscores

Set RDBMS_URI env var to a connection string like postgresql://username:password#host/database, then on Python 3.9 with PostgreSQL 15 and SQLalchemy 1.14 run:
from os import environ
from sqlalchemy import Boolean, Column, Identity, Integer
from sqlalchemy import create_engine
from sqlalchemy.orm import declarative_base
Base = declarative_base()
class Tbl(Base):
__tablename__ = 'Tbl'
__has_error__ = Column(Boolean)
id = Column(Integer, primary_key=True, server_default=Identity())
engine = create_engine(environ["RDBMS_URI"])
Base.metadata.create_all(engine)
Checking the database:
=> \d "Tbl"
Table "public.Tbl"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+----------------------------------
id | integer | | not null | generated by default as identity
Indexes:
"Tbl_pkey" PRIMARY KEY, btree (id)
How do I force the column names with double underscore to work?
I believe that the declarative machinery explicitly excludes attributes whose names start with a double underscore from the mapping process (based on this and this). Consequently your __has_error__ column is not created in the target table.
There are at least two possible workarounds. Firstly, you could give the model attribute a different name, for example:
_has_error = Column('__has_error__', BOOLEAN)
This will create the database column __has_attr__, accessed through Tbl._has_error*.
If you want the model's attribute to be __has_error__, then you can achieve this by using an imperative mapping.
import sqlalchemy as sa
from sqlalchemy import orm
mapper_registry = orm.registry()
tbl = sa.Table(
'tbl',
mapper_registry.metadata,
sa.Column('__has_error__', sa.Boolean),
sa.Column(
'id', sa.Integer, primary_key=True, server_default=sa.Identity()
),
)
class Tbl:
pass
mapper_registry.map_imperatively(Tbl, tbl)
mapper_registry.metadata.create_all(engine)
* I tried using a synonym to map __has_error__ to _has_error but it didn't seem to work. It probably gets exluded in the mapper as well, but I didn't investigate further.

SQLAlchemy + Pandas: saving array of strings to Postgres saves them as array of chars

I am trying to save an array of strings to Postgres but when I check, the array of strings is saved as an array of chars. Example using sqlalchemy for my database engine
df = pd.read_csv('data.csv')
df.to_sql('tablename', dtypes={'array_col':sqlalchemy.dialects.postgresql.Array(sqlalchemy.dialects.postgresql.text)})
when I query for 'array_col', I'm expecting this:
['one','two']
What I get is this
['','o','n','e','','t','w','o']
I think you need to convert the string from the csv into an array first using converters argument in read_csv() before calling to_sql.
This example assumes they are stored separated by commas themselves. I used hobbies as the name of my array column. Once the value was converted to a list I did not seem to pass dtype to to_sql but if you still need that then you would use dtype={'hobbies': ARRAY(TEXT)} in this example. I think ARRAY and TEXT are the correct types, not to be confused with array or text. I define my column array type in my sqlalchemy model below.
import sys
from io import StringIO
from sqlalchemy import (
create_engine,
Integer,
String,
)
from sqlalchemy.schema import (
Column,
)
from sqlalchemy.sql import select
from sqlalchemy.orm import declarative_base
import pandas as pd
from sqlalchemy.dialects.postgresql import ARRAY, TEXT
Base = declarative_base()
username, password, db = sys.argv[1:4]
engine = create_engine(f"postgresql+psycopg2://{username}:{password}#/{db}", echo=False)
class User(Base):
__tablename__ = "users"
id = Column(Integer, primary_key=True)
name = Column(String(8), index=True)
hobbies = Column(ARRAY(TEXT))
Base.metadata.create_all(engine)
csv_content = '''"name","hobbies"
1,"User 1","running,jumping"
2,"User 2","sitting,sleeping"
'''
with engine.begin() as conn:
def convert_to_array(v):
return [s.strip() for s in v.split(',') if s.strip()]
content = StringIO(csv_content)
df = pd.read_csv(content, converters={'hobbies': convert_to_array})
df.to_sql('users', schema="public", con=conn, if_exists='append', index_label='id')
with engine.begin() as conn:
for user in conn.execute(select(User)).all():
print(user.id, user.name, user.hobbies)
print ("|".join(user.hobbies))
print (type(user.hobbies))

How to test if a class object was created using Pytest

I wrote a habit tracker app and used SQLAlchemy to store the data in an SQLite3 database. Now I'm writing the unit tests using Pytest for all the functions I wrote. Besides functions returning values, there are functions that create entries in the database by creating objects. Here's my object-relational mapper setup and the two main classes:
from sqlalchemy import create_engine, Column, Integer, String, ForeignKey, Date
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
# Setting up SQLAlchemy to connect to the local SQLite3 database
Base = declarative_base()
engine = create_engine('sqlite:///:main:', echo=True)
Base.metadata.create_all(bind=engine)
Session = sessionmaker(bind=engine)
session = Session()
class Habit(Base):
__tablename__ = 'habit'
habit_id = Column('habit_id', Integer, primary_key=True)
name = Column('name', String, unique=True)
periodicity = Column('periodicity', String)
start_date = Column('start_date', Date)
class HabitEvent(Base):
__tablename__ = 'habit_event'
event_id = Column('event_id', Integer, primary_key=True)
date = Column('date', Date)
habit_id = Column('fk_habit_id', Integer, ForeignKey(Habit.habit_id))
One of the creating functions is the following:
def add_habit(name, periodicity):
if str(periodicity) not in ['d', 'w']:
print('Wrong periodicity. \nUse d for daily or w for weekly.')
else:
h = Habit()
h.name = str(name)
if str(periodicity) == 'd':
h.periodicity = 'Daily'
if str(periodicity) == 'w':
h.periodicity = 'Weekly'
h.start_date = datetime.date.today()
session.add(h)
session.commit()
print('Habit added.')
Here's my question: Since this functions doesn't return a value which can be matched with an expected result, I don't know how to test if the object was created. The same problem occurs to me, when I want to check if all objects were deleted using the following function:
def delete_habit(habitID):
id_list = []
id_query = session.query(Habit).all()
for i in id_query:
id_list.append(i.habit_id)
if habitID in id_list:
delete_id = int(habitID)
session.query(HabitEvent).filter(
HabitEvent.habit_id == delete_id).delete()
session.query(Habit).filter(Habit.habit_id == delete_id).delete()
session.commit()
print('Habit deleted.')
else:
print('Non existing Habit ID.')
If I understand correctly, you can utilize the get_habits function as part of the test for add_habit.
def test_add_habit():
name = 'test_add_habit'
periodicity = 'd'
add_habit(name, periodicity)
# not sure of the input or output from get_habits, but possibly:
results = get_habits(name)
assert name in results['name']

How do I alter two different column headers of a pre-existing database table in sqlalchemy?

I am using sqlalchemy to reflect the columns of a table in a mysql database into a python script. This is a database I have inherited and some of the column headers for the table have spaces in eg "Chromosome Position". A couple of the column headers also are strings which start with a digit eg "1st time".
I would like to alters these headers so that spaces are replaced with underscores and there are no digits at the beginning of the column header string eg "1st time" becomes "firsttime". I followed the advice given sqlalchemy - reflecting tables and columns with spaces which partially solved my problem.
from sqlalchemy import create_engine, Column, event, MetaData
from sqlalchemy.ext.declarative import declarative_base, DeferredReflection
from sqlalchemy.orm import sessionmaker, Session
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.schema import Table
from twisted.python import reflect
Base = automap_base()
engine = create_engine('mysql://username:password#localhost/variants_database', echo=False)
#Using a reflection event to access the column attributes
#event.listens_for(Table, "column_reflect")
def reflect_col(inspector, table, column_info):
column_info['key'] = column_info['name'].replace(' ', '_')
metadata = MetaData()
session = Session(engine)
class Variants(Base):
__table__ = Table("variants", Base.metadata, autoload=True, autoload_with=engine)
Base.prepare(engine, reflect=True)
session = Session(engine)
a = session.query(Variants).filter(Variants.Gene == "AGL").first()
print a.Chromosome_Position
This allows me to return the values in a.Chromosome_Position. Likewise if I change the method reflect_col to:
#event.listens_for(Table, "column_reflect")
def reflect_col(inspector, table, column_info):
column_info['key'] = column_info['name'].replace('1st time', 'firsttime')
a = session.query(Variants).filter(Variants.Gene == "AGL").first()
print a.firsttime
This also allow me to return the values in a.firsttime. However I am not able to alter both attributes of the column headers at the same time so changing the method to:
#event.listens_for(Table, "column_reflect")
def reflect_col(inspector, table, column_info):
column_info['key'] = column_info['name'].replace(' ', '_')
column_info['key'] = column_info['name'].replace('1st time', 'secondcheck')
will only modify the last call to column_info which in this case is the column '1st time'. So I can return the values of a.firsttime but not a.Chromosome_Position. How do I change both column name features in the same reflection event?
It seems that you are overwriting the first value after the second replacement. I hope chaining the .replace works:
#event.listens_for(Table, "column_reflect")
def reflect_col(inspector, table, column_info):
column_info['key'] = column_info['name'].replace(' ', '_').replace('1st_time', 'secondcheck')
[EDIT]: You have to also make sure that the changes wouldn't clash.
Because in this example the first change replaces spaces with underscore, you have to adapt the second replacement, as it's already called 1st_time when the second replace is called.

sqlalchemy - reflecting tables and columns with spaces

How can I use sqlalchemy on a database where the column names (and table names) have spaces in them?
db.auth_stuff.filter("db.auth_stuff.first name"=='Joe') obviously can't work. Rather than manually define everything when doing the reflections I want to put something like lambda x: x.replace(' ','_') between existing table names being read from the db, and being used in my models. (It might also be useful to create a general function to rename all table names that won't work well with python - reserved words etc.)
Is there an easy/clean way of doing this?
I think I need to define my own mapper class?
https://groups.google.com/forum/#!msg/sqlalchemy/pE1ZfBlq56w/ErPcn1YYSJgJ
Or use some sort of __mapper_args__ parameter -
http://docs.sqlalchemy.org/en/rel_0_8/orm/mapper_config.html#naming-all-columns-with-a-prefix
ideally:
class NewBase(Base):
__mapper_args__ = {
'column_rename_function' : lambda x: x.replace(' ','_')
}
class User(NewBase):
__table__ = "user table"
}
you can do this using a reflection event to give the columns a .key, however the full recipe has a bug when primary key columns are involved, which was fixed in the still-unreleased 0.8.3 version (as well as master). If you check out 0.8.3 at https://bitbucket.org/zzzeek/sqlalchemy/get/rel_0_8.zip this recipe will work even with primary key cols:
from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import declarative_base, DeferredReflection
Base = declarative_base(cls=DeferredReflection)
e = create_engine("sqlite://", echo=True)
e.execute("""
create table "user table" (
"id col" integer primary key,
"data col" varchar(30)
)
""")
from sqlalchemy import event
#event.listens_for(Table, "column_reflect")
def reflect_col(inspector, table, column_info):
column_info['key'] = column_info['name'].replace(' ', '_')
class User(Base):
__tablename__ = "user table"
Base.prepare(e)
s = Session(e)
print s.query(User).filter(User.data_col == "some data")
DeferredReflection is an optional helper to use with declarative + reflection.

Categories