Proper method to make a class of other class instances - python

I'm working on building a simple report generator that will run a query, put the data into a formatted excel sheet and email it.
I am trying to get better at proper coding practices (cohesion, coupling, etc) and I am wondering if this is the proper way to build this or if there's a better way. It feels somewhat redundant to have to pass the arguments of the Extractor twice: once to the main class and then again to the subclass.
Should I be using nested classes? **kwargs? Or is this correct?
from typing_extensions import ParamSpec
import pandas as pd
from sqlalchemy import create_engine
from O365 import Account
from jinjasql import JinjaSql
class Emailer:
pass
class Extractor:
'''Executes a sql query and returns the data.'''
def __init__(self, query: str, database: str, connection: dict, param_query: bool = False) -> None:
self.query = query
self.database = database
# self.conn_params = connection
self.engine = create_engine(connection)
# self.engine = Reactor().get_engine(**self.conn_params)
def parse_query(self) -> None:
'''If the query needs parameterization, do that here.'''
pass
def run_query(self) -> None:
'''Run a supplied query. Expects the name of the query.'''
# TODO: Make this check if it's a query or a query name.
with open(self.query, "r") as f:
query = f.read()
return pd.read_sql_query(query, self.engine)
class ReportGenerator:
'''Main class'''
def __init__(self, query: str, connection: dict, param_query: bool = False) -> None:
self.extractor = Extractor(query, connection, param_query)
self.emailer = Emailer()
def build_report(self) -> None:
pass

This isn't a bad solution, but I think it can be improved.
If you'll continue this correctly, ReportGenerator should never know about query/connection and other members of Extractor, thus it shouldn't be passed to his constructor.
What you can do instead is create an Extractor and an Emailer before the creation of the ReportGenerator and pass them to the constructor.

Related

How to use dependencies with yield in FastApi

I am using FastApi and I would like to know if I am using the dependencies correctly.
First, I have a function that yields the database session.
class ContextManager:
def __init__(self):
self.db = DBSession()
def __enter__(self):
return self.db
def __exit__(self):
self.db.close()
def get_db():
with ContextManager() as db:
yield db
I would like to use that function in another function:
def validate(db=Depends(get_db)):
is_valid = verify(db)
if not is is_valid:
raise HTTPException(status_code=400)
yield db
Finally, I would like to use the last functions as a dependency on the routes:
#router.get('/')
def get_data(db=Depends(validate)):
data = db.query(...)
return data
I am using this code and it seems to work, but I would like to know if it is the most appropiate way to use dependencies. Especially, I am not sure if I have to use 'yield db' inside the function validate or it would be better to use return. I would appreciate your help. Thanks a lot

Mocking subclasses created in __init__

I have a database helper class that takes in a connection as an argument, and initializes another helper class in init
def generate_table_deprecation_lkp() -> None:
"""
run sql to generate lkp table that drives table deprecation workflow
"""
with RedshiftConnection() as conn:
helper = TableDeprecationHelper(conn)
helper.generate_table_deprecation_lkp()
class TableDeprecationHelper():
"""
Class that takes in RedshiftConnection and provides helper functions for table deprecation.
"""
def __init__(self, redshift_connection: RedshiftConnection) -> None:
self.redshift_connection = redshift_connection
self.table_manager = RedshiftTableManager(redshift_connection)
log.info("Initialized TableDeprecationHelper")
def generate_table_deprecation_lkp(self) -> None:
"""
generates transient lkp table used to drive deprecation workflow
"""
self.table_manager.truncate_table(*UNUSED_TABLES_LKP.split('.'))
self.redshift_connection.execute_db_command(FLAG_UNUSED_TABLES)
log.info("Generated table deprecation stage")
I'm trying to test a function that calls this helper using pytest, with code that looks like this
#pytest.fixture
def redshift_connection_patch():
with patch("RedshiftConnection") as redshift_connection_patch:
yield redshift_connection_patch
#pytest.fixture
def truncate_table_patch():
with patch.object(RedshiftTableManager, "truncate_table") as truncate_table_patch:
yield truncate_table_patch
def test_generate_table_deprecation_lkp(redshift_connection_patch, truncate_table_patch):
generate_table_deprecation_lkp()
truncate_table_patch.assert_called_with("admin", "table_deprecation_lkp")
redshift_connection_patch.execute_db_command.assert_called_with(FLAG_UNUSED_TABLES)
The truncate_table passes as expected. However, the redshift_connection_patch isn't mocking as expected, and I keep ending up with an error like this:
py::test_generate_table_deprecation_lkp Failed: [undefined]AssertionError: expected call not found.
Expected: execute_db_command("\ninsert into admin
I can see my db call on redshift_connection_patch.mock_calls, but it's not appearing in method calls, and the execute_db_command method is missing from the mock object altogether.

How to avoid that a class recreates a client\session when reinstatiated on python?

let me clear my case:
I'm interacting with s3 across multiple parts of my code in python, because of that I create a class called S3Interactor
This is the class:
class S3Interactor(BaseInteractor):
"""Class that loads data into dataframes from s3
Args:
Loader (_type_): _description_
"""
def __init__(self, bucket_name: str, aws_region: str):
self.bucket_name = bucket_name
self.aws_region = aws_region
self._s3_client = None
#property
def s3_client(self):
if self._s3_client is None:
self._s3_client = self.create_client()
return self._s3_client
#monitor(local_logger=logger)
def create_client(self):
"""Create client that connects to loader interface"""
The problem is that this generates a bottleneck because every time that I instantiate my code this creates a session with s3 and there is some latency.
I want to have this #property client already set so I avoid this initial latency and just reuse that connection. I was thinking on using a Singleton but I want to know more pythonic ways.
Any suggestions are welcome, thanks

How to express and enforce that a class has 2 modes of operation, each having some valid and invalid methods

I'm very new to type checking in Python. I'd like to find a way to use it to check for this common situation:
class (e.g. my DbQuery class) is instantiated, is in some uninitialized state. e.g. I'm a db query-er but I havent connected to a db yet. You could say (abstractly) the instance is of type 'Unconnected Db Query Connector'
user calls .connect() which sets the class instance to connected. Can now think of this class instance as belong to a new category (protocol?). You could say the instance is of type 'Connected DB Query Connector' now...
user calls .query(), etc. uses the class. The query method is annotated to express that self in this case must be a 'Connected DB Query Connector'
In an incorrect usage, which I would like to detect automatically: the user instantiates the db connector and then calls query() without calling connect first.
Is there a representation for this with annotations? Can I express that the connect() method has caused 'self' to join a new type? or is that the right way to do it?
Is there some other standard mechanism for expressing this and detecting it in Python or mypy?
I might be able to see how this could be expressed with inheritance maybe... I'm not sure
Thanks in advance!
EDIT:
Here's what I wish I could do:
from typing import Union, Optional, NewType, Protocol, cast
class Connector:
def __init__(self, host: str) -> None:
self.host = host
def run(self, sql: str) -> str:
return f"I ran {sql} on {self.host}"
# This is a version of class 'A' where conn is None and you can't call query()
class NoQuery(Protocol):
conn: None
# This is a version of class 'A' where conn is initialized. You can query, but you cant call connect()
class CanQuery(Protocol):
conn: Connector
# This class starts its life as a NoQuery. Should switch personality when connect() is called
class A(NoQuery):
def __init__(self) -> None:
self.conn = None
def query(self: CanQuery, sql: str) -> str:
return self.conn.run(sql)
def connect(self: NoQuery, host: str):
# Attempting to change from 'NoQuery' to 'CanQuery' like this
# mypy complains: Incompatible types in assignment (expression has type "CanQuery", variable has type "NoQuery")
self = cast(CanQuery, self)
self.conn = Connector(host)
a = A()
a.connect('host.domain')
print(a.query('SELECT field FROM table'))
b = A()
# mypy should help me spot this. I'm trying to query an unconnected host. self.conn is None
print(b.query('SELECT oops'))
For me, this is a common scenario (an object that has a few distinct and very meaningful modes of operation). Is there no way to express this in mypy?
You may be able to hack something together by making your A class a generic type, (ab)using Literal enums, and annotating the self parameter, but frankly I don't think that's a good idea.
Mypy in general assumes that calling a method won't change the type of a method, and circumventing that is probably not possible without resorting that gross hacks and a bunch of casts or # type: ignores.
Instead, the standard convention is to use two classes -- a "connection" object and a "query" object -- along with context managers. This, as a side benefit, would also let you ensure your connections are always closed once you're done using them.
For example:
from typing import Union, Optional, Iterator
from contextlib import contextmanager
class RawConnector:
def __init__(self, host: str) -> None:
self.host = host
def run(self, sql: str) -> str:
return f"I ran {sql} on {self.host}"
def close(self) -> None:
print("Closing connection!")
class Database:
def __init__(self, host: str) -> None:
self.host = host
#contextmanager
def connect(self) -> Iterator[Connection]:
conn = RawConnector(self.host)
yield Connection(conn)
conn.close()
class Connection:
def __init__(self, conn: RawConnector) -> None:
self.conn = conn
def query(self, sql: str) -> str:
return self.conn.run(sql)
db = Database("my-host")
with db.connect() as conn:
conn.query("some sql")
If you really want to combine these two new classes into one, you can by (ab)using literal types, generics, and self annotations and by keeping within the constraint that you can only ever return instances with new personalities.
For example:
# If you are using Python 3.8+, you can import 'Literal' directly from
# typing. But if you need to support older Pythons, you'll need to
# pip-install typing_extensions and import from there.
from typing import Union, Optional, Iterator, TypeVar, Generic, cast
from typing_extensions import Literal
from contextlib import contextmanager
from enum import Enum
class RawConnector:
def __init__(self, host: str) -> None:
self.host = host
def run(self, sql: str) -> str:
return f"I ran {sql} on {self.host}"
def close(self) -> None:
print("Closing connection!")
class State(Enum):
Unconnected = 0
Connected = 1
# Type aliases here for readability. We use an enum and Literal
# types mostly so we can give each of our states a nice name. We
# could have also created an empty 'State' class and created an
# 'Unconnected' and 'Connected' subclasses: all that matters is we
# have one distinct type per state/per "personality".
Unconnected = Literal[State.Unconnected]
Connected = Literal[State.Connected]
T = TypeVar('T', bound=State)
class Connection(Generic[T]):
def __init__(self: Connection[Unconnected]) -> None:
self.conn: Optional[RawConnector] = None
def connect(self: Connection[Unconnected], host: str) -> Connection[Connected]:
self.conn = RawConnector(host)
# Important! We *return* the new type!
return cast(Connection[Connected], self)
def query(self: Connection[Connected], sql: str) -> str:
assert self.conn is not None
return self.conn.run(sql)
c1 = Connection()
c2 = c1.connect("foo")
c2.query("some-sql")
# Does not type check, since types of c1 and c2 do not match declared self types
c1.query("bad")
c2.connect("bad")
Basically, it becomes possible to make a type act more or less as a state machine as long as we stick with returning new instances (even if at runtime, we always return just 'self').
With a little more cleverness/a few more compromises, you might even be able to get rid of the cast whenever you transition from one state to another.
But tbh, I consider this sort of trick to be overkill/probably inappropriate for what you seem to be trying to do. I would personally recommend the two classes + contextmanager approach.

Python descriptors to chain methods

I'm trying to figure out how to chain class methods to improve a utility class I've been writing - for reasons I'd prefer not to get into :)
Now suppose I wanted to chain a chain class methods on a class instance (in this case for setting the cursor) e.g.:
# initialize the class instance
db = CRUD(table='users', public_fields=['name', 'username', 'email'])
#the desired interface class_instance.cursor(<cursor>).method(...)
with sql.read_pool.cursor() as c:
db.cursor(c).get(target='username', where="omarlittle")
The part that's confusing is I would prefer the cursor not to persist as a class attribute after .get(...) has been called and has returned, I'd like to require that .cursor(cursor) must be first called.
class CRUD(object):
def __init__(self, table, public_fields):
self.table = table
self.public_fields = public_fields
def fields(self):
return ', '.join([f for f in self.public_fields])
def get(self, target, where):
#this is strictly for illustration purposes, I realize all
#the vulnerabilities this leaves me exposed to.
query = "SELECT {fields} FROM {table} WHERE {target} = {where}"
query.format(fields=self.fields, table=self.table, target=target,
where=where)
self.cursor.execute(query)
def cursor(self, cursor):
pass # this is where I get lost.
If I understand what you're asking, what you want is for the cursor method to return some object with a get method that works as desired. There's no reason the object it returns has to be self; it can instead return an instance of some cursor type.
That instance could have a back-reference to self, or it could get its own copy of whatever internals are needed to be a cursor, or it could be a wrapper around an underlying object from your low-level database library that knows how to be a cursor.
If you look at the DB API 2.0 spec, or implementations of it like the stdlib's sqlite3, that's exactly how they do it: A Database or Connection object (the thing you get from the top-level connect function) has a cursor method that returns a Cursor object, and that Cursor object has an execute method.
So:
class CRUDCursor(object):
def __init__(self, c, crud):
self.crud = crud
self.cursor = however_you_get_an_actual_sql_cursor(c)
def get(self, target, where):
#this is strictly for illustration purposes, I realize all
#the vulnerabilities this leaves me exposed to.
query = "SELECT {fields} FROM {table} WHERE {target} = {where}"
query.format(fields=self.crud.fields, table=self.crud.table,
target=target, where=where)
self.cursor.execute(query)
# you may want this to return something as well?
class CRUD(object):
def __init__(self, table, public_fields):
self.table = table
self.public_fields = public_fields
def fields(self):
return ', '.join([f for f in self.public_fields])
# no get method
def cursor(self, cursor):
return CRUDCursor(self, cursor)
However, there still seems to be a major problem with your example. Normally, after you execute a SELECT statement on a cursor, you want to fetch the rows from that cursor. You're not keeping the cursor object around in your "user" code, and you explicitly don't want the CRUD object to keep its cursor around, so… how do you expect to do that? Maybe get is supposed to return self.cursor.fetch_all() at the end or something?

Categories