I come from a ruby background and I am noticing some differences to python... In ruby, when I need to create a helper I usually go for a module, something like the below:
module QueryHelper
def db_client
#db ||= DBClient.new
end
def query
db_client.select('whateverquery')
end
end
In python tho, I do something like the following:
db_client = DBClient()
def query():
return db_client.select('whateverquery')
My only worry with the above is that every time I call query() function it will try to instantiate DBClient() over and over... but based on reading and testing, that does not seem to occur due to some caching mechanism in python when I import a module...
The question is if the above in python is bad practice, if so, why and how can it be improved? perhaps lazy evaluating it? Or if you guys believe it's ok as is...
No. The query function will not be re-instantiated every time you call it. This is because you've already created an instance of DBClient outside of the query function. This means that your current code is fine as is.
If your intention was to create a new instance of DBClient every time query is called, then you should just move the declaration into the query function, like this:
def query():
db_client = DBClient()
return db_client.select( ... )
In short you would like to add a method to the DBClient object? Why not adding it dynamically?
# defining the method to add
def query(self, command):
return self.select(command)
# Actually adding it to the DBClient class
DBClient.query = query
# Instances now come with the newly added method
db_client = DBClient()
# Using the method
return_command_1 = db_client.query("my_command_1")
return_command_2 = db_client.query("my_command_2")
Credits to Igor Sobreira.
Related
I have seen different "patterns" in handling this case so I am wondering if one has any drawbacks comapred to the other.
So lets assume that we wish to create a new object of class MyClass and add it to the database. We can do the following:
class MyClass:
pass
def builder_method_for_myclass():
# A lot of code here..
return MyClass()
my_object=builder_method_for_myclass()
with db.managed_session() as s:
s.add(my_object)
which seems that only keeps the session open for adding the new object but I have also seen cases where the entire builder method is called and executed within the managed session like so:
class MyClass:
pass
def builder_method_for_myclass():
# A lot of code here..
return MyClass()
with db.managed_session() as s:
my_object=builder_method_for_myclass()
are there any downsides in either of these methods and if yes what are they? Cant find something specific about this in the documentation.
When you build objects depending on objects fetched from a session you have to be in a session. So a factory function can only execute outside a session for the simplest cases. Usually you have to pass the session around or make it available on a thread local.
For example in this case to build a product I need to fetch the product category from the database into the session. So my product factory function depends on the session instance. The new product is created and added to the same session that the category is also in. An implicit commit should also occur when the session ends, ie the context manager completes.
def build_product(session, category_name):
category = session.query(ProductCategory).where(
ProductCategory.name == category_name).first()
return Product(category=category)
with db.managed_session() as s:
my_product = build_product(s, "clothing")
s.add(my_product)
I have the following function:
def create_tables(tables: list) -> None:
template = jinja2.Template(
open("/opt/airflow/etl_scripts/postgres/staging_orientdb_create_tables.sql", "r").read()
)
pg_hook = PostgresHook(POSTGRES_CONN_ID) # this is airflow module for Postgres
conn = pg_hook.get_conn()
for table in tables:
sql = template.render(TABLE_NAME=table)
with conn.cursor() as cur:
cur.execute(sql)
conn.commit()
Is there a solution to check the content of the "execute" or "sql" internal variable?
My test looks like this, but because I return nothing, I can't check:
def test_create_tables(mocker):
pg_mock = MagicMock(name="pg")
mocker.patch.object(
PostgresHook,
"get_conn",
return_value=pg_mock
)
mock_file_content = "CREATE TABLE IF NOT EXISTS {{table}}"
mocker.patch("builtins.open", mock_open(read_data=mock_file_content))
create_tables(tables=["mock_table_1", "mock_table_2"])
Thanks,
You cannot access internal variables of a function from the outside.
I strongly suggest refactoring your create_tables function then, if you already see that you want to test a specific part of its algorithm.
You could create a function like get_sql_from_table for example. Then test that separately and mock it out in your test_create_tables.
Although, since all it does is call the template rendering function form an external library, I am not sure, if that is even something you should be testing. You should assume/know that they test their functions themselves.
Same goes for the Postgres functions.
But the general advice stands: If you want to verify that a specific part of your code does what you expect it to do, factor it out into its own unit/function and test that separately.
Within a class in Python:
I have this variable that I pass through a function called "changers"
energy_L_gainer = changers('cap_largeover,','sec_energy','-change')
When I call upon this variable from another function, how can I get it to pass back through the function again to refresh the data in the variable. I have searched everywhere and tried all that my small mind could muster, and I cannot get it to work. Any ideas? Thank you all so much.
Can you rerun the class somehow within the function to refresh the variables?
I called upon this variable within another function using:
self.energy_L_gainer
Here is the changers function for reference:
def changers(cap, sec, sort):
screen = requests.get(f'https://finviz.com/screener.ashx?v=111&f={cap}{sec},geo_usa&ft=4&o={sort}', headers = headers).text
tables = pd.read_html(screen)
tables = tables[-2]
tables.columns = tables.iloc[0]
tables = tables[1:]
return tables
Make it a property on your class
#property
def energy_L_gainer(self):
return changers('cap_largeover,','sec_energy','-change')
Now any reference to self.energy_L_gainer will use the result of that function
I'm not quite sure I'm following you, but you probably need to use self.energy_L_gainer = instead of energy_L_gainer =. In Python those two aren't the same like they can be in some other languages.
I'm writing a CLI to interact with elasticsearch using the elasticsearch-py library. I'm trying to mock elasticsearch-py functions in order to test my functions without calling my real cluster.
I read this question and this one but I still don't understand.
main.py
Escli inherits from cliff's App class
class Escli(App):
_es = elasticsearch5.Elasticsearch()
settings.py
from escli.main import Escli
class Settings:
def get(self, sections):
raise NotImplementedError()
class ClusterSettings(Settings):
def get(self, setting, persistency='transient'):
settings = Escli._es.cluster\
.get_settings(include_defaults=True, flat_settings=True)\
.get(persistency)\
.get(setting)
return settings
settings_test.py
import escli.settings
class TestClusterSettings(TestCase):
def setUp(self):
self.patcher = patch('elasticsearch5.Elasticsearch')
self.MockClass = self.patcher.start()
def test_get(self):
# Note this is an empty dict to show my point
# it will contain childs dict to allow my .get(persistency).get(setting)
self.MockClass.return_value.cluster.get_settings.return_value = {}
cluster_settings = escli.settings.ClusterSettings()
ret = cluster_settings.get('cluster.routing.allocation.node_concurrent_recoveries', persistency='transient')
# ret should contain a subset of my dict defined above
I want to have Escli._es.cluster.get_settings() to return what I want (a dict object) in order to not make the real HTTP call, but it keeps doing it.
What I know:
In order to mock an instance method I have to do something like
MagicMockObject.return_value.InstanceMethodName.return_value = ...
I cannot patch Escli._es.cluster.get_settings because Python tries to import Escli as module, which cannot work. So I'm patching the whole lib.
I desperately tried to put some return_value everywhere but I cannot understand why I can't mock that thing properly.
You should be mocking with respect to where you are testing. Based on the example provided, this means that the Escli class you are using in the settings.py module needs to be mocked with respect to settings.py. So, more practically, your patch call would look like this inside setUp instead:
self.patcher = patch('escli.settings.Escli')
With this, you are now mocking what you want in the right place based on how your tests are running.
Furthermore, to add more robustness to your testing, you might want to consider speccing for the Elasticsearch instance you are creating in order to validate that you are in fact calling valid methods that correlate to Elasticsearch. With that in mind, you can do something like this, instead:
self.patcher = patch('escli.settings.Escli', Mock(Elasticsearch))
To read a bit more about what exactly is meant by spec, check the patch section in the documentation.
As a final note, if you are interested in exploring the great world of pytest, there is a pytest-elasticsearch plugin created to assist with this.
I love CherryPy's API for sessions, except for one detail. Instead of saying cherrypy.session["spam"] I'd like to be able to just say session["spam"].
Unfortunately, I can't simply have a global from cherrypy import session in one of my modules, because the cherrypy.session object isn't created until the first time a page request is made. Is there some way to get CherryPy to initialize its session object immediately instead of on the first page request?
I have two ugly alternatives if the answer is no:
First, I can do something like this
def import_session():
global session
while not hasattr(cherrypy, "session"):
sleep(0.1)
session = cherrypy.session
Thread(target=import_session).start()
This feels like a big kludge, but I really hate writing cherrypy.session["spam"] every time, so to me it's worth it.
My second solution is to do something like
class SessionKludge:
def __getitem__(self, name):
return cherrypy.session[name]
def __setitem__(self, name, val):
cherrypy.session[name] = val
session = SessionKludge()
but this feels like an even bigger kludge and I'd need to do more work to implement the other dictionary functions such as .get
So I'd definitely prefer a simple way to initialize the object myself. Does anyone know how to do this?
For CherryPy 3.1, you would need to find the right subclass of Session, run its 'setup' classmethod, and then set cherrypy.session to a ThreadLocalProxy. That all happens in cherrypy.lib.sessions.init, in the following chunks:
# Find the storage class and call setup (first time only).
storage_class = storage_type.title() + 'Session'
storage_class = globals()[storage_class]
if not hasattr(cherrypy, "session"):
if hasattr(storage_class, "setup"):
storage_class.setup(**kwargs)
# Create cherrypy.session which will proxy to cherrypy.serving.session
if not hasattr(cherrypy, "session"):
cherrypy.session = cherrypy._ThreadLocalProxy('session')
Reducing (replace FileSession with the subclass you want):
FileSession.setup(**kwargs)
cherrypy.session = cherrypy._ThreadLocalProxy('session')
The "kwargs" consist of "timeout", "clean_freq", and any subclass-specific entries from tools.sessions.* config.