I have a class that has several methods and I would like to write unit tests for them. The problem I'm facing is that this class has an __init__ method that queries a database, imagine something like this:
class MyClass:
accepted_values = ['a', 'b', 'c']
def __init__(self, database_name):
self.database = database_name
self.data = self.query_database()
def query_database(self):
data = query_this_database(self.database)
# clean data
return data
def check_values_in_db(self, column_name):
column = data.column_name
if any(item not in self.accepted_values for item in column):
print('Oh noes!')
else:
print('All good')
Now, given this, I would like to unit test the last method using some mock data, but I can't because if I initialize the class, it will want to query the database. This is further complicated by the fact that to actually make the query, one needs an API key, permissions, etc., which is exactly what I want to avoid during unit testing.
I'm relatively new to OOP and unit testing in general, so I'm not even sure if I structured my class properly: maybe the method query_database() should only be called at a later stage and not in __init__?
EDIT:
Asked to add some details so here goes:
This class belongs to an AWS lambda function that runs on a schedule. Every hour, this class queries the DB for the last hour, and checks a specific column against some pre-defined values.
If any value in the column does not belong to those pre-defined values, it sends an alert email. I would like to test this specific functionality, but without having to query the database, but just using mock values.
I edited the code accordingly to reflect what I mean.
You can still call a class method without making an instance of it, but you will have problems if you're trying to call attributes that you have defined in __init__.
You might also want to think about making a blank variable data outside of the __init__ and then call classinstance.data = classinstance.query_database() in your main code.
You can use unittest.mock to do this kind of stuff.
An uglier way is to override your class's init method in a mock class like:
class MyMockedClass(MyClass):
def __init__(self):
self.database = not_a_real_database()
self.data = not_real_data()
and then use this new class in your tests.
For your last question, it depends on your project structure and frameworks you might be using. You should ask codereview for advice on your project structure.
Related
I have a class that looks like the following
class A:
communicate = set()
def __init__(self):
pass
...
def _some_func(self):
...some logic...
self.communicate.add(some_var)
The communicate variable is shared among the instances of the class. I use it to provide a convenient way for the instances of this class to communicate with one another (they have some mild orchestration needed and I don't want to force the calling object to serve as an intermediary of this communication). However, I realized this causes problems when I run my tests. If I try to test multiple aspects of my code, since the python interpreter is the same throughout all the tests, I won't get a "fresh" A class for the tests, and as such the communicate set will be the aggregate of all objects I add to that set (in normal usage this is exactly what I want, but for testing I don't want interactions between my tests). Furthermore, down the line this will also cause problems in my code execution if I want to loop over my whole process multiple times (because I won't have a way of resetting this class variable).
I know I can fix this issue where it occurs by having the creator of the A objects do something like
A.communicate = set()
before it creates and uses any instances of A. However, I don't really love this because it forces my caller to know some details about the communication pathways of the A objects, and I don't want that coupling. Is there a better way for me to to reset the communicate A class variable? Perhaps some method I could call on the class instead of an instance itself like (A.new_batch()) that would perform this resetting? Or is there a better way I'm not familiar with?
Edit:
I added a class method like
class A:
communicate = set()
def __init__(self):
pass
...
#classmethod
def new_batch(cls):
cls.communicate = set()
def _some_func(self):
...some logic...
self.communicate.add(some_var)
and this works with the caller running A.new_batch(). Is this the way it should be constructed and called, or is there a better practice here?
Say I have the following class and i want to test it.
class SearchRecommended:
def __init__(self, request2template):
self._r2t = request2template
def handle(self, request: Request):
return request.user().queries().add_recommendation_query().run(1).print(
RecommendedSearchMedia(self._r2t(request))
).message(RecommendedSearchMessage)
The object returned by .user() belongs to the User "interface" and is database-related.
class User(Equalable, ABC):
#abstractmethod
def user_id(self):
pass
#abstractmethod
def lang(self):
pass
#abstractmethod
def queries(self) -> "UserQueries":
pass
#abstractmethod
def subscriptions(self) -> "UserSubscriptions":
pass
#abstractmethod
def notifications(self) -> "UserSubsNotifications":
pass
#abstractmethod
def access(self) -> "UserAccess":
pass
def repr(self):
return self.user_id()
UserQueries, UserSubscriptions, UserSubsNotifications, UserAccess are also base classes for database-interacting classes.
As far as I know, unit-tests are meant to be fast and shouldn't use the actual database connection.
Unit tests also shouldn't know too much about the inner structure of the code they are testing.
Mocking the whole database interaction layer is tedious, but mocking only methods used in the method being tested seems like "knowing too much" about the inner code.
Shouldn't my code in the .handle method be free to call whatever method it pleases from User interface (or the object it is being mocked by) and subsequent persistence layer classes (as long as those calls are correct for the given interfaces),
unless I explicitly test for the orded of methods called?
Am I getting something wrong & what should I do?
Your method handle is not suitable for being tested in unit-testing. The only thing that handle does is interacting with other code. But, for testing interactions with other code you rather use integration testing.
The background is, that with any kind of testing your goal is to find bugs. With unit-testing you try to find the bugs in the isolated code. But, if you really isolate your code - what bugs are there to find?
The bugs in your code are more in the direction of "am I calling the proper methods of the other objects with the right arguments in the right order and will the return values be in the form that I expect them to be." All these questions will not be answered by unit-testing, but by integration testing instead.
Your unit need to make sure the class does what it is supposed to do.
In order to accomplish that your class needs certain things to function, in this case a version of the class User.
Your class knows enough about User to call its methods and the results of the methods, so your tests have have enough for those calls to work as expected.
Your mocks don't actually have to make a fake database, or have real functionality, it just has to look like it does. If you only really care about make sure that the data layer is called in order, just have each step of the function chain set a var to true or something, and verify that all of the var are true at the end of the test. Not great, but it makes sure that this class calls the data layer as expected.
Long term, if you keep having to do something like this, make a test double for User or similar classes, add functionality as needed.
I am struggling to understand when it makes sense to use an instance method versus a static method. Also, I don't know if my functions are static since there is not a #staticmethod decorator. Would I be able to access the class functions when I make a call to one of the methods?
I am working on a webscraper that sends information to a database. It’s setup to run once a week. The structure of my code looks like this
import libraries...
class Get:
def build_url(url_paramater1, url_parameter2, request_date):
return url_with_parameters
def web_data(request_date, url_parameter1, url_parameter2): #no use of self
# using parameters pull the variables to look up in the database
for a in db_info:
url = build_url(a, url_parameter2, request_date)
x = requests.Session().get(url, proxies).json()
#save data to the database
return None
#same type of function for pulling the web data from the database and parsing it
if __name__ == ‘__main__’:
Get.web_data(request_date, url_parameter1, url_parameter2)
Parse.web_data(get_date, parameter) #to illustrate the second part of the scrapper
That is the basic structure. The code is functional but I don’t know if I am using the methods (functions?) correctly and potentially missing out on ways to use my code in the future. I may even be writing bad code that will cause errors down the line that are impossibly hard to debug only because I didn’t follow best practices.
After reading about when class and instance methods are used. I cannot see why I would use them. If I want the url built or the data pulled from the website I call the build_url or get_web_data function. I don’t need an instance of the function to keep track of anything separate. I cannot imagine when I would need to keep something separate either which I think is part of the problem.
The reason I think my question is different than the previous questions is: the conceptual examples to explain the differences don't seem to help me when I am sitting down and writing code. I have not run into real world problems that are solved with the different methods that show when I should even use an instance method, yet instance methods seem to be mandatory when looking at conceptual examples of code.
Thank you!
Classes can be used to represent objects, and also to group functions under a common namespace.
When a class represents an object, like a cat, anything that this object 'can do', logically, should be an instance method, such as meowing.
But when you have a group of static functions that are all related to each other or are usually used together to achieve a common goal, like build_url and web_data, you can make your code clearer and more organized by putting them under a static class, which provides a common namespace, like you did.
Therefore in my opinion the structure you chose is legitimate. It is worth considering though, that you'd find static classes more in more definitively OOP languages, like Java, while in python it is more common to use modules for namespace separation.
This code doesn't need to be a class at all. It should just be a pair of functions. You can't see why you would need an instance method because you don't have a reason to instantiate the object in the first place.
The functions you have wrote in your code are instance methods but they were written incorrectly.
An instance method must have self as first parameter
i.e def build_url(self, url_paramater1, url_parameter2, request_date):
Then you call it like that
get_inst = Get()
get_inst.build_url(url_paramater1, url_parameter2, request_date)
This self parameter is provided by python and it allow you to access all properties and functions - static or not - of your Get class.
If you don't need to access other functions or properties in your class then you add #staticmethod decorator and remove self parameter
#staticmethod
def build_url(url_paramater1, url_parameter2, request_date):
And then you can call it directly
Get.build_url(url_paramater1, url_parameter2, request_date)
or call from from class instance
get_inst = Get()
get_inst.build_url(url_paramater1, url_parameter2, request_date)
But what is the problem with your current code you might ask?
Try calling it from an instance like this and u will see the problem
get_inst = Get()
get_inst.build_url(url_paramater1, url_parameter2, request_date)
Example where creating an instance is useful:
Let's say you want to make a chat client.
You could write code like this
class Chat:
def send(server_url, message):
connection = connect(server_url)
connection.write(message)
connection.close()
def read(server_url):
connection = connect(server_url)
message = connection.read()
connection.close()
return message
But a much cleaner and better way to do it:
class Chat:
def __init__(server_url):
# Initialize connection only once when instance is created
self.connection = connect(server_url)
def __del__()
# Close connection only once when instance is deleted
self.connection.close()
def send(self, message):
self.connection.write(message)
def read(self):
return self.connection.read()
To use that last class you do
# Create new instance and pass server_url as argument
chat = Chat("http://example.com/chat")
chat.send("Hello")
chat.read()
# deleting chat causes __del__ function to be called and connection be closed
delete chat
From given example, there is no need to have Get class after all, since you are using it just like a additional namespace. You do not have any 'state' that you want to preserve, in either class or class instance.
What seems like a good thing is to have separate module and define these functions in it. This way, when importing this module, you get to have this namespace that you want.
A somewhat simple question, but I'd like some help understanding exactly how classes work in Python. Specifically, when are class variables set, and are they overwritten every time an instance of that class is created?
Here is the general situation. I have a table of data that I would like all instances of a certain class to be able to access. The problem is that this table sits on a database server. I want to connect to the database automatically and retrieve this table into memory the first time a instance of this class is created, but not any time after that.
It seems like a class variable would work, but I just wanted to verify that this would be the case. My program is going to be creating hundreds of instances of this class so I definitely don't want to be accidentally making hundreds of database calls for the same table or storing hundreds of copies of the same table in memory.
Would I be able to write something like the following? Assume that "get_table_from_db" is a function that connects to a database and returns a pandas dataframe with the relevant table.
class ExampleClass:
table1 = get_table_from_db(arguments)
def __init__(self, x, y):
self.value = self.table1.iloc[x, y]
obj1 = ExampleClass(0, 0)
obj2 = ExampleClass(0, 1)
Basically, when I run this script, I want to run "get_table_from_db" once, either before or when obj1 is created and store the relevant dataframe to table1. When obj2 is created, I do not want to call "get_table_from_db" again because table1 already exists. However, I still want it to have access to table1 for some of its methods because it will need that data available to it (as an example, I assigned self.value equal to a specific cell in the table, but my actual code is going to be more complicated than that).
Also, a similar question. If the above method works, can I then reset a class variable from within the class? For example, will something like the following allow me to modify (and by modify I mean redownload) the class variable for all instances of the class?
class ExampleClass:
table1 = get_table_from_db(arguments)
def __init__(self, x, y):
self.value = self.table1.iloc[x, y]
def reset_table(self):
ExampleClass.table1 = get_table_from_db(arguments)
The class variable will be initialized once, when the class is declared (so before either obj1 or obj2 are created), so your code will work the way you think.
With regards to your second question, you should do it like this:
class ExampleClass:
#classmethod
def reset_table(cls):
cls.table1 = get_table_from_db(arguments)
That way you can call either obj.reset_table() or ExampleClass.reset_table() without needing an instance of ExampleClass.
I am currently trying to implement a python class that automatically synchronized with a NoSQL database with implicit buffering, quite to the image of the SLQAlchemy.
In order to do this, I need to track attribute updates issued by the user and, on each attribute update, call functions that keep that object in synchronization with the database or buffer.
What is the best way of doing this in Python? If it passes through __setattr__ and __delattr__, how do I do it correctly, to avoid messing up with garbage collector?
One way to do it (the way I would recommend) is to use descriptors.
First you make a class for your properties, something like:
class Property:
def __init__(self, *args, **kwargs):
#initialize the property with any information it needs to do get and set
def __get__(self,obj, type=None):
#logic to get from database or cache
def __set__(self,obj, value):
#logic to set the value and sync with database if necessary.
And then in your class entity class you have something like this:
class Student:
student_id = Property(...)
name = Property(...)
classes = Property(...)
Of course in practice you may have multiple Property types. My guess is that SQLAlchemy does something like this, where Column types are descriptors.