When I write with business logic, my code often depends on the current time. For example the algorithm which looks at each unfinished order and checks if an invoice should be sent (which depends on the no of days since the job was ended). In these cases creating an invoice is not triggered by an explicit user action but by a background job.
Now this creates a problem for me when it comes to testing:
I can test invoice creation itself easily
However it is hard to create an order in a test and check that the background job identifies the correct orders at the correct time.
So far I found two solutions:
In the test setup, calculate the job dates relative to the current date. Downside: The code becomes quite complicated as there are no explicit dates written anymore. Sometimes the business logic is pretty complex for edge cases so it becomes hard to debug due to all these relative dates.
I have my own date/time accessor functions which I use throughout my code. In the test I just set a current date and all modules get this date. So I can simulate an order creation in February and check that the invoice is created in April easily. Downside: 3rd party modules do not use this mechanism so it's really hard to integrate+test these.
The second approach was way more successful to me after all. Therefore I'm looking for a way to set the time Python's datetime+time modules return. Setting the date is usually enough, I don't need to set the current hour or second (even though this would be nice).
Is there such a utility? Is there an (internal) Python API that I can use?
Monkey-patching time.time is probably sufficient, actually, as it provides the basis for almost all the other time-based routines in Python. This appears to handle your use case pretty well, without resorting to more complex tricks, and it doesn't matter when you do it (aside from the few stdlib packages like Queue.py and threading.py that do from time import time in which case you must patch before they get imported):
>>> import datetime
>>> datetime.datetime.now()
datetime.datetime(2010, 4, 17, 14, 5, 35, 642000)
>>> import time
>>> def mytime(): return 120000000.0
...
>>> time.time = mytime
>>> datetime.datetime.now()
datetime.datetime(1973, 10, 20, 17, 20)
That said, in years of mocking objects for various types of automated testing, I've needed this approach only very rarely, as most of the time it's my own application code that needs the mocking, and not the stdlib routines. After all, you know they work already. If you are encountering situations where your own code has to handle values returned by library routines, you may want to mock the library routines themselves, at least when checking how your own app will handle the timestamps.
The best approach by far is to build your own date/time service routine(s) which you use exclusively in your application code, and build into that the ability for tests to supply fake results as required. For example, I do a more complex equivalent of this sometimes:
# in file apptime.py (for example)
import time as _time
class MyTimeService(object):
def __init__(self, get_time=None):
self.get_time = get_time or _time.time
def __call__(self):
return self.get_time()
time = MyTimeService()
Now in my app code I just do import apptime as time; time.time() to get the current time value, whereas in test code I can first do apptime.time = MyTimeService(mock_time_func) in my setUp() code to supply fake time results.
Update: Years later there's an alternative, as noted in Dave Forgac's answer.
The freezegun package was made specifically for this purpose. It allows you to change the date for code under test. It can be used directly or via a decorator or context manager. One example:
from freezegun import freeze_time
import datetime
#freeze_time("2012-01-14")
def test():
assert datetime.datetime.now() == datetime.datetime(2012, 1, 14)
For more examples see the project: https://github.com/spulec/freezegun
You can patch the system, by creating a custom datetime module (even a fake one - see example below) acting as a proxy and then insert it in sys.modules dictionary. From there on, each import to the datetime module will return your proxy.
There is still the caveat of datetime class, especially when someone does from datetime import datetime; for that, you can simply add another proxy only for that class.
Here is an example of what I am saying - of course it is just something I've thrown in 5 minutes, and may have several issues (for instance, the type of datetime class is not correct); but hopefully it may already be of use.
import sys
import datetime as datetime_orig
class DummyDateTimeModule(sys.__class__):
""" Dummy class, for faking datetime module """
def __init__(self):
sys.modules["datetime"] = self
def __getattr__(self, attr):
if attr=="datetime":
return DummyDateTimeClass()
else:
return getattr(datetime_orig, attr)
class DummyDateTimeClass(object):
def __getattr__(self, attr):
return getattr(datetime_orig.datetime, attr)
dt_fake = DummyDateTimeModule()
Finally - is it worth?
Frankly speaking, I like our second solution much more than this one :-).
Yes, python is a very dynamic language, where you can do quite a lot of interesting things, but patching code in this way has always a certain degree of risk, even if we are talking here of test code.
But mostly, I think the accessory function would make test patching more explicit, and also your code would be more explicit in terms of what it is going to be tested, thus increasing readability.
Therefore, if the change is not too expensive, I would go for your second approach.
I would use the helpers from the 'testfixtures' package to mock out the date, datetime or time calls you're making:
http://packages.python.org/testfixtures/datetime.html
Well one way to do it is to dynamic patch the time /datetime module
something like
import time
import datetime
class MyDatetime:
def now(self):
return time.time()
datetime.datetime = MyDatetime
print datetime.datetime().now()
there might be few ways of doing this, like creating the orders (with the current timestamp) and then changing that value in the DB directly by some external process (assuming data is in the DB).
I'll suggest something else. Have you though about running your application in a virtual machine, setting the time to say Feb, creating orders, and then just changing the VMs time? This approach is the closest as you can get to the real-life situation.
Related
I want to define a bunch of config variables that can be imported in all the modules in my project. The values of those variables will be constant during runtime but are not known before runtime; they depend on the input. Usually I'd define a dict in my top module which would be passed to all functions and classes from other modules; however, I was thinking it may be cleaner to simply create a blank config.py module which would be dynamically filled with config variables by the top module:
# top.py
import config
config.x = x
# config.py
x = None
# other.py
import config
print(config.x)
I like this approach because I don't have to save the parameters as attributes of classes in my other modules; which makes sense to me because parameters do not describe classes themselves.
This works but is it considered bad practice?
The question as such may be disputed. But I would generally say yes, it's "bad practice" because scope and impact of change is really getting blurred. Note the use case you're describing really is not about sharing configuration, but about different parts of the program functions, objects, modules exchanging data and as such it's a bit of a variation on (meta)global variable).
Reading common configuration values could be fine, but changing them along the way... you may lose track of what happened where and also in which order as modules get imported / values get modified. For instance assume the config.py and two modules m1.py:
import config
print(config.x)
config.x=1
and m2.py:
import config
print(config.x)
config.x=2
and a main.py that just does:
import m1
import m2
import config
print(config.x)
or:
import m2
import m1
import config
print(config.x)
The state in which you find config in each module and really any other (incl. main.py here) depends on order in which imports have occurred and who assigned what value when. Even for a program entirely under your control, this may get confusing (and source of mistakes) rather quickly.
For runtime data and passing information between objects and modules (and your example is really that and not configuration that is predefined and shared between modules) I would suggest you look into describing the information perhaps in a custom state (config) object and pass it around through appropriate interface. But really just a function / method argument may be all that is needed. The exact form depends on what exactly you're trying to achieve and what your overall design is.
In your example, other.py behaves differently when called or imported before top.py which may still seem obvious and manageable in a minimal example, but really is not a very sound design. Anyone reading the code (incl. future you) should be able to follow its logic and this IMO breaks its flow.
The most trivial (and procedural) example of what for what you've described and now I hopefully have a better grasp of would be other.py recreating your current behavior:
def do_stuff(value):
print(value) # We did something useful here
if __name__ == "__main__":
do_stuff(None) # Could also use config with defaults
And your top.py presumably being the entry point and orchestrating importing and execution doing:
import other
x = get_the_value()
other.do_stuff(x)
You can of course introduce an interface to configure do_stuff perhaps a dict or a custom class even with default implementation in config.py:
class Params:
def __init__(self, x=None):
self.x = x
and your other.py:
def do_stuff(params=config.Params()):
print(params.x) # We did something useful here
And on your top.py you can use:
params = config.Params(get_the_value())
other.do_stuff(params)
But you could also have any use case specific source of value(s):
class TopParams:
def __init__(self, url):
self.x = get_value_from_url(url)
params = TopParams("https://example.com/value-source")
other.do_stuff(params)
x could even be a property which you retrieve every time you access it... or lazily when needed and then cached... Again, it really then is a matter of what you need to do.
"Is it bad practice to modify attributes of one module from another module?"
that it is considered as bad practice - violation of the law of demeter, which means in fact "talk to friends, not to strangers".
Objects should expose behaviour and functions, but should HIDE the data.
DataStructures should EXPOSE data, but should not have any methods (which are exposed). The law of demeter does not apply to such DataStructures. OOP Purists might cover such DataStructures with setters and getters, but it really adds no value in Python.
there is a lot of literature about that like : https://en.wikipedia.org/wiki/Law_of_Demeter
and of course, a must to read: "Clean Code", by Robert C. Martin (Uncle Bob), check it out on Youtube also.
For procedural programming it is perfectly normal to keep data in a DataStructure which does not have any (exposed) methods.
The procedures in the program work with that data. Consider to use the module attrs, see : https://www.attrs.org/en/stable/ for easy creation of such classes.
my prefered method for keeping config is (here without using attrs):
# conf_xy.py
"""
config is code - so why use damned parsers, textfiles, xml, yaml, toml and all that
if You just can use testable code as config that can deliver the correct types, etc.
as well as hinting in Your favorite IDE ?
Here, for demonstration without using attrs package - usually I use attrs (read the docs)
"""
class ConfXY(object):
def __init__(self) -> None:
self.x: int = 1
self.z: float = get_z_from_input()
...
conf_xy=ConfXY()
# other.py
from conf_xy import conf_xy
...
y = conf_xy.x * 2
...
After dozens of research on the subject and a lot of thinking, I leave it to you in this new question:
Is it possible to mock an entire library with Python? I would like the import of this library and all its packages / modules / etc to be done without having to define each element by hand, with mock and sys.module ... :(
In my case, I use a library specific to the job and I would like to be able to work on my code at home, without having to recode my imports, on code which is not dependent on this library.
Example:
"""Main file.
I define the mock here.
"""
mocked = MagicLibraryMock("mylib") # the dream
"""File with lib imports.
I can import anything and use it as a mock.
"""
import mylib
from mylib.a import b
from mylib.z import c
from mylib.a.e.r import x
foo = x()
bar = c.a.e.r.t.d()
bar.side_effect = [1, 2, 3]
bar()
I tried to integrate a class inherited from a dictionary to overload the __getitem__ method of sys.modules. But the problem is that the import method also uses __iter__, and there it becomes much more complicated to return a MagicMock according to the result, knowing that it is not recommended to directly modify the import source code - source.
Finally I lose less time extracting imports from my application to sub-modules which will take care of solving them. I can thus intercept these imports more easily without dirtying my code.
The design is more interesting.
Thanks for your help.
In first.py, I imported the datetime library and called a method which is written in second.py. The method works well without importing the datetime libirary in second.py.
first.py
from datetime import datetime
import second
def method1(time):
return datetime.strptime(time,"%Y/%m/%d")
a = method1("2019/08/01")
b = second.method2(a)
second.py
def method2(para1):
return para1.second
Output
0
Should second.py import datetime so that para1.second can work? Can someone help explain the rationale behind?
You only need to import modules explicitly when you need to use their names. In first.py, for example, you're using things in the datetime module directly, and referring to it by name. So, you do import datetime, and then call datetime.strptime() on that module you imported.
In second.py, however, you don't have to do import datetime. This is because of how python handles attributes - when you do para1.second, python doesn't need to know exactly what type of variable para1 is - it just checks to see whether it has a field called second. And it does, so it returns that. Nowhere in second.py are you referring to datetime directly - only indirectly, via a variable that was defined from it.
Also consider that the datetime module does a lot of stuff on its own, and almost certainly imports other dependencies that you're not aware of and you're not importing yourself. But you can still use the datetime module, because you don't need to explicitly refer to those modules it's using behind the scenes. They're still in memory somewhere, and if you call certain methods from datetime, that code will still get executed, but you don't need to be directly aware of it.
Python usually uses duck typing1. This means that instead of requiring a particular type for an object, it looks at the actual attributes it has.
What this means in your case is that method2 does not care in the slightest whether you pass in a datetime object or not. All that's required is that the input para1 have a second attribute.
Importing datetime into second.py would be counter-productive. It wouldn't affect the operation of your method in any way, but it would polute your namespace and set up an implication that isn't necessarily true.
1 A notable counterexample is a sum of strings, e.g. sum(['a', 'b'], ''). Aside from that, your own code can choose what to do as you see fit if course.
I spent last months rewriting from scratch a new version of my Python algorithm. One of my goals was to write a perfectly documented code, easy to read and understand for "anyone".
In the same project folder I put a lot of different modules and each module contain a class. I used classes as functions and related variables container, in that way a class contain all the functions with a specific task, for example wrinting on Excel files all the output results of the algorithm.
Here an example:
Algorithm.py
import os
import pandas as pd
import numpy as np
from Observer import Observer
def main(hdf_path):
for hdf_file in os.listdir(hdf_path):
filename = str(hdf_file.replace('.hdf', '.xlsx'))
Observer.create_workbook(filename)
dataframe = pd.read_hdf(hdf_file)
years_array = dataframe.index.levels[0].values
for year in years_array:
year_mean = np.mean(dataframe.loc[year].values)
Observer.mean_values = np.append(Observer.mean_values, dataframe_mean)
Observer.export_result()
if __name__ == "main":
hdf_path = 'bla/bla/bla/'
main(hdf_path)
Observer.py
import numpy as np
import openpyxl
class Observer:
workbook = None
workbookname = None
mean_values = np.array([])
def create_workbook(filename):
Observer.workbook = openpyxl.Workbook()
Observer.workbookname = filename
# do other things
def save_workbook():
Observer.workbook.save('results_path' + Observer.workbookname)
def export_results():
# print Observer.mean_values values in different workbook cells
# export result on a specific sheet
I hope that you can understand from this simple example how do I use class on my project. For every class I define a lot of variables (workbook for example) and I call them from other modules as if they were global variables. In that way I can easily access them from anywhere and I dont need to pass them to functions explicitly, cause I can simply write Classname.varname.
My question is: is it bad design? Will it create some problems or performance slowdown?
Thanks for your help.
My question is: is it bad design?
Yes.
I can simply write Classname.varname.
You are creating a very strong coupling between classes when you enforce calling Classname.varname. The class that access this variable is now strongly coupled with Classname. This prevent you from changing the behavior in OOP way by passing different parameters, and will complicate testing of the class - since you will be unable to mock Classname and use its mock instead of the "real" class.
This will result in code duplication when you try to run 2 pieces of very similar code in two parts of your app, which only vary in these parameters. You will end up creating two almost identical classes, one using Workbook and the other using Notepad classes.
And remember the vicious cycle:
Hard to test code -> Fear of refactor -> Sloppy code
^ |
| |
---------------------------------------
Using proper objects, with ability to mock them (and dependency injection) is going to guarantee your code is easily testable, and the rest will will follow.
I am writing an app which depends heavily on dates and times. I want to be able to have an injectable concept of now() and today(). I was thinking that I could write my own versions of these two functions which would check some central setting, to which I will refer to as INJECTED_NOW. If INJECTED_NOW is None, the above functions would just return the values of datetime.datetime.now() and datetime.date.today(). However, if INJECTED_NOW has a datetime value, the above functions would use it to get now() and today().
I am wondering how I could store INJECTED_NOW so that it is mutable. I would like to be able to set it at the beginning of a test case and modify it before another test case. Similarly, I would like to be able to set it from the request, perhaps using middleware.
Does this approach make sense, and if so, how should I store INJECTED_NOW? I would like to avoid a DB access. Is there an alternate way of addressing this problem?
There's a recently released library called FreezeGun that lets specify datetimes like you describe:
http://stevepulec.com/freezegun/
Here is a way to do it using mock, for more information about mock see the docs
# this should be the code your are testing
import datetime
def one_minute_ago():
return (datetime.datetime.now() - datetime.timedelta(seconds=60)).time()
# this would be in your tests file
import mock
import sys
import unittest
class SomeTestcase(unittest.TestCase):
def test_one_minute_ago(self):
real_datetime = datetime.datetime
fake_now = datetime.datetime(2012, 12, 21, 11, 13, 13)
with mock.patch('datetime.datetime', spec=datetime.datetime) as datetime_mock:
datetime_mock.now.return_value = fake_now
self.assertEqual(one_minute_ago(), datetime.time(11, 12, 13))
if __name__ == '__main__':
sys.exit(unittest.main())
To test it just copy the code to a file and run it with Python.