Caching an instance's method indefinitely raises pylint warning

Caching an instance's method indefinitely raises pylint warning - python

class A:
#cache
def extremely_long_and_expensive_function(self) -> None:
# series of instructions that MUST access self
Pylint complains as follows:
lru_cache(maxsize=None)' or 'cache' will keep all method args alive
indefinitely, including 'self'pylint(method-cache-max-size-none)
But I could not find a satisfying solution online that actually tells me how
to cache that method without having to create some contrived rube-goldberg machine.
How do I memoize expensive_function so that the method is run exactly once and no more, no matter how many times I launch it?
Others have suggested using #cached_property, but this is not a property, so it feels wrong to write A().expensive_function. It's a function that executes initialization commands that are not always needed in every instance, and it doesn't return anything, so it should not be a property.
Surely there's some simple way to do this that I'm missing, I don't want to believe that such a simple use case requires a Frankenstein reimplementation like the answer in https://stackoverflow.com/a/33672499/11558993.

Related

Avoid type warnings when mocking objects in unit tests?

Assuming I have a function that takes a complex object and does something with it:
def foo(bar: SomeComplexObject):
...
In unit tests bar will be replaced by a mock object, but that of courses now raises type warnings. Should I simply ignore or suppress these or is there a proper way to deal with them (without changing the original function signature of course)?
Update: I've seen now that this is an open issue on mypy, but its been in that state for over two years. Has any consensus emerged on how to work around this?

I'm going to put my 2¢ in and say that you should type-check your testsuite. Its still code and static type checking will help you write better code faster.
That leaves the question of how. Ideally, if your function expects SomeComplexObject then you also pass in an instance thereof. Either by building one in your test fixtures, or by subclassing and overriding what you don't need. The best unit test is the one that operates on proper input.
That still leaves the case where this is impractical or we explicitly want to test how invalid input is handled. In that case just explicitly cast your mock to the type that mypy requires:
from typing import cast
def test_foo():
mock_bar = cast(SomeComplexObject, MockBar())
foo(mock_bar)

python mocking: mock.patch.object gotchas

I have been writing unit tests for over a year now, and have always used patch.object for pretty much everything (modules, classes, etc).
My coworker says that patch.object should never be used to patch an object in a module (i.e. patch.object(socket, 'socket'), instead you should always use patch('socket.socket').
I much prefer the patch.object method, as it allows me to import modules and is more pythonic in my opinion. Is my coworker right?
Note: I have looked through the patch documentation and can't find any warnings on this subject. Isn't everything an object in python?

There is no such requirement, and yes, everything is an object in Python.
It is nothing more than a style choice; do you import the module yourself or does patch take care of this for you? Because that's the only difference between the two approaches; either patch() imports the module, or you do.
For the code-under-test, I prefer mock.patch() to take care of this, as this ensures that the import takes place as the test runs. This ensures I get a test error state (test failure), rather than problems while loading the test. All other modules are fair game.

Looking at the mock source code, it really doesn't look like there is a difference.
To investigate I first looked at def patch and see that it does:
getter, attribute = _get_target(target)
return _patch(
getter, attribute, new, spec, create,
spec_set, autospec, new_callable, kwargs)
wheras patch.object does the same except: getter = lambda: target
Ok, so what does this _get_target do? It pretty much splits the string and calls _importer on the first part (making an object) and uses the string the same way as get_object.
_importer is a pretty simple mechanism to import from a module (using getattr for every "component"), and pretty clearly just makes an object as well.
So fundamentally, at the source level, there is not really any difference.
Case Closed

Good way to isolate tests that depend on an initializer

I've really tried to start isolating my unit tests so I can pinpoint where errors occur rather than having my entire screen turn red with failures when one thing goes wrong. It's been working in all instances except when something in an initializer fails.
Check out these tests:
#setup_directory(test_path)
def test_filename(self):
flexmock(lib.utility.time).should_receive('timestamp_with_random').and_return(1234)
f = SomeFiles(self.test_path)
assert f.path == os.path.join(self.test_path, '1234.db')
#setup_directory(test_path)
def test_filename_with_suffix(self):
flexmock(lib.utility.time).should_receive('timestamp_with_random').and_return(1234)
f = SomeFiles(self.test_path, suffix='.txt')
assert f.path == os.path.join(self.test_path, '1234.txt')
I'm mocking dependent methods so that the thing I'm testing is completely isolated. What you notice is that the class needs to be instantiated for every single test. If an error is introduced in the initializer, every single test fails.
This is the offending constructor that calls the class's initializer:
SomeFiles(*args)
Is there a way to isolate or mock the initializer or object constructor?

I'm not sure what testing packages you're using, but in general, you can usually just mock the __init__() call on the class before actually attempting to use it. Something like
def my_init_mock_fn(*args, **kwargs):
print 'mock_init'
SomeFiles.__init__ = my_init_mock_fn
SomeFiles()
This isn't probably exactly what you want as from this point on SomeFiles.__init__ fn will always be the mock fn, but there are utilities like voidspace mock that provide a patch function that allow you to patch the class just for a specific scope.
from mock import patch
with patch.object(SomeFiles, '__init__', my_init_mock_fn):
SomeFiles()
..other various tests...
SomeFiles() #__init__ is reset to original __init__ fn
I'm sure there's probably similar functionality in whatever mocking package you are using.
Just realized you're using flexmock, there's a page for replace_with here.

What's causing the initialising function to fail? Maybe that's a bug that you should be looking into.
Another thing you can do, instead of mocking the object constructor, is simply mocking its return values. ie: Given this input, I expect this output -- so I'm going to use this expected output whether or not it returned correctly.
You can also stop testing on first failure. (failfast)
You also might want to reconsider how your tests are set up. If you have to recreate two files for every test, maybe ask yourself why. Could your tests be structured that you set up the two files, then run a series of tests, rinse and repeat. This would make it so only the series of tests assigned to that path fail, helping you isolate why it failed at all.

What is your strategy to avoid dynamic typing errors in Python (NoneType has no attribute x)?

I'm not sure if I like Python's dynamic-ness. It often results in me forgetting to check a type, trying to call an attribute and getting the NoneType (or any other) has no attribute x error. A lot of them are pretty harmless but if not handled correctly they can bring down your entire app/process/etc.
Over time I got better predicting where these could pop up and adding explicit type checking, but because I'm only human I miss one occasionally and then some end-user finds it.
So I'm interested in your strategy to avoid these. Do you use type-checking decorators? Maybe special object wrappers?
Please share...

forgetting to check a type
This doesn't make much sense. You so rarely need to "check" a type. You simply run unit tests and if you've provided the wrong type object, things fail. You never need to "check" much, in my experience.
trying to call an attribute and
getting the NoneType (or any other)
has no attribute x error.
Unexpected None is a plain-old bug. 80% of the time, I omitted the return. Unit tests always reveal these.
Of those that remain, 80% of the time, they're plain old bugs due to an "early exit" which returns None because someone wrote an incomplete return statement. These if foo: return structures are easy to detect with unit tests. In some cases, they should have been if foo: return somethingMeaningful, and in still other cases, they should have been if foo: raise Exception("Foo").
The rest are dumb mistakes misreading the API's. Generally, mutator functions don't return anything. Sometimes I forget. Unit tests find these quickly, since basically, nothing works right.
That covers the "unexpected None" cases pretty solidly. Easy to unit test for. Most of the mistakes involve fairly trivial-to-write tests for some pretty obvious species of mistakes: wrong return; failure to raise an exception.
Other "has no attribute X" errors are really wild mistakes where a totally wrong type was used. That's either really wrong assignment statements or really wrong function (or method) calls. They always fail elaborately during unit testing, requiring very little effort to fix.
A lot of them are pretty harmless but if not handled correctly they can bring down your entire app/process/etc.
Um... Harmless? If it's a bug, I pray that it brings down my entire app as quickly as possible so I can find it. A bug that doesn't crash my app is the most horrible situation imaginable. "Harmless" isn't a word I'd use for a bug that fails to crash my app.

If you write good unit tests for all of your code, you should find the errors very quickly when testing code.

You can also use decorators to enforce the type of attributes.
>>> #accepts(int, int, int)
... #returns(float)
... def average(x, y, z):
... return (x + y + z) / 2
...
>>> average(5.5, 10, 15.0)
TypeWarning: 'average' method accepts (int, int, int), but was given
(float, int, float)
15.25
>>> average(5, 10, 15)
TypeWarning: 'average' method returns (float), but result is (int)
15
I'm not really a fan of them, but I can see their usefulness.

One tool to try to help you keep your pieces fitting together well is interfaces. zope.interface is the most notable package in the Python world for using interfaces. Check out http://wiki.zope.org/zope3/WhatAreInterfaces and http://glyph.twistedmatrix.com/2009/02/explaining-why-interfaces-are-great.html to start to get an idea how interfaces and z.i in particular work. Interfaces can prove very useful in a large Python codebases.
Interfaces are no substitute for testing. Reasonably comprehensive testing is especially important in highly dynamic languages like Python where there are types of bugs that could not exist in a statically types language. Tests will also help you catch the sorts of bugs that are not unique to dynamic languages. Fortunately, developing in Python means that testing is easy (due to the flexibility) and you have plenty of time to write them that you saved because you're using Python.

One advantage of TDD is that you end up writing code that is easier to write tests for.
Writing code first and then the tests can result in code that superficially works the same, but is much harder to write 100% coverage tests for.
Each case is likely to be different
It might make sense to have a decorator to check whether a particular parameter is None (or some other unexpected value) if you use it in a bunch of places.
Maybe it is appropriate to use the Null pattern - if the code is blowing up because you are setting the initial value to None, you could instead set the initial value to a null version of the object.
More and more wrappers can add up to quite a performance hit though, so it's always better to write code from the start that avoids the corner cases

forgetting to check a type
With duck typing, it shouldn't be necessary to check a type. But that's theory, in reality you will often want to validate input parameters (e.g. checking a UUID with a regex). For that purpose, I created myself some handy decorators for simple type and return type checking which are called like this:
#decorators.params(0, int, 2, str) # first parameter must be integer / third a string
#decorators.returnsOrNone(int, long) # must return an int/long value or None
def doSomething(integerParam, noMatterWhatParam, stringParam):
...
For everything else I mostly use assertions. Of course one often forgets to check a parameter, so it's necessary to test and to test often.
trying to call an attribute
Happens to me very seldom. Actually I often use methods instead of direct access to attributes (the "good" old getter/setter approach sometimes).
because I'm only human I miss one occasionally and then some end-user finds it
"Software is always completed at the customers'." - An anti-pattern which you should solve with unit tests that handle all possible cases in a function. Easier said than done, but it helps...
As for other common Python mistakes (mistyped names, wrong imports, ...), I'm using Eclipse with PyDev for projects (not for small scripts). PyDev warns you about most of the simple kinds of mistakes.

I haven’t done a lot of Python programming, but I’ve done no programming at all in staticly typed languages, so I don’t tend to think about things in terms of variable types. That might explain why I haven’t come across this problem much. (Although the small amount of Python programming I’ve done might explain that too.)
I do enjoy Python 3’s revised handling of strings (i.e. all strings are unicode, everything else is just a stream of bytes), because in Python 2 you might not notice TypeErrors until dealing with unusual real world string values.

You can hint your IDE via function doc, for example: http://www.pydev.org/manual_adv_type_hints.html, in JavaScript the jsDoc helps in a similar way.
But at some point you will face errors that a typed language would avoid immediately without unit tests (via the IDE compilation and the types/inference).
Of course this does not remove the benefit of unit tests, static analysis and assertions. For larger project I tend to use statically typed languages because they have very good IDE support (excellent autocompletion, heavy refactoring...). You can still use scripting or a DSL for some sub part of the project.

Something you can use to simplify your code is using the Null Object Design Pattern (to which I was introduced in Python Cookbook).
Roughly, the goal with Null objects is to provide an 'intelligent'
replacement for the often used primitive data type None in Python or
Null (or Null pointers) in other languages. These are used for many
purposes including the important case where one member of some group
of otherwise similar elements is special for whatever reason. Most
often this results in conditional statements to distinguish between
ordinary elements and the primitive Null value.
This object just eats the lack of attribute error, and you can avoid checking for their existence.
It's nothing more than
class Null(object):
def __init__(self, *args, **kwargs):
"Ignore parameters."
return None
def __call__(self, *args, **kwargs):
"Ignore method calls."
return self
def __getattr__(self, mname):
"Ignore attribute requests."
return self
def __setattr__(self, name, value):
"Ignore attribute setting."
return self
def __delattr__(self, name):
"Ignore deleting attributes."
return self
def __repr__(self):
"Return a string representation."
return "<Null>"
def __str__(self):
"Convert to a string and return it."
return "Null"
With this, if you do Null("any", "params", "you", "want").attribute_that_doesnt_exists() it won't explode, but just silently become the equivalent of pass.
Normally you'd do something like
if obj.attr:
obj.attr()
With this, you just do:
obj.attr()
and forget about it. Beware that extensive use of the Null object can potentially hide bugs in your code.

I tend to use
if x is None:
raise ValueError('x cannot be None')
But this will only work with the actual None value.
A more general approach is to test for the necessary attributes before you try to use them. For example:
def write_data(f):
# Here we expect f is a file-like object. But what if it's not?
if not hasattr(f, 'write'):
raise ValueError('write_data requires a file-like object')
# Now we can do stuff with f that assumes it is a file-like object
The point of this code is that instead of getting an error message like "NoneType has no attribute write", you get "write_data requires a file-like object". The actual bug isn't in write_data(), and isn't really a problem with NoneType at all. The actual bug is in the code that calls write_data(). The key is to communicate that information as directly as possible.

Bad Practice to run code in constructor thats likely to fail?

my question is rather a design question.
In Python, if code in your "constructor" fails, the object ends up not being defined. Thus:
someInstance = MyClass("test123") #lets say that constructor throws an exception
someInstance.doSomething() # will fail, name someInstance not defined.
I do have a situation though, where a lot of code copying would occur if i remove the error-prone code from my constructor. Basically my constructor fills a few attributes (via IO, where a lot can go wrong) that can be accessed with various getters. If I remove the code from the contructor, i'd have 10 getters with copy paste code something like :
is attribute really set?
do some IO actions to fill the attribute
return the contents of the variable in question
I dislike that, because all my getters would contain a lot of code. Instead of that I perform my IO operations in a central location, the constructor, and fill all my attributes.
Whats a proper way of doing this?

There is a difference between a constructor in C++ and an __init__ method
in Python. In C++, the task of a constructor is to construct an object. If it fails,
no destructor is called. Therefore if any resources were acquired before an
exception was thrown, the cleanup should be done before exiting the constructor.
Thus, some prefer two-phase construction with most of the construction done
outside the constructor (ugh).
Python has a much cleaner two-phase construction (construct, then
initialize). However, many people confuse an __init__ method (initializer)
with a constructor. The actual constructor in Python is called __new__.
Unlike in C++, it does not take an instance, but
returns one. The task of __init__ is to initialize the created instance.
If an exception is raised in __init__, the destructor __del__ (if any)
will be called as expected, because the object was already created (even though it was not properly initialized) by the time __init__ was called.
Answering your question:
In Python, if code in your
"constructor" fails, the object ends
up not being defined.
That's not precisely true. If __init__ raises an exception, the object is
created but not initialized properly (e.g., some attributes are not
assigned). But at the time that it's raised, you probably don't have any references to
this object, so the fact that the attributes are not assigned doesn't matter. Only the destructor (if any) needs to check whether the attributes actually exist.
Whats a proper way of doing this?
In Python, initialize objects in __init__ and don't worry about exceptions.
In C++, use RAII.
Update [about resource management]:
In garbage collected languages, if you are dealing with resources, especially limited ones such as database connections, it's better not to release them in the destructor.
This is because objects are destroyed in a non-deterministic way, and if you happen
to have a loop of references (which is not always easy to tell), and at least one of the objects in the loop has a destructor defined, they will never be destroyed.
Garbage collected languages have other means of dealing with resources. In Python, it's a with statement.

In C++ at least, there is nothing wrong with putting failure-prone code in the constructor - you simply throw an exception if an error occurs. If the code is needed to properly construct the object, there reallyb is no alternative (although you can abstract the code into subfunctions, or better into the constructors of subobjects). Worst practice is to half-construct the object and then expect the user to call other functions to complete the construction somehow.

It is not bad practice per se.
But I think you may be after a something different here. In your example the doSomething() method will not be called when the MyClass constructor fails. Try the following code:
class MyClass:
def __init__(self, s):
print s
raise Exception("Exception")
def doSomething(self):
print "doSomething"
try:
someInstance = MyClass("test123")
someInstance.doSomething()
except:
print "except"
It should print:
test123
except
For your software design you could ask the following questions:
What should the scope of the someInstance variable be? Who are its users? What are their requirements?
Where and how should the error be handled for the case that one of your 10 values is not available?
Should all 10 values be cached at construction time or cached one-by-one when they are needed the first time?
Can the I/O code be refactored into a helper method, so that doing something similiar 10 times does not result in code repetition?
...

I'm not a Python developer, but in general, it's best to avoid complex/error-prone operations in your constructor. One way around this would be to put a "LoadFromFile" or "Init" method in your class to populate the object from an external source. This load/init method must then be called separately after constructing the object.

One common pattern is two-phase construction, also suggested by Andy White.
First phase: Regular constructor.
Second phase: Operations that can fail.
Integration of the two: Add a factory method to do both phases and make the constructor protected/private to prevent instantation outside the factory method.
Oh, and I'm neither a Python developer.

If the code to initialise the various values is really extensive enough that copying it is undesirable (which it sounds like it is in your case) I would personally opt for putting the required initialisation into a private method, adding a flag to indicate whether the initialisation has taken place, and making all accessors call the initialisation method if it has not initialised yet.
In threaded scenarios you may have to add extra protection in case initialisation is only allowed to occur once for valid semantics (which may or may not be the case since you are dealing with a file).

Again, I've got little experience with Python, however in C# its better to try and avoid having a constructor that throws an exception. An example of why that springs to mind is if you want to place your constructor at a point where its not possible to surround it with a try {} catch {} block, for example initialisation of a field in a class:
class MyClass
{
MySecondClass = new MySecondClass();
// Rest of class
}
If the constructor of MySecondClass throws an exception that you wish to handle inside MyClass then you need to refactor the above - its certainly not the end of the world, but a nice-to-have.
In this case my approach would probably be to move the failure-prone initialisation logic into an initialisation method, and have the getters call that initialisation method before returning any values.
As an optimisation you should have the getter (or the initialisation method) set some sort of "IsInitialised" boolean to true, to indicate that the (potentially costly) initialisation does not need to be done again.
In pseudo-code (C# because I'll just mess up the syntax of Python):
class MyClass
{
private bool IsInitialised = false;
private string myString;
public void Init()
{
// Put initialisation code here
this.IsInitialised = true;
}
public string MyString
{
get
{
if (!this.IsInitialised)
{
this.Init();
}
return myString;
}
}
}
This is of course not thread-safe, but I don't think multithreading is used that commonly in python so this is probably a non-issue for you.

seems Neil had a good point: my friend just pointed me to this:
http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization
which is basically what Neil said...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.