I'm getting a bit of a headache trying to figure out how to organise modules and classes together. Coming from C++, I'm used to classes encapsulating all the data and methods required to process that data. In python there are modules however and from code I have looked at, some people have a lot of loose functions stored in modules, whereas others almost always bind their functions to classes as methods.
For example say I have a data structure and would like to write it to disk.
One way would be to implement a save method for that object so that I could just type
MyObject.save(filename)
or something like that. Another method I have seen in equal proportion is to have something like
from myutils import readwrite
readwrite.save(MyObject,filename)
This is a small example, and I'm not sure how python specific this problem is at all, but my general question is what is the best pythonic practice in terms of functions vs methods organisation?
It seems like loose functions bother you. This is the python way. It makes sense because a module in python is really just an object on the same footing as any other object. It does have language level support for loading it from a file but other than that, it's just an object.
so if I have a module foo.py:
import pprint
def show(obj):
pprint(obj)
Then the when I import it from bar.py
import foo
class fubar(object):
#code
def method(self, obj):
#more stuff
foo.show(obj)
I am essentially accessing a method on the foo object. The data attributes of the foo module are just the globals that are defined in foo. A module is the language level implementation of a singleton without the need to prepend self to every methods argument list.
I try to write as many module level functions as possible. If some function will only work with an instance of a particular class, I will make it a method on the class. Otherwise, I try to make it work on instances of every class that is defined in the module for which it would make sense.
The rational behind the exact example that you gave is that if each class has a save method, then if you later change how you are saving data (from say filesystem to database or remote XML file) then you have to change every class. If each class implements an interface to yield that data that it wants saved, then you can write one function to save instances of every class and only change that function once. This is known as the Single Responsibility Principle: Each class should have only one reason to change.
If you have a regular old class you want to save to disk, I would just make it an instance method. If it were a serialization library that could handle different types of objects I would do the second way.
Related
Suppose I have a project in ~/app/, containing at least files myclass.py, myobject.py, and app.py.
In myclass.py I have something like
def myclass():
# class attributes and methods...
In myobject.py, I have something like
from app import myclass
attribute1 = 1
attribute2 = 2
myobject = myclass(attribute1, attribute2)
Finally, app.py looks something like
from app import myobject
# do stuff with myobject
In practice, I'm using myobject.py to gather a common instance of myclass and make it easily importable, so I don't have to define all the attributes separately. My question is on the convention of myobject.py. Is this okay or is there something that would be better to achieve the purpose mentioned. The concerns I thought of is that there are all these other variables (in this case attribute1 and attribute2) which are just... there... in the myobject module. It just feels a little weird because these aren't things that would ever be accessed individually, but the fact that it is accessible... I feel like there's some other conventional way to do this. Is this perfectly fine, or am I right to have concerns (if so, how to fix it)?
Edit: To make it more clear, here is an example: I have a Camera class which stores the properties of the lens and CCD and such (like in myclass.py). So users are able to define different cameras and use them in the application. However, I want to allow them to have some preset cameras, thus I define objects of the Camera class that are specific to certain cameras I know are common to use for this application (like in myobject.py). So when they run the application, they can just import these preset cameras (as Camera objects) (like in app.py). How should these preset objects be written, if how it's written in myobject.py is not the best way?
so you this method fails to call function inside the class in first case. i think you do it by making a class of attribute and getting variables from it.
class Attribute():
def __init(self,a1,a2):
self.a1=a1
self.a2=a2
att=Attribute(1,2)
print(att.a1)
It looks like you stumbled upon the singleton pattern. Essentially, your class should only ever have one instance at any time, most likely to store global configurations or some similar purpose. In Java, you'd implement this pattern by making the constructor private, and have a static method (eg. getInstance()) that returns a private static instance.
For Python, it's actually quite tricky to implement singletons. You can see some discussion about that subject here. To me how you're doing it is probably the simplest way to do it, and although it doesn't strictly enforce the singleton constraint, it's probably better for a small project than adding a ton of complexity like metaclasses to make 'true' singletons.
There are many questions related to the use of the Singleton pattern in python, and although this question might repeat many of the aspects already discussed, I have not found the answer to the following specific question.
Let's assume I have a class MyClass which I want to instantiate only exactly once. In python I can do this as follows in the code myclass.py:
class MyClass(object):
def foo(self):
....
instance = MyClass()
Then in any other program I can refer to the instance simply with
import myclass
myclass.instance.foo()
Under what circumstances is this approach enough? Under what circumstances is the use of a Singleton pattern useful/mandatory?
The singleton pattern is more often a matter of convenience than of requirement. Python is a little bit different than other languages in that it is fairly easy to mock out singletons in testing (just clobber the global variable!) by comparison to other languages, but it is neverthess a good idea to ask yourself when creating a singleton: am I doing this for the sake of convenience or because it is stricly necessary that there is only one instance? Is it possible that there may be more than one in the future?
If you create a class that really will be only constructed once, it may make more sense to make the state a part of the module, and to make its methods into module-level functions. If there is a possibility that the assumption of exactly one instance may change in the future, then it is often much better to pass the singleton instance around rather than referencing the singleton through a global name.
For example, you can just as easily implement a "singleton" this way:
if __name__ == '__main__':
instance = MyClass()
doSomethingWith(instance)
In the above, "instance" is singleton by virtue of the fact that it is constructed only once, but the code that handles it is provided the instance rather than referencing module.instance, which makes it easier to reuse pieces of the code if, in some future situation, you need more than one MyClass.
Assuming you want to use a module as a singleton as Michael Aaron Safyan suggests, you can make it work even if the module isn't imported by the main code by doing something like the following (in the main code or a module it does import direct or indirectly). What it does is make aninstanceclass attribute initialized it to one, and then replaces the module object insys.moduleswith the instance created:
class _MyClass(object):
def foo(self):
print 'foo()'
_MyClass.instance = _MyClass()
import sys
_ref = sys.modules[__name__] # Reference to current module so it's not deleted
sys.modules[__name__] = _MyClass.instance
I've found singletons a useful way to implement "registers" of things when it makes sense to have only one (registry) -- such as a group of classes for a class factory, a group of constants, or a bundle of configuration information. In many cases just a regular Python module will do fine because, by default, they're effectively already singletons due to fact that those already loaded get cached in the sys.modulesdictionary.
Occasionally however, class instances are preferable because they can be passed construction parameters and have properties -- something built-in module objects don't and can't be made to possess. Limitations like that can be worked-around using the trick shown above which effectively turns custom class instances into module objects.
The idea of using class instances as module objects is from Alex Martelli's ActiveState recipe named Constants in Python.
In my humble opinion, there are two sides to the singleton pattern.
you want a single context for a given service because more than one does not make sense.
you want to absolutely prevent people from creating two object of a given type because it might break your service
While the first case may have some applications (logging service), the second one is often the sign of a bad design.
You should design your API so that your users should not have to think about this problem. But if they dig through your undocumented layers to find your hidden constructor and want to use it for whatever reason, they should not have to deal with useless constructs created to prevent them to do what they need to do.
Suppose I have an instance method that contains a lot of nested conditionals. What would be a good way to encapsulate that code? Put in another instance method of the same class or a function? Could you say why a certain approach is preferred?
If the function is only used by one class, and especially if the module has more classes with potentially more utility functions (used only by one class), it might clarify things a bit if you kept the functions as static methods instead to make it obvious which class they belong to. Also, automated refactorings (using the e.g. the rope library, or PyCharm or PyDev etc) then automatically move the static method along with the class to wherever the class is moved.
P.S. #staticmethods, unlike module-level functions, can be overridden in subclasses, e.g. in case of a mathematical formula that doesn't depend on the object but does depend on the type of the object.
There are two different questions here. The first one is what to do with multiple nested conditionals. There's no single right answer: it depends on your coding style, how the conditions interact, the architecture of your program and so on. Have a look at this Programmers.SE question and Jeff Atwood's blog post for some ideas; personally, I like
if not check1: return
code1
if not check2: return
code 2
...
although some people object to the multiple exit points.
The second question is what to do with individual functions if you're writing object oriented Python. The usual answer is just to put them as functions inside the module containing the class, since there's no requirement that a function be attached to a particular class. If you want, though, you can include them in the class as static methods.
As Classes are first-class objects in Python, we can pass them to functions. For example, here is some code I've come across:
ChatRouter = sockjs.tornado.SockJSRouter(ChatConnection, '/chat')
where ChatConnection is a Class defined in the same module. I wonder what would be the common user case(s) for such practice?
In addition, in the code example above, why is the variable 'ChatRouter' capitalized?
Without knowing anything else about that code, I'd guess this:
OK, I looked at the source. Below the line is incorrect, although plausible. Basically what the code does is use ChatConnection to create a Session object which does some other stuff. ChatRouter is just a badly named regular variable, not a class name.
SockJSRouter is a class that takes another class (call it connection) and a string as parameters. It uses __new__ to create not an instance of SockJSRouter, but an instance of a special class that uses (possibly subclasses) connection. That would explain why ChatRouter is capitalized, as it would be a class name. The returned class would use connection to generalize a lot of things, as connection would be responsible for handling communicating over a network or whatever. So by using different connections, one could handle different protocols. ChatConnection is probably some layer over IRC.
So basically, the common use case (and likely the use here) is generalization, and the reason for the BactrianCase name is because it's a class (just one generated at runtime).
Passing classes around may be useful for customization and flexible code. The function may want to create several objects of the given class, so passing it a class is one way to implement this (another would be to pass some kind of factory function). For example, in the example you gave, SockJSRouter ends up passing the connection class to Session, which then uses it to construct a new connection object.
As for ChatRouter, I suppose this is just a naming convention. While Python programmers are advised to follow PEP 8, and many do, it's not strictly required and some projects settle on different naming conventions.
I have a program that I am writing in Python that does the following:
The user enters the name of a folder. Inside that folder a 8-15 .dat files with different extensions.
The program opens those dat files, enters them into a SQL database and then allows the user to select different changes made to the database. Then the database is exported back to the .dat files. There are about 5-10 different operations that could be performed.
The way that I had planned on designing this was to create a standard class for each group of files. The user would enter the name of the folder and an object with certain attributes (file names, dictionary of files, version of files (there are different versions), etc) would get created. Determining these attributes requires opening a few of these files, reading file names, etc.
Should this action be carried out in the __init__ method? Or should this action be carried our in different instance methods that get called in the __init__ method? Or should these methods be somewhere else, and only be called when the attribute is required elsewhere in the program?
I have already written this program in Java. And I had a constructor that called other methods in the class to set the object's attributes. But I was wondering what standard practice in Python would be.
Well, there is nothing special about good OOP practices in Python. Decomposition of one big method into a bunch of small ones is great idea both in Java and in Python. Among other things small methods gives you an opportunity to write different constructors:
class GroupDescriptor(object):
def __init__(self, file_dictionary):
self.file_dict = file_dictionary
self.load_something(self.file_dict['file_with_some_info'])
#classmethod
def from_filelist(cls, list_of_files):
file_dict = cls.get_file_dict(list_of_files)
return cls(file_dict)
#classmethod
def from_dirpath(cls, directory_path):
files = self.list_dir(directory_path)
return cls.from_filelist(files)
Besides, I don't know how it is in Java but in Python you don't have to worry about exceptions in constructor because they are finely handled. Therefore, it is totally normal to work with such exception-prone things like files.
It looks the action you are describing are initialization, so it'd be perfectly ok to put them into __init__. On the other hand, these actions seem to be pretty expensive, and probably useful in the other part of a program, so you might want to encapsulate them in some separate function.
There's no problem with having a long __init__ method, but I would avoid it simply because its more difficult to test. My approach would be to create smaller methods which are called from __init__. This way you can test them and the initialization separately.
Whether they should be called when needed or run up front really depends on what you need them to do. If they are expensive operations, and are usually not all needed, then maybe its better to only call them when needed. On the other hand, you might want to run them up front so that there is no lag when the attributes are required.
Its not clear from your question whether you actually need a class though. I have no experience with Java, but I understand that everything in it is a class. In python it is perfectly acceptable to just have a function if that's all that's required, and to only create classes when you need instances and other classy things.
The __init__ method is called when the object is instantiated.
Coming from a C++ background I believe its not good to do actual work other than initialization in the constructor.