I have a program that I am writing in Python that does the following:
The user enters the name of a folder. Inside that folder a 8-15 .dat files with different extensions.
The program opens those dat files, enters them into a SQL database and then allows the user to select different changes made to the database. Then the database is exported back to the .dat files. There are about 5-10 different operations that could be performed.
The way that I had planned on designing this was to create a standard class for each group of files. The user would enter the name of the folder and an object with certain attributes (file names, dictionary of files, version of files (there are different versions), etc) would get created. Determining these attributes requires opening a few of these files, reading file names, etc.
Should this action be carried out in the __init__ method? Or should this action be carried our in different instance methods that get called in the __init__ method? Or should these methods be somewhere else, and only be called when the attribute is required elsewhere in the program?
I have already written this program in Java. And I had a constructor that called other methods in the class to set the object's attributes. But I was wondering what standard practice in Python would be.
Well, there is nothing special about good OOP practices in Python. Decomposition of one big method into a bunch of small ones is great idea both in Java and in Python. Among other things small methods gives you an opportunity to write different constructors:
class GroupDescriptor(object):
def __init__(self, file_dictionary):
self.file_dict = file_dictionary
self.load_something(self.file_dict['file_with_some_info'])
#classmethod
def from_filelist(cls, list_of_files):
file_dict = cls.get_file_dict(list_of_files)
return cls(file_dict)
#classmethod
def from_dirpath(cls, directory_path):
files = self.list_dir(directory_path)
return cls.from_filelist(files)
Besides, I don't know how it is in Java but in Python you don't have to worry about exceptions in constructor because they are finely handled. Therefore, it is totally normal to work with such exception-prone things like files.
It looks the action you are describing are initialization, so it'd be perfectly ok to put them into __init__. On the other hand, these actions seem to be pretty expensive, and probably useful in the other part of a program, so you might want to encapsulate them in some separate function.
There's no problem with having a long __init__ method, but I would avoid it simply because its more difficult to test. My approach would be to create smaller methods which are called from __init__. This way you can test them and the initialization separately.
Whether they should be called when needed or run up front really depends on what you need them to do. If they are expensive operations, and are usually not all needed, then maybe its better to only call them when needed. On the other hand, you might want to run them up front so that there is no lag when the attributes are required.
Its not clear from your question whether you actually need a class though. I have no experience with Java, but I understand that everything in it is a class. In python it is perfectly acceptable to just have a function if that's all that's required, and to only create classes when you need instances and other classy things.
The __init__ method is called when the object is instantiated.
Coming from a C++ background I believe its not good to do actual work other than initialization in the constructor.
Related
So I am a little confused about what I have been reading on Object Oriented Programming. I realized that while I was focusing on the rule of each object doing only one thing, I created a class that does not have a changing state.
Basically I am writing a program that does a lot of reading and writing on text files. I thought that none of the objects I have should be dealing with these operations and I should have a fileIO class that does these operations for them. However, I am a little worried that this might be the same thing as having a utility class.
Is having a class whose fields never change(or not even need to be initialized) same as a utility class? Is it a bad practice from OOP perspective? Does it make sense to have a fileIO object? If not should objects be allowed to read from and write to files?
class fileIO:
__processFilePath = None
__trackFilePath = None
def __init__(self, iProcessFilePath, iTrackFilePath):
def getProcesses(self):
#checks if appString is in file
def isAppRunning(self,appString):
#reads all
def getAllTrackedLines(self):
#appends
def addNewOnTracked(self,toBeAdded):
#overwrites
def overWriteTrackedLines(self,fullData):
in this particular instance I am actually initializing the fields in the init method. But for the purposes of my program I don't actually need to because there are only two files that I read and write from.
Reading and Writing data from file can be wrapped in some class that handles the state of the data to ensure that the transaction completes. What I mean by this is that the resource needs to be de-allocated properly, preferably in the same transaction, no matter the outcome of the operation. If you consider the allocation and de-allocation of resources as state, then your class is not exactly stateless. In functional programming, a function handling resources will be considered impure as it is not stateless. The state is merely external.
Having a class with no state does not constitute a bad class. It is true that Utility classes are an anti-pattern, but if your class does one small thing, and it does it well, then it is not a problem. The problem comes in when you have a ton of related methods bunched into the class and the code begins to rot. That is what you want to avoid. A class that has a well defined purpose, and only does that thing, will resist rot.
Make sure you write lots of tests around your class as well, as this is key in long term maintainability.
Please let me know if there is anything that I can clarify.
I am new to OOP and am writing a small tool in Python that checks Bitcoin prices using a JSON load from the web Bitcoin() class, it monitors the prices Monitor(), notifies the user when thresholds are met Notify() and uses a console-interface Interface() for now to do so.
I have created a Bitcoin() class that can read the prices and volumes from the JSON load. The __init__ definition connects to the web using socket. Since every instance of this class would result in a new socket, I would only need/want one instance of this class running.
Is a class still the best way to approach this?
What is the best way to get other classes and instances to interact with my Bitcoin() instance?
Should I global a Bitcoin() instance? Pass the instance as an argument to every class that needs it?
The first thing which concerns me is the SRP violation, your Bitcoin class probably shouldn't be responsible for:
opening socket,
parsing results,
rendering output.
I don't know the details but from my point of view you should split that functionality to smaller classes/functions (in case of using only modules), and one of them will be responsible for retrieving data from web. Please also keep in mind that global state is evil (singletons in some contexts could be described as global state).
Another thing which is a smell from my point of view is opening a socket inside the constructor. This isn't testable, of course you could mock/stub socket, but from my point of view it's better when class requires all it's dependencies as a constructor parameter. By doing it that way you could also notice some classes with to wide responsibility (if your constructor requires more that 3,4 parameters it definitely could be simplified).
http://www.youtube.com/watch?v=o9pEzgHorH0
I'm not sure how relevant this video is for your project (no code to actually read). But maybe you'll pick up the answer to your question. At least you'll learn something new and that's what were here for.
If I were you my code would be something like:
( a class for every set of jobs, which is not what you are doing )
class Interface:
''' Handle UI '''
...
class Connect:
''' Handle web interface '''
...
class Bitcoin:
''' Handle the calculations '''
...
class Notify:
''' Notifier '''
...
In short, split your classes into smaller simpler classes.
Now for your question:
Yes, because you have a "complex-ish" problem at hand and you're using Python, so it's definitely easier to create a OOP version than a non-OOP one. So, unless you have a good reason not to, Stick to OOP.
In your case, it might as well be passing the instance as an argument.
This is a good idea. This eliminates the problems caused by scopes if you don't have a very good understanding of them.
But remember you pass the reference, not the value, so manipulating the instance, can and will affect other classes the instance is passed to.
Note: Opening a socket in the constructor of the class is not a good idea. It might be better if you have it in a method.
The answer is maybe. Depends upon you whole architecture,
You should look at the singleton pattern, because you description yells Singleton all over.
http://de.wikipedia.org/wiki/Singleton_%28Entwurfsmuster%29
If you don't find any good reason against creating a class in your given architecture, then just go for it.
OOP is a tool, not a goal, you can make a decision whether to use it or not. If you use a Python module, you can achieve encapsulation without ever writing "class".
Sure, you can use python classes for this purpose. You can use module-level instances as well(no global keyword or explicit passing as arguments needed). It is a matter of taste IMHO.
Basically you're asking about Singleton pattern python-specific implementation, it has been answered here:
Python and the Singleton Pattern
Description of pattern itself can be found here: http://en.wikipedia.org/wiki/Singleton_pattern
I am about to refactor the code of a python project built on top of twisted. So far I have been using a simple settings.py module to store constants and dictionaries like:
#settings.py
MY_CONSTANT='whatever'
A_SLIGHTLY_COMPLEX_CONF= {'param_a':'a', 'param_b':b}
A great deal of modules import settings.py to do their stuff.
The reason why I want to refactor the project is because I am in need to change/add configuration parameters on the fly. The approach that I am about to take is to gather all configuration in a singleton and to access its instance whenever I need to.
import settings.MyBloatedConfig
def first_insteresting_function():
cfg = MyBloatedConfig.get_instance()
a_much_needed_param = cfg["a_respectable_key"]
#do stuff
#several thousands of functions later
def gazillionth_function_in_module():
tired_cfg = MyBloatedConfig.get_instance()
a_frustrated_value = cfg["another_respectable_key"]
#do other stuff
This approach works but feels unpythonic and bloated. An alternative would be to externalize the cfg object in the module, like this:
CONFIG=MyBloatedConfig.get_instance()
def a_suspiciously_slimmer_function():
suspicious_value = CONFIG["a_shady_parameter_key"]
Unfortunately this does not work if I am changing the MyBloatedConfig instance entries in another module. Since I am using the reactor pattern, storing staff on a thread local is out of question as well as using a queue.
For completeness, following is the implementation I am using to implement a singleton pattern
instances = {}
def singleton(cls):
""" Use class as singleton. """
global instances
#wraps(cls)
def get_instance(*args, **kwargs):
if cls not in instances:
instances[cls] = cls(*args, **kwargs)
return instances[cls]
return get_instance
#singleton
class MyBloatedConfig(dict):
....
Is there some other more pythonic way to broadcast configuration changes across different modules?
The big, global (often singleton) config object is an anti-pattern.
Whether you have settings.py, a singleton in the style of MyBloatedConfig.get_instance(), or any of the other approaches you've outlined here, you're basically using the same anti-pattern. The exact spelling doesn't matter, these are all just ways to have a true global (as distinct from a Python module level global) shared by all of the code in your entire project.
This is an anti-pattern for a number of reasons:
It makes your code difficult to unit test. Any code that changes its behavior based on this global is going to require some kind of hacking - often monkey-patching - in order to let you unit test its behavior under different configurations. Compare this to code which is instead written to accept arguments (as in, function arguments) and alters its behavior based on the values passed to it.
It makes your code less re-usable. Since the configuration is global, you'll have to jump through hoops if you ever want to use any of the code that relies on that configuration object under two different configurations. Your singleton can only represent one configuration. So instead you'll have to swap global state back and forth to get the different behavior you want.
It makes your code harder to understand. If you look at a piece of code that uses the global configuration and you want to know how it works, you'll have to go look at the configuration. Much worse than this, though, is if you want to change your configuration you'll have to look through your entire codebase to find any code that this might affect. This leads to the configuration growing over time, as you add new items to it and only infrequently remove or modify old ones, for fear of breaking something (or for lack of time to properly track down all users of the old item).
The above problems should hint to you what the solution is. If you have a function that needs to know the value of some constant, make it accept that value as an argument. If you have a function that needs a lot of values, then create a class that can wrap up those values in a convenient container and pass an instance of that class to the function.
The part of this solution that often bothers people is the part where they don't want to spend the time typing out all of this argument passing. Whereas before you had functions that might have taken one or two (or even zero) arguments, now you'll have functions that might need to take three or four arguments. And if you're converting an application written in the style of settings.py, then you may find that some of your functions used half a dozen or more items from your global configuration, and these functions suddenly have a really long signature.
I won't dispute that this is a potential issue, but should be looked upon mostly as an issue with the structure and organization of the existing code. The functions that end up with grossly long signatures depended on all of that data before. The fact was just obscured from you. And as with most programming patterns which hide aspects of your program from you, this is a bad thing. Once you are passing all of these values around explicitly, you'll see where your abstractions need work. Maybe that 10 parameter function is doing too much, and would work better as three different functions. Or maybe you'll notice that half of those parameters are actually related and always belong together as part of a container object. Perhaps you can even put some logic related to manipulation of those parameters onto that container object.
(note) I would appreciate help generalizing the title. I am sure that this is a class of problems in OO land, and probably has a reasonable pattern, I just don't know a better way to describe it.
I'm considering the following -- Our server script will be called by an outside program, and have a bunch of text dumped at it, (usually XML).
There are multiple possible types of data we could be getting, and multiple versions of the data representation we could be getting, e.g. "Report Type A, version 1.2" vs. "Report Type A, version 2.0"
We will generally want to do the same thing action with all the data -- namely, determine what sort and version it is, then parse it with a custom parser, then call a synchronize-to-database function on it.
We will definitely be adding types and versions as time goes on.
So, what's a good design pattern here? I can come up with two, both seem like they may have som problems.
Option 1
Write a monolithic ID script which determines the type, and then
imports and calls the properly named class functions.
Benefits
Probably pretty easy to debug,
Only one file that does the parsing.
Downsides
Seems hack-ish.
It would be nice to not have to create
knowledge of dataformats in two places, once for ID, once for actual
merging.
Option 2
Write an "ID" function for each class; returns Yes / No / Maybe when given identifying text.
the ID script now imports a bunch of classes, instantiates them on the text and asks if the text and class type match.
Upsides:
Cleaner in that everything lives in one module?
Downsides:
Slower? Depends on logic of running through the classes.
Put abstractly, should Python instantiate a bunch of Classes, and consume an ID function, or should Python instantiate one (or many) ID classes which have a paired item class, or some other way?
You could use the Strategy pattern which would allow you to separate the logic for the different formats which need to be parsed into concrete strategies. Your code would typically parse a portion of the file in the interface and then decide on a concrete strategy.
As far as defining the grammar for your files I would find a fast way to identify the file without implementing the full definition, perhaps a header or other unique feature at the beginning of the document. Then once you know how to handle the file you can pick the best concrete strategy for that file handling the parsing and writes to the database.
I'm getting a bit of a headache trying to figure out how to organise modules and classes together. Coming from C++, I'm used to classes encapsulating all the data and methods required to process that data. In python there are modules however and from code I have looked at, some people have a lot of loose functions stored in modules, whereas others almost always bind their functions to classes as methods.
For example say I have a data structure and would like to write it to disk.
One way would be to implement a save method for that object so that I could just type
MyObject.save(filename)
or something like that. Another method I have seen in equal proportion is to have something like
from myutils import readwrite
readwrite.save(MyObject,filename)
This is a small example, and I'm not sure how python specific this problem is at all, but my general question is what is the best pythonic practice in terms of functions vs methods organisation?
It seems like loose functions bother you. This is the python way. It makes sense because a module in python is really just an object on the same footing as any other object. It does have language level support for loading it from a file but other than that, it's just an object.
so if I have a module foo.py:
import pprint
def show(obj):
pprint(obj)
Then the when I import it from bar.py
import foo
class fubar(object):
#code
def method(self, obj):
#more stuff
foo.show(obj)
I am essentially accessing a method on the foo object. The data attributes of the foo module are just the globals that are defined in foo. A module is the language level implementation of a singleton without the need to prepend self to every methods argument list.
I try to write as many module level functions as possible. If some function will only work with an instance of a particular class, I will make it a method on the class. Otherwise, I try to make it work on instances of every class that is defined in the module for which it would make sense.
The rational behind the exact example that you gave is that if each class has a save method, then if you later change how you are saving data (from say filesystem to database or remote XML file) then you have to change every class. If each class implements an interface to yield that data that it wants saved, then you can write one function to save instances of every class and only change that function once. This is known as the Single Responsibility Principle: Each class should have only one reason to change.
If you have a regular old class you want to save to disk, I would just make it an instance method. If it were a serialization library that could handle different types of objects I would do the second way.