Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm used to the Java model where you can have one public class per file. Python doesn't have this restriction, and I'm wondering what's the best practice for organizing classes.
A Python file is called a "module" and it's one way to organize your software so that it makes "sense". Another is a directory, called a "package".
A module is a distinct thing that may have one or two dozen closely-related classes. The trick is that a module is something you'll import, and you need that import to be perfectly sensible to people who will read, maintain and extend your software.
The rule is this: a module is the unit of reuse.
You can't easily reuse a single class. You should be able to reuse a module without any difficulties. Everything in your library (and everything you download and add) is either a module or a package of modules.
For example, you're working on something that reads spreadsheets, does some calculations and loads the results into a database. What do you want your main program to look like?
from ssReader import Reader
from theCalcs import ACalc, AnotherCalc
from theDB import Loader
def main( sourceFileName ):
rdr= Reader( sourceFileName )
c1= ACalc( options )
c2= AnotherCalc( options )
ldr= Loader( parameters )
for myObj in rdr.readAll():
c1.thisOp( myObj )
c2.thatOp( myObj )
ldr.laod( myObj )
Think of the import as the way to organize your code in concepts or chunks. Exactly how many classes are in each import doesn't matter. What matters is the overall organization that you're portraying with your import statements.
Since there is no artificial limit, it really depends on what's comprehensible. If you have a bunch of fairly short, simple classes that are logically grouped together, toss in a bunch of 'em. If you have big, complex classes or classes that don't make sense as a group, go one file per class. Or pick something in between. Refactor as things change.
I happen to like the Java model for the following reason. Placing each class in an individual file promotes reuse by making classes easier to see when browsing the source code. If you have a bunch of classes grouped into a single file, it may not be obvious to other developers that there are classes there that can be reused simply by browsing the project's directory structure. Thus, if you think that your class can possibly be reused, I would put it in its own file.
It entirely depends on how big the project is, how long the classes are, if they will be used from other files and so on.
For example I quite often use a series of classes for data-abstraction - so I may have 4 or 5 classes that may only be 1 line long (class SomeData: pass).
It would be stupid to split each of these into separate files - but since they may be used from different files, putting all these in a separate data_model.py file would make sense, so I can do from mypackage.data_model import SomeData, SomeSubData
If you have a class with lots of code in it, maybe with some functions only it uses, it would be a good idea to split this class and the helper-functions into a separate file.
You should structure them so you do from mypackage.database.schema import MyModel, not from mypackage.email.errors import MyDatabaseModel - if where you are importing things from make sense, and the files aren't tens of thousands of lines long, you have organised it correctly.
The Python Modules documentation has some useful information on organising packages.
I find myself splitting things up when I get annoyed with the bigness of files and when the desirable structure of relatedness starts to emerge naturally. Often these two stages seem to coincide.
It can be very annoying if you split things up too early, because you start to realise that a totally different ordering of structure is required.
On the other hand, when any .java or .py file is getting to more than about 700 lines I start to get annoyed constantly trying to remember where "that particular bit" is.
With Python/Jython circular dependency of import statements also seems to play a role: if you try to split too many cooperating basic building blocks into separate files this "restriction"/"imperfection" of the language seems to force you to group things, perhaps in rather a sensible way.
As to splitting into packages, I don't really know, but I'd say probably the same rule of annoyance and emergence of happy structure works at all levels of modularity.
I would say to put as many classes as can be logically grouped in that file without making it too big and complex.
Related
So I've been searching for a bit and couldn't find anything on Google or PEP discussing this.
I am doing a project with tkinter and I had a file, that is part of a project, that was only 200 lines of code (excluding all the commented out code). While the entire file was related to the GUI portion of the project, it felt a bit long and a bit broad to me.
I ended up splitting the file into 4 different files that each has its own portion of the GUI.
Basically, the directory looks like this:
project/
guiclasses/
statisticsframe.py
textframes.py
windowclass.py
main_gui.py
...
statisticsframe has a class of a frame that shows statistics about stuff.
textframes holds 3 classes of frames holding textareas, one of them inherits Frame, the others inherit the first one.
windowclass basically creates the root of the window and all the general initialization for a tkinter GUI.
main_gui isn't actually the name but it simply combines all the above three and runs the mainloop()
Overall, each file is now 40-60 lines of code.
I am wondering if there are any conventions regarding this. The rule of thumb in most languages is that if you can reuse the functions/ classes elsewhere then you should split, though in Python it is less of a problem since you can import specific classes and functions from modules.
Sorry if it isn't coherent enough, nearly 3AM here and it is simply sitting in the back of my head.
I'm not familiar with tkinter, so my advice would be rather broad.
You can use any split into modules which you feel is better, but
as readability counts try making names coherent and do not repeat yourself: guiclasses - your enire progarm is about GUI, and there obviously classes somewhere, why repeath that in a name? imagine typing all that in in import, make it meaningful to type
flat structure is better than nested, three modules do not have to go to submodule
best split is across layers of abstraction (this is probably hardest and specific to tkinter)
anything in a module shoudl be rather self-sufficient and quite isolated from other parts of the program
modules should make good entitites for unit testing (eg share same fixtures)
can you write an understandable docstring for a module? then it's a good one.
try learning by example, I often seek wisdom for naming and package structure in Barry Warsaw mailman, maybe you can try finding some reputable repo with tkinter to follow (eg IDLE?).
From purely syntatic view I would have named the modules as:
- <package_name>
- baseframe
- textframe
- window
- main
Similar questions have been asked for other languages (C++, Clojure, TypeScript, maybe others) but I am still looking for an answer for Python.
There are a lot of similar questions related to the use of import and global in Python but the associated answers don't fit my needs. I just want to split a big file into smaller ones to easily modify/reuse parts of the code in different versions of the same program, without having to deal with different namespaces or managing global variables.
A simple hack to do what I want would be to merge selected Python files at runtime with a script but I hope there is a pythonic way of doing that.
With an illustration, what I am trying to do is going from several big files which are almost identical:
big_file_v1.py
## First part
# Hundreds of code lines to define things, make computations...
## Second part
# More code to do a few additional things
big_file_v2.py
## First part
# Exactly the same first part as in big_file_v1.py
## Second part
# A lot of small differences compared to big_file_v1.py
...to several smaller files with most of them not needing any modification or only needing modifications that I want to share across all of the different versions:
myprog_v1.py
include first_part_common_to_both_versions.py
include second_part_v1.py
myprog_v2.py
include first_part_common_to_both_versions.py
include second_part_v2.py
In this second case, using include-like commands, I can instantly share the modifications made in first_part_common_to_both_versions.py across version 1 and 2 of my program and I just need to modify/copy the smaller files second_part_v2.py if I want to make new modifications/create another new version.
The question is: how to do this include in Python?
Just to avoid debates on good software development practices, I use Python as a tool to solve scientific questions and as such, I care more about editing comfort than coding practice.
Have a look at exec or execfile:
"Python's exec statement is similar to the import statement, with an
important difference: The exec statement executes a file in the
current namespace. The exec statement doesn't create a new namespace.
We'll look at this in the section called “The exec Statement”
So, I have two scripts temperature.py and rain.py. These two scripts do the same thing: open a file, update it and then save it. The purpose of the functions is the same but the functions do it differently for each script.
My question is should I give the same name, ex. update(), to both scripts, all it should be different ?
I am going to import them like below:
import temperature
import rain
I would propose to use the same names, especially if you are not up to from temperature import * or similar nastinesses. The way you write you want to do it, there is no problem by calling rain.update() and temperature.update().
But maybe you should reconsider if your two "modules" shouldn't be better two classes which inherit from each other or at least from a common base class. If you want two things to "do the same, but differently", then overriding an inherited behaviour is the object-oriented way to do it. Maybe the two update()s even have something in common which both need to do (which is a very typical case); that could then be done in a base class then.
And even if you now would answer "no, they have nothing in common", I think I feel the idea of them having something in common on a more theoretical level. In that case, maybe later versions of your software would benefit from the different architecture I propose.
What about making the function specific and descriptive to the context?
from temperature import update_temperature
from rain import update_rain
Naming like this provides a lesser chance of naming conflicts (like both files having an update(), or even a third-party package having it's own update()).
It provides a clear understanding of what exactly your code is doing; updating the temperature (not the units temperature.update_units, or the location temperature.update_location) and states it.
In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?
Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.
As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)
In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)
Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
How many Python classes should I put in one file?
Coming from a C++ background I've grown accustomed to organizing my classes such that, for the most part, there's a 1:1 ratio between classes and files. By making it so that a single file contains a single class I find the code more navigable. As I introduce myself to Python I'm finding lots of examples where a single file contains multiple classes. Is that the recommended way of doing things in Python? If so, why?
Am I missing this convention in the PEP8?
Here are some possible reasons:
Python is not exclusively class-based - the natural unit of code decomposition in Python is the module. Modules are just as likely to contain functions (which are first-class objects in Python) as classes. In Java, the unit of decomposition is the class. Hence, Python has one module=one file, and Java has one (public) class=one file.
Python is much more expressive than Java, and if you restrict yourself to one class per file (which Python does not prevent you from doing) you will end up with lots of very small files - more to keep track of with very little benefit.
An example of roughly equivalent functionality: Java's log4j => a couple of dozen files, ~8000 SLOC. Python logging => 3 files, ~ 2800 SLOC.
There's a mantra, "flat is better than nested," that generally discourages an overuse of hierarchy. I'm not sure there's any hard and fast rules as to when you want to create a new module -- for the most part, people just use their discretion to group logically related functionality (classes and functions that pertain to a particular problem domain).
Good thread from the Python mailing list, and a quote by Fredrik Lundh:
even more important is that in Python,
you don't use classes for every-
thing; if you need factories,
singletons, multiple ways to create
objects, polymorphic helpers, etc, you
use plain functions, not classes or
static methods.
once you've gotten over the "it's all
classes", use modules to organize
things in a way that makes sense to
the code that uses your components.
make the import statements look good.
the book Expert Python Programming has something related discussion
Chapter 4: Choosing Good Names:"Building the Namespace Tree" and "Splitting the Code"
My line crude summary: collect some related class to one module(source file),and
collect some related module to one package, is helpful for code maintain.
In python, class can also be used for small tasks (just for grouping etc). maintaining a 1:1 relation would result in having too many files with small or little functionality.
There is no specific convention for this - do whatever makes your code the most readable and maintainable.
A good example of not having seperate files for each class might be the models.py file within a django app. Each django app may have a handful of classes that are related to that app, and putting them into individual files just makes more work.
Similarly, having each view in a different file again is likely to be counterproductive.