Splitting a Python script in multiple files sharing the same namespace - python

Similar questions have been asked for other languages (C++, Clojure, TypeScript, maybe others) but I am still looking for an answer for Python.
There are a lot of similar questions related to the use of import and global in Python but the associated answers don't fit my needs. I just want to split a big file into smaller ones to easily modify/reuse parts of the code in different versions of the same program, without having to deal with different namespaces or managing global variables.
A simple hack to do what I want would be to merge selected Python files at runtime with a script but I hope there is a pythonic way of doing that.
With an illustration, what I am trying to do is going from several big files which are almost identical:
big_file_v1.py
## First part
# Hundreds of code lines to define things, make computations...
## Second part
# More code to do a few additional things
big_file_v2.py
## First part
# Exactly the same first part as in big_file_v1.py
## Second part
# A lot of small differences compared to big_file_v1.py
...to several smaller files with most of them not needing any modification or only needing modifications that I want to share across all of the different versions:
myprog_v1.py
include first_part_common_to_both_versions.py
include second_part_v1.py
myprog_v2.py
include first_part_common_to_both_versions.py
include second_part_v2.py
In this second case, using include-like commands, I can instantly share the modifications made in first_part_common_to_both_versions.py across version 1 and 2 of my program and I just need to modify/copy the smaller files second_part_v2.py if I want to make new modifications/create another new version.
The question is: how to do this include in Python?
Just to avoid debates on good software development practices, I use Python as a tool to solve scientific questions and as such, I care more about editing comfort than coding practice.

Have a look at exec or execfile:
"Python's exec statement is similar to the import statement, with an
important difference: The exec statement executes a file in the
current namespace. The exec statement doesn't create a new namespace.
We'll look at this in the section called “The exec Statement”

Related

Maintaining two versions of an ipython notebook

I often need to create two versions of an ipython notebook: One contains tasks to be carried out (usually including some python code and output), the other contains the same text plus solutions. Let's call them the assignment and the solution.
It is easy to generate the solution document first, then strip the answers to generate the assignment (or vice versa). But if I subsequently need to make changes (and I always do), I need to repeat the stripping process. Is there a reasonable workflow that will allow changes in the assignment to be propagated to the solutions document?
Partial self-answer: I have experimented with leveraging mercurial's hg copy, which will let two files with different names share history. But I can only get this to work if assignment and solution are in different directories, in two linked hg repositories. I would much prefer a simpler set-up. I've also noticed that diff gets very confused when one JSON file has more sections than another, making a VCS-based solution even less attractive. (To be clear: Ordinary use of a VCS with notebooks is fine; it's the parallel versions that stumble).
This question covers similar ground, but does not solve my problem. In fact an answer to my question would solve the OP's second remaining problem, "pulling changes" (see the Update section).
It sounds like you are maintaining an assignment and an answer key of some kind and want to be able to distribute the assignments (without solutions) to students, and still have the answers for yourself or a TA.
For something like this, I would create two branches "unsolved" and "solved". First write the questions on the "unsolved" branch. Then create the "solved" branch from there and add the solutions. If you ever need to update a question, update back to the "unsolved" branch, make the update and merge the change into "solved" and fix the solution.
You could try going the other way, but my hunch is that going "backwards" from solved to unsolved might be strange to maintain.
After some experimentation I concluded that it is best to tackle this by processing the notebook's JSON code. Version control systems are not the right approach, for the following reasons:
JSON doesn't diff very well when adding or deleting cells. A minimal change leads to mis-matched braces and a very messy diff.
In my use case, the superset version of the file (containing both the assignments and their solutions) must be the source document. This is because the assignment includes example code and output that depends on earlier parts, to be written by the students. This model does not play well with version control, as pointed out by #ChrisPhillips in his answer.
I ended up filtering the JSON structure for the notebook and stripping out the solution cells; they may be recognized via special metadata (which can be set interactively using the metadata button in the interface), or by pattern-matching on the cell contents. The following snippet shows how to filter out cells whose first line starts with # SOLUTION:
def stripcell(cell, pattern):
"""Check if the first line of the cell's content matches `pattern`"""
if cell["cell_type"] == "code":
content = cell["input"]
else:
content = cell["source"]
return ( len(content) > 0 and re.search(pattern, content[0]) )
pattern = r"^# SOLUTION:"
struct = json.load(open("input.ipynb"))
cells = struct["worksheets"][0]["cells"]
struct["worksheets"][0]["cells"] = [ c for c in cells if not stripcell(c, pattern) ]
json.dump(struct, open("output.ipynb", "wb"), indent=1)
I used the generic json library rather than the notebook API. If there's a better way to go about it, please let me know.

Where to Store Borrowed Python Code?

Recently, I have been working on a Python project with usual directory structure, and have received help from someone else who has given me a code snippet (a single function definition, about 30 lines long) which I would like to import into my code. What is the most proper directory/location in a Python project to store borrowed code of this size? Is it best to store the snippet into an entirely different module and import it from there?
I generally find it easiest to put such code in a separate file, because for clarity you don't want more than one different copyright/licensing term to apply within a single file. So in Python this does indeed mean a separate module. Then the file can contain whatever attribution and other legal boilerplate you need.
As long as your file headers don't accidentally claim copyright on something to which you do not own the copyright, I don't think it's actually a legal problem to mix externally-licensed or public domain code into files you mostly own. I may be wrong, though, which is why I normally avoid giving myself reason to think about it. A comment saying "this is external code from the following source with the following license:" may well be clearer than dividing code into different files that naturally wouldn't be. So I do occasionally do that.
I don't see any definite need for a separate directory (or package) per separate external source. If that's already part of your project structure (that is, it already uses external libraries by incorporating their source) then I suppose you might as well continue the trend.
I usually place scripts I copy off the internet in a folder/package called borrowed so I know all of the code here is stuff that I didn't write myself.
That is, if it's something more substantial than a one or two-liner demonstrating how something works.

Is it costly in Python to put classes in different files?

I am a Java programmer and I have always created separate files for Classes, I am attempting to learn python and I want to learn it right.
Is it costly in python to put Classes in different files, meaning one file contains only one class. I read in a blog that it is costly because resolution of . operator happens at runtime in python (It happens at compile time for Java).
Note: I did read in other posts that we can put them in separate files but they don't mention if they are costlier in any way
It is slightly more costly, but not to an extent you are likely to care. You can negate this extra cost by doing:
from module import Class
As then the class will be assigned to a variable in the local namespace, meaning it doesn't have to do the lookup through the module.
In reality, however, this is unlikely to be important. The cost of looking up something like this is going to be tiny, and you should focus on doing what makes your code the most readable. Split classes across modules and packages as is logical for your program, and as it keeps them clear.
If, for example, you are using something repeatedly in a loop which is a bottleneck for your program, you can assign it to a local variable for that loop, e.g:
import module
...
some_important_thing = module.some_important_thing
#Bottleneck loop
for item in items:
#module.some_important_thing()
some_important_thing()
Note that this kind of optimisation is unlikely to be the important thing, and you should only ever optimise where you have proof you need to do so.

When should a Python script be split into multiple files/modules?

In Java, this question is easy (if a little tedious) - every class requires its own file. So the number of .java files in a project is the number of classes (not counting anonymous/nested classes).
In Python, though, I can define multiple classes in the same file, and I'm not quite sure how to find the point at which I split things up. It seems wrong to make a file for every class, but it also feels wrong just to leave everything in the same file by default. How do I know where to break a program up?
Remember that in Python, a file is a module that you will most likely import in order to use the classes contained therein. Also remember one of the basic principles of software development "the unit of packaging is the unit of reuse", which basically means:
If classes are most likely used together, or if using one class leads to using another, they belong in a common package.
As I see it, this is really a question about reuse and abstraction. If you have a problem that you can solve in a very general way, so that the resulting code would be useful in many other programs, put it in its own module.
For example: a while ago I wrote a (bad) mpd client. I wanted to make configuration file and option parsing easy, so I created a class that combined ConfigParser and optparse functionality in a way I thought was sensible. It needed a couple of support classes, so I put them all together in a module. I never use the client, but I've reused the configuration module in other projects.
EDIT: Also, a more cynical answer just occurred to me: if you can only solve a problem in a really ugly way, hide the ugliness in a module. :)
In Java ... every class requires its own file.
On the flipside, sometimes a Java file, also, will include enums or subclasses or interfaces, within the main class because they are "closely related."
not counting anonymous/nested classes
Anonymous classes shouldn't be counted, but I think tasteful use of nested classes is a choice much like the one you're asking about Python.
(Occasionally a Java file will have two classes, not nested, which is allowed, but yuck don't do it.)
Python actually gives you the choice to package your code in the way you see fit.
The analogy between Python and Java is that a file i.e., the .py file in Python is
equivalent to a package in Java as in it can contain many related classes and functions.
For good examples, have a look in the Python built-in modules.
Just download the source and check them out, the rule of thumb I follow is
when you have very tightly coupled classes or functions you keep them in a single file
else you break them up.

Possibilities for Python classes organized across files? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
The community reviewed whether to reopen this question 3 months ago and left it closed:
Original close reason(s) were not resolved
Improve this question
I'm used to the Java model where you can have one public class per file. Python doesn't have this restriction, and I'm wondering what's the best practice for organizing classes.
A Python file is called a "module" and it's one way to organize your software so that it makes "sense". Another is a directory, called a "package".
A module is a distinct thing that may have one or two dozen closely-related classes. The trick is that a module is something you'll import, and you need that import to be perfectly sensible to people who will read, maintain and extend your software.
The rule is this: a module is the unit of reuse.
You can't easily reuse a single class. You should be able to reuse a module without any difficulties. Everything in your library (and everything you download and add) is either a module or a package of modules.
For example, you're working on something that reads spreadsheets, does some calculations and loads the results into a database. What do you want your main program to look like?
from ssReader import Reader
from theCalcs import ACalc, AnotherCalc
from theDB import Loader
def main( sourceFileName ):
rdr= Reader( sourceFileName )
c1= ACalc( options )
c2= AnotherCalc( options )
ldr= Loader( parameters )
for myObj in rdr.readAll():
c1.thisOp( myObj )
c2.thatOp( myObj )
ldr.laod( myObj )
Think of the import as the way to organize your code in concepts or chunks. Exactly how many classes are in each import doesn't matter. What matters is the overall organization that you're portraying with your import statements.
Since there is no artificial limit, it really depends on what's comprehensible. If you have a bunch of fairly short, simple classes that are logically grouped together, toss in a bunch of 'em. If you have big, complex classes or classes that don't make sense as a group, go one file per class. Or pick something in between. Refactor as things change.
I happen to like the Java model for the following reason. Placing each class in an individual file promotes reuse by making classes easier to see when browsing the source code. If you have a bunch of classes grouped into a single file, it may not be obvious to other developers that there are classes there that can be reused simply by browsing the project's directory structure. Thus, if you think that your class can possibly be reused, I would put it in its own file.
It entirely depends on how big the project is, how long the classes are, if they will be used from other files and so on.
For example I quite often use a series of classes for data-abstraction - so I may have 4 or 5 classes that may only be 1 line long (class SomeData: pass).
It would be stupid to split each of these into separate files - but since they may be used from different files, putting all these in a separate data_model.py file would make sense, so I can do from mypackage.data_model import SomeData, SomeSubData
If you have a class with lots of code in it, maybe with some functions only it uses, it would be a good idea to split this class and the helper-functions into a separate file.
You should structure them so you do from mypackage.database.schema import MyModel, not from mypackage.email.errors import MyDatabaseModel - if where you are importing things from make sense, and the files aren't tens of thousands of lines long, you have organised it correctly.
The Python Modules documentation has some useful information on organising packages.
I find myself splitting things up when I get annoyed with the bigness of files and when the desirable structure of relatedness starts to emerge naturally. Often these two stages seem to coincide.
It can be very annoying if you split things up too early, because you start to realise that a totally different ordering of structure is required.
On the other hand, when any .java or .py file is getting to more than about 700 lines I start to get annoyed constantly trying to remember where "that particular bit" is.
With Python/Jython circular dependency of import statements also seems to play a role: if you try to split too many cooperating basic building blocks into separate files this "restriction"/"imperfection" of the language seems to force you to group things, perhaps in rather a sensible way.
As to splitting into packages, I don't really know, but I'd say probably the same rule of annoyance and emergence of happy structure works at all levels of modularity.
I would say to put as many classes as can be logically grouped in that file without making it too big and complex.

Categories