I've been looking around at some open source projects on Python, and I'm seeing a lot of files and patterns that I'm not familiar with.
First of all, a lot of projects just have a file called setup.py, which usually contains one function:
setup(blah, blah, blah)
Second, a lot contain a file that is simply called __init__.py and contains next to no information.
Third, some .py files contain a statement similar to this:
if __name__ == "__main__"
Finally, I'm wondering if there are any "best practices" for dividing Python files up in a git repository. With Java, the idea of file division comes pretty naturally because of the class structure. With Python, many scripts have no classes at all, and sometimes a program will have OOP aspects, but a class by class division does not make that much sense. Is it just "whatever makes the code the most readable," or are there some guidelines somewhere about this?
The setup.py is part of Python’s module distribution using the distrubution utilities. It allows for easy installation of the Python module and is useful when, well, you want to distribute your project as a whole Python module.
The __init__.py is used for Python’s package system. An empty file is usually enough to make Python recognize the directory it is in as a package, but you can also define different things in it.
Finally, the __name__ == '__main__' check is to ensure that the current script is run directly (e.g. from the command line) and it is not just imported into some other script. During a Python script execution only a single module’s __name__ property will be equal to __main__. See also my answer here or the more general question on that topic.
The setup.py is part of distutils setup process. You'll want to have one of those if you're distributing a module instead of just a basic script (which even then it's a good idea to have one so you can easily expand into a module later).
The __init__.py part of the python module import process:
Files named init.py are used to mark directories on disk as a
Python package directories. If you have the files
mydir/spam/init.py mydir/spam/module.py and mydir is on your path,
you can import the code in module.py as:
import spam.module or
from spam import module If you remove the init.py file, Python
will no longer look for submodules inside that directory, so attempts
to import the module will fail.
if __name == "__main__" is a way to indicate code that would be executed if the file was run directly instead of imported.
To answer on how to layout your code, the distfiles documentation has a good guide on this.
In addition to #poke's answer, see this related question on what the directory structure of a python project should be. Here is another useful tutorial on how to make your project easily runnable.
Related
I have a Python module called util. I would like to import a script in this package _util.py from another script in scripts folder.
Even if the util package has an empty __init__.py file it does not appear as a Python package but a normal directory, without the small dot on folder image.
How can I import this module?
A preliminary answer to your question is that modules or methods beginning with underscores are meant to be used internally.
_my_method() should only be referenced from within the module holding it
_my_module() should only be referenced from within the package holding it
That being said, this convention is meant to be a hint to other developers, not a strict prohibition. Perhaps the first step you can take to solve the import issue is to rename _util.py to util.py and proceed from there.
Occasionally, module name collisions happen between the application and an internal file in a third-party package. For example, a file named profile.py in the current folder will cause jupyter notebook to crash as it attempts to import it instead of its own profile.py. What's a good way to avoid this problem, from the perspective of the package user? (Or is this something that the package developer should somehow prevent?)
Note: while a similar problem occurs due to a collision between application and built-in names (e.g., time.py or socket.py), at least it's relatively easy to remember the names of standard library modules and other built-in objects.
The current directory is the directory which contains the main script of the application. If you want to avoid name collisions in this directory, don't put any modules in it.
Instead, use a namespace. Create a uniquely-named package in the directory of the main script, and import everything from that. The main script should be very simple, and contain nothing more than this:
if __name__ == '__main__':
from mypackage import myapp
myapp.run()
All the modules inside the package should also use from imports to access the other modules within the package. For example, myapp.py might contain:
from mypackage import profile
I'm just starting to get to the point in my python projects that I need to start using multiple packages and I'm a little confused on exactly how everything is supposed to work together. What exactly should go into the __init__.py of the package? Some projects I see just have blank inits and all of their code are in modules in that package. Other projects implement what seems to be the majority of the package's classes and functions inside the init.
Is there a document or style guide or something that describes what the python authors had in mind for the use of packages and the __init__ file and such?
Edit:
I know the point of having the __init__.py file in the simplest sense that it makes a folder a package. But why would I put a function there instead of a module in that same folder(package)?
__init__.py can be empty, but what it really does is make sure Python treats your directories correctly, provide any initialization you might need for when your package is imported (configuring the environment or something along those lines), or defining __all__ so that Python knows what to do when someone uses from package import *.
Most everything you need to know is described in the docs on Packages. Dive Into Python also has a piece on packaging.
You already know, I guess that __init__.py files are required to make Python treat the directories as containing packages.
In the above model __init__.py can remain empty.
You can can also execute initialization code for the package.
You can also set the __all__ variable.
[Edit: learnings]
When you do "from package import item", or "from package import *", then the variable __all__ can be used to import selected packages.
See : http://docs.python.org/tutorial/modules.html
I'm just beginning Python, and I'd like to use an external RSS class. Where do I put that class and how do I import it? I'd like to eventually be able to share python programs.
About the import statement:
(a good writeup is at http://effbot.org/zone/import-confusion.htm and the python tutorial goes into detail at http://docs.python.org/tutorial/modules.html )
There are two normal ways to import code into a python program.
Modules
Packages
A module is simply a file that ends in .py. In order for python, it must exist on the search path (as defined in sys.path). The search path usually consists of the same directory of the .py that is being run, as well as the python system directories.
Given the following directory structure:
myprogram/main.py
myprogram/rss.py
From main.py, you can "import" the rss classes by running:
import rss
rss.rss_class()
#alternativly you can use:
from rss import rss_class
rss_class()
Packages provide a more structured way to contain larger python programs. They are simply a directory which contains an __init__.py as well as other python files.
As long as the package directory is on sys.path, then it can be used exactly the same as above.
To find your current path, run this:
import sys
print(sys.path)
I don't really like answering so late, but I'm not entirely satisfied with the existing answers.
I'm just beginning Python, and I'd like to use an external RSS class. Where do I put that class and how do I import it?
You put it in a python file, and give the python file an extension of .py . Then you can import a module representing that file, and access the class. Supposing you want to import it, you must put the python file somewhere in your import search path-- you can see this at run-time with sys.path, and possibly the most significant thing to know is that the site-packages (install-specific) and current directory ('') are generally in the import search path. When you have a single homogeneous project, you generally put it in the same directory as your other modules and let them import each other from the same directory.
I'd like to eventually be able to share python programs.
After you have it set up as a standalone file, you can get it set up for distribution using distutils. That way you don't have to worry about where, exactly, it should be installed-- distutils will worry for you. There are many other additional means of distribution as well, many OS-specific-- distutils works for modules, but if you want to distribute a proper program that users are meant to run, other options exist, such as using py2exe for Windows.
As for the modules/packages distinction, well, here it goes. If you've got a whole bunch of classes that you want divided up so that you don't have one big mess of a python file, you can separate it into multiple python files in a directory, and give the directory an __init__.py . The important thing to note is that from Python, there's no difference between a package and any other module. A package is a module, it's just a different way of representing one on the filesystem. Similarly, a module is not just a .py file-- if that were the case, sys would not be a module, since it has no .py file. It's built-in to the interpreter. There are infinitely many ways to represent modules on the filesystem, since you can add import hooks that can create ways other than directories and .py files to represent modules. One could, hypothetically, create an import hook that used spidermonkey to load Javascript files as Python modules.
from [module] import [classname]
Where the module is somewhere on your python path.
About modules and packages:
a module is a file ending with .py. You can put your class in such a file. As said by Andy, it needs to be in your python path (PYTHONPATH). Usually you will put the additional module in the same directory as your script is though which can be directly imported.
a package is a directory containing an __init__.py (can be empty) and contains module files. You can then import a la from <package>.<module> import <class>. Again this needs to be on your python path.
You can find more in the documenation.
If you want to store your RSS file in a different place use sys.append("") and pout the module in that directory and use
import or from import *
The first file, where you have created the class, is "first.py"
first.py:
class Example:
...
You create the second file, where you want to use the class contained in the "first.py", which is "second.py"
myprogram/first.py
myprogram/second.py
Then in the second file, to call the class contained in the first file, you simply type:
second.py:
from first import Example
...
I'm thinking how to arrange a deployed python application which will have a
Executable script located in /usr/bin/ which will provide a CLI to functionality implemented in
A library installed to wherever the current site-packages directory is.
Now, currently, I have the following directory structure in my sources:
foo.py
foo/
__init__.py
...
which I guess is not the best way to do things. During development, everything works as expected, however when deployed, the "from foo import FooObject" code in foo.py seemingly attempts to import foo.py itself, which is not the behaviour I'm looking for.
So the question is what is the standard practice of orchestrating situations like this? One of the things I could think of is, when installing, rename foo.py to just foo, which stops it from importing itself, but that seems rather awkward...
Another part of the problem, I suppose, is that it's a naming challenge. Perhaps call the executable script foo-bin.py?
This article is pretty good, and shows you a good way to do it. The second item from the Do list answers your question.
shameless copy paste:
Filesystem structure of a Python project
by Jp Calderone
Do:
name the directory something related to your project. For example, if your
project is named "Twisted", name the
top-level directory for its source
files Twisted. When you do releases,
you should include a version number
suffix: Twisted-2.5.
create a directory Twisted/bin and put your executables there, if you
have any. Don't give them a .py
extension, even if they are Python
source files. Don't put any code in
them except an import of and call to a
main function defined somewhere else
in your projects.
If your project is expressable as a single Python source file, then put it
into the directory and name it
something related to your project. For
example, Twisted/twisted.py. If you
need multiple source files, create a
package instead (Twisted/twisted/,
with an empty
Twisted/twisted/__init__.py) and place
your source files in it. For example,
Twisted/twisted/internet.py.
put your unit tests in a sub-package of your package (note - this means
that the single Python source file
option above was a trick - you always
need at least one other file for your
unit tests). For example,
Twisted/twisted/test/. Of course, make
it a package with
Twisted/twisted/test/__init__.py.
Place tests in files like
Twisted/twisted/test/test_internet.py.
add Twisted/README and Twisted/setup.py to explain and
install your software, respectively,
if you're feeling nice.
Don't:
put your source in a directory called src or lib. This makes it hard
to run without installing.
put your tests outside of your Python package. This makes it hard to
run the tests against an installed
version.
create a package that only has a __init__.py and then put all your code into __init__.py. Just make a module
instead of a package, it's simpler.
try to come up with magical hacks to make Python able to import your module
or package without having the user add
the directory containing it to their
import path (either via PYTHONPATH or
some other mechanism). You will not
correctly handle all cases and users
will get angry at you when your
software doesn't work in their
environment.
Distutils supports installing modules, packages, and scripts. If you create a distutils setup.py which refers to foo as a package and foo.py as a script, then foo.py should get installed to /usr/local/bin or whatever the appropriate script install path is on the target OS, and the foo package should get installed to the site_packages directory.
You should call the executable just foo, not foo.py, then attempts to import foo will not use it.
As for naming it properly: this is difficult to answer in the abstract; we would need to know what specifically it does. For example, if it configures and controls, calling it -config or ctl might be appropriate. If it is a shell API for the library, it should have the same name as the library.
Your CLI module is one thing, the package that supports it is another thing. Don't confuse the names withe module foo (in a file foo.py) and the package foo (in a directory foo with a file __init__.py).
You have two things named foo: a module and a package. What else do you want to name foo? A class? A function? A variable?
Pick a distinctive name for the foo module or the foo package. foolib, for example, is a popular package name.