I am having a bit of troubles getting a grasp on how to structure my Python projects. I have read jcalderone: Filesystem structure of a Python project and been looking at the source code of CouchApp, but I'm still feeling very puzzled.
I understand how the files should be structured, but I don't understand why. I would love if somebody could hook me up with a detailed walk-through of this, or could explain it to me. Simply how to set up a basic python project, and how the files would interact with each other.
I think this is definitely something people coming from other languages like C, C++, Erlang ... or people who have never been programming before, could benefit from.
name the directory something related to your project. When you do releases, you should include a version number suffix: Twisted-2.5.
Not sure why this is unclear. It seems obvious. It all has to be in one directory.
Why do things have to be in one directory? Because everyone says so, that's why.
create a directory Twisted/bin and put your executables there.
This is the way Linux works. Executables are in a bin directory. It makes it easy to put this specific directory in your PATH environment variable.
If your project is expressable as a single Python source file, then put it into the directory and name it something related to your project. For example, Twisted/twisted.py.
Right. You have /Twisted, /Twisted/bin and /Twisted/twisted.py with your actual, running code in it. Where else would you put it?
There's no "why" to this. Where else could you possibly put it?
If you need multiple source files, create a package instead (Twisted/twisted/, with an empty Twisted/twisted/init.py) and place your source files in it. For example, Twisted/twisted/internet.py.
This is just the way Python packages work. They're directories with __init__.py files. The tutorial is pretty clear on this.
put your unit tests in a sub-package of your package Twisted/twisted/test/.
Where else would you put your tests? Seriously. There's no "why?" to this. There's no sensible alternative.
add Twisted/README and Twisted/setup.py to explain and install your software, respectively
Right. Where else would you put them? Again. There's no "why?" They go in the top directory because -- well -- that's what a directory is for. It contains files.
I am not expert in python but reading this lines from first link makes sense if you think that
There might be computers/programs involved whit project
There might be other people involved whit project
If you have consistent names and file structures both humans and computers might understand your complex program much better.
This involves topics like: testing, building, deploying, re-usability, searching, structure, consistency...
Standard makes connectivity.
Let's try to answer to each rule:
1) You should have a root dir with a good name. If you make a tarball of your package, it's not considered good behavior to have files on the root. I feel really angry when I unpack something and the current folder ends up cluttered with junk.
2) You should separate your executables from modules. They are different beasts. And if you plan to use distutils, it will make your life easier.
3) If you have a single module, the reasons above don't appy. So you can simplify your tree.
4) Unit tests should be closely tighted to it's package. But they are not the package, so it's the perfect case for a subpackage.
Related
I have a question about adding project path to python, for facilitating import effort.
Situation
When I write code in Python, I usually add necessary path to sys.path by using
import sys
sys.path.append("/path/to/dir/") # almost every `.py` need this
Sometimes, when my project gets bigger with many levels of directories, this approach seems bulky and error-prone (especially when I re-organize my files)
Recently, I start using a bash script (located at project root directory) that adding the sys.path.append with project root argument to .py file in the project. With this approach, I hardly have to manually care about importing a module.
Question
My question is: Is that a good practice? I find it convenient for myself, compared to my old method, but since the bash script is a separated file, I need 2 command to run any script in my project (one for the bash and one for the .py). I can include the command calling .py to the bash, but it far less flexible than directly call it from terminal.
Really want to hear some advices! Thanks in advance. Any suggestion will be gratefully appreciated!
It is generally not good practice to use manipulate sys.path within a python library or program. You should add the relevant paths to the PYTHONPATH in the calling environment for your python program:
PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH" python ...
or
export PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH"
python ...
This allows you to easily manipulate the paths that your program or library will search for dependencies easily without modifying your code.
It is also very easy to manage this in your personal development environment by modifying your bashrc or in your production environments in your init script (or other wrapper script) and provides you with one place to update each time you add or modify your project paths.
Given that you mention that you have almost one directory per .py file, you should also consider how your code might be organized into packages to further simplify your setup.
It's not a particularly good practice, though you could get away with it. Better to look into virtualenv though (or pipenv) for a smoother workflow.
For some routine work I found that combining different scripting languages can be the fastest way to do. Like I have some main bash script which calles some awk, pyton and bash scripts or even some compiled fortran executables.
I can put all the files into a folder that is in the paths, but it makes modification is a bit slower. If I need a new copy with some modifications I need to add another path to $PATH as well.
Is there a way to make merge these files as a single executable?
For example: tar all the files together and explain somehow that the main script is main.sh? This way I could simply vi the file, modify, run, modify, run ... but I could move the file between folders and machines easily. Also dependencies could be handle properly (executing the tar could set PATH itself).
I hope this dream does exist! Thanks for the comments!
Janos
From a software engineering point of view, this approach does not sound so great because your programs would be be badly structured by design (but maybe that is no problem in your case). Also, you will lose syntax highlighting support in your editor when combining multiple languages in one file. I would rather suggest building a package of all your programs, distributing them
together, and after deployment have the main script call the other programs.
That being said, you could use here documents in Bash, see bash(1) (search for "Here Documents"), to include other text scripts in your main script, then have the main script write them to a temporary directory and execute them. This will not help you with your compiled code though, unless you have a compiler on your target machine.
This Linux Journal Article might be interesting for you too, as it shows how to include a binary payload (e.g. a tgz file) in a Bash script. You could use this to add a compressed archive containing your programs to a never-changing main script.
Thanks for the answers!
Of course this is not a good approach for providing a published code! Sorry if it was confusing. But this is a good approach if you are developing some e.g. scientific idea, and you wish to obtain a proof of concept result fast and you wish to do similar tasks several times but replacing fast some parts of the algorithm. Note, that sometimes many codes are available for some parts of the task. These codes are sometimes needed to be modified a bit (a few lines). I am a big believer of re-implementing everything, but first it is good to know if it worth to do!
For some compromise: can I call a script externally that is wrapped in some tar or zip and is not compressed?
Thanks again,
J.
I've been looking around here but I haven't finded what I was searching, so I hope it's not answer around here. If it is, I would delete my question.
I was wondering if Sublime Text can suggest you functions from a module when you write "module.function". For example, if I write "import PyQt4", then sublime Text suggests me "PyQt4.QtCore" when I write "PyQt4.Q".
For now, I'll installed "SublimeCodeIntel" and just does it but for only some modules (like math or urllib). It's possible to configure it for any module? Or you can recommend me any other plugin?
Thanks for reading!
PD: also, could be possible to configute it also for my own modules? I mean, for example, module that I have written and are in the same folder as the current file I'm editing.
SublimeCodeIntel will work for any module, as long as it's indexed. After you first install the plugin, indexing can take a while, depending on the number and size of third-party modules you have in site-packages. If you're on Linux and have multiple site-packages locations, make sure you define them all in the settings. I'd also recommend changing "codeintel_max_recursive_dir_depth" to 25, especially if you're on OS X, as the default value of 10 may not reach all the way into deep directory trees.
Make sure you read through all the settings, and modify them to suit your needs. The README also contains some valuable information for troubleshooting, so if the indexing still isn't working after a while, and after restarting Sublime a few times, you may want to delete the database and start off fresh.
I'm trying to build a project which includes a few open source third party libraries, but I want to bundle my distribution with said libraries - because I expect my users to want to use my project without an Internet connection. Additionally, I'd like to leave their code 100% untouched and even leave their directory structure untouched if I can. My approach so far has been to extract the tarballs and place the entire folder in MyProject/lib, but I've had to put __init__.py in every sub-directory to be able to reference the third-party code in my own modules.
What's the common best practice for accomplishing this task? How can I best respect the apps' developers by retaining their projects' structure? How can I make importing these libraries in my code less painful?
I'm new to making a distributable project in Python, so I've no clue if there's something I can do in __init__.py or in setup.py to keep myself from having to type from lib.app_name.app_name.app_module include * and whatnot.
For what it's worth, I will be distributing this on OS X and possibly *nix. I'd like to avoid using another library (e.g. setuptools) to accomplish this. (Weirdly, it seems to already be installed on my system, but I didn't install it, so I've no idea what's going on.)
I realize that this question seems to be a duplicate of this one, but I don't think it is because I'm asking for the best practice for the "distribute-the-third-party-code-with-your-own" approach. Please forgive me if I'm asking a garbage question.
buildout is good solution for building and distributing Python software.
You could have a look at the inner workings of virtualenv to get some inspiration how go about this. Maybe you can resuse code from there.
I've recently been having some problems with my imports in Django (Python)... It's better to explain using a file diagram:
- project/
- application/
- file.py
- application2/
- file2.py
In project/application/file.py I have the following:
def test_method():
return "Working"
The problem occurs in project/application2/file2.py, when I try to import the method from above:
from application.file import test_method
Usually works, but sometimes not.
from project.application.file import test_method
Does work, but it goes against Django's portability guidelines as the project folder must always be called the same.
I wouldn't mind, but it's the fact that this issue is occurring inconsistently, most of the time omitting project is fine, but occasionally not (and as far as I can see, with no reason).
I can pretty much guarantee I'm doing something stupid, but has anyone experienced this? Would I be better just putting the project in front of all relevant imports to keep things consistent? Honestly, it's unlikely the project folder name will ever change, I just want things to stick with guidelines where possible.
For import to find a module, it needs to either be in sys.path. Usually, this includes "", so it searches the current directory. If you load "application" from project, it'll find it, since it's in the current directory.
Okay, that's the obvious stuff. A confusing bit is that Python remembers which modules are loaded. If you load application, then you load application2 which imports application, the module "application" is already loaded. It doesn't need to find it on disk; it just uses the one that's already loaded. On the other hand, if you didn't happen to load application yet, it'll search for it--and not find it, since it's not in the same directory as what's loading it ("."), or anywhere else in the path.
That can lead to the weird case where importing sometimes works and sometimes doesn't; it only works if it's already loaded.
If you want to be able to load these modules as just "application", then you need to arrange for project/ to be appended to sys.path.
(Relative imports sound related, but it seems like application and application2 are separate packages--relative imports are used for importing within the same package.)
Finally, be sure to consistently treat the whole thing as a package, or to consistently treat each application as their own package. Do not mix and match. If package/ is in the path (eg. sys.path includes package/..), then you can indeed do "from package.application import foo", but if you then also do "from application import foo", it's possible for Python to not realize these are the same thing--their names are different, and they're in different paths--and end up loading two distinct copies of it, which you definitely don't want.
If you dig into the django philosophy, you will find, that a project is a collection of apps. Some of these apps could depend on other apps, which is just fine. However, what you always want is to make your apps plug-able so you can move them to a different project and use them there as well. To do this, you need to strip all things in your code that's related to your project, so when doing imports you would do.
from aplication.file import test_method
This would be the django way of doing this. Glenn answered why you are getting your errors so I wont go into that part. When you run the command to start a new project: django-admin.py startproject myproject
This will create a folder with a bunch of files that django needs, manage.py settings,py ect, but it will do another thing for you. It will place the folder "myproject" on your python path. In short this means that what ever application you put in that folder, you would be able to import like shown above. You don't need to use django-admin.py to start a project as nothing magical happens, it's just a shortcut. So you can place you application folders anywhere really, you just need to have them on a python path, so you can import from them directly and make your code project independent so it easily can be used in any future project, abiding to the DRY principle that django is built upon.
It is better to always import using the same way - say, using project.app.models - because otherwise, you may find your module is imported twice, which sometimes leads to obscure errors as discussed in this question.