Python: cross referencing files from another project without hard directory reference - python

I mostly work on Java/c# before and like the feature these IDEs provide to import the whole project into a workspace tree without messing up all the references (Underlying linker seems to handle them). Is there anything equivalent in python?
I now works on a few different projects in python and realized that referencing each other becomes a nightmare. Either I have to write the absolute path when I do import or I have to copy files / make symlinks (not easily portable to Windows hence platform dependent).
I definitely know symlinking / hard directory links (put the whole directory in import) works but when I have to move files around / switch OS they all break.
Does such thing exist in the Python world? What's the intended way when I have to modularize things into projects and cross-reference something?
(I have read Python: Referencing another project but it doesn't help, any method there brokes when modules are moved to a different place. Really want an automatic solution.)

Related

Is adding project root directory to sys.path a good practice?

I have a question about adding project path to python, for facilitating import effort.
Situation
When I write code in Python, I usually add necessary path to sys.path by using
import sys
sys.path.append("/path/to/dir/") # almost every `.py` need this
Sometimes, when my project gets bigger with many levels of directories, this approach seems bulky and error-prone (especially when I re-organize my files)
Recently, I start using a bash script (located at project root directory) that adding the sys.path.append with project root argument to .py file in the project. With this approach, I hardly have to manually care about importing a module.
Question
My question is: Is that a good practice? I find it convenient for myself, compared to my old method, but since the bash script is a separated file, I need 2 command to run any script in my project (one for the bash and one for the .py). I can include the command calling .py to the bash, but it far less flexible than directly call it from terminal.
Really want to hear some advices! Thanks in advance. Any suggestion will be gratefully appreciated!
It is generally not good practice to use manipulate sys.path within a python library or program. You should add the relevant paths to the PYTHONPATH in the calling environment for your python program:
PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH" python ...
or
export PYTHONPATH="/path/to/other/projects/directory:$PYTHONPATH"
python ...
This allows you to easily manipulate the paths that your program or library will search for dependencies easily without modifying your code.
It is also very easy to manage this in your personal development environment by modifying your bashrc or in your production environments in your init script (or other wrapper script) and provides you with one place to update each time you add or modify your project paths.
Given that you mention that you have almost one directory per .py file, you should also consider how your code might be organized into packages to further simplify your setup.
It's not a particularly good practice, though you could get away with it. Better to look into virtualenv though (or pipenv) for a smoother workflow.

Pydev tags import as "unresolved import" but code using this import works fine.

I'm fairly new to python and especially it's import mechanism. I'm not entirely sure i'm using the terminology correctly so i should apologize for that up front.
firstly, this seems to be a problem i'm having with a 3rd party import so i can't really change the structure of their release.
In the release, all of the packages are in site-packages/[ROOTFOL]/[PACKAGE]
the [ROOTFOL] does not have a __init__.py file, only the package folders have this file.
this folder is placed into site-packages and the site-packages is present in my PYTHONPATH
in the examples they provide, they use it like this:
import ROOTFOL.PACKAGE.WhateverObject as obj
I'm trying to avoid adding every single package to the PYTHONPATH as there are a bunch of them. Everything seems to work fine, however it really inhibits my ability to work with the auto-complete functionality and that is the frustrating part.
Something else i find strange, is that when the packages are installed, there is a EGG-INFO folder placed along side the package. In this there are several .txt files and one of which is namespace_packages.txt which has only the ROOTFOL. Is there some way i should be setting this to PyDev?
So, what you're seeing here is their distribution model. Usually a module will have one root import that everything stems from, but that's not necessarily the case. They're providing a package with (what I assume) is many modules that don't interact with each other; or they can all stand alone.
instead of importing each package individually, you could use the 'from' keyword:
from ROOTFOL.PACKAGE import *
which will grab everything inside that sub-module. You could e-mail the developer and ask why they deployed it this way...or you could add your own __init__.py to the root folder and,
from ROOTFOL import *
which will walk the tree. Good luck!

Local collection of Python packages: best way to import them?

I need to ship a collection of Python programs that use multiple packages stored in a local Library directory: the goal is to avoid having users install packages before using my programs (the packages are shipped in the Library directory). What is the best way of importing the packages contained in Library?
I tried three methods, but none of them appears perfect: is there a simpler and robust method? or is one of these methods the best one can do?
In the first method, the Library folder is simply added to the library path:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
import package_from_Library
The Library folder is put at the beginning so that the packages shipped with my programs have priority over the same modules installed by the user (this way I am sure that they have the correct version to work with my programs). This method also works when the Library folder is not in the current directory, which is good. However, this approach has drawbacks. Each and every one of my programs adds a copy of the same path to sys.path, which is a waste. In addition, all programs must contain the same three path-modifying lines, which goes against the Don't Repeat Yourself principle.
An improvement over the above problems consists in trying to add the Library path only once, by doing it in an imported module:
# In module add_Library_path:
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
and then to use, in each of my programs:
import add_Library_path
import package_from_Library
This way, thanks to the caching mechanism of CPython, the module add_Library_path is only run once, and the Library path is added only once to sys.path. However, a drawback of this approach is that import add_Library_path has an invisible side effect, and that the order of the imports matters: this makes the code less legible, and more fragile. Also, this forces my distribution of programs to inlude an add_Library_path.py program that users will not use.
Python modules from Library can also be imported by making it a package (empty __init__.py file stored inside), which allows one to do:
from Library import module_from_Library
However, this breaks for packages in Library, as they might do something like from xlutils.filter import …, which breaks because xlutils is not found in sys.path. So, this method works, but only when including modules in Library, not packages.
All these methods have some drawback.
Is there a better way of shipping programs with a collection of packages (that they use) stored in a local Library directory? or is one of the methods above (method 1?) the best one can do?
PS: In my case, all the packages from Library are pure Python packages, but a more general solution that works for any operating system is best.
PPS: The goal is that the user be able to use my programs without having to install anything (beyond copying the directory I ship them regularly), like in the examples above.
PPPS: More precisely, the goal is to have the flexibility of easily updating both my collection of programs and their associated third-party packages from Library by having my users do a simple copy of a directory containing my programs and the Library folder of "hidden" third-party packages. (I do frequent updates, so I prefer not forcing the users to update their Python distribution too.)
Messing around with sys.path() leads to pain... The modern package template and Distribute contain a vast array of information and were in part set up to solve your problem.
What I would do is to set up setup.py to install all your packages to a specific site-packages location or if you could do it to the system's site-packages. In the former case, the local site-packages would then be added to the PYTHONPATH of the system/user. In the latter case, nothing needs to changes
You could use the batch file to set the python path as well. Or change the python executable to point to a shell script that contains a modified PYTHONPATH and then executes the python interpreter. The latter of course, means that you have to have access to the user's machine, which you do not. However, if your users only run scripts and do not import your own libraries, you could use your own wrapper for scripts:
#!/path/to/my/python
And the /path/to/my/python script would be something like:
#!/bin/sh
PYTHONPATH=/whatever/lib/path:$PYTHONPATH /usr/bin/python $*
I think you should have a look at path import hooks which allow to modify the behaviour of python when searching for modules.
For example you could try to do something like kde's scriptengine does for python plugins[1].
It adds a special token to sys.path(like "<plasmaXXXXXX>" with XXXXXX being a random number just to avoid name collisions) and then when python try to import modules and can't find them in the other paths, it will call your importer which can deal with it.
A simpler alternative is to have a main script used as launcher which simply adds the path to sys.path and execute the target file(so that you can safely avoid putting the sys.path.append(...) line on every file).
Yet an other alternative, that works on python2.6+, would be to install the library under the per-user site-packages directory.
[1] You can find the source code under /usr/share/kde4/apps/plasma_scriptengine_python in a linux installation with kde.

Project structuring and file implementation in Python

I am having a bit of troubles getting a grasp on how to structure my Python projects. I have read jcalderone: Filesystem structure of a Python project and been looking at the source code of CouchApp, but I'm still feeling very puzzled.
I understand how the files should be structured, but I don't understand why. I would love if somebody could hook me up with a detailed walk-through of this, or could explain it to me. Simply how to set up a basic python project, and how the files would interact with each other.
I think this is definitely something people coming from other languages like C, C++, Erlang ... or people who have never been programming before, could benefit from.
name the directory something related to your project. When you do releases, you should include a version number suffix: Twisted-2.5.
Not sure why this is unclear. It seems obvious. It all has to be in one directory.
Why do things have to be in one directory? Because everyone says so, that's why.
create a directory Twisted/bin and put your executables there.
This is the way Linux works. Executables are in a bin directory. It makes it easy to put this specific directory in your PATH environment variable.
If your project is expressable as a single Python source file, then put it into the directory and name it something related to your project. For example, Twisted/twisted.py.
Right. You have /Twisted, /Twisted/bin and /Twisted/twisted.py with your actual, running code in it. Where else would you put it?
There's no "why" to this. Where else could you possibly put it?
If you need multiple source files, create a package instead (Twisted/twisted/, with an empty Twisted/twisted/init.py) and place your source files in it. For example, Twisted/twisted/internet.py.
This is just the way Python packages work. They're directories with __init__.py files. The tutorial is pretty clear on this.
put your unit tests in a sub-package of your package Twisted/twisted/test/.
Where else would you put your tests? Seriously. There's no "why?" to this. There's no sensible alternative.
add Twisted/README and Twisted/setup.py to explain and install your software, respectively
Right. Where else would you put them? Again. There's no "why?" They go in the top directory because -- well -- that's what a directory is for. It contains files.
I am not expert in python but reading this lines from first link makes sense if you think that
There might be computers/programs involved whit project
There might be other people involved whit project
If you have consistent names and file structures both humans and computers might understand your complex program much better.
This involves topics like: testing, building, deploying, re-usability, searching, structure, consistency...
Standard makes connectivity.
Let's try to answer to each rule:
1) You should have a root dir with a good name. If you make a tarball of your package, it's not considered good behavior to have files on the root. I feel really angry when I unpack something and the current folder ends up cluttered with junk.
2) You should separate your executables from modules. They are different beasts. And if you plan to use distutils, it will make your life easier.
3) If you have a single module, the reasons above don't appy. So you can simplify your tree.
4) Unit tests should be closely tighted to it's package. But they are not the package, so it's the perfect case for a subpackage.

Forced to use inconsistent file import paths in Python (/Django)

I've recently been having some problems with my imports in Django (Python)... It's better to explain using a file diagram:
- project/
- application/
- file.py
- application2/
- file2.py
In project/application/file.py I have the following:
def test_method():
return "Working"
The problem occurs in project/application2/file2.py, when I try to import the method from above:
from application.file import test_method
Usually works, but sometimes not.
from project.application.file import test_method
Does work, but it goes against Django's portability guidelines as the project folder must always be called the same.
I wouldn't mind, but it's the fact that this issue is occurring inconsistently, most of the time omitting project is fine, but occasionally not (and as far as I can see, with no reason).
I can pretty much guarantee I'm doing something stupid, but has anyone experienced this? Would I be better just putting the project in front of all relevant imports to keep things consistent? Honestly, it's unlikely the project folder name will ever change, I just want things to stick with guidelines where possible.
For import to find a module, it needs to either be in sys.path. Usually, this includes "", so it searches the current directory. If you load "application" from project, it'll find it, since it's in the current directory.
Okay, that's the obvious stuff. A confusing bit is that Python remembers which modules are loaded. If you load application, then you load application2 which imports application, the module "application" is already loaded. It doesn't need to find it on disk; it just uses the one that's already loaded. On the other hand, if you didn't happen to load application yet, it'll search for it--and not find it, since it's not in the same directory as what's loading it ("."), or anywhere else in the path.
That can lead to the weird case where importing sometimes works and sometimes doesn't; it only works if it's already loaded.
If you want to be able to load these modules as just "application", then you need to arrange for project/ to be appended to sys.path.
(Relative imports sound related, but it seems like application and application2 are separate packages--relative imports are used for importing within the same package.)
Finally, be sure to consistently treat the whole thing as a package, or to consistently treat each application as their own package. Do not mix and match. If package/ is in the path (eg. sys.path includes package/..), then you can indeed do "from package.application import foo", but if you then also do "from application import foo", it's possible for Python to not realize these are the same thing--their names are different, and they're in different paths--and end up loading two distinct copies of it, which you definitely don't want.
If you dig into the django philosophy, you will find, that a project is a collection of apps. Some of these apps could depend on other apps, which is just fine. However, what you always want is to make your apps plug-able so you can move them to a different project and use them there as well. To do this, you need to strip all things in your code that's related to your project, so when doing imports you would do.
from aplication.file import test_method
This would be the django way of doing this. Glenn answered why you are getting your errors so I wont go into that part. When you run the command to start a new project: django-admin.py startproject myproject
This will create a folder with a bunch of files that django needs, manage.py settings,py ect, but it will do another thing for you. It will place the folder "myproject" on your python path. In short this means that what ever application you put in that folder, you would be able to import like shown above. You don't need to use django-admin.py to start a project as nothing magical happens, it's just a shortcut. So you can place you application folders anywhere really, you just need to have them on a python path, so you can import from them directly and make your code project independent so it easily can be used in any future project, abiding to the DRY principle that django is built upon.
It is better to always import using the same way - say, using project.app.models - because otherwise, you may find your module is imported twice, which sometimes leads to obscure errors as discussed in this question.

Categories