I have module that I am writing in python that needs to download data and store it in a particular directory. Currently, I am doing this by in the manner shown below,
import os
folder = 'd:\data' #location of the root folder directory on my system
DATAPATH = os.path.join(folder, 'download_data')
This works for my module on my system. I am interested in distributing this module to other machines and I am not sure how I can control the location of the root folder when I install the module to a different machine. Are there any best practices on how to do this? Is there some way to do this in the setup file?
Yes, it should be set by your installer and reside in a configuration file. Use module configparser to extract the value. Look at BasicInterpolation to see your need as an example.
From the sounds of things, you need to install your package with a known structure, including some non-Python files, then need to get the location of those files by using their known relative path from your package modules. In which case, you need to combine these two answers
Including non-Python files with setup.py
Retrieving python module path
And then use your existing logic to construct your full path
Related
I have a python package built from source code in /Document/pythonpackage directory
/Document/pythonpackage/> python setup.py install
This creates a folder in site-packages directory of python
import pythonpackage
print(pythonpackage.__file__)
>/anaconda3/lib/python3.7/site-packages/pythonpackage-x86_64.egg/pythonpackage/__init__.py
I am running a script on multiple environments so the only path I know I will have is pythonpackage.__file__. However Document/pythonpackage has some data that is not in site-packages is there a way to automatically find the path to /Document/pythonpackage given that you only have access to the module in python?
working like that is discouraged. it's generally assumed that after installing a package the user can remove the installation directory (as most automated package managers would do). instead you'd make sure your setup.py copied any data files over into the relevant places, and then your code would pick them up from there.
assuming you're using the standard setuptools, you can see the docs on Including Data Files, which says at the bottom:
In summary, the three options allow you to:
include_package_data
Accept all data files and directories matched by MANIFEST.in.
package_data
Specify additional patterns to match files that may or may not be matched by MANIFEST.in or found in source control.
exclude_package_data
Specify patterns for data files and directories that should not be included when a package is installed, even if they would otherwise have been included due to the use of the preceding options.
and then says:
Typically, existing programs manipulate a package’s __file__ attribute in order to find the location of data files. However, this manipulation isn’t compatible with PEP 302-based import hooks, including importing from zip files and Python Eggs. It is strongly recommended that, if you are using data files, you should use the ResourceManager API of pkg_resources to access them
Not sure, but you could create a repository for your module and use pip to install it. The egg folder would then have a file called PKG-INFO which would contain the url to the repository you imported your module from.
I've never had to consider about any kind of software distributing (I'm using python for this project), so now I'm not sure about the 'best' or most common used approach of filename handling. Now I use relative paths for all images, config files, ... from the top-level-directory with the executable program.
So it naturally fails when the program is executed from different location. My question is, if it is Ok to change curent working directory in the beginnig of the program to dirname of __file__ (It's executed in sub-shell so I don't see a problem with this - but I want to run platfrom independent, so I'm not sure how windows handles it), or if it issue I can solve using distutils and installing the whole program (I'd prefer not to). Or if there are any other (better) ways?
So basically I can solve the problem easily, I just want to know what is the usual to do, thank you for your advice.
Best practice is to use absolute paths.
Use the __file__ path not to change directories, instead use it to calculate a base path to use to build absolute paths. In a top-level module, add:
import os.path
BASE = os.path.dirname(os.path.abspath(__file__))
and reuse BASE to build absolute paths:
abspath = os.path.join(BASE, relpath)
Changing the working directory is rarely needed or useful.
Well __file__ is defined for a given module and not all modules has this property. According to the documentation:
__file__ is the pathname of the file from which the module was loaded, if it was loaded from a file. The __file__ attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.
Since you are planning to invoke this in your own module you shouldn't have any problem on linux, windows and even osx. Of course, use os.path module to manipulate the paths.
Using this general structure:
setup.py
/package
__init__.py
project.py
/data
client.log
I have a script that saves a list of names to client.log, so I don't have to reinitialize that list each time I need access to it or run the module. Before I set up this structure with pkg_resources, I used open('.../data/client.log', 'w') to update the log with explicit paths, but this doesn't work anymore.
Is there any way to edit data files within modules? Or is there a better way to save this list?
No, pkg_resources are for reading resources within a package. You can't use it to write log files, because it's the wrong place for log files. Your package directory should typically not be writeable by the user that loads the library. Also, your package may in fact be inside a ZIP-file.
You should instead store the logs in a log directory. Where to put that depends on a lot of things, the biggest issue is your operating system but also if it's system software or user software.
I need to put a python script somewhere on my computer so that in another file I can use it. How do I do this and where do I put it? And where in the python documentation do I learn how to do this? I'm a beginner + don't use python much.
library file: MyLib.py put in a well-known place
def myfunc():
....
other file SourceFile.py located elsewhere, doesn't need to know where MyLib.py is:
something = MyLib.myfunc()
Option 1:
Put your file at:
<Wherever your Python is>/Lib/site-packages/myfile.py
Add this to your code:
import myfile
Pros: Easy
Cons: Clutters site-packages
Option 2:
Put your file at:
/Lib/site-packages/mypackage/myfile.py
Create an empty text file called:
<Wherever your Python is>/Lib/site-packages/mypackage/__init__.py
Add this to your code:
from mypackage import myfile
Pros: Reduces clutter in site-packages by keeping your stuff consolidated in a single directory
Cons: Slightly more work; still some clutter in site-packages. This isn't bad for stable stuff, but may be regarded as inappropriate for development work, and may be impossible if Python is installed on a shared drive
Option 3
Put your file in any directory you like
Add that directory to the PYTHONPATH environment variable
Proceed as with Option 1 or Option 2, except substitute the directory you just created for <Wherever your Python is>/Lib/site-packages/
Pros: Keeps development code out of the site-packages directory
Cons: slightly more setup
This is the approach I usually use for development work
In general, the Modules section of the Python tutorial is a good introduction for beginners on this topic. It explains how to write your own modules and where to put them, but I'll summarize the answer to your question below:
Your Python installation has a site-packages directory; any python file you put in that directory will be available to any script you write. For example, if you put the file MyLib.py in the site-packages directory, then in your script you can say
import MyLib
something = MyLib.myfunc()
If you're not sure where Python is installed, the Stack Overflow question How do I find the location of my Python site-packages directory will be helpful to you.
Alternatively, you can modify sys.path, which is a list of directories where Python looks for libraries when you use the import statement. Your site-packages directory is already in this list, but you can add (or remove) entries yourself. For example, if you wanted to put your MyLib.py file in /usr/local/pythonModules, you could say
import sys
sys.path.append("/usr/local/pythonModules")
import MyLib
something = MyLib.myfunc()
Finally, you could use the PYTHONPATH environment variable to indicate the directory where your MyLib.py is located.
However, I recommend simply placing your MyLib.py file in the site-packages directory, as described above.
No one has mentioned using .pth files in site-packages to abstract away the location.
You will have to place your MyLib.py somewhere in your load path (this the paths in your sys.path variable) and then you'll be able to import it fine. Your code would look like
import MyLib
MyLib.myfunc()
Generally speaking, you should distribute your packages using distutils so that they can be easily installed in the proper locations. It would help you as well.
Also, you might not want to install packages in your global Python install. It's customary (and recommended) to use virtualenv which you can use to create small isolated Python environments that can hold local packages.
It's best your give the whole thing a shot and then ask further questions if you have them.
The private version, from my .profile
export PYTHONPATH=${PYTHONPATH}:$HOME/lib/python
which has a subdirectory "msw" so import msw.primes is self documenting or add to a local directory that is already in sys.path
The Python tutorial section 6 talks about modules, and 6.1.2 talks about the PYTHONPATH, which determines where Python will look for modules you try to import. The tutorial: http://docs.python.org/tutorial/modules.html
What would be the best directory structure strategy to share a utilities module across my python projects? As the common modules would be updated with new functions I would not want to put them in the python install directory.
project1/
project2/
sharedUtils/
From project1 I can not use "import ..\sharedUtils", is there any other way? I would rather not hardcode the "sharedUtils" location
Thanks in advance
Directory structure:
project1/foo.py
sharedUtils/bar.py
With the directories as you've shown them, from foo.py inside the project1 directory you can add the relative path to sharedUtils as follows:
import sys
sys.path.append("../sharedUtils")
import bar
This avoids hardcoding a C:/../sharedUtils path, and will work as long as you don't change the directory structure.
Make a separate standalone package? And put it in the /site-packages of your python install?
There is also my personal favorite when it comes to development mode: use of symlinks and/or *.pth files.
Suppose you have sharedUtils/utils_foo and sharedUtils/utils_bar.
You could edit your PYTHONPATH to include sharedUtils, then import them in project1 and project2 using
import utils_foo
import utils_bar
etc.
In linux you could do that be editing ~/.profile with something like this:
PYTHONPATH=/path/to/sharedUtils:/other/paths
export PYTHONPATH
Using the PYTHONPATH environment variable affects the directories that python searches when looking for modules. Since every user can set his own PYTHONPATH, this solution is good for personal projects.
If you want all users on the machine to be able to import modules in sharedUtils, then
you can achieve this by using a .pth file. Exactly where you put the .pth file may depend on your python distribution. See Using .pth files for Python development.