Is module __file__ attribute absolute or relative? - python

I'm having trouble understanding __file__. From what I understand, __file__ returns the absolute path from which the module was loaded.
I'm having problem producing this: I have a abc.py with one statement print __file__, running from /d/projects/ python abc.py returns abc.py. running from /d/ returns projects/abc.py. Any reasons why?

From the documentation:
__file__ is the pathname of the file from which the module was loaded, if it was loaded from a file. The __file__ attribute is not present for C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file.
From the mailing list thread linked by #kindall in a comment to the question:
I haven't tried to repro this particular example, but the reason is
that we don't want to have to call getpwd() on every import nor do we
want to have some kind of in-process variable to cache the current
directory. (getpwd() is relatively slow and can sometimes fail
outright, and trying to cache it has a certain risk of being wrong.)
What we do instead, is code in site.py that walks over the elements of
sys.path and turns them into absolute paths. However this code runs
before '' is inserted in the front of sys.path, so that the initial
value of sys.path is ''.
For the rest of this, consider sys.path not to include ''.
So, if you are outside the part of sys.path that contains the module, you'll get an absolute path. If you are inside the part of sys.path that contains the module, you'll get a relative path.
If you load a module in the current directory, and the current directory isn't in sys.path, you'll get an absolute path.
If you load a module in the current directory, and the current directory is in sys.path, you'll get a relative path.

__file__ is absolute since Python 3.4, except when executing a script directly using a relative path:
Module __file__ attributes (and related values) should now always contain absolute paths by default, with the sole exception of __main__.__file__ when a script has been executed directly using a relative path. (Contributed by Brett Cannon in bpo-18416.)
Not sure if it resolves symlinks though.
Example of passing a relative path:
$ python script.py

Late simple example:
from os import path, getcwd, chdir
def print_my_path():
print('cwd: {}'.format(getcwd()))
print('__file__:{}'.format(__file__))
print('abspath: {}'.format(path.abspath(__file__)))
print_my_path()
chdir('..')
print_my_path()
Under Python-2.*, the second call incorrectly determines the path.abspath(__file__) based on the current directory:
cwd: C:\codes\py
__file__:cwd_mayhem.py
abspath: C:\codes\py\cwd_mayhem.py
cwd: C:\codes
__file__:cwd_mayhem.py
abspath: C:\codes\cwd_mayhem.py
As noted by #techtonik, in Python 3.4+, this will work fine since __file__ returns an absolute path.

With the help of the of Guido mail provided by #kindall, we can understand the standard import process as trying to find the module in each member of sys.path, and file as the result of this lookup (more details in PyMOTW Modules and Imports.). So if the module is located in an absolute path in sys.path the result is absolute, but if it is located in a relative path in sys.path the result is relative.
Now the site.py startup file takes care of delivering only absolute path in sys.path, except the initial '', so if you don't change it by other means than setting the PYTHONPATH (whose path are also made absolute, before prefixing sys.path), you will get always an absolute path, but when the module is accessed through the current directory.
Now if you trick sys.path in a funny way you can get anything.
As example if you have a sample module foo.py in /tmp/ with the code:
import sys
print(sys.path)
print (__file__)
If you go in /tmp you get:
>>> import foo
['', '/tmp', '/usr/lib/python3.3', ...]
./foo.py
When in in /home/user, if you add /tmp your PYTHONPATH you get:
>>> import foo
['', '/tmp', '/usr/lib/python3.3', ...]
/tmp/foo.py
Even if you add ../../tmp, it will be normalized and the result is the same.
But if instead of using PYTHONPATH you use directly some funny path
you get a result as funny as the cause.
>>> import sys
>>> sys.path.append('../../tmp')
>>> import foo
['', '/usr/lib/python3.3', .... , '../../tmp']
../../tmp/foo.py
Guido explains in the above cited thread, why python do not try to transform all entries in absolute paths:
we don't want to have to call getpwd() on every import ....
getpwd() is relatively slow and can sometimes fail outright,
So your path is used as it is.

__file__ can be relative or absolute depending on the python version used and whether the module is executed directly or not.
TL;DR:
From python 3.5 to 3.8, __file__ is set to the relative path of the module w.r.t. the current working directory if the module is called directly. Otherwise it is set to the absolute path.
In python 3.9 and 3.10, __file__ is set to the absolute path even if the corresponding module is executed directly.
For instance, having the following setup (inspired by this answer):
# x.py:
from pathlib import Path
import y
print(__file__)
print(Path(__file__))
print(Path(__file__).resolve())
# y.py:
from pathlib import Path
print(__file__)
print(Path(__file__))
with x.py and y.py in the same directory. The different outputs after going to this directory and executing:
python x.py
are:
Python 3.5 - 3.8:
D:\py_tests\y.py
D:\py_tests\y.py
x.py
x.py
D:\py_tests\x.py
Python 3.9 - 3.10
D:\py_tests\y.py
D:\py_tests\y.py
D:\py_tests\x.py
D:\py_tests\x.py
D:\py_tests\x.py
Note: this tests have been done in windows 10

Related

Python help working with packages and modules

I need some help with working with a folder structure in python. I was given an structure like this:
/main-folder
/assets
somefiles.txt
/integrations
/module-folder
__init__.py
ingestion.py
__init__.py
models.py
Inside ingestion.py I have:
import os
from models import MyModel
PROJECT_DIR = os.path.dirname(os.path.dirname(os.path.dirname(__file__)))
some_function()
some_processing()
if __name__ == "__main__":
some_function()
Both __init__.py mentioned above are empty.
So I need to process some info and use the models module to store them. when trying to execute intestion.py directly from its dir it says: No module named 'models'. So I'm guessing I have to execute the whole thing as a package. I have no idea how should I import a module located above the package and can't touch the structure.
Any help woud be appreciated.
What you have to do is to add the module's directory to the PYTHONPATH environment variable. If you don't want to do this however, You can modify the sys.path list in your program where the Python interpreter searches for the modules to import, the python documentation says:
When a module named spam is imported, the interpreter first searches for a built-in module with that name. If not found, it then searches for a file named spam.py in a list of directories given by the variable sys.path. sys.path is initialized from these locations:
the directory containing the input script (or the current directory).
PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH).
the installation-dependent default.
After initialization, Python programs can modify sys.path. The directory containing the script being run is placed at the beginning of the search path, ahead of the standard library path. This means that scripts in that directory will be loaded instead of modules of the same name in the library directory. This is an error unless the replacement is intended.
Knowing this, you can do the following in your program:
import sys
# Add the main-folder folder path to the sys.path list
sys.path.append('/path/to/main-folder/')
# Now you can import your module
from main-folder import models
# Or just
import main-folder

Purpose of some boilerplate code in __main__.py

I've seen the following code in a couple Python projects, in __main__.py. Could someone explain the purpose? Of course it puts the directory containing __main__.py at the head of sys.path, but why? And why the tests (__package__ is None and not hasattr(sys, 'frozen')? Also, in the sys.path.insert, why is os.path.dirname called twice?
import sys
if __package__ is None and not hasattr(sys, 'frozen'):
# direct call of __main__.py
import os.path
path = os.path.realpath(os.path.abspath(__file__))
sys.path.insert(0, os.path.dirname(os.path.dirname(path)))
os.path.dirname(os.path.dirname(path)) - Gets the grand-parent directory (the directory containing the directory of the given path variable); this is being added to the system's PATH variable.
os.path.realpath(os.path.abspath(__file__)) - Gets the realpath (resolves symbolic linking) of the absolute path of the running file.
Through this method, the project can now execute binary files that are included in that grandparent directory without needing to prefix the binary executable.
Sidenote: Without context of where you see this code, it's hard to give more of an answer as to why its used.
The test for __package__ lets the code run when package/__main__.py has been run with a command like python __main__.py or python package/ (naming the file directly or naming the package folder's path), not the more normal way of running the main module of a package python -m package. The other check (for sys.frozen) tests if the package has been packed up with something like py2exe into a single file, rather than being in a normal file system.
What the code does is put the parent folder of the package into sys.path. That is, if __main__.py is located at /some/path/to/package/__main__.py, the code will put /some/path/to in sys.path. Each call to dirname strips off one item off the right side of the path ("/some/path/to/package/__main__.py" => "/some/path/to/package" => "/some/path/to").

Path Changed in Python Script Executed by Another Python Script

A script at /foo/bar.py tries to run a second script /main.py using the subprocess module. Although main.py runs fine with python main.py in the Windows command prompt, running bar.py which calls main.py causes an error
ConfigParser.NoSectionError: No section: 'user'
Why is there now a problem with the path to settings.ini, and how can we fix it?
~/settings.ini
[user]
id: helloworld
~/foo/bar.py
subprocess.Popen([sys.executable, "../main.py"])
~/main.py
Config = ConfigParser.ConfigParser()
Config.read("settings.ini")
userId = Config.get('user', 'id')
If settings.ini is presumed to be in the same directory as main.py you can deduce its full path from __file__ and read the settings using the full path.
main.py:
import os
ini_path = os.path.join(os.path.dirname(__file__), "settings.ini")
Config = ConfigParser.ConfigParser()
Config.read(ini_path)
Check that the read() method returns a non-empty list otherwise it means that settings.ini is not found. Relative paths are resolved relative to the current working directory (place where you've run python), not script's directory (where bar.py is stored).
You should probably use appdirs.user_config_dir(), to get the directory where to put user's configuration files. Different OSes such as Windows, macOS, Linux distributions may use different defaults. appdirs follows conventions that are appropriate for a given OS.
If you want to get data stored in a file that is packaged with your installed Python modules then you could use pkgutil.get_data() or setuptools' pkg_resources.resource_string() that work even if your package is inside an archive. It is not recommended to construct the paths manually but if you need the directory with the current script then you could put get_script_dir() function into your script—it is more general than os.path.dirname(os.path.abspath(__file__)). If the relative position is always the same then you could use:
#!/usr/bin/env python
import os
import subprocess
import sys
main_script_dir = os.path.join(get_script_dir(), os.pardir)
subprocess.check_call([sys.executable, '-m', 'main'], cwd=main_script_dir)
See Python : How to access file from different directory

Get path of file being run after cd to another directory in python

I have following simple script test.py in directory /Users/apps:-
import os
os.chdir("/Users/apps/update/diiferent_path")
path = os.path.dirname(os.path.abspath(__file__))
print(path)
I am getting path value as /Users/apps/update/diiferent_path because I have changed the directory. How can I get test.py path i.e. /Users/apps after changing directory.
It's impossible to do it after the chdir (unless, of course, if __file__ already contains the absolute path), because the information is lost. There is no such thing as what was the current directory when the process started stored anywhere automatically. But you can store it manually by calling os.path.abspath(__file__) before calling the chdir.
I agree with pts post that storing the path is the way to go. It best shows your intention of getting the original path, despite what modifications are done throughout the script.
For reference sake two workarounds. The path changes because the script is executed by name. The interpreter then finds the file in the current directory. The current directory changes, and thus the absolute path of both __file__ and sys.argv.
One way around that is to call the script with an absolute path. Another is to realize that the path of the script is automatically added to the PYTHONPATH, i.e. to sys.path. The following script illustrates that:
#!/usr/bin/python
import os
import sys
os.chdir("/usr/bin")
print(os.path.abspath(__file__))
print(os.path.abspath(sys.argv[0]))
print(sys.path[0])
The output:
# /tmp$ ./so24323731.py
/usr/bin/so24323731.py
/usr/bin/so24323731.py
/tmp
# /tmp$ /tmp/so24323731.py
/tmp/so24323731.py
/tmp/so24323731.py
/tmp

what does the __file__ variable mean/do?

import os
A = os.path.join(os.path.dirname(__file__), '..')
B = os.path.dirname(os.path.realpath(__file__))
C = os.path.abspath(os.path.dirname(__file__))
I usually just hard-wire these with the actual path. But there is a reason for these statements that determine path at runtime, and I would really like to understand the os.path module so that I can start using it.
When a module is loaded from a file in Python, __file__ is set to its path. You can then use that with other functions to find the directory that the file is located in.
Taking your examples one at a time:
A = os.path.join(os.path.dirname(__file__), '..')
# A is the parent directory of the directory where program resides.
B = os.path.dirname(os.path.realpath(__file__))
# B is the canonicalised (?) directory where the program resides.
C = os.path.abspath(os.path.dirname(__file__))
# C is the absolute path of the directory where the program resides.
You can see the various values returned from these here:
import os
print(__file__)
print(os.path.join(os.path.dirname(__file__), '..'))
print(os.path.dirname(os.path.realpath(__file__)))
print(os.path.abspath(os.path.dirname(__file__)))
and make sure you run it from different locations (such as ./text.py, ~/python/text.py and so forth) to see what difference that makes.
I just want to address some confusion first. __file__ is not a wildcard it is an attribute. Double underscore attributes and methods are considered to be "special" by convention and serve a special purpose.
http://docs.python.org/reference/datamodel.html shows many of the special methods and attributes, if not all of them.
In this case __file__ is an attribute of a module (a module object). In Python a .py file is a module. So import amodule will have an attribute of __file__ which means different things under difference circumstances.
Taken from the docs:
__file__ is the pathname of the file from which the module was loaded, if it was loaded from a file. The __file__ attribute is not present
for C modules that are statically linked into the interpreter; for
extension modules loaded dynamically from a shared library, it is the
pathname of the shared library file.
In your case the module is accessing it's own __file__ attribute in the global namespace.
To see this in action try:
# file: test.py
print globals()
print __file__
And run:
python test.py
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__file__':
'test_print__file__.py', '__doc__': None, '__package__': None}
test_print__file__.py
Per the documentation:
__file__ is the pathname of the file from which the module was
loaded, if it was loaded from a file. The __file__ attribute is not
present for C modules that are statically linked into the interpreter;
for extension modules loaded dynamically from a shared library, it is
the pathname of the shared library file.
and also:
__file__ is to be the “path” to the file unless the module is built-in (and thus listed in sys.builtin_module_names) in which case the attribute is not set.
Just going to add a quick note here (mostly answering the question's title rather than its description) about a change which can confuse some people. As of Python 3.4 there has been a slight change in how the __file__ behaves:
It's set to the relative path of the module in which it's used, if that module is executed directly.
It's set to the absolute path of the file otherwise.
Module __file__ attributes (and related values) should now always contain absolute paths by default, with the sole exception of __main__.__file__ when a script has been executed directly using a relative path. (Contributed by Brett Cannon in issue 18416.)
Example:
Calling module x directly and module y indirectly:
# x.py:
from pathlib import Path
import y
print(__file__)
print(Path(__file__))
print(Path(__file__).resolve())
# y.py:
from pathlib import Path
print(__file__)
print(Path(__file__))
Running python3 x.py will output:
/home/aderchox/mytest/y.py
/home/aderchox/mytest/y.py
x.py
x.py
/home/aderchox/mytest/x.py
Using __file__ combined with various os.path modules lets all paths be relative the current module's directory location. This allows your modules/projects to be portable to other machines.
In your project you do:
A = '/Users/myname/Projects/mydevproject/somefile.txt'
and then try to deploy it to your server with a deployments directory like /home/web/mydevproject/ then your code won't be able to find the paths correctly.
To add to aderchox's answer, the behaviour of the __file__ variable was again changed in Python 3.9, and now it's an absolute Path in all cases
Running the same example (but copying it here for self consistency)
# x.py:
from pathlib import Path
import y
print(__file__)
print(Path(__file__))
print(Path(__file__).resolve())
# y.py:
from pathlib import Path
print(__file__)
print(Path(__file__))
Now running x.py with two different versions of the interpreter
$ python3.8 x.py
/private/tmp/y.py
/private/tmp/y.py
x.py
x.py
/private/tmp/x.py
$ python3.9 x.py
/private/tmp/y.py
/private/tmp/y.py
/private/tmp/x.py
/private/tmp/x.py
/private/tmp/x.py
source:
https://docs.python.org/3/whatsnew/3.9.html#other-language-changes

Categories