python readline module folder name completion - python

I am trying to implement a folder name completion for one of my programs and i am using the readline module for it. I have written a Completer class (file autocompletion.py):
import os.path
import readline
class Completer(object):
def _match_path(self, text):
text = readline.get_line_buffer()
return DirectoryPathCompleter(text).completions()
def complete(self, text, state):
matches = self._match_path(text)
try:
return matches[state]
except IndexError:
return None
class DirectoryPathCompleter(object):
def __init__(self, text):
self.text = text
def completions(self):
path, rest = os.path.split(self.text)
if rest == '.' or rest == '..':
result = [rest + '/']
else:
result = [i + '/' for i in get_dirs_under(path) if i.startswith(rest)]
return result
def get_dirs_under(path):
if not path:
path = '.'
dirpath, dirnames, filenames = os.walk(path).next()
return dirnames
To try this out here is a little script try_readline.py
#!/usr/bin/env python
from autocompletion import Completer
import readline
readline.parse_and_bind('tab: complete')
readline.set_completer(Completer().complete)
raw_input('Test it! > ')
I can see there, that it works quite good as long as I do not try to complete dirnames that can be completed just partially. When I create for example a directory with the following structure and run the program via shell:
$ mkdir -p test/bla-foo test/bla-bar test/bla-baz test/something test/else
$ cp autocompletion.py try_readline.py test
$ cd test; tree
.
├── autocompletion.py
├── bla-bar
├── bla-baz
├── bla-foo
├── else
├── something
└── try_readline.py
5 directories, 2 files
$ python try_readline.py
and I try to complete bla- via TAB, I get as a result bla-bla-. I would like the completion to stick with bla- and show the alternatives that can be completed.
How can I achieve this?
EDIT:
Ok, I don't know exactly why this is happening. If I alter the try_readline.py script to look like this:
#!/usr/bin/env python
import readline
readline.parse_and_bind('tab: complete')
raw_input('Test it! > ')
So I am not using my own Completer and I run it on the same folder structure as shown above, I can see similar behaviour:
Test it! > bla-b<TAB>
Test it! > bla-bla-
Whereas I would expect:
Test it! > bla-b<TAB>
Test it! > bla-ba<TAB><TAB>
bla-bar/ bla-baz/
EDIT2
Ok I have another approach now, it took so long because the readline documentation is very sparse.. I still don't get exactly why I do have to do it this way but it works.
If I look at the delimiters readline is using, I get:
In [1]: readline.get_completer_delims()
Out[1]: ' \t\n`!##$^&*()=+[{]}\\|;:\'",<>?'
Especially the \\ part looked odd to me, so I did set new delimiters in the try_readline.py
#!/usr/bin/env python
import readline
from autocompletion import Completer
readline.parse_and_bind('tab: complete')
readline.set_completer_delims(' \t\n')
raw_input('Test it! > ')
I am now using the standard readline completion as I don't know exactly how to write a correct Completer... I would like to reuse the standard readline completion function and just filter out any non-directories but I don't know how to extract it from the module.

Related

How do you properly integrate unit tests for file parsing with pytest?

I'm trying to test file parsing with pytest. I have a directory tree that looks something like this for my project:
project
project/
cool_code.py
setup.py
setup.cfg
test/
test_read_files.py
test_files/
data_file1.txt
data_file2.txt
My setup.py file looks something like this:
from setuptools import setup
setup(
name = 'project',
description = 'The coolest project ever!',
setup_requires = ['pytest-runner'],
tests_require = ['pytest'],
)
My setup.cfg file looks something like this:
[aliases]
test=pytest
I've written several unit tests with pytest to verify that files are properly read. They work fine when I run pytest from within the "test" directory. However, if I execute any of the following from my project directory, the tests fail because they cannot find data files in test_files:
>> py.test
>> python setup.py pytest
The test seems to be sensitive to the directory from which pytest is executed.
How can I get pytest unit tests to discover the files in "data_files" for parsing when I call it from either the test directory or the project root directory?
One solution is to define a rootdir fixture with the path to the test directory, and reference all data files relative to this. This can be done by creating a test/conftest.py (if not already created) with some code like this:
import os
import pytest
#pytest.fixture
def rootdir():
return os.path.dirname(os.path.abspath(__file__))
Then use os.path.join in your tests to get absolute paths to test files:
import os
def test_read_favorite_color(rootdir):
test_file = os.path.join(rootdir, 'test_files/favorite_color.csv')
data = read_favorite_color(test_file)
# ...
One solution is to try multiple paths to find the files.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from coolprogram import *
import os
def test_file_locations():
"""Possible locations where test data could be found."""
return(['./test_files',
'./tests/test_files',
])
def find_file(filename):
""" Searches for a data file to use in tests """
for location in test_file_locations():
filepath = os.path.join(location, filename)
if os.path.exists(filepath):
return(filepath)
raise IOError('Could not find test file.')
def test_read_favorite_color():
""" Test that favorite color is read properly """
filename = 'favorite_color.csv'
test_file = find_file(filename)
data = read_favorite_color(test_file)
assert(data['first_name'][1] == 'King')
assert(data['last_name'][1] == 'Arthur')
assert(data['correct_answers'][1] == 2)
assert(data['cross_bridge'][1] == True)
assert(data['favorite_color'][1] == 'green')
One way is to pass a dictionary of command name and custom command class to cmdclass argument of setup function.
Another way is like here, posted it here for quick reference.
pytest-runner will install itself on every invocation of setup.py. In some cases, this causes delays for invocations of setup.py that will never invoke pytest-runner. To help avoid this contingency, consider requiring pytest-runner only when pytest is invoked:
pytest = {'pytest', 'test', 'ptr'}.intersection(sys.argv)
pytest_runner = ['pytest-runner'] if needs_pytest else []
# ...
setup(
#...
setup_requires=[
#... (other setup requirements)
] + pytest_runner,
)
Make sure all the data you read in your test module is relative to the location of setup.py directory.
In OP's case data file path would be test/test_files/data_file1.txt,
I made a project with same structure and read the data_file1.txt with some text in it and it works for me.

Custom Module Import Issue

I can't seem to import my own custom NYT module. My project structure is as follows and I'm on a mac:
articulation/
articulation/
__init__.py # empty
lib/
nyt.py
__init__.py # empty
tests/
test_nyt.py
__init__.py # empty
When I try running python articulation/tests/test_nyt.py from that first parent directory, I get
File "articulation/tests/test_nyt.py", line 5, in <module>
from articulation.lib.nyt import NYT
ImportError: No module named articulation.lib.nyt
I also tried
(venv) Ericas-MacBook-Pro:articulation edohring$ Python -m articulation/tests/test_nyt.py
/Users/edohring/Desktop/articulation/venv/bin/Python: Import by filename is not supported.
test_nyt.py
import sys
sys.path.insert(0, '../../')
import unittest
#from mock import patch
# TODO: store example as fixture and complete test
from articulation.lib.nyt import NYT
class TestNYT(unittest.TestCase):
#patch('articulation.lib.nyt.NYT.fetch')
def test_nyt(self):
print "hi"
#assert issubclass(NYT, Article)
# self.assertTrue(sour_surprise.title == '')"""
nyt.py
from __future__ import division
import regex as re
import string
import urllib2
from collections import Counter
from bs4 import BeautifulSoup
from cookielib import CookieJar
PARSER_TYPE = 'html.parser'
class NYT:
def __init__(self, title, url):
self.url = url
self.title = title
self.words = get_words(url)
def get_words(url):
cj = CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
p = opener.open(url)
soup = BeautifulSoup(p.read(), PARSER_TYPE)
# title = soup.html.head.title.string
letters = soup.find_all('p', class_='story-body-text story-content')
if len(letters)==0:
letters = soup.find_all('p', class_='paragraph--story')
if len(letters)==0:
letters = soup.find_all('p', class_='story-body-text', )
words = Counter()
for element in letters:
a = element.get_text().split()
for c in a:
c = ''.join(ch for ch in c if c.isalpha())
c = c.lower()
if len(c) > 0:
words[c] += 1
return words
def test_nyt():
china_apple_stores = NYT('title_test', 'http://www.nytimes.com/2016/12/29/technology/iphone-china-apple-stores.html?_r=0')
assert(len(china_apple_stores.words) > 0)
# print china_apple_stores.words
fri_brief = NYT('Russia, Syria, 2017: Your Friday Briefing', 'http://www.nytimes.com/2016/12/30/briefing/us-briefing-russia-syria-2017.html')
assert(fri_brief.title == 'Russia, Syria, 2017: Your Friday Briefing')
assert(fri_brief.url == 'http://www.nytimes.com/2016/12/30/briefing/us-briefing-russia-syria-2017.html')
assert(len(fri_brief.words) > 0)
vet = NYT('title_test', 'http://lens.blogs.nytimes.com/2017/01/03/a-love-story-and-twins-for-a-combat-veteran-amputee/')
assert(len(vet.words)>0)
print "All NYT Tests Passed"
#test_nyt()
I've tried the following and none seem to work - does anyone know how to fix this?
- Adding an init.py file to the top directory -> Doesn't help
- Entering Memory Python couldn't find this - maybe because I'm using Python 2. If this is the issue I can post more what I tried.
- Adding sys.path at the top from suggestion below
Doing this:
import sys
sys.path.insert(0, '../../')
is usually a bad idea. Sometimes it's useful for when you're testing something, or you have a single-use program that you just need to work for a short time and then you're going to throw away, but in general it's a bad habit to get into because it might stop working once you move directories around or once you give the code to someone else. I would advise you not to let yourself get in the habit of doing that.
The most likely reason to get the kind of error you're seeing is that the directory /Users/edohring/Desktop/articulation does not appear in sys.path. The first thing to do is see what actually is in sys.path, and one good way to do that is to temporarily put these lines at the top of test_nyt.py:
import os.path, sys
for p in sys.path:
print(p)
if not os.path.isabs(p):
print(' (absolute: {})'.format(os.path.abspath(p)))
sys.exit()
Then run
python articulation/tests/test_nyt.py
and look at the output. You will get a line for each directory path that Python looks in to find its modules, and if any of those paths are relative, it will also print out the corresponding absolute path so that there is no confusion. I suspect you will find that /Users/edohring/Desktop/articulation does not appear anywhere in this list.
If that turns out to be the case, the most straightforward (but least future-proof) way to fix it is to run
export PYTHONPATH=".:$PYTHONPATH"
in the shell (not in Python!) before you use Python itself to do anything using your module. Directories named in the PYTHONPATH environment variable will be added to sys.path when Python starts up. This is only a temporary fix, unless you put it in a file like $HOME/.bashrc which will get read by the shell every time you open up a Terminal window. You can read about this and better ways to add the proper directory to sys.path in this question.
Perhaps a better way to run your script is to use the shell command
python -m articulation.tests.test_nyt
This needs to be run in the directory /Users/edohring/Desktop/articulation, or at least that directory needs to appear in sys.path in order for the command to work. But using the -m switch in this way causes Python to handle how it sets up sys.path a little differently, and it may work for you. You can read more about how sys.path is populated in this answer.

Python - Get path of root project structure

I've got a python project with a configuration file in the project root.
The configuration file needs to be accessed in a few different files throughout the project.
So it looks something like: <ROOT>/configuration.conf
<ROOT>/A/a.py, <ROOT>/A/B/b.py (when b,a.py access the configuration file).
What's the best / easiest way to get the path to the project root and the configuration file without depending on which file inside the project I'm in? i.e without using ../../? It's okay to assume that we know the project root's name.
You can do this how Django does it: define a variable to the Project Root from a file that is in the top-level of the project. For example, if this is what your project structure looks like:
project/
configuration.conf
definitions.py
main.py
utils.py
In definitions.py you can define (this requires import os):
ROOT_DIR = os.path.dirname(os.path.abspath(__file__)) # This is your Project Root
Thus, with the Project Root known, you can create a variable that points to the location of the configuration (this can be defined anywhere, but a logical place would be to put it in a location where constants are defined - e.g. definitions.py):
CONFIG_PATH = os.path.join(ROOT_DIR, 'configuration.conf') # requires `import os`
Then, you can easily access the constant (in any of the other files) with the import statement (e.g. in utils.py): from definitions import CONFIG_PATH.
Other answers advice to use a file in the top-level of the project. This is not necessary if you use pathlib.Path and parent (Python 3.4 and up). Consider the following directory structure where all files except README.md and utils.py have been omitted.
project
│ README.md
|
└───src
│ │ utils.py
| | ...
| ...
In utils.py we define the following function.
from pathlib import Path
def get_project_root() -> Path:
return Path(__file__).parent.parent
In any module in the project we can now get the project root as follows.
from src.utils import get_project_root
root = get_project_root()
Benefits: Any module which calls get_project_root can be moved without changing program behavior. Only when the module utils.py is moved we have to update get_project_root and the imports (refactoring tools can be used to automate this).
All the previous solutions seem to be overly complicated for what I think you need, and often didn't work for me. The following one-line command does what you want:
import os
ROOT_DIR = os.path.abspath(os.curdir)
Below Code Returns the path until your project root
import sys
print(sys.path[1])
To get the path of the "root" module, you can use:
import os
import sys
os.path.dirname(sys.modules['__main__'].__file__)
But more interestingly if you have an config "object" in your top-most module you could -read- from it like so:
app = sys.modules['__main__']
stuff = app.config.somefunc()
Try:
ROOT_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
A standard way to achieve this would be to use the pkg_resources module which is part of the setuptools package. setuptools is used to create an install-able python package.
You can use pkg_resources to return the contents of your desired file as a string and you can use pkg_resources to get the actual path of the desired file on your system.
Let's say that you have a package called stackoverflow.
stackoverflow/
|-- app
| `-- __init__.py
`-- resources
|-- bands
| |-- Dream\ Theater
| |-- __init__.py
| |-- King's\ X
| |-- Megadeth
| `-- Rush
`-- __init__.py
3 directories, 7 files
Now let's say that you want to access the file Rush from a module app.run. Use pkg_resources.resouces_filename to get the path to Rush and pkg_resources.resource_string to get the contents of Rush; thusly:
import pkg_resources
if __name__ == "__main__":
print pkg_resources.resource_filename('resources.bands', 'Rush')
print pkg_resources.resource_string('resources.bands', 'Rush')
The output:
/home/sri/workspace/stackoverflow/resources/bands/Rush
Base: Geddy Lee
Vocals: Geddy Lee
Guitar: Alex Lifeson
Drums: Neil Peart
This works for all packages in your python path. So if you want to know where lxml.etree exists on your system:
import pkg_resources
if __name__ == "__main__":
print pkg_resources.resource_filename('lxml', 'etree')
output:
/usr/lib64/python2.7/site-packages/lxml/etree
The point is that you can use this standard method to access files that are installed on your system (e.g pip install xxx or yum -y install python-xxx) and files that are within the module that you're currently working on.
Simple and Dynamic!
this solution works on any OS and in any level of directory:
Assuming your project folder name is my_project
from pathlib import Path
current_dir = Path(__file__)
project_dir = [p for p in current_dir.parents if p.parts[-1]=='my_project'][0]
I've recently been trying to do something similar and I have found these answers inadequate for my use cases (a distributed library that needs to detect project root). Mainly I've been battling different environments and platforms, and still haven't found something perfectly universal.
Code local to project
I've seen this example mentioned and used in a few places, Django, etc.
import os
print(os.path.dirname(os.path.abspath(__file__)))
Simple as this is, it only works when the file that the snippet is in is actually part of the project. We do not retrieve the project directory, but instead the snippet's directory
Similarly, the sys.modules approach breaks down when called from outside the entrypoint of the application, specifically I've observed a child thread cannot determine this without relation back to the 'main' module. I've explicitly put the import inside a function to demonstrate an import from a child thread, moving it to top level of app.py would fix it.
app/
|-- config
| `-- __init__.py
| `-- settings.py
`-- app.py
app.py
#!/usr/bin/env python
import threading
def background_setup():
# Explicitly importing this from the context of the child thread
from config import settings
print(settings.ROOT_DIR)
# Spawn a thread to background preparation tasks
t = threading.Thread(target=background_setup)
t.start()
# Do other things during initialization
t.join()
# Ready to take traffic
settings.py
import os
import sys
ROOT_DIR = None
def setup():
global ROOT_DIR
ROOT_DIR = os.path.dirname(sys.modules['__main__'].__file__)
# Do something slow
Running this program produces an attribute error:
>>> import main
>>> Exception in thread Thread-1:
Traceback (most recent call last):
File "C:\Python2714\lib\threading.py", line 801, in __bootstrap_inner
self.run()
File "C:\Python2714\lib\threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "main.py", line 6, in background_setup
from config import settings
File "config\settings.py", line 34, in <module>
ROOT_DIR = get_root()
File "config\settings.py", line 31, in get_root
return os.path.dirname(sys.modules['__main__'].__file__)
AttributeError: 'module' object has no attribute '__file__'
...hence a threading-based solution
Location independent
Using the same application structure as before but modifying settings.py
import os
import sys
import inspect
import platform
import threading
ROOT_DIR = None
def setup():
main_id = None
for t in threading.enumerate():
if t.name == 'MainThread':
main_id = t.ident
break
if not main_id:
raise RuntimeError("Main thread exited before execution")
current_main_frame = sys._current_frames()[main_id]
base_frame = inspect.getouterframes(current_main_frame)[-1]
if platform.system() == 'Windows':
filename = base_frame.filename
else:
filename = base_frame[0].f_code.co_filename
global ROOT_DIR
ROOT_DIR = os.path.dirname(os.path.abspath(filename))
Breaking this down:
First we want to accurately find the thread ID of the main thread. In Python3.4+ the threading library has threading.main_thread() however, everybody doesn't use 3.4+ so we search through all threads looking for the main thread save it's ID. If the main thread has already exited, it won't be listed in the threading.enumerate(). We raise a RuntimeError() in this case until I find a better solution.
main_id = None
for t in threading.enumerate():
if t.name == 'MainThread':
main_id = t.ident
break
if not main_id:
raise RuntimeError("Main thread exited before execution")
Next we find the very first stack frame of the main thread. Using the cPython specific function sys._current_frames() we get a dictionary of every thread's current stack frame. Then utilizing inspect.getouterframes() we can retrieve the entire stack for the main thread and the very first frame.
current_main_frame = sys._current_frames()[main_id]
base_frame = inspect.getouterframes(current_main_frame)[-1]
Finally, the differences between Windows and Linux implementations of inspect.getouterframes() need to be handled. Using the cleaned up filename, os.path.abspath() and os.path.dirname() clean things up.
if platform.system() == 'Windows':
filename = base_frame.filename
else:
filename = base_frame[0].f_code.co_filename
global ROOT_DIR
ROOT_DIR = os.path.dirname(os.path.abspath(filename))
So far I've tested this on Python2.7 and 3.6 on Windows as well as Python3.4 on WSL
I decided for myself as follows.
Need to get the path to 'MyProject/drivers' from the main file.
MyProject/
├─── RootPackge/
│ ├── __init__.py
│ ├── main.py
│ └── definitions.py
│
├─── drivers/
│ └── geckodriver.exe
│
├── requirements.txt
└── setup.py
definitions.py
Put not in the root of the project, but in the root of the main package
from pathlib import Path
ROOT_DIR = Path(__file__).parent.parent
Use ROOT_DIR:
main.py
# imports must be relative,
# not from the root of the project,
# but from the root of the main package.
# Not this way:
# from RootPackge.definitions import ROOT_DIR
# But like this:
from definitions import ROOT_DIR
# Here we use ROOT_DIR
# get path to MyProject/drivers
drivers_dir = ROOT_DIR / 'drivers'
# Thus, you can get the path to any directory
# or file from the project root
driver = webdriver.Firefox(drivers_dir)
driver.get('http://www.google.com')
Then PYTHON_PATH will not be used to access the 'definitions.py' file.
Works in PyCharm:
run file 'main.py' (ctrl + shift + F10 in Windows)
Works in CLI from project root:
$ py RootPackge/main.py
Works in CLI from RootPackge:
$ cd RootPackge
$ py main.py
Works from directories above project:
$ cd ../../../../
$ py MyWork/PythoProjects/MyProject/RootPackge/main.py
Works from anywhere if you give an absolute path to the main file.
Doesn't depend on venv.
Here is a package that solves that problem: from-root
pip install from-root
from from_root import from_root, from_here
# path to config file at the root of your project
# (no matter from what file of the project the function is called!)
config_path = from_root('config.json')
# path to the data.csv file at the same directory where the callee script is located
# (has nothing to do with the current working directory)
data_path = from_here('data.csv')
Check out the link above and read the readme to see more use cases
I struggled with this problem too until I came to this solution.
This is the cleanest solution in my opinion.
In your setup.py add "packages"
setup(
name='package_name'
version='0.0.1'
.
.
.
packages=['package_name']
.
.
.
)
In your python_script.py
import pkg_resources
import os
resource_package = pkg_resources.get_distribution(
'package_name').location
config_path = os.path.join(resource_package,'configuration.conf')
This worked for me using a standard PyCharm project with my virtual environment (venv) under the project root directory.
Code below isnt the prettiest, but consistently gets the project root. It returns the full directory path to venv from the VIRTUAL_ENV environment variable e.g. /Users/NAME/documents/PROJECT/venv
It then splits the path at the last /, giving an array with two elements. The first element will be the project path e.g. /Users/NAME/documents/PROJECT
import os
print(os.path.split(os.environ['VIRTUAL_ENV'])[0])
Just an example: I want to run runio.py from within helper1.py
Project tree example:
myproject_root
- modules_dir/helpers_dir/helper1.py
- tools_dir/runio.py
Get project root:
import os
rootdir = os.path.dirname(os.path.realpath(__file__)).rsplit(os.sep, 2)[0]
Build path to script:
runme = os.path.join(rootdir, "tools_dir", "runio.py")
execfile(runme)
I used the ../ method to fetch the current project path.
Example:
Project1 -- D:\projects
src
ConfigurationFiles
Configuration.cfg
Path="../src/ConfigurationFiles/Configuration.cfg"
I had to implement a custom solution because it's not as simple as you might think.
My solution is based on stack trace inspection (inspect.stack()) + sys.path and is working fine no matter the location of the python module in which the function is invoked nor the interpreter (I tried by running it in PyCharm, in a poetry shell and other...). This is the full implementation with comments:
def get_project_root_dir() -> str:
"""
Returns the name of the project root directory.
:return: Project root directory name
"""
# stack trace history related to the call of this function
frame_stack: [FrameInfo] = inspect.stack()
# get info about the module that has invoked this function
# (index=0 is always this very module, index=1 is fine as long this function is not called by some other
# function in this module)
frame_info: FrameInfo = frame_stack[1]
# if there are multiple calls in the stacktrace of this very module, we have to skip those and take the first
# one which comes from another module
if frame_info.filename == __file__:
for frame in frame_stack:
if frame.filename != __file__:
frame_info = frame
break
# path of the module that has invoked this function
caller_path: str = frame_info.filename
# absolute path of the of the module that has invoked this function
caller_absolute_path: str = os.path.abspath(caller_path)
# get the top most directory path which contains the invoker module
paths: [str] = [p for p in sys.path if p in caller_absolute_path]
paths.sort(key=lambda p: len(p))
caller_root_path: str = paths[0]
if not os.path.isabs(caller_path):
# file name of the invoker module (eg: "mymodule.py")
caller_module_name: str = Path(caller_path).name
# this piece represents a subpath in the project directory
# (eg. if the root folder is "myproject" and this function has ben called from myproject/foo/bar/mymodule.py
# this will be "foo/bar")
project_related_folders: str = caller_path.replace(os.sep + caller_module_name, '')
# fix root path by removing the undesired subpath
caller_root_path = caller_root_path.replace(project_related_folders, '')
dir_name: str = Path(caller_root_path).name
return dir_name
Here's my take on this issue.
I have a simple use-case that bugged me for a while. Tried a few solutions, but I didn't like either of them flexible enough.
So here's what I figured out.
create a blank python file in the root dir -> I call this beacon.py
(assuming that the project root is in the PYTHONPATH so it can be imported)
add a few lines to my module/class which I call here not_in_root.py.
This will import the beacon.py module and get the path to that
module
Here's an example project structure
this_project
├── beacon.py
├── lv1
│   ├── __init__.py
│   └── lv2
│   ├── __init__.py
│   └── not_in_root.py
...
The content of the not_in_root.py
import os
from pathlib import Path
class Config:
try:
import beacon
print(f"'import beacon' -> {os.path.dirname(os.path.abspath(beacon.__file__))}") # only for demo purposes
print(f"'import beacon' -> {Path(beacon.__file__).parent.resolve()}") # only for demo purposes
except ModuleNotFoundError as e:
print(f"ModuleNotFoundError: import beacon failed with {e}. "
f"Please. create a file called beacon.py and place it to the project root directory.")
project_root = Path(beacon.__file__).parent.resolve()
input_dir = project_root / 'input'
output_dir = project_root / 'output'
if __name__ == '__main__':
c = Config()
print(f"Config.project_root: {c.project_root}")
print(f"Config.input_dir: {c.input_dir}")
print(f"Config.output_dir: {c.output_dir}")
The output would be
/home/xyz/projects/this_project/venv/bin/python /home/xyz/projects/this_project/lv1/lv2/not_in_root.py
'import beacon' -> /home/xyz/projects/this_project
'import beacon' -> /home/xyz/projects/this_project
Config.project_root: /home/xyz/projects/this_project
Config.input_dir: /home/xyz/projects/this_project/input
Config.output_dir: /home/xyz/projects/this_project/output
Of course, it doesn't need to be called beacon.py nor need to be empty, essentially any python file (importable) file would do as long as it's in the root directory.
Using an empty .py file sort of guarantees that it will not be moved elsewhere due to some future refactoring.
Cheers
If you are working with anaconda-project, you can query the PROJECT_ROOT from the environment variable --> os.getenv('PROJECT_ROOT'). This works only if the script is executed via anaconda-project run .
If you do not want your script run by anaconda-project, you can query the absolute path of the executable binary of the Python interpreter you are using and extract the path string up to the envs directory exclusiv. For example: The python interpreter of my conda env is located at:
/home/user/project_root/envs/default/bin/python
# You can first retrieve the env variable PROJECT_DIR.
# If not set, get the python interpreter location and strip off the string till envs inclusiv...
if os.getenv('PROJECT_DIR'):
PROJECT_DIR = os.getenv('PROJECT_DIR')
else:
PYTHON_PATH = sys.executable
path_rem = os.path.join('envs', 'default', 'bin', 'python')
PROJECT_DIR = py_path.split(path_rem)[0]
This works only with conda-project with fixed project structure of a anaconda-project
I ended up needing to do this in various different situations where different answers worked correctly, others didn't, or either with various modifications, so I made this package to work for most situations
pip install get-project-root
from get_project_root import root_path
project_root = root_path(ignore_cwd=False)
# >> "C:/Users/person/source/some_project/"
https://pypi.org/project/get-project-root/
This is not exactly the answer to this question; But it might help someone. In fact, if you know the names of the folders, you can do this.
import os
import sys
TMP_DEL = '×'
PTH_DEL = '\\'
def cleanPath(pth):
pth = pth.replace('/', TMP_DEL)
pth = pth.replace('\\', TMP_DEL)
return pth
def listPath():
return sys.path
def getPath(__file__):
return os.path.abspath(os.path.dirname(__file__))
def getRootByName(__file__, dirName):
return getSpecificParentDir(__file__, dirName)
def getSpecificParentDir(__file__, dirName):
pth = cleanPath(getPath(__file__))
dirName = cleanPath(dirName)
candidate = f'{TMP_DEL}{dirName}{TMP_DEL}'
if candidate in pth:
pth = (pth.split(candidate)[0]+TMP_DEL +
dirName).replace(TMP_DEL*2, TMP_DEL)
return pth.replace(TMP_DEL, PTH_DEL)
return None
def getSpecificChildDir(__file__, dirName):
for x in [x[0] for x in os.walk(getPath(__file__))]:
dirName = cleanPath(dirName)
x = cleanPath(x)
if TMP_DEL in x:
if x.split(TMP_DEL)[-1] == dirName:
return x.replace(TMP_DEL, PTH_DEL)
return None
List available folders:
print(listPath())
Usage:
#Directories
#ProjectRootFolder/.../CurrentFolder/.../SubFolder
print(getPath(__file__))
# c:\ProjectRootFolder\...\CurrentFolder
print(getRootByName(__file__, 'ProjectRootFolder'))
# c:\ProjectRootFolder
print(getSpecificParentDir(__file__, 'ProjectRootFolder'))
# c:\ProjectRootFolder
print(getSpecificParentDir(__file__, 'CurrentFolder'))
# None
print(getSpecificChildDir(__file__, 'SubFolder'))
# c:\ProjectRootFolder\...\CurrentFolder\...\SubFolder
One-line solution
Hi all! I have been having this issue for ever as well and none of the solutions worked for me, so I used a similar approach that here::here() uses in R.
Install the groo package: pip install groo-ozika
Place a hidden file in your root directory, e.g. .my_hidden_root_file.
Then from anywhere lower in the directory hierarchy (i.e. within
the root) run the following:
from groo.groo import get_root
root_folder = get_root(".my_hidden_root_file")
That's it!
It just executes the following function:
def get_root(rootfile):
import os
from pathlib import Path
d = Path(os.getcwd())
found = 0
while found == 0:
if os.path.isfile(os.path.join(d, rootfile)):
found = 1
else:
d=d.parent
return d
The project root directory does not have __init__.py.
I solved this problem by looking for an ancestor directory that does not have __init__.py.
from functools import lru_cache
from pathlib import Path
#lru_cache()
def get_root_dir() -> str:
path = Path().cwd()
while Path(path, "__init__.py").exists():
path = path.parent
return str(path)
There are many answers here but I couldn't find something simple that covers all cases so allow me to suggest my solution too:
import pathlib
import os
def get_project_root():
"""
There is no way in python to get project root. This function uses a trick.
We know that the function that is currently running is in the project.
We know that the root project path is in the list of PYTHONPATH
look for any path in PYTHONPATH list that is contained in this function's path
Lastly we filter and take the shortest path because we are looking for the root.
:return: path to project root
"""
apth = str(pathlib.Path().absolute())
ppth = os.environ['PYTHONPATH'].split(':')
matches = [x for x in ppth if x in apth]
project_root = min(matches, key=len)
return project_root
Important: This solution requires you to run the file as a module with python -m pkg.file and not as a script like python file.py.
import sys
import os.path as op
root_pkg_dirname = op.dirname(sys.modules[__name__.partition('.')[0]].__file__)
Other answers have requirements like depending on an environment variable or the position of another module in the package structure.
As long as you run the script as python -m pkg.file (with the -m), this approach is self-contained and will work in any module of the package, including in the top-level __init__.py file.
import sys
import os.path as op
root_pkg_name, _, _ = __name__.partition('.')
root_pkg_module = sys.modules[root_pkg_name]
root_pkg_dirname = op.dirname(root_pkg_module.__file__)
config_path = os.path.join(root_pkg_dirname, 'configuration.conf')
It works by taking the first component in the dotted string contained in __name__ and using it as a key in sys.modules which returns the module object of the top-level package. Its __file__ attribute contains the path we want after trimming off /__init__.py using os.path.dirname().

Python merge .py part-files into one .py file

I am doing browser automation using python + splinter.
my structure is like this
[root]
+--start.py
+--end.py
+--[module1]
| +--mod11area1.py
| +--mod12area2.py
| +--[module1_2]
| | +--mod121area1.py
| +--[module1_3]
| +--mod131area1.py
+--[module2]
+--mod21area1.py
start.py sets the initialization and opening of the browser
and the inner modules.py performs actions per module
this structure would then be merged into one script upon execute by appending the contents in this fasion:
start.py
mod11area1.py
mod12area2.py
mod121area1.py
mod131area1.py
mod21area1.py
end.py
My question is, is there a better way of doing this? I'm quite new to this and just usually create a single script. since my project keeps on expanding I had to employ several other guys to script with me. Hence I came up with this approach.
No, Python has no simple way to merge scripts into one .py file.
But you can fake it, albeit in a fairly limited way.
Heres an example of how you can define multiple modules (each with their own namespace), in a single file.
But has the following limitations.
No package support(although this could be made to work).
No support for modules depending on eachother(a module can't be imported unless its already defined).
Example - 2 modules, each containing a function:
# Fake multiple modules in a single file.
import sys
_globals_init = None # include ourself in namespace
_globals_init = set(globals().keys())
# ------------------------
# ---- begin
__name__ = "test_module_1"
__doc__ = "hello world"
def test2():
print(123)
sys.modules[__name__] = type(sys)(__name__, __doc__)
sys.modules[__name__].__dict__.update(globals())
[globals().__delitem__(k) for k in list(globals().keys()) if k not in _globals_init]
# ---- end ------------
# ---------------------
# ---- begin
__name__ = "some_other"
__doc__ = "testing 123"
def test1():
print(321)
sys.modules[__name__] = type(sys)(__name__, __doc__)
sys.modules[__name__].__dict__.update(globals())
[globals().__delitem__(k) for k in list(globals().keys()) if k not in _globals_init]
# ---- end ------------
# ----------------
# ---- example use
import test_module_1
test_module_1.test2()
import some_other
some_other.test1()
# this will fail (as it should)
test1()
Note, this isn't good practice, if you have this problem, you're probably better off with some alternative solution (such as using https://docs.python.org/3/library/zipimport.html)
See my GitHub project.
There is likely a better way for your needs. I developed this project/hack for programming contests which only allow the contestant to submit a single .py file. This allows one to develop a project with multiple .py files and then combine them into one .py file at the end.
My hack is a decorator #modulize which converts a function into a module. This module can then be imported as usual. Here is an example.
#modulize('my_module')
def my_dummy_function(__name__): # the function takes one parameter __name__
# put module code here
def my_function(s):
print(s, 'bar')
# the function must return locals()
return locals()
# import the module as usual
from my_module import my_function
my_function('foo') # foo bar
I also have a script which can combine a project of many .py files which import each other into one '.py' file.
For example, assume I had the following directory structure and files:
my_dir/
__main__.py
import foo.bar
fb = foo.bar.bar_func(foo.foo_var)
print(fb) # foo bar
foo/
__init__.py
foo_var = 'foo'
bar.py
def bar_func(x):
return x + ' bar'
The combined file will look as follows. The code on the top defines the #modulize decorator.
import sys
from types import ModuleType
class MockModule(ModuleType):
def __init__(self, module_name, module_doc=None):
ModuleType.__init__(self, module_name, module_doc)
if '.' in module_name:
package, module = module_name.rsplit('.', 1)
get_mock_module(package).__path__ = []
setattr(get_mock_module(package), module, self)
def _initialize_(self, module_code):
self.__dict__.update(module_code(self.__name__))
self.__doc__ = module_code.__doc__
def get_mock_module(module_name):
if module_name not in sys.modules:
sys.modules[module_name] = MockModule(module_name)
return sys.modules[module_name]
def modulize(module_name, dependencies=[]):
for d in dependencies: get_mock_module(d)
return get_mock_module(module_name)._initialize_
##===========================================================================##
#modulize('foo')
def _foo(__name__):
##----- Begin foo/__init__.py ------------------------------------------------##
foo_var = 'foo'
##----- End foo/__init__.py --------------------------------------------------##
return locals()
#modulize('foo.bar')
def _bar(__name__):
##----- Begin foo/bar.py -----------------------------------------------------##
def bar_func(x):
return x + ' bar'
##----- End foo/bar.py -------------------------------------------------------##
return locals()
def __main__():
##----- Begin __main__.py ----------------------------------------------------##
import foo.bar
fb = foo.bar.bar_func(foo.foo_var)
print(fb) # foo bar
##----- End __main__.py ------------------------------------------------------##
__main__()
Instead of appending the contents into a single *.py file, why not just import what you need from the code that the other people in your team write?

Get the current git hash in a Python script

I would like to include the current git hash in the output of a Python script (as a the version number of the code that generated that output).
How can I access the current git hash in my Python script?
No need to hack around getting data from the git command yourself. GitPython is a very nice way to do this and a lot of other git stuff. It even has "best effort" support for Windows.
After pip install gitpython you can do
import git
repo = git.Repo(search_parent_directories=True)
sha = repo.head.object.hexsha
Something to consider when using this library. The following is taken from gitpython.readthedocs.io
Leakage of System Resources
GitPython is not suited for long-running processes (like daemons) as it tends to leak system resources. It was written in a time where destructors (as implemented in the __del__ method) still ran deterministically.
In case you still want to use it in such a context, you will want to search the codebase for __del__ implementations and call these yourself when you see fit.
Another way assure proper cleanup of resources is to factor out GitPython into a separate process which can be dropped periodically
This post contains the command, Greg's answer contains the subprocess command.
import subprocess
def get_git_revision_hash() -> str:
return subprocess.check_output(['git', 'rev-parse', 'HEAD']).decode('ascii').strip()
def get_git_revision_short_hash() -> str:
return subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD']).decode('ascii').strip()
when running
print(get_git_revision_hash())
print(get_git_revision_short_hash())
you get output:
fd1cd173fc834f62fa7db3034efc5b8e0f3b43fe
fd1cd17
The git describe command is a good way of creating a human-presentable "version number" of the code. From the examples in the documentation:
With something like git.git current tree, I get:
[torvalds#g5 git]$ git describe parent
v1.0.4-14-g2414721
i.e. the current head of my "parent" branch is based on v1.0.4, but since it has a few commits on top of that, describe has added the number of additional commits ("14") and an abbreviated object name for the commit itself ("2414721") at the end.
From within Python, you can do something like the following:
import subprocess
label = subprocess.check_output(["git", "describe"]).strip()
Here's a more complete version of Greg's answer:
import subprocess
print(subprocess.check_output(["git", "describe", "--always"]).strip().decode())
Or, if the script is being called from outside the repo:
import subprocess, os
print(subprocess.check_output(["git", "describe", "--always"], cwd=os.path.dirname(os.path.abspath(__file__))).strip().decode())
Or, if the script is being called from outside the repo and you like pathlib:
import subprocess
from pathlib import Path
print(subprocess.check_output(["git", "describe", "--always"], cwd=Path(__file__).resolve().parent).strip().decode())
numpy has a nice looking multi-platform routine in its setup.py:
import os
import subprocess
# Return the git revision as a string
def git_version():
def _minimal_ext_cmd(cmd):
# construct minimal environment
env = {}
for k in ['SYSTEMROOT', 'PATH']:
v = os.environ.get(k)
if v is not None:
env[k] = v
# LANGUAGE is used on win32
env['LANGUAGE'] = 'C'
env['LANG'] = 'C'
env['LC_ALL'] = 'C'
out = subprocess.Popen(cmd, stdout = subprocess.PIPE, env=env).communicate()[0]
return out
try:
out = _minimal_ext_cmd(['git', 'rev-parse', 'HEAD'])
GIT_REVISION = out.strip().decode('ascii')
except OSError:
GIT_REVISION = "Unknown"
return GIT_REVISION
If subprocess isn't portable and you don't want to install a package to do something this simple you can also do this.
import pathlib
def get_git_revision(base_path):
git_dir = pathlib.Path(base_path) / '.git'
with (git_dir / 'HEAD').open('r') as head:
ref = head.readline().split(' ')[-1].strip()
with (git_dir / ref).open('r') as git_hash:
return git_hash.readline().strip()
I've only tested this on my repos but it seems to work pretty consistantly.
This is an improvement of Yuji 'Tomita' Tomita answer.
import subprocess
def get_git_revision_hash():
full_hash = subprocess.check_output(['git', 'rev-parse', 'HEAD'])
full_hash = str(full_hash, "utf-8").strip()
return full_hash
def get_git_revision_short_hash():
short_hash = subprocess.check_output(['git', 'rev-parse', '--short', 'HEAD'])
short_hash = str(short_hash, "utf-8").strip()
return short_hash
print(get_git_revision_hash())
print(get_git_revision_short_hash())
if you want a bit more data than the hash, you can use git-log:
import subprocess
def get_git_hash():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%H']).strip()
def get_git_short_hash():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%h']).strip()
def get_git_short_hash_and_commit_date():
return subprocess.check_output(['git', 'log', '-n', '1', '--pretty=tformat:%h-%ad', '--date=short']).strip()
for full list of formating options - check out git log --help
I ran across this problem and solved it by implementing this function.
https://gist.github.com/NaelsonDouglas/9bc3bfa26deec7827cb87816cad88d59
from pathlib import Path
def get_commit(repo_path):
git_folder = Path(repo_path,'.git')
head_name = Path(git_folder, 'HEAD').read_text().split('\n')[0].split(' ')[-1]
head_ref = Path(git_folder,head_name)
commit = head_ref.read_text().replace('\n','')
return commit
r = get_commit('PATH OF YOUR CLONED REPOSITORY')
print(r)
I had a problem similar to the OP, but in my case I'm delivering the source code to my client as a zip file and, although I know they will have python installed, I cannot assume they will have git. Since the OP didn't specify his operating system and if he has git installed, I think I can contribute here.
To get only the hash of the commit, Naelson Douglas's answer was perfect, but to have the tag name, I'm using the dulwich python package. It's a simplified git client in python.
After installing the package with pip install dulwich --global-option="--pure" one can do:
from dulwich import porcelain
def get_git_revision(base_path):
return porcelain.describe(base_path)
r = get_git_revision("PATH OF YOUR REPOSITORY's ROOT FOLDER")
print(r)
I've just run this code in one repository here and it showed the output v0.1.2-1-gfb41223, similar to what is returned by git describe, meaning that I'm 1 commit after the tag v0.1.2 and the 7-digit hash of the commit is fb41223.
It has some limitations: currently it doesn't have an option to show if a repository is dirty and it always shows a 7-digit hash, but there's no need to have git installed, so one can choose the trade-off.
Edit: in case of errors in the command pip install due to the option --pure (the issue is explained here), pick one of the two possible solutions:
Install Dulwich package's dependencies first:
pip install urllib3 certifi && pip install dulwich --global-option="--pure"
Install without the option pure: pip install dulwich. This will install some platform dependent files in your system, but it will improve the package's performance.
If you don't have Git available for some reason, but you have the git repo (.git folder is found), you can fetch the commit hash from .git/fetch/heads/[branch].
For example, I've used a following quick-and-dirty Python snippet run at the repository root to get the commit id:
git_head = '.git\\HEAD'
# Open .git\HEAD file:
with open(git_head, 'r') as git_head_file:
# Contains e.g. ref: ref/heads/master if on "master"
git_head_data = str(git_head_file.read())
# Open the correct file in .git\ref\heads\[branch]
git_head_ref = '.git\\%s' % git_head_data.split(' ')[1].replace('/', '\\').strip()
# Get the commit hash ([:7] used to get "--short")
with open(git_head_ref, 'r') as git_head_ref_file:
commit_id = git_head_ref_file.read().strip()[:7]
If you are like me :
Multiplatform so subprocess may crash one day
Using Python 2.7 so GitPython not available
Don't want to use Numpy just for that
Already using Sentry (old depreciated version : raven)
Then (this will not work on shell because shell doesn't detect current file path, replace BASE_DIR by your current file path) :
import os
import raven
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
print(raven.fetch_git_sha(BASE_DIR))
That's it.
I was looking for another solution because I wanted to migrate to sentry_sdk and leave raven but maybe some of you want to continue using raven for a while.
Here was the discussion that get me into this stackoverflow issue
So using the code of raven without raven is also possible (see discussion) :
from __future__ import absolute_import
import os.path
__all__ = 'fetch_git_sha'
def fetch_git_sha(path, head=None):
"""
>>> fetch_git_sha(os.path.dirname(__file__))
"""
if not head:
head_path = os.path.join(path, '.git', 'HEAD')
with open(head_path, 'r') as fp:
head = fp.read().strip()
if head.startswith('ref: '):
head = head[5:]
revision_file = os.path.join(
path, '.git', *head.split('/')
)
else:
return head
else:
revision_file = os.path.join(path, '.git', 'refs', 'heads', head)
if not os.path.exists(revision_file):
# Check for Raven .git/packed-refs' file since a `git gc` may have run
# https://git-scm.com/book/en/v2/Git-Internals-Maintenance-and-Data-Recovery
packed_file = os.path.join(path, '.git', 'packed-refs')
if os.path.exists(packed_file):
with open(packed_file) as fh:
for line in fh:
line = line.rstrip()
if line and line[:1] not in ('#', '^'):
try:
revision, ref = line.split(' ', 1)
except ValueError:
continue
if ref == head:
return revision
with open(revision_file) as fh:
return fh.read().strip()
I named this file versioning.py and I import "fetch_git_sha" where I need it passing file path as argument.
Hope it will help some of you ;)

Categories