I just finished a middle-sized python (3.6) project and I need to clean it a bit.
I am not a software engineer, so during the development, I was not too accurate structuring the project, so now I have several modules that are no (longer) imported by any other module or modules that are imported by other .py files that are not actually needed.
So for example, I have
Project/
├── __init__.py
├── main.py
├── foo.py
|
├── tools/
│ ├── __init__.py
│ ├── tool1.py
│ └── tool2.py
│ └── tool3.py
|
├── math/
│ ├── __init__.py
│ ├── math1.py
│ └── math2.py
├── graph/
│ ├── __init__.py
│ ├── graph1.py
│ ├── graph2.py
│
and inside
main.py
from math import math1
from tools import tool2
graph1.py
from math import math1
from tools import tool1, tool2
foo.py
from tools import tool3
If I could see in one look that not a module imports graph2 or math2, I could delete them, or at least add them as candidates for deletion (and restructure the project in a better way).
Or I may think to delete tool3 because I know I don't need foo anymore.
Is there an easy way to visualize all the "connections" (which module imports which) in a diagram or some other kind of structured data/visualization manner?
You can use Python to do the work for you:
Place a Python file with the following code into the same directory as your Project directory.
from pathlib import Path
# list all the modules you want to check:
modules = ["tool1", "tool2", "tool3", "math1", "math2", "graph1", "graph2"]
# find all the .py files within your Project directory (also searches subdirectories):
p = Path('./Project')
file_list = list(p.glob('**/*.py'))
# check, which modules are used in each .py file:
for file in file_list:
with open(file, "r") as f:
print('*'*10, file, ':')
file_as_string = f.read()
for module in modules:
if module in file_as_string:
print(module)
Running this will give you an output looking something like this:
********** Project\main.py :
tool1
tool2
graph1
********** Project\foo.py :
tool2
********** Project\math\math1.py :
tool2
math2
If you're in a Unix-like platform (such as macOS), you can find all files containing specific text with grep. So you could search for all files containing ''import math1'' in your Project directory, for example, with grep -rnw '/path/to/Project/' -e 'import math1' , and if there are no results, then you can safely remove the module. All this process can be easily automated with a python or a shell script!
Maybe this project can help you with visualizing your dependency graph. After a quick google search, it looks like you're not the first person to try to do this.
Related
I want to use my Python "toolbox" - packages I've written myself - in several other packages (let's call them pkg1 and pkg2 in the example below).
Here is what my work directory looks like:
└── WORK
├── PythonToolBox
│ └── Tool1
│ └── module1.py
├── Use1
│ └── pkg1
│ └── p1_script.py
└── SomeSubdirectory
└── Use2
└── pkg2
└── p2_script.py
Let's say I want to import module1.py, from the package Tool1, into p1_script.py and p2_script.py.
Because of the architecture of my directories shown above, I believe I would need to write the following bit of code in p1_script.py:
from pathlib import Path
import sys
path_root = Path(__file__).parents[2]
sys.path.append(str(path_root))
import PythonToolBox.Tool1.module1 as mod1
and do the same in p2_script.py, except that I'd need to make the following modification to the snippet of code above : Path(__file__).parents[2] --> Path(__file__).parents[3].
That works, but it makes it hard to share work with others : I'd need to copy module1 into pkg1 before sending it to someone else, and remove the snippet of code to write import module1.py as mod1 instead.
Another option would be to copy module1.py into pkg1 and pkg2 at the beginning. However, this doesn't seem like a good option either, as that would create two different versions of module1: I might improve module1 as I'm working on pkg1, and now pkg2 won't benefit from these improvements.
Is there a better solution to use that toolbox than the two I've listed above?
I am writing some code that will require data from downstream the package hierarchy. Consider the following project structure:
├── app/
│ ├── main.py
│ ├── lib.py
│ └── subpck/
│ └── module.py
And the following code:
# lib.py
some_instance = SomeClass()
# subpck.module
from ..lib import some_instance
some_instance.a = 'abc'
# main.py
from .lib import some_instance
print(some_instance.a)
The key point here is that main.py does not import subpck.module, so the latter's code is not run. However, I have successfully used runpy in main.py to run subpck.module and have gotten the desired results.
My question is this:
If I can somehow figure out how to encapsulate the use of runpy in SomeClass, is it safe to do?
I have been reading the runpy docs, but am nervous if I am missing something. I also haven't heard if this is a big "no-no" for code that may be used in production.
Any help is appreciated.
I have the following folder structure:
PROJECT_DIR
| --helpers
| |--utils.py
| --stuff
| |--script.py
I need to run script.py as a script, and from it, I need to use a function from helpers/utils.py.
I tried relative importing from ..helpers.utils import func, but it says
ImportError: attempted relative import with no known parent package
so I added an empty init.py file to each folder, including PROJECT_DIR.
Then I read that while running as a script, the python compiler runs the script as if it was the main module, so it doesn't see any other modules outside so relative import cannot be used.
But what should I do if I need to use that function? It's a fairly simple use case, I can't get my head around why it's so hard to import a function from a file outside the current directory. Tho I'm not really interested in the whys, I'd just like to know a solution how people do this.
root_project
└── proj
├── __init__.py
├── helpers
│ ├── __init__.py
│ └── utils.py
└── stuff
├── __init__.py
└── script.py
With this structure just cd to root_project and use this command:
python -m proj.stuff.script
I have this code structure in python3:
- datalake
__init__.py
utils
__init__.py
utils.py
lambdas
__init__.py
my-lambdas.py
- tests
__init__.py
demo.py
All init__.py files are empty.
My problem is how I can import datalake module from tests/demo.py?
I tried from datalake.utils import utils in demo.py but when I run python tests/demo.py from command line, I get this error ModuleNotFoundError: No module named 'datalake'.
If I use this code:
from ..datalake.utils import utils
I will get error ValueError: attempted relative import beyond top-level package.
I also tried to import the module utils from my-lambda.py file which also failed. The code in my-lambda.py is from datalake.utils import utils but I get ModuleNotFoundError: No module named 'datalake' error when run python datalake/lambda/my-lambda.py from command line.
How can I import the module?
When you run a command like python tests/demo.py, the folder you are in does not get added to the PYTHONPATH, the script folder does. So a top-level import like import datalake will fail. To get around this you can run your tests as a module:
Python 2:
python -m tests/demo
Python 3:
python -m tests.demo
and any datalake imports in demo.py will work.
It sounds like what you really want to do is have a folder with tests separate to your main application and run them. For this I recommend py.test, for your case you can read Tests Outside Application Code for how to do it. TL;DR is run your tests from your top level project folder with python -m py.test and it will work.
First of all, my-lambdas.py is not importable with the import statement as hyphens are not valid in Python identifiers. Try to follow PEP-8's naming conventions, such as mylambdas.py.
Otherwise the package structure looks good, and it should be importable as long as you are at the level above datalake/, e.g., if you were in the directory myproject/ below:
myproject
├── datalake
│ ├── __init__.py
│ ├── utils
│ │ ├── __init__.py
│ │ └── utils.py
│ └── lambdas
│ ├── __init__.py
│ └── mylambdas.py
└── tests
├── __init__.py
└── demo.py
Then this should work:
~/myproject$ python -c 'from datalake import utils'
Otherwise, setting the environment variable PYTHONPATH to the path above datalake/ or modifying sys.path are both ways of changing where Python can import from. See the official tutorial on modules for more information.
Also some general advice: I've found it useful to stick with simple modules rather than packages (directories) until there is a need to expand. Then you can change foo.py into a foo/ directory with an __init__.py file and import foo will work as before, although you may need to add some imports to the __init__.py to maintain API compatibility. This would leave you with a simpler structure:
myproject
├── datalake
│ ├── __init__.py
│ ├── utils.py
│ └── lambdas.py
└── tests
├── __init__.py
└── demo.py
You can add the module directory into your sys.path:
import sys
sys.path.append("your/own/modules/folder") # like sys.path.append("../tests")
but this is a one-shot method, which is just valid at this time, the added path is not permanent, it will be eliminated after the code completed execution.
One of the ways to import the file directly instead of using from, like import util
you can try run :
python -m datalake.lambda.my-lambda
follow: https://docs.python.org/3.7/using/cmdline.html#cmdoption-m
Update: I have changed my file directory
I have a directory structure as follows and I would like to import a module in a parent directory.
**project**/
__init__.py
main.py
**APP_NAME**/
**parser**/
__init__.py
parser.py
**test**/
__init__.py
parser_test.py
parser.py
class Parser(object):
pass
main.py (Works fine)
from APP_NAME.parser.parser import Parser
parser_test.py (Throws error)
from ..APP_NAME.parser.parser import Parser
Throws the following error at parser_test.py
Parent module '' not loaded, cannot perform relative import
I know I can fix it using sys.path.append(), but I want to import it like a package the way I did it in main.py.
Any help is appreciated. Thanks.
I had to check back at one of my projects for a reference.
To test files in the tests folder you must first create setup.py, so that you can install you project for python to use it.
If on linux use the command, sudo python setup.py install to install the package. When changes have been made to the project, you must install again for the changes to take place.
These folder will be created in your root project directory after installing.
build, dist, and project.egg-info.
You may need to clean the build directory before re-installing to update.
python setup.py clean
python setup.py build
python setup.py install
Project Structure
project
├── setup.py
├── tests
│ └── parser_test.py
│
└── project
├── __init__.py
├── __init__.pyc
├── main.py
└── parser
├── __init__.py
├── __init__.pyc
├── parser.py
└── parser.pyc
project/setup.py
from setuptools import setup
# Make sure the project name will not conflict with other libraries
# For example do not name the project, 'os', 'sys', ect.
setup(
name='project',
description='My project description',
author='your_online_name',
license='MIT', # Check out software licenses
packages=['project', 'tests']
)
project/tests/parser_test.py
from project.parser import Parser
parser = Parser()
project/project/__init__.py
from . import parser
project/project/parser/__init__.py
from .parser import Parser
project/project/parser/parser.py
class Parser(object):
pass
You shouldn't be using absolute import within your package. In-package imports should be done with relative imports this way:
parser_test.py
from ..parser.parser import Parser
With relative imports in Python, the first point refers to the file's directory and each extra point refers to the parent directory.
In this case, you would be pointing to the project/parser/parser.py file which from test_parser.py standpoint's is ../parser.py
If you are using Python 2, you should add the following line at the top of all the files in your parser package
from __future__ import absolute_import
This will avoid that you use absolute imports inside you package files by mistake.
Still assuming you are working with Python 2, you should also import unicode_literals for native unicode support and print_function to replace the print command by the print() function.
However, I would rather have my tests in the top folder of the package, which, assuming the package is called project and not parser, would give the following directory structure:
project/ # top project directory
├── main.py
└── project # top package directory
├── __init__.py # this file is required even if it is empty
├── parser
│ ├── __init__.py
│ └── parser.py
└── tests
└── test_parser.py
Also, the project/project/parser/__init__.py could contain the following:
from .parser import Parser
So that your main.py file could import the Parser class like this:
from project.parser import Parser
instead of the more tedious:
from project.parser.parser import Parser
Your test_parser.py file, however, will still have to import the Parser class like this:
from ..parser.parser import Parser
because the classes exposed in an __init__.py file are not available to relative imports.
Finally, if you are starting a new independent project, you should do it in Python 3 (that's a PEP recommendation), where all the above rules apply, except the from __future__ imports which are unnecessary.
Sources: https://axialcorps.wordpress.com/2013/08/29/5-simple-rules-for-building-great-python-packages/