Is runpy safe to use? - python

I am writing some code that will require data from downstream the package hierarchy. Consider the following project structure:
├── app/
│ ├── main.py
│ ├── lib.py
│ └── subpck/
│ └── module.py
And the following code:
# lib.py
some_instance = SomeClass()
# subpck.module
from ..lib import some_instance
some_instance.a = 'abc'
# main.py
from .lib import some_instance
print(some_instance.a)
The key point here is that main.py does not import subpck.module, so the latter's code is not run. However, I have successfully used runpy in main.py to run subpck.module and have gotten the desired results.
My question is this:
If I can somehow figure out how to encapsulate the use of runpy in SomeClass, is it safe to do?
I have been reading the runpy docs, but am nervous if I am missing something. I also haven't heard if this is a big "no-no" for code that may be used in production.
Any help is appreciated.

Related

How to import my own packages in several other Python scripts in a way that's shareable?

I want to use my Python "toolbox" - packages I've written myself - in several other packages (let's call them pkg1 and pkg2 in the example below).
Here is what my work directory looks like:
└── WORK
├── PythonToolBox
│ └── Tool1
│ └── module1.py
├── Use1
│ └── pkg1
│ └── p1_script.py
└── SomeSubdirectory
└── Use2
└── pkg2
└── p2_script.py
Let's say I want to import module1.py, from the package Tool1, into p1_script.py and p2_script.py.
Because of the architecture of my directories shown above, I believe I would need to write the following bit of code in p1_script.py:
from pathlib import Path
import sys
path_root = Path(__file__).parents[2]
sys.path.append(str(path_root))
import PythonToolBox.Tool1.module1 as mod1
and do the same in p2_script.py, except that I'd need to make the following modification to the snippet of code above : Path(__file__).parents[2] --> Path(__file__).parents[3].
That works, but it makes it hard to share work with others : I'd need to copy module1 into pkg1 before sending it to someone else, and remove the snippet of code to write import module1.py as mod1 instead.
Another option would be to copy module1.py into pkg1 and pkg2 at the beginning. However, this doesn't seem like a good option either, as that would create two different versions of module1: I might improve module1 as I'm working on pkg1, and now pkg2 won't benefit from these improvements.
Is there a better solution to use that toolbox than the two I've listed above?

ModuleNotFoundError in Python project

I have the following project structure in Python (the ... means I have n crawler_.py files).
project
├── crawlers
│ ├── __init__.py
│ ├── crawler_1.py
│ ├── crawler_2.py
│ ...
│ ├── crawler_n.py
│ └── useful_functions.py
├── main.py
└── __init__.py
I need to import all crawlers from crawler into main, so I use this.
# main.py
from crawlers import crawler_1
from crawlers import crawler_2
...
from crawlers import crawler_n
But I also need useful_functions.py inside all crawler_.py files, so I use this in each one.
# crawler_.py
import useful_functions
But when I ran main.py I got ModuleNotFoundError: No module named 'useful_functions' when it tried to import crawler_1.
So I tried the following
# crawler_.py
from crawlers import useful_functions
And it works when I run main.py. The problem is that I might want run only one of the crawler_.py directly. Using this last import statement, I get ModuleNotFoundError: No module named 'crawlers'. Not sure how to address this problem, if there's something inside the code I should adjust or if the structure that I'm using is fundamentally wrong (I'm perfectly okay with adjusting the project structure).
You can use this inside the crawler_n.py
if __name__ == '__main__':
import useful_functions
else:
import crawlers.useful_functions as useful_functions
__name__ == '__main__' checks if the module is called or is imported and thus makes the imports accordingly.

Import diagram/structure inside a python folder (clean-up code)

I just finished a middle-sized python (3.6) project and I need to clean it a bit.
I am not a software engineer, so during the development, I was not too accurate structuring the project, so now I have several modules that are no (longer) imported by any other module or modules that are imported by other .py files that are not actually needed.
So for example, I have
Project/
├── __init__.py
├── main.py
├── foo.py
|
├── tools/
│ ├── __init__.py
│ ├── tool1.py
│ └── tool2.py
│ └── tool3.py
|
├── math/
│ ├── __init__.py
│ ├── math1.py
│ └── math2.py
├── graph/
│ ├── __init__.py
│ ├── graph1.py
│ ├── graph2.py
│
and inside
main.py
from math import math1
from tools import tool2
graph1.py
from math import math1
from tools import tool1, tool2
foo.py
from tools import tool3
If I could see in one look that not a module imports graph2 or math2, I could delete them, or at least add them as candidates for deletion (and restructure the project in a better way).
Or I may think to delete tool3 because I know I don't need foo anymore.
Is there an easy way to visualize all the "connections" (which module imports which) in a diagram or some other kind of structured data/visualization manner?
You can use Python to do the work for you:
Place a Python file with the following code into the same directory as your Project directory.
from pathlib import Path
# list all the modules you want to check:
modules = ["tool1", "tool2", "tool3", "math1", "math2", "graph1", "graph2"]
# find all the .py files within your Project directory (also searches subdirectories):
p = Path('./Project')
file_list = list(p.glob('**/*.py'))
# check, which modules are used in each .py file:
for file in file_list:
with open(file, "r") as f:
print('*'*10, file, ':')
file_as_string = f.read()
for module in modules:
if module in file_as_string:
print(module)
Running this will give you an output looking something like this:
********** Project\main.py :
tool1
tool2
graph1
********** Project\foo.py :
tool2
********** Project\math\math1.py :
tool2
math2
If you're in a Unix-like platform (such as macOS), you can find all files containing specific text with grep. So you could search for all files containing ''import math1'' in your Project directory, for example, with grep -rnw '/path/to/Project/' -e 'import math1' , and if there are no results, then you can safely remove the module. All this process can be easily automated with a python or a shell script!
Maybe this project can help you with visualizing your dependency graph. After a quick google search, it looks like you're not the first person to try to do this.

Failed to import python module from different directory

I have this code structure in python3:
- datalake
__init__.py
utils
__init__.py
utils.py
lambdas
__init__.py
my-lambdas.py
- tests
__init__.py
demo.py
All init__.py files are empty.
My problem is how I can import datalake module from tests/demo.py?
I tried from datalake.utils import utils in demo.py but when I run python tests/demo.py from command line, I get this error ModuleNotFoundError: No module named 'datalake'.
If I use this code:
from ..datalake.utils import utils
I will get error ValueError: attempted relative import beyond top-level package.
I also tried to import the module utils from my-lambda.py file which also failed. The code in my-lambda.py is from datalake.utils import utils but I get ModuleNotFoundError: No module named 'datalake' error when run python datalake/lambda/my-lambda.py from command line.
How can I import the module?
When you run a command like python tests/demo.py, the folder you are in does not get added to the PYTHONPATH, the script folder does. So a top-level import like import datalake will fail. To get around this you can run your tests as a module:
Python 2:
python -m tests/demo
Python 3:
python -m tests.demo
and any datalake imports in demo.py will work.
It sounds like what you really want to do is have a folder with tests separate to your main application and run them. For this I recommend py.test, for your case you can read Tests Outside Application Code for how to do it. TL;DR is run your tests from your top level project folder with python -m py.test and it will work.
First of all, my-lambdas.py is not importable with the import statement as hyphens are not valid in Python identifiers. Try to follow PEP-8's naming conventions, such as mylambdas.py.
Otherwise the package structure looks good, and it should be importable as long as you are at the level above datalake/, e.g., if you were in the directory myproject/ below:
myproject
├── datalake
│ ├── __init__.py
│ ├── utils
│ │ ├── __init__.py
│ │ └── utils.py
│ └── lambdas
│ ├── __init__.py
│ └── mylambdas.py
└── tests
├── __init__.py
└── demo.py
Then this should work:
~/myproject$ python -c 'from datalake import utils'
Otherwise, setting the environment variable PYTHONPATH to the path above datalake/ or modifying sys.path are both ways of changing where Python can import from. See the official tutorial on modules for more information.
Also some general advice: I've found it useful to stick with simple modules rather than packages (directories) until there is a need to expand. Then you can change foo.py into a foo/ directory with an __init__.py file and import foo will work as before, although you may need to add some imports to the __init__.py to maintain API compatibility. This would leave you with a simpler structure:
myproject
├── datalake
│ ├── __init__.py
│ ├── utils.py
│ └── lambdas.py
└── tests
├── __init__.py
└── demo.py
You can add the module directory into your sys.path:
import sys
sys.path.append("your/own/modules/folder") # like sys.path.append("../tests")
but this is a one-shot method, which is just valid at this time, the added path is not permanent, it will be eliminated after the code completed execution.
One of the ways to import the file directly instead of using from, like import util
you can try run :
python -m datalake.lambda.my-lambda
follow: https://docs.python.org/3.7/using/cmdline.html#cmdoption-m

Execute module as script breaks imports

I'm trying to build a Python program with the "structure" shown at the end.
The problem is that actions should be able to be executed as scripts as well (I've already included a main in them). When I try to execute DummyAction.py imports keep complaining they can't find misc.
How can I use DummyAction.py as a script and still use the functions in utils.py?
DummyAction.py contains a class called DummyActionClass, and the same for DummyTrigger.py. In utils.py there are several functions that both actions and triggers use and MMD.py contains the main.
/MMD
├── __init__.py
├── MMD.py
├── /actions
│   ├── __init__.py
│   ├── DummyAction.py
├── /misc
│   ├── __init__.py
│   ├── utils.py
└── /triggers
├── DummyTrigger.py
└── __init__.py
The import in DummyAction.py and DummyTrigger.py is:
from misc import utils
And the error is:
File "DDM/actions/DummyAction.py", line 11, in <module>
from misc import utils
ImportError: No module named misc
Seen the updated question, I think the problem is that you should do the import including the root of your dependecies tree: MMD.
So they should all look like:
from MMD.misc import utils
And also you need to call python with the -m option:
python -m MMD.actions.DummyAction
Edit: You said that MMD.py contains the main but it can't be your executable, and that's because is a module (is inside a directory with an __init__.py file). MMD is like your library so you need the executable to be outside and use such library.
You can find [here] some guidelines on how to organize your project.
If you can change you project structure I'll suggest to do it like this:
MMD/
├── runner.py
└── mmd
├── __init__.py
├── main.py
├── /actions
│ ├── __init__.py
│ ├── DummyAction.py
├── /misc
│ ├── __init__.py
│ ├── utils.py
└── /triggers
├── DummyTrigger.py
└── __init__.py
Then in any file inside the mmd directory every import should start with mmd, for example:
from mmd.misc import utils
from mmd.actions import DummyActions
And you put your main code that now is inside MMD.py inside a Main class in main.py, with something like:
# main.py
from mmd.misc import utils
class Main:
def start_session(self):
utils.function()
# etc ...
And then in runner.py you do something like:
# runner.py
from mmd.main import Main
cli = Main()
cli.start_session()
This way inside the MMD directory calling python runner.py you would execute your code, and you can also make executable runner.py so a simply ./runner.py will run your code.
And run your module with:
python -m mmd.actions.DummyAction
I'd do it like this, becasue this way is open to future implementation (and is almost like in the line guides).
If instead you can't, then you can give it a try at removing __init__.py from the MMD directory.
I assume this is a directory structure you're talking about, in which case python doesn't know where to look for utils.py - it tries the local directory, a few places in the path then gives up. It's simple enough to modify the path:
import sys
sys.path.append("/MMD/misc")
import utils
and you should be away.
I've found a "workarround"
try:
#When executing from the main
from misc import utils
except:
#When executing as a standalone script
from MMD.misc import utils
that allows to:
Call the module as a script (while using other modules): python -m MMD.actions.DummyAction
Call the main program and use the module: python MMD/MMD.py
Although I'm not sure if it's strictly correct to use a try - except block with an import, so feel free to add comments or other solutions.
The complete solution would be:
MMD.main
from misc import utils
from actions import DummyAction
class MMD():
def __init__(self):
a = DummyAction.DummyActionClass()
utils.foo()
if __name__ == '__main__':
d = MMD()
Then in actions/DummyAction.py:
try:
#When executing from the main
from misc import utils
except:
#When executing as a standalone script
from MMD.misc import utils
class DummyActionClass():
def __init__(self):
utils.foo()
if __name__ == '__main__':
a = DummyActionClass()
And finally in misc/utils.py:
def foo():
print "Foo was called"

Categories