How to run Python to reckognize module hierarchy for Airflow DAGs?

How to run Python to reckognize module hierarchy for Airflow DAGs? - python

A have several DAGs of similar structure and I wanted to use advice described in Airflow docs as Modules Management:
This is an example structure that you might have in your dags folder:
<DIRECTORY ON PYTHONPATH>
| .airflowignore -- only needed in ``dags`` folder, see below
| -- my_company
| __init__.py
| common_package
| | __init__.py
| | common_module.py
| | subpackage
| | __init__.py
| | subpackaged_util_module.py
|
| my_custom_dags
| __init__.py
| my_dag1.py
| my_dag2.py
| base_dag.py
In the case above, these are the ways you could import the python
files:
from my_company.common_package.common_module import SomeClass
from my_company.common_package.subpackage.subpackaged_util_module import AnotherClass
from my_company.my_custom_dags.base_dag import BaseDag
That works fine in Airflow.
However I used to validate my DAGs locally by running (also as advised by a piece of documentation - DAG Loader Test):
python my_company/my_custom_dags/my_dag1.py
When using the imports, it complains:
Traceback (most recent call last):
File "/[...]/my_company/my_custom_dags/my_dag1.py", line 1, in <module>
from my_company.common_package.common_module import SomeClass
ModuleNotFoundError: No module named 'my_company'
How should I run it so that it understands the context and reckognizes the package?

It works when run this way:
PYTHONPATH=. python my_company/my_custom_dags/my_dag1.py
It seems that when entry point is my_dag1.py that's inside my_company/my_custom_dags Python considers it as its "working directory" and only looks for modules within that path. When I add . to PYTHONPATH it can also look at entire directory structure and reckognize module my_company and below.
(I'm not expert in Python, so above explanation might be somewhat innacurrate, but this is my understanding. I'm also not sure if that's indeed the cleanest way to make it work)

Related

Importing File Into main from Parent Directory

Forgive me for another "relative imports" post but I've read every stack overflow post about them and I'm still confused.
I have a file structure that looks something like this:
|---MyProject
| |---constants.py
| |---MyScripts
| | |---script.py
Inside of script.py I have from .. import constants.py
And of course when I run python3 script.py, I get the error ImportError: attempted relative import with no known parent package.
From what I understand, this doesn't work because when you run a python file directly you can't use relative imports. After hours of searching it seems that the most popular solution is to add "../" to sys.path and then do import constants.py but this feels a little hacky. I've got to believe that there is a better solution than this right? Importing constants has got to be a fairly common task.
I've also seen some people recommend adding __init__.py to make the project into a package but I don't really understand how this solves anything?
Are there other solutions out there that I've either missed or just don't understand?

Your library code should live in a package. For your folder to be a package, it needs to have __init__.py.
Your scripts can live in the top directory in development. Here is how you would structure this layout:
|---MyProject
| |---myproject
| | |---__init__.py
| | |---constants.py
| |---script.py
If you were to productionize this, you could place scripts into a special folder, but then they would rely on the package being installed on your system. The layout could look like this:
|---MyProject
| |---myproject
| | |---__init__.py
| | |---constants.py
| |---scripts
| | |---script.py
In both cases your imports would be normal absolute (non-relative) imports like
from myproject.constants import ...
OR
from myproject import constants
You are correct that attempting a relative import or a path modification in a standalone script is hacky. If you want to use the more flexible second layout, make a setup.py install script in the root folder, and run python setup.py develop.

|---MyProject
| |---__init__.py # File1
| |---constants.py
| |---MyScripts
| | |---__init__.py # File 2
| | |---script.py
And then the content of File1 __init__.py is:
from . import constants
from . import MyScripts
And then the content of File2 __init__.py is:
from . import script
By converting MyProject to a package, you would be able to import constants like:
from MyProject import constants
# Rest of your code
After that, you need to add the root of your MyProject to your python path. If you are using PyCharm, you do not need to do anything. By default the following line is added to your Python Console or Debugging sessions:
import sys
sys.path.extend(['path/to/MyProject'])
If you are not using PyCharm, one is is to add the above script before codes you run or for debugging, add that as a code to run before every session, or you define a setup.py for your MyProject package, install it in your environment and nothing else need to be changed.

The syntax from .. import x is only intended to be used inside packages (thus your error message). If you don't want to make a package, the other way to accomplish what you want is to manipulate the path.
In absolute terms, you can do:
import sys
sys.path.append(r'c:\here\is\MyProject')
import constants
where "MyProject" is the folder you described.
If you want to use relative paths, you need to do a little more work:
import inspect
import os
import sys
parent_dir = os.path.split(
os.path.dirname(inspect.getfile(inspect.currentframe())))[0]
sys.path.append(parent_dir)
import constants

To put it concisely, the error message is telling you exactly what's happening; t's just not explaining why. .. isn't a package. It resolves to MyProject but MyProject isn't a package. To make it a package, you must include an “__init__.py” file directly in the “MyProject” directory. It can even be an empty file if you're feeling lazy.
But given what you've shown us, it doesn't seem like your project is supposed to be a package. Try changing the import in script.py to import constants and running python3 MyScripts/script.py from your project directory.

Python import module from within module in another subpackage

I am having trouble importing a module from within another module. I understand that that sentence can be confusing so my question and situation is exactly like the one suggested here: Python relative-import script two levels up
So lets say my directory structure is like so:
main_package
|
| __init__.py
| folder_1
| | __init__.py
| | folder_2
| | | __init__.py
| | | script_a.py
| | | script_b.py
|
| folder_3
| | __init__.py
| | script_c.py
And I want to access code in script_b.py as well as code from script_c.py from script_a.py.
I have also followed exactly what the answer suggested with absolute imports.
I included the following lines of code in script_a.py:
from main_package.folder_3 import script_c
from main_package.folder1.folder2 import script_b
When I run script_a.py, I get the following error:
ModuleNotFoundError: No module named 'main_package'
What am I doing wrong here?

This is because python doesn't know where to find main_package in script_a.py.
There are a couple of ways to expose main_package to python:
run script_a.py from main_package's parent directory (say packages). Python will look for it in the current directory (packages), which contains main_package:
python main_package/folder_1/folder_2/script_a.py
add main_package's parent directory (packages) to your PYTHONPATH:
export PYTHONPATH="$PYTHONPATH:/path/to/packages"; python script_a.py
add main_package's parent directory (packages) to sys.path in script_a.py
In your script_a.py, add the following at the top:
import sys
sys.path.append('/path/to/packages')

How to structure azure functions python project

I'm messing around with Azure Functions with Python and running into issues with getting a proper project directory structure. My goal is to have a library directory that I can put all the business logic in and then reference that from the functions entry point and also have a test directory that can test the functions and the library code directly. I have the following structure currently:
__app__
| - MyFirstFunction
| | - __init__.py
| | - function.json
| - library
| | - __init__.py
| | - services
| | | - __init__.py
| | | - sample_service.py
| - host.json
| - requirements.txt
| - __init__.py
| - tests
| | - __init__.py
| | - test_first_function.py
I ended up with this structure and am able to reference sample_service from the azure function using from __app__.library.services import sample_service and things seem to work, but I am unable to get the unit test to be able to properly reference the azure function. I have tried from ..HttpTrigger import main and from __app__.HttpTrigger import main in the test, but visual studio code is unable to discover the test when either of those imports statements are in place. I get the following errors respectively when I execute the test_first_function.py file directly: ImportError: attempted relative import with no known parent package and ModuleNotFoundError: No module named '__app__'
I'm far from an expert when it comes to Python modules and packages so any insight in how to fix these issues would be helpful. It would be good to get other ideas on how anyone else structures their Python azure functions projects

If you move your tests directory one level up, outside the __app__ directory, you should be able to, then, run the tests and import using from __app__.MyFirstFunction import myfunction
There is a discussion going on for improving the testing experience and the proper guidance (discussion) (work-item).
Meanwhile, the above suggestion should work fine. You could use this project (azure-functions branch) as a reference (linked by Brett Canon in the discussion tagged above).

How to import modules from adjacent package without setting PYTHONPATH

I have a python 2.7 project which I have structured as below:
project
|
|____src
| |
| |__pkg
| |
| |__ __init__.py
|
|____test
|
|__test_pkg
| |
| |__ __init__.py
|
|__helpers
| |
| |__ __init__.py
|
|__ __init__.py
I am setting the src folder to the PYTHONPATH, so importing works nicely in the packages inside src. I am using eclipse, pylint inside eclipse and nosetests in eclipse as well as via bash and in a make file (for project). So I have to satisfy lets say every stakeholder!
The problem is importing some code from the helpers package in test. Weirdly enough, my test is also a python package with __init__.py containing some top level setUp and tearDown method for all tests. So when I try this:
import helpers
from helpers.blaaa import Blaaa
in some module inside test_pkg, all my stakeholders are not satisfied. I get the ImportError: No module named ... and pylint also complains about not finding it. I can live with pylint complaining in test folders but nosetests is also dying if I run it in the project directory and test directory. I would prefer not to do relative imports with dot (.).

The problem is that you can not escape the current directory by importing from ..helpers.
But if you start your test code inside the test directory with
python3 -m test_pkg.foo
the current directory will be the test directory and importing helpers will work. On the minus side that means you have to import from . inside test_pkg.

How to use relative imports in both, module and main

I have the following setup of a library in python:
library
| - __init__.py
| - lib1.py
| - ...
| - tools
| - __init__.py
| - testlib1.py
so in other words, the directory library contain python modules and the directory tools, which is a subdirectory of library contains e.g. one file testlib1.pyto test the library lib1.py.
testlib1.py therefore need to import lib1.py from the directory above to do some tests etc., just by calling python testlib1.py from somewhere on the computer, assuming the file is in the search path.
In addition, I only want ONE PYTHONPATH to be specified.
But we all know the following idea for testlib1.py does not work because the relative import does not work:
from .. import lib1
...
do something with lib1
I accept two kind of answers:
An answer which describes how still to be possible to call testlib1.py directly as the executing python script.
An answer that explains a better conceptual setup of of the modules etc, with the premise that everything has to be in the directory project and the tools has to be in a different directory than the actual libraries.
If something is not clear I suggest to ask and I will update the question.

Try adding a __init__.py to the tools directory. The relative import should work.

You can't. If you plan to use relative imports then you can't run the module by itself.
Either drop the relative imports or drop the idea of running testlib1.py directly.
Also I believe a test file should never use relative imports. Test should check whether the library works, and thus the code should be as similar as possible to the code that the users would actually use. And usually users do not add files to the library to use relative imports but use absolute imports.
By the way, I think your file organization is too much "java-like": mixing source code and tests. If you want to do tests then why not have something like:
project/
|
+-- src/
| |
| +--library/
| | |
| | +- lib1.py
| | |
| | #...
| +--library2/ #etc.
|
+-- tests/
|
+--testlibrary/
| |
| +- testlib1.py
#etc
To run the tests just use a tool like nosetests which automatically looks for this kind of folder structure and provide tons of options to change the search/test settings.

Actually, I found a solution, which does
works with the current implementation
does not require any changes in PYTHONPATH
without the need to hardcode the top-lebel directory
In testlib1.py the following code will do (has been tested):
import os
import sys
dirparts = os.path.dirname(os.path.abspath(__file__)).split('/')
sys.path.append('/'.join(dirparts[:-1]))
import mylib1
Not exactly sure this is a very clean or straightforward solution, but it allows to import any module from the directory one level up, which is what I need (as the test code or extra code or whetever is always located one level below the actual module files).

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to run Python to reckognize module hierarchy for Airflow DAGs? - python

Related

Importing File Into main from Parent Directory

Python import module from within module in another subpackage

How to structure azure functions python project

How to import modules from adjacent package without setting PYTHONPATH

How to use relative imports in both, module and main

Categories

Resources

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to run Python to reckognize module hierarchy for Airflow DAGs? - python

Related

Importing File Into __main__ from Parent Directory

Python import module from within module in another subpackage

How to structure azure functions python project

How to import modules from adjacent package without setting PYTHONPATH

How to use relative imports in both, module and main

Categories

Resources

Importing File Into main from Parent Directory