I would like to implement one central prefect project, where over time it will be possible to add flows independent of each other. The structure of the project is something like this:
prefect/
├── src/
│ ├── flows/
│ │ ├── test_pack1/
│ │ │ ├── common/
│ │ │ │ ├── __init__.py
│ │ │ │ └── test_module.py
│ │ │ ├── .env
│ │ │ ├── __init__.py
│ │ │ ├── requirements.txt
│ │ │ └── test_pack1_flow.py
│ │ ├── test_pack2/
│ │ │ ├── __init__.py
│ │ │ ├── .env
│ │ │ ├── requirements.txt
│ │ │ └── test_pack2_flow.py
│ │ ├── __init__.py
│ │ └── Dockerfile
│ ├── utilities/
│ │ ├── __init__.py
│ │ ├── storage.py
│ │ ├── builder.py
│ │ ├── executor.py
│ │ └── run_config.py
│ ├── .env
│ ├── __init__.py
│ └── main.py
├── .gitignore
├── poetry.lock
└── pyproject.toml
I would like each flow in the flows/ folder to be independent of the central project and created as a separate docker container.
builder.py at startup searches for all flows in flows/ folder, sets a specific configuration and registers them on the server.
But I ran into the problem of importing third-party packages. Let's say in the test_package1/ in requirements.txt there is SQLAlchemy==1.4.34. And in test_pack1/common/test_module.py there is an import sqlalchemy. And test_pack1/test_pack1_flow.py have a #task with function from test_module.py. When the FlowBuilder class looks for a flow variable in the file test_pack1_flow.py it does this using the function flow = extract_flow_from_file(str(flow_module)). At this step, a ModuleNotFoundError error occurs, since there is no such dependency in the prefect central application(in pyproject.toml). But when the docker container is created, after flow.register(), of course it will already be there. How can I handle this step? Or maybe I'm doing something wrong?
I use Docker Storage, Docker Run and Local Executor.
This is a matter of packaging flow code dependencies, and it's all definitely doable. Since this was cross-posted on Prefect Discourse here, I responded in much more detail there.
Here is a short summary:
You can use Prefect Register CLI instead of building custom builder.py functionality looping over flows
You can have a custom utility function setting different storage and run_config based on your environment (dev/stage/prod etc)
To solve the problem of dependencies being in a Docker image but not in your local environment, you can solve it with a custom package defined with setup.py
Related
I have the following folder structure:
...
│
├── src
│ ├── folder_A
│ │ └── file_A.py
│ │ └── __init__.py
│ │
│ ├── folder_B
│ │ └── file_B.py
│ │ └── __init__.py
│ │
│ └── __init__.py
│
│
└── something else
In the file file_A.py I put from folder_B import file_B as fb. But file_A.py works only in debug mode (meaning that the code produces the expected results). If I run file_A.py in the standard way I get the error ModuleNotFoundError: No module named 'folder_B'.
I also changed the configuration before running the code, putting C:\Users\***\***\***\src as the working directory of file_A.py but it still doesn't work.
What can be a solution?
If your current directory is src, then folder_B is in the path because of that. If you want to make a package with sub-packages that can access each other, place everything into a root package:
│
├── src
│ └── root_package
│ ├── folder_A
│ │ ├── file_A.py
│ │ └── __init__.py
│ │
│ ├── folder_B
│ │ ├── file_B.py
│ │ └── __init__.py
│ │
│ └── __init__.py
│
└── something else
Now in file_A, you can do
from ..folder_B import file_B as fb
Since src is not a package, you can't do a relative import through it. By adding root_package, you make it possible to find folder_B in the same package hierarchy (albeit a different branch) as the module doing the import.
I'm unable to add dll files using package_data in setup.py file. Here's a look at directory structure:
my_project
├── key
│ ├── 1_0
│ │ ├── sub_dir
│ │ │ ├── _required.dll
│ │ │ └── __init__.py
│ │ ├── get_key.py
│ │ └── __init__.py
│ ├── 1_1
│ │ ├── sub_dir
│ │ │ ├── _required.dll
│ │ │ └── __init__.py
│ │ ├── get_key.py
│ │ └── __init__.py
│ ├── 1_2
│ │ ├── sub_dir
│ │ │ ├── _required.dll
│ │ │ └── __init__.py
│ │ ├── get_key.py
│ │ └── __init__.py
│ └── __init__.py
├── my_program.py
└── __init__.py
I've been trying without success to add the required.dll files with the installation of this module. I know I have to add it in the setup.py file. What I've tried so far (I'll skip all unnecessary parameters) :
First:
setuptools.setup(name='my_project',
packages=setuptools.find_packages(),
include_package_data=True,
package_data={'': ['my_project\\key\\1_0\\sub_dir\\_required.dll',
'my_project\\key\\1_1\\sub_dir\\_required.dll',
'my_project\\key\\1_2\\sub_dir\\_required.dll']},
...)
Second:
setuptools.setup(name='my_project',
packages=setuptools.find_packages(),
include_package_data=True,
package_data={'key': ['1_0\\sub_dir\\_required.dll',
'1_1\\sub_dir\\_required.dll',
'1_2\\sub_dir\\_required.dll']},
...)
Third:
setuptools.setup(name='my_project',
packages=setuptools.find_packages(),
include_package_data=True,
package_data={'my_project\\key\\1_0\\sub_dir': ['_required.dll'],
'my_project\\key\\1_1\\sub_dir': ['_required.dll'],
'my_project\\key\\1_2\\sub_dir': ['_required.dll']}
...)
Whenever I call python setup.py sdist --format=zip, dll files are never included. BTW, I'd rather not change directory structure, unless there is no other option.
What am I missing here?
Regards.
François
Im writing a library for internal use,its called "etllib", and I have the following structure:
etl-lib
├── README.md
├── etllib
│ ├── __init__.py
│ ├── client
│ │ ├── __init__.py
│ │ ├── elastic.py
│ │ └── qradar.py
│ ├── etl
│ │ ├── __init__.py
│ │ └── etl_imperva.py
│ └── util
│ ├── __init__.py
│ ├── config.py
│ ├── daemon.py
│ ├── elastic
│ │ ├── __init__.py
│ │ └── impeva_index_config.py
│ └── imperva
│ ├── __init__.py
│ ├── kpe_config.py
│ └── query_config.py
├── scripts
│ └── etl_imperva
└── setup.py
And I have a script called "etl_imperva" in etllib/scripts. The code inside looks like this :
#!/usr/bin/python3
import sys
from etllib.etl.etl_imperva import ETL
# Run with python3 imperva_run.py start|run|stop|restart
ETL.startup(sys.argv)
If I install this package(etllib) and call this script, it works just fine. But when I need to test stuff, how can I tell python to use the modules that are on my working directory instead the ones are installed? Because each time I make a change on the modules, I need to reinstall the package and this is a little time consuming.
I also tried uninstalling the package for testing, but whe I run this script I get the following error :
Exception has occurred: ModuleNotFoundError
No module named 'etllib'
File "/home/jleonse/etl-lib/scripts/run_imperva", line 3, in <module>
from etllib.etl.etl_imperva import ETL
Is there a better way to do this?
Actually, it is not on the same level in the hierarchy.
from etllib.etl.etl_imperva import ETL
would work only if etllib was in the same directory or in a directory in your system PATH, but the etllib is in the parent directory, hence it can not find it.
so you can make it work if you change the project structure to be:
etl-lib
├── README.md
├── etllib
│ ├── __init__.py
│ ├── client
│ │ ├── __init__.py
│ │ ├── elastic.py
│ │ └── qradar.py
│ ├── etl
│ │ ├── __init__.py
│ │ └── etl_imperva.py
│ └── util
│ ├── __init__.py
│ ├── config.py
│ ├── daemon.py
│ ├── elastic
│ │ ├── __init__.py
│ │ └── impeva_index_config.py
│ └── imperva
│ ├── __init__.py
│ ├── kpe_config.py
│ └── query_config.py
├── etl_imperva
│
└── setup.py
I want to run my stand-alone script csvImp.py, which interacts with the database used by my Django site BilliClub. I'm running the script from the project root (~/BilliClub) on my virtual environment django2.
I've followed the instructions here but for DJANGO_SETTINGS_MODULE rather than the secret key. The part that trips me up is what value to assign to the environmental variable. Every iteration I've tried has yielded an error like ModuleNotFoundError: No module named 'BilliClub' after running
(django2) 04:02 ~/BilliClub $ python ./pigs/csvImp.py.
I am reloading the shell every time I try to change the variable in .env so the postactivate script runs each time and I'm making sure to re-enter my virtualenv. The problem just seems to be my inability to figure out how to path to the settings module.
The .env file:
# /home/username/BilliClub/.env #
DJANGO_SETTINGS_MODULE="[what goes here???]"
Full path of my settings.py is /home/username/BilliClub/BilliClub/settings.py.
Abridged results from running tree:
(django2) 04:33 ~ $ tree
.
├── BilliClub
│ ├── BilliClub
│ │ ├── __init__.py
│ │ ├── settings.py
│ │ ├── urls.py
│ │ └── wsgi.py
│ ├── manage.py
│ ├── media
│ ├── pigs
│ │ ├── __init__.py
│ │ ├── admin.py
│ │ ├── apps.py
│ │ ├── bc2019.csv
│ │ ├── csvImp.py
│ │ ├── models.py
│ │ ├── models.pyc
│ │ ├── tests.py
│ │ ├── urls.py
│ │ └── views.py
│ └── ...
It looks like you should make csvImp a custom management command and then
DJANGO_SETTINGS_MODULE is "BilliClub.settings" When you write your utility function as Django Management Command you get all the Django configuration for free, and the root directory of your command is the same as the root directory of the web app, and its the directory where manage.py is.
Take a look at https://docs.djangoproject.com/en/3.1/howto/custom-management-commands/
I have the following directory structure in my home projects folder.
|ALL-IN-ONE
|demo
|__init__.py
|__main__.py
|models
|grpc
|allinone_server.py
And I want to import from allinone_server.py a function defined in main.py called images_demo. I have tried
from demo.__main__ import images_demo
It is not working. How can I import it? The function I am trying to import is located inside main.py which is inside demo directory. I am trying to import it from the file allinone_server.py in grpc. I guess I have made my question clear now.
Here is the whole tree for the project
├── demo
│ ├── __init__.py
│ ├── __main__.py
│ └── __pycache__
│ ├── __init__.cpython-36.pyc
│ └── main.cpython-36.pyc
├── description
├── environment.yml
├── HEAD
├── hooks
│ ├── applypatch-msg.sample
│ ├── commit-msg.sample
│ ├── fsmonitor-watchman.sample
│ ├── post-update.sample
│ ├── pre-applypatch.sample
│ ├── pre-commit.sample
│ ├── prepare-commit-msg.sample
│ ├── pre-push.sample
│ ├── pre-rebase.sample
│ ├── pre-receive.sample
│ └── update.sample
├── imgs
│ └── 44.jpg
├── info
│ └── exclude
├── __init__.py
├── loggers
│ ├── __init__.py
│ └── __pycache__
│ └── __init__.cpython-36.pyc
├── models
│ ├── adience_large1.h5
│ ├── adience_small1.h5
│ ├── AgeModel.json
│ ├── detection_age_gender_large1.h5
│ ├── detection_age_gender_small1.h5
│ ├── detection_age_gender_smile_large1.h5
│ ├── detection_age_gender_smile_small1.h5
│ ├── detection_age_large1.h5
│ ├── detection_age_small1.h5
│ ├── detection_large1.h5
│ ├── detection_small1.h5
│ ├── grpc
│ │ ├── adele_2016.jpg
│ │ ├── allinone_client.py
│ │ ├── all_in_one_pb2_grpc.py
│ │ ├── all_in_one_pb2.py
│ │ ├── all_in_one.proto
│ │ ├── allinone_server.py
│ │ ├── benedict_cumberbatch_2014.png
│ │ ├── cat.png
│ │ ├── classroom_in_tanzania.jpg
│ │ ├── decoded1.py
│ │ ├── decoded.py
│ │ ├── elon_musk_2015.jpg
│ │ ├── laos.jpg
│ │ ├── model_face.jpg
│ │ ├── __pycache__
│ │ │ ├── all_in_one_pb2.cpython-36.pyc
│ │ │ ├── all_in_one_pb2_grpc.cpython-36.pyc
│ │ │ └── decoded.cpython-36.pyc
│ │ ├── sophia.jpg
│ │ ├── test
│ │ │ ├── __init__.py
│ │ │ ├── __pycache__
│ │ │ │ └── __init__.cpython-36.pyc
│ │ │ └── test_images
│ │ │ ├── adele_2016.jpg
│ │ │ ├── benedict_cumberbatch_2014.png
│ │ │ ├── classroom_in_tanzania.jpg
│ │ │ ├── elon_musk_2015.jpg
│ │ │ ├── __init__.py
│ │ │ ├── laos.jpg
│ │ │ ├── model_face.jpg
│ │ │ ├── sophia.jpg
│ │ │ ├── waaah.jpg
│ │ │ ├── woman.jpg
│ │ │ └── zebra_stripes.jpg
│ │ ├── waaah.jpg
│ │ ├── woman.jpg
│ │ └── zebra_stripes.jpg
So you've referred to main.py, but you also have __main__.py in your directory structure. I'll assume that your directory actually contains main.py instead of __main__.py.
To import from levels up in a package, start your import with a period.
To import just one function you would use from .main import images_demo
Now, let's start by saying main.py is in grpc/ along with allinone_server.py, then we'll move it to different directories and see how the import changes.
If it were in grpc/ from .main import images_demo
If it were in models/ from ..main import images_demo
If it were in __ALL-IN-ONE/ from ...main import images_demo
If it were in __demo/ from ...__demo.main import images_demo
Every extra period brings you up one level in the hierarchy, then you use the name of the next level down in the target path until you reach where you want to be.
Now let's suppose you wanted to import the whole of main.py.
If it were in grpc/ from . import main
If it were in models/ from .. import main
If it were in __ALL-IN-One/ from ... import main
If it were in __demo/ from ...__demo import main
Finally, the dot notation to move up a level only works if the file that uses it is in a package, so this will work fine if at the top level you start your program in a scope outside of this package then use from __ALL-IN-ONE.models.grpc import allinone_server
However, if you run allinone_server.py directly then it will fail to import anything above it as it isn't being imported as part of a package. Try that out, and let me know if that needs better explanation.
Good luck!
You can't import a function from another folder directly and for that you have to use this:
import sys
sys.path.insert(0, "../../demo/")
Another step is to rename __main__ to main.
here is the exact example that worked for me:
The tree:
.
├── demo
│ ├── __init__.py
│ ├── main.py
│
└── models
└── grpc
└── allinone_server.py
main.py:
def images_demo():
print("hello there")
The calling file(allinone_server.py):
import sys
sys.path.insert(0, "../../demo/")
import main
main.images_demo()