I need to run the DAG with the repository folder name, and I need to call the other modules from another directory from another path repository deployed.
So, I have a cloudbuild.yaml that will deploy the script into DAG folder and Plugins folder, but I still didn't know, how to get the other modules from the other path on Cloud Composer Bucket Storage.
This is my Bucket Storage path
cloud-composer-bucket/
dags/
github_my_repository_deployed-testing/
test_dag.py
plugins/
github_my_repository_deployed-testing/
planning/
modules_1.py
I need to call modules_1.py from my test_dag.py, I used this command to call the module
from planning.modules_1 import get_data
But from this method, I got an error shown like this
Broken DAG: [/home/airflow/gcs/dags/github_my_repository_deployed-testing/test_dag.py] Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/home/airflow/gcs/dags/github_my_repository_deployed-testing/test_dag.py", line 7, in <module>
from planning.modules_1 import get_date
ModuleNotFoundError: No module named 'planning'
This is my cloudbuild.yaml
steps:
- id: 'Push into Composer DAG'
name: 'google/cloud-sdk'
entrypoint: 'sh'
args: [ '-c', 'gsutil -m rsync -d -r ./dags ${_COMPOSER_BUCKET}/dags/$REPO_NAME']
- id: 'Push into Composer Plugins'
name: 'google/cloud-sdk'
entrypoint: 'sh'
args: [ '-c', 'gsutil -m rsync -d -r ./plugins ${_COMPOSER_BUCKET}/plugins/$REPO_NAME']
- id: 'Code Scanning'
name: 'python:3.7-slim'
entrypoint: 'sh'
args: [ '-c', 'pip install bandit && bandit --exit-zero -r ./']
substitutions:
_CONTAINER_VERSION: v0.0.1
_COMPOSER_BUCKET: gs://asia-southeast1-testing-cloud-composer-025c0511-bucket
My question is, what is the best and how to call the other modules into DAG?
You can put every modules in the Cloud Composer DAG folder, example :
cloud-composer-bucket/
dags/
github_my_repository_deployed-testing/
test_dag.py
planning/
modules_1.py
setup.py
On the DAG Python code, you can import your module with the following way :
from planning.modules_1 import get_data
As I remembered, the setup.py is created by Cloud Composer in the DAG root folder, if it's not the case, you can copy the setup.py in the DAG folder :
bucket/dags/setup.py
Example of setup.py file :
from setuptools import find_packages, setup
setup(
name="composer_env_python_lib",
version="0.0.1",
install_requires=[],
data_files=[],
packages=find_packages(),
)
Other possible solution
You can also use internal Python packages from GCP Artifact registry if you want (example with your package planning).
Then you can download your internal Python packages from Cloud Composer via PyPiPackages, I share with you a link about this :
private repo Composer Artifact registry
Related
I am building a pipeline job along the lines of this guide:
https://learn.microsoft.com/en-us/azure/machine-learning/tutorial-pipeline-python-sdk
I have a local environment.yaml which I am sending via the SDK to be built and registered by the ML Studio workspace. This works for 3rd party dependencies such as numpy. But in my pipeline components script (scripts/train.py) I am importing my own module, which that script is a part of.
Component's definition (TRAIN_CONFIG) that uses train.py
name: training_job
display_name: Training
# version: 1 # Not specifying a version will automatically update the version
type: command
inputs:
registered_model_name:
type: string
outputs:
model:
type: uri_folder
code: .
environment:
azureml:AzureML-sklearn-0.24-ubuntu18.04-py37-cpu:21
command: >-
python scripts/train.py
Snippet from script that triggers the remote training job:
train_component: Component = load_component(source=TRAIN_CONFIG)
train_component.environment = get_ml_client().environments.get("training-job-env", version="42")
Since the environment build job (which basically uses conda env create -f environment.yaml) fails when I have <my module> as a dependency in the environment.yaml, since it of course does not find that module, I took it out. But obviously, then <mv module> is missing from the environment when the job runs on Azure:
Traceback (most recent call last):
File "/mnt/azureml/cr/j/xxx/exe/wd/scripts/train.py", line 11, in <module>
from <my module> import <...>
ModuleNotFoundError: No module named '<my module>'
So, given that seemingly I do not have more control over the job environment than passing the environment.yaml, how do I get <my module> installed in the job environment?
i am trying to build a container for my express.js application. The express.js-app makes use of python via the npm package PythonShell.
I have plenty of python-code, which is in a subfolder of my express-app and with npm start everything works perfectly.
However, i am new to docker and i need to containerize the app. My Dockerfile looks like this:
FROM node:18
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 3001
CMD ["node", "./bin/www"]
I built the Image with:
docker build . -t blahblah-server and ran it with docker run -p 8080:3001 -d blahblah-server.
I make use of imports at the top of the python-script like this:
import datetime
from pathlib import Path # Used for easier handling of auxiliary file's local path
import pyecma376_2 # The base library for Open Packaging Specifications. We will use the OPCCoreProperties class.
from assi import model
When the pythonscript is executed (only in the container!!!) I get following error-message:
/usr/src/app/public/javascripts/service/pythonService.js:12
if (err) throw err;
^
PythonShellError: ModuleNotFoundError: No module named 'pyecma376_2'
at PythonShell.parseError (/usr/src/app/node_modules/python-shell/index.js:295:21)
at terminateIfNeeded (/usr/src/app/node_modules/python-shell/index.js:190:32)
at ChildProcess.<anonymous> (/usr/src/app/node_modules/python-shell/index.js:182:13)
at ChildProcess.emit (node:events:537:28)
at ChildProcess._handle.onexit (node:internal/child_process:291:12)
----- Python Traceback -----
File "/usr/src/app/public/pythonscripts/myPython/wtf.py", line 6, in <module>
import pyecma376_2 # The base library for Open Packaging Specifications. We will use the OPCCoreProperties class. {
traceback: 'Traceback (most recent call last):\n' +
' File "/usr/src/app/public/pythonscripts/myPython/wtf.py", line 6, in <module>\n' +
' import pyecma376_2 # The base library for Open Packaging Specifications. We will use the OPCCoreProperties class.\n' +
"ModuleNotFoundError: No module named 'pyecma376_2'\n",
executable: 'python3',
options: null,
script: 'public/pythonscripts/myPython/wtf.py',
args: null,
exitCode: 1
}
If I comment the first three imports out, I get the same error:
PythonShellError: ModuleNotFoundError: No module named 'assi'
Please notice, that assi actually is from my own python-code, which is included in the expressjs-app-directory
Python seems to be installed in the container correctly. I stepped inside the container via docker exec -it <container id> /bin/bash and there are the python packages in the #/usr/lib-directory.
I really have absolute no idea how all this works together and why python doesn't find this modules...
You are trying to use libs that are not in Standard Python Library. It seems that you are missing to run pip install , when you build the docker images.
Try adding RUN docker commands that can do this for you. Example:
RUN pip3 install pyecma376_2
RUN pip3 install /path/to/assi
Maybe, that can solve your problem. Don't forget to check if python are already installed in your container, it semms that it is. And if you have python2 and pyhton3 installed, make sure that you use pip3 instead of only pip.
I'm trying to create an Azure DevOps Pipeline in order to build and release a Python package under the Azure DevOps Artifacts section.
I've started creating a feed called "utils", then I've created my package and I've structured it like that:
.
src
|
__init__.py
class.py
test
|
__init__.py
test_class.py
.pypirc
azure-pipelines.yml
pyproject.toml
requirements.txt
setup.cfg
And this is the content of files:
.pypirc
[distutils]
Index-servers =
prelios-utils
[utils]
Repository = https://pkgs.dev.azure.com/OMIT/_packaging/utils/pypi/upload/
pyproject.toml
[build-system]
requires = [
"setuptools>=42",
"wheel"
]
build-backend = "setuptools.build_meta"
setup.cfg
[metadata]
name = my_utils
version = 0.1
author = Walter Tranchina
author_email = walter.tranchina#OMIT.com
description = A package containing [...]
long_description = file: README.md
long_description_content_type = text/markdown
url = OMIT.com
project_urls =
classifiers =
Programming Language :: Python :: 3
License :: OSI Approved :: MIT License
Operating System :: OS Independent
[options]
package_dir =
= src
packages = find:
python_requires = >=3.7
install_requires=
[options.packages.find]
where = src
azure-pipelines.yml
trigger:
- main
pool:
vmImage: 'ubuntu-latest'
strategy:
matrix:
Python38:
python.version: '3.8'
steps:
- task: UsePythonVersion#0
inputs:
versionSpec: '$(python.version)'
displayName: 'Use Python $(python.version)'
- script: |
python -m pip install --upgrade pip
displayName: 'Install dependencies'
- script: |
pip install twine wheel
displayName: 'Install buildtools'
- script: |
pip install pytest pytest-azurepipelines
pytest
displayName: 'pytest'
- script: |
python -m build
displayName: 'Artifact creation'
- script: |
twine upload -r utils --config-file ./.pypirc dist/*
displayName: 'Artifact Upload'
The problem I'm facing is that the pipeline stucks in the Artifact Upload stage for hours without completing.
Can please someone help me understand what it's wrong?
Thanks!
[UPDATE]
I've updated my yml file as suggested in the answers:
- task: TwineAuthenticate#1
displayName: 'Twine Authenticate'
inputs:
artifactFeed: 'utils'
And now I have this error:
2022-05-19T09:20:50.6726960Z ##[section]Starting: Artifact Upload
2022-05-19T09:20:50.6735745Z ==============================================================================
2022-05-19T09:20:50.6736081Z Task : Command line
2022-05-19T09:20:50.6736434Z Description : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
2022-05-19T09:20:50.6736788Z Version : 2.201.1
2022-05-19T09:20:50.6737008Z Author : Microsoft Corporation
2022-05-19T09:20:50.6737375Z Help : https://learn.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
2022-05-19T09:20:50.6737859Z ==============================================================================
2022-05-19T09:20:50.8090380Z Generating script.
2022-05-19T09:20:50.8100662Z Script contents:
2022-05-19T09:20:50.8102321Z twine upload -r utils --config-file ./.pypirc dist/*
2022-05-19T09:20:50.8102824Z ========================== Starting Command Output ===========================
2022-05-19T09:20:50.8129029Z [command]/usr/bin/bash --noprofile --norc /home/vsts/work/_temp/706c12ef-da25-44b0-b1fc-5ab83e7e0bf9.sh
2022-05-19T09:20:51.1178721Z Uploading distributions to
2022-05-19T09:20:51.1180490Z https://pkgs.dev.azure.com/OMIT/_packaging/utils/pypi/upload/
2022-05-19T09:20:27.0860014Z Traceback (most recent call last):
2022-05-19T09:20:27.0861203Z File "/opt/hostedtoolcache/Python/3.8.12/x64/bin/twine", line 8, in <module>
2022-05-19T09:20:27.0862081Z sys.exit(main())
2022-05-19T09:20:27.0863965Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/__main__.py", line 33, in main
2022-05-19T09:20:27.0865080Z error = cli.dispatch(sys.argv[1:])
2022-05-19T09:20:27.0866638Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/cli.py", line 124, in dispatch
2022-05-19T09:20:27.0867670Z return main(args.args)
2022-05-19T09:20:27.0869183Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/commands/upload.py", line 198, in main
2022-05-19T09:20:27.0870362Z return upload(upload_settings, parsed_args.dists)
2022-05-19T09:20:27.0871990Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/commands/upload.py", line 127, in upload
2022-05-19T09:20:27.0873239Z repository = upload_settings.create_repository()
2022-05-19T09:20:27.0875392Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/settings.py", line 329, in create_repository
2022-05-19T09:20:27.0876447Z self.username,
2022-05-19T09:20:27.0877911Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/settings.py", line 131, in username
2022-05-19T09:20:27.0879043Z return cast(Optional[str], self.auth.username)
2022-05-19T09:20:27.0880583Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/auth.py", line 34, in username
2022-05-19T09:20:27.0881640Z return utils.get_userpass_value(
2022-05-19T09:20:27.0883208Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/utils.py", line 248, in get_userpass_value
2022-05-19T09:20:27.0884302Z value = prompt_strategy()
2022-05-19T09:20:27.0886234Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/auth.py", line 85, in username_from_keyring_or_prompt
2022-05-19T09:20:27.0887440Z return self.prompt("username", input)
2022-05-19T09:20:27.0888964Z File "/opt/hostedtoolcache/Python/3.8.12/x64/lib/python3.8/site-packages/twine/auth.py", line 96, in prompt
2022-05-19T09:20:27.0890017Z return how(f"Enter your {what}: ")
2022-05-19T09:20:27.0890786Z EOFError: EOF when reading a line
2022-05-19T09:20:27.1372189Z ##[error]Bash exited with code 'null'.
2022-05-19T09:20:27.1745024Z ##[error]The operation was canceled.
2022-05-19T09:20:27.1749049Z ##[section]Finishing: Artifact Upload
Seems like twine is waiting for something... :/
I guess this is because you are missing a Python Twine Upload Authenticate task.
- task: TwineAuthenticate#1
inputs:
artifactFeed: 'MyTestFeed'
If you are using a project level feed, the value of artifactFeed should be {project name}/{feed name}.
If you are using an organization level feed, the value of artifactFeed should be {feed name}.
A simpler way is to click the gray "setting" button under the task and select your feed from the drop-down list.
I've found the solution after many tentatives...
First I've created a Service Connection in Azure DevOps to Python, containing an API key previously generated.
Then I've edited the yaml file:
- task: TwineAuthenticate#1
displayName: 'Twine Authenticate'
inputs:
pythonUploadServiceConnection: 'PythonUpload'
- script: |
python -m twine upload --skip-existing --verbose -r utils --config-file $(PYPIRC_PATH) dist/*
displayName: 'Artifact Upload'
They key was using the variable $(PYPIRC_PATH) that is automatically set by the previous task. The .pypirc file is ignored by the process, so it can be deleted!
Hope it will help!
TL;DR: How can I set up my GitLab test pipeline so that the tests also run locally on VS Code?
I'm very new to GitLab pipelines, so please forgive me if the question is amateurish. I have a GitLab repo set up online, and I'm using VS Code to develop locally. I've created a new pipeline, I want to make sure all my unit tests (written with PyTest) run anytime I make a commit.
The issue is, that even though I use the same setup.py file for both places (obviously), I can't get both VS Code testing and the GitLab pipeline test to work at the same time. The issue is, I'm doing an import for my tests, and if I import like
...
from external_workforce import misc_tools
# I want to test functions in this misc_tools module
...
Then it works on GitLab, but not on VS Code, as VS Code gives an error when I'm doing test discovery, namely: ModuleNotFoundError: No module named 'external_workforce'. However, this works on GitLab. But if I import (in my test_tools.py file, see location below) like this:
...
from hr_datapool.external_workforce import misc_tools
...
It works in VS Code, but now GitLab is doing a crazy on me saying ModuleNotFoundError: No module named 'hr_datapool'.
I think the relevant info might be the following, please ask for more if more info is needed!
My file structure is:
.
|__ requirements.txt
setup.py
hr_datapool
|__ external_workforce
|__ __init__.py
misc_tools.py
tests
|__ test_tools.py
|__ other_module
...
In my pipeline editor (the .gitlab-ci.yml file) I have:
image: python:3.9.7
variables:
PIP_CACHE_DIR: "$CI_PROJECT_DIR/.cache/pip"
cache:
paths:
- .cache/pip
- venv/
before_script:
- python --version # For debugging
- pip install virtualenv
- virtualenv venv
- source venv/bin/activate
- pip install -r requirements.txt
test:
script:
- pytest --pyargs hr_datapool #- python setup.py test
run:
script:
- python setup.py bdist_wheel
artifacts:
paths:
- dist/*.whl
And finally, my setup.py is:
import re
from unittest import removeResult
from setuptools import setup, find_packages
with open('requirements.txt') as f:
requirements = f.read().splitlines()
for req in ['wheel', 'bar']:
requirements.append(req)
setup(
name='hr-datapool',
version='0.1',
...
packages=find_packages(),
install_requires=requirements,
)
Basically, the question is: How can I set up my GitLab test pipeline so that the tests also run locally on VS Code? Thank you!
UPDATE:
Adding the full trace coming from VS Code:
> conda run -n base --no-capture-output --live-stream python ~/.vscode/extensions/ms-python.python-2022.2.1924087327/pythonFiles/get_output_via_markers.py ~/.vscode/extensions/ms-python.python-2022.2.1924087327/pythonFiles/testing_tools/run_adapter.py discover pytest -- --rootdir "." -s --cache-clear hr_datapool
cwd: .
[ERROR 2022-2-23 9:2:4.500]: Error discovering pytest tests:
[r [Error]: ============================= test session starts ==============================
platform darwin -- Python 3.9.7, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /Users/myuser/Documents/myfolder
plugins: anyio-2.2.0
collected 0 items / 1 error
==================================== ERRORS ====================================
_____ ERROR collecting hr_datapool/external_workforce/tests/test_tools.py ______
ImportError while importing test module '/Users/myuser/Documents/myfolder/hr_datapool/external_workforce/tests/test_tools.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../../opt/anaconda3/lib/python3.9/importlib/__init__.py:127: in import_module
return _bootstrap._gcd_import(name[level:], package, level)
hr_datapool/external_workforce/tests/test_tools.py:2: in <module>
from external_workforce import misc_tools
E ModuleNotFoundError: No module named 'external_workforce'
=========================== short test summary info ============================
ERROR hr_datapool/external_workforce/tests/test_tools.py
!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!
===================== no tests collected, 1 error in 0.08s =====================
Traceback (most recent call last):
File "/Users/myuser/.vscode/extensions/ms-python.python-2022.2.1924087327/pythonFiles/get_output_via_markers.py", line 26, in <module>
runpy.run_path(module, run_name="__main__")
File "/Users/myuser/opt/anaconda3/lib/python3.9/runpy.py", line 268, in run_path
return _run_module_code(code, init_globals, run_name,
File "/Users/myuser/opt/anaconda3/lib/python3.9/runpy.py", line 97, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/Users/myuser/opt/anaconda3/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/Users/myuser/.vscode/extensions/ms-python.python-2022.2.1924087327/pythonFiles/testing_tools/run_adapter.py", line 22, in <module>
main(tool, cmd, subargs, toolargs)
File "/Users/myuser/.vscode/extensions/ms-python.python-2022.2.1924087327/pythonFiles/testing_tools/adapter/__main__.py", line 100, in main
parents, result = run(toolargs, **subargs)
File "/Users/myuser/.vscode/extensions/ms-python.python-2022.2.1924087327/pythonFiles/testing_tools/adapter/pytest/_discovery.py", line 44, in discover
raise Exception("pytest discovery failed (exit code {})".format(ec))
Exception: pytest discovery failed (exit code 2)
ERROR conda.cli.main_run:execute(33): Subprocess for 'conda run ['python', '/Users/myuser/.vscode/extensions/ms-python.python-2022.2.1924087327/pythonFiles/get_output_via_markers.py', '/Users/A111086670/.vscode/extensions/ms-python.python-2022.2.1924087327/pythonFiles/testing_tools/run_adapter.py', 'discover', 'pytest', '--', '--rootdir', '/Users/myuser/Documents/myfolder', '-s', '--cache-clear', 'hr_datapool']' command failed. (See above for error)
at ChildProcess.<anonymous> (/Users/myuser/.vscode/extensions/ms-python.python-2022.2.1924087327/out/client/extension.js:32:39235)
at Object.onceWrapper (events.js:422:26)
at ChildProcess.emit (events.js:315:20)
at maybeClose (internal/child_process.js:1048:16)
at Process.ChildProcess._handle.onexit (internal/child_process.js:288:5)]
The PYTHONPATH caused the problem.
As external_workforce parent folder path -> the path of hr_datapool in the PYTHONPATH when you are using GitLab. While hr_datapool parent folder path in the PYTHONPATH when you are using VSCode.
Are you running the test in the terminal on the VSCode? And have you added this in the settings.json file?
"terminal.integrated.env.windows": {
"PYTHONPATH": "${workspaceFolder};"
},
Then you can execute pytest in the terminal on the VSCode. But you have not configured this in GitLab instead of adding hr-datapool( - pytest --pyargs hr_datapool or setup( name='hr-datapool',), so you will get the error message.
I want to run a python project on a remote server (AWS cluster).
In the project, there is a model ("model") in path project/folder2/model, in model.py I have df.write.parquet(parquetFileName).
In order for the salves to have that "model" from "folder2" I am using setup from setuptools.
This is my setup.py (path project/setup.py):
setup(
name='dataFrameFromSpark',
version='',
packages=['project/config', 'project/folder2', 'project/folder3'],
py_modules=['project/model1'],
url='',
license='',
author='....',
author_email='',
description=''
)
Before running main.py I type this line on the remote server:
python project/setup.py build && python project/setup.py sdist && python project/setup.py bdist_egg
The error:
ImportError: No module named folder2.model
It seems the slaves don't get the folders when I run the "write to parquet" code.
what can I do?