handling config yaml files in setup.py

handling config yaml files in setup.py - python

I have a pycharm project with the following shape:
my_project:
setup.py
config_files:
config1.yaml
config2.yaml
my_package:
__init__.py
main.py
When I run my main.py from pycharm, I am able to access my config1.yaml file using the following path: ../config_files/config1.yaml.
My goal is to build the package, then install it on my cloud machine and run the program. But the setup.py does not copy the config files where they are in my pycharm project.
My question is: what can I do to be sure that these files are correctly copied?

Related

Google Cloud Buildpack custom source directory for Python app

I am experimenting with Google Cloud Platform buildpacks, specifically for Python. I started with the Sample Functions Framework Python example app, and got that running locally, with commands:
pack build --builder=gcr.io/buildpacks/builder sample-functions-framework-python
docker run -it -ePORT=8080 -p8080:8080 sample-functions-framework-python
Great, let's see if I can apply this concept on a legacy project (Python 3.7 if that matters).
The legacy project has a structure similar to:
.gitignore
source/
main.py
lib
helper.py
requirements.txt
tests/
<test files here>
The Dockerfile that came with this project packaged the source directory contents without the "source" directory, like this:
COPY lib/ /app/lib
COPY main.py /app
WORKDIR /app
... rest of Dockerfile here ...
Is there a way to package just the contents of the source directory using the buildpack?
I tried to add this config to the project.toml file:
[[build.env]]
name = "GOOGLE_FUNCTION_SOURCE"
value = "./source/main.py"
But the Python modules/imports aren't set up correctly for that, as I get this error:
File "/workspace/source/main.py", line 2, in <module>
from source.lib.helper import mymethod
ModuleNotFoundError: No module named 'source'
Putting both main.py and /lib into the project root dir would make this work, but I'm wondering if there is a better way.
Related question, is there a way to see what project files are being copied into the image by the buildpack? I tried using verbose logging but didn't see anything useful.
Update:
The python module error:
File "/workspace/source/main.py", line 2, in <module>
from source.lib.helper import mymethod
ModuleNotFoundError: No module named 'source'
was happening because I moved the lib dir into source in my test project, and when I did this, Intellij updated the import statement in main.py without me catching it. I fixed the import, then applied the solution listed below and it worked.

I had been searching the buildpack and Google cloud function documentation, but I discovered the option I need on the pack build documentation page: option --path.
This command only captures the source directory contents:
pack build --builder=gcr.io/buildpacks/builder --path source sample-functions-framework-python
If changing the path, the project.toml descriptor needs to be in that directory too (or specify with --descriptor on command line).

Error when importing modules from different folders in Python

I have the following:
my_project/
hybrik/
__init__.py
models/
__init__.py
builder.py
scripts/
demo.py
And in demo.py:
from hybrik.models import builder
When I tried to run demo.py, an error occured:
ModuleNotFoundError: No module named 'hybrik'
I've already had __init__.py, why can't it find the module?

Python will look for modules in locations on the PYTHONPATH.
Assuming the actual code in those 4 Python files makes sense, you can do the following:
on PowerShell:
$env:pythonpath += ";/path/to/my_project"
on Windows command prompt:
set PYTHONPATH=%PYTHONPATH%;/path/to/my_project
on a Linux shell:
PYTHONPATH=$PYTHONPATH:/path/to/my_project
Alternatively, you can build a package and install it in the environment of your script.

Python module not found when using Docker

I have an Angular-Flask application that I'm trying to Dockerize.
One of the Python files used is a file called trial.py. This file uses a module called _trial.pyd. When I run the project on local everything works fine. However, when I dockerize using the following code and run, it gives me error "Module _trial not found".
FROM node:latest as node
COPY . /APP
COPY package.json /APP/package.json
WORKDIR /APP
RUN npm install
RUN npm install -g #angular/cli#7.3.9
CMD ng build --base-href /static/
FROM python:3.6
WORKDIR /root/
COPY --from=0 /APP/ .
RUN pip install -r requirements.txt
EXPOSE 5000
ENTRYPOINT ["python"]
CMD ["app.py"]
Where am I going wrong? Do I need to change some path in my dockerfile?
EDIT:
Directory structure:
APP [folder with src Angular files]
static [folder with js files]
templates [folder with index.html]
_trial.pyd
app.py [Starting point of execution]
Dockerfile
package.json
requirements.txt
trial.py
app.py calls trial.py (successfully).
trial.py needs _trial.pyd. I try to call it using import _trial.
It works fine locally, but when I dockerize it, i get the following error message:
Traceback (most recent call last):
File "app.py", line 7, in
from trial import *
File "/root/trial.py", line 15, in
import _trial
ModuleNotFoundError: No module named '_trial'
The commands run are:
docker image build -t prj .
docker container run --publish 5000:5000 --name prj prj
UPDATE
It is able to recognize other .py files but not .pyd module.

I think this could be due to the following reasons
Path
Make sure the required file is in the pythonpath. It sounds like you have done that so probably not this one.
Debug Mode
If you are working with a debug mode you will need to rename this module "_trail.pyd" to "_trail_d.pyd"
Missing DLL
The .pyd file may require a dll or other dependency that is not available and cant be imported due to that. there are tools such as depends.exe or this that allow you to find what is required
Name Space Issue
If there was a file already called "_trail.py" in the python path that could create unwanted behavior.

Solved it!
The issue was that I was using .pyd files in a Linux container. However, it seems that Linux doesn't support .pyd files. This is why it was not able to detect the _trial.pyd module. I had to generate a shared object (.so) file (i.e. _trial.so) and it worked fine.
EDIT: The .so file was generated on a Linux system by me. Generating .so on Windows and then using it in a Linux container gives "invalid ELF header" error.

pathlib Path resolves installed path directory of package instead of source code directory

I have packaged my project using setup.py and project folder structure looks like below.
api-automation
api
packagename
__init__.py
user.py
payloads
a.json
b.json
tests
conftest.py
setup.cfg
setup.py
README.rst
I have created virtual environment in below folder with name "myenv_1",
/Users/basavarajlamani/Documents/environments/
and i have installed above repo in this virtual environment.
I tried a lot on stackoverflow and internet but did not found answer.
code of user.py file
from pathlib import Path
current_dir = str(Path(__file__).resolve().parent)
def func():
print("current_dir", current_dir)
code of conftest.py
from packagename.user import func
func()
If I run user.py file directly(python3 user.py), i will get the correct directory path as below,
current_dir /Users/basavarajlamani/Documents/repos/api-automation/api/packagename
But if I run conftest.py file(python3 conftest.py), I am getting installed path as below which i don't want and I want to get directory path like when i run user.py file directly,
current_dir
/Users/basavarajlamani/Documents/environments/myenv_1/lib/python3.7/site-packages/packagename
Please help, how i can solve this problem.

I suspect you didn't use the correct option when bootstrapping your development environment.
Try:
cleanup your development virtualenv or delete it and create a new one.
cd the/root/of/your/source/tree
pip install -e .
The important point is the -e option. Read the pip manual.

Google Dataflow - Failed to import custom python modules

My Apache beam pipeline implements custom Transforms and ParDo's python modules which further imports other modules written by me. On Local runner this works fine as all the available files are available in the same path. In case of Dataflow runner, pipeline fails with module import error.
How do I make custom modules available to all the dataflow workers? Please advise.
Below is an example:
ImportError: No module named DataAggregation
at find_class (/usr/lib/python2.7/pickle.py:1130)
at find_class (/usr/local/lib/python2.7/dist-packages/dill/dill.py:423)
at load_global (/usr/lib/python2.7/pickle.py:1096)
at load (/usr/lib/python2.7/pickle.py:864)
at load (/usr/local/lib/python2.7/dist-packages/dill/dill.py:266)
at loads (/usr/local/lib/python2.7/dist-packages/dill/dill.py:277)
at loads (/usr/local/lib/python2.7/dist-packages/apache_beam/internal/pickler.py:232)
at apache_beam.runners.worker.operations.PGBKCVOperation.__init__ (operations.py:508)
at apache_beam.runners.worker.operations.create_pgbk_op (operations.py:452)
at apache_beam.runners.worker.operations.create_operation (operations.py:613)
at create_operation (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:104)
at execute (/usr/local/lib/python2.7/dist-packages/dataflow_worker/executor.py:130)
at do_work (/usr/local/lib/python2.7/dist-packages/dataflow_worker/batchworker.py:642)

The issue is probably that you haven't grouped your files as a package. The Beam documentation has a section on it.
Multiple File Dependencies
Often, your pipeline code spans multiple files. To run your project remotely, you must group these files as a Python package and specify the package when you run your pipeline. When the remote workers start, they will install your package. To group your files as a Python package and make it available remotely, perform the following steps:
Create a setup.py file for your project. The following is a very basic setup.py file.
setuptools.setup(
name='PACKAGE-NAME'
version='PACKAGE-VERSION',
install_requires=[],
packages=setuptools.find_packages(),
)
Structure your project so that the root directory contains the setup.py file, the main workflow file, and a directory with the rest of the files.
root_dir/
setup.py
main.py
other_files_dir/
See Juliaset for an example that follows this required project structure.
Run your pipeline with the following command-line option:
--setup_file /path/to/setup.py
Note: If you created a requirements.txt file and your project spans multiple files, you can get rid of the requirements.txt file and instead, add all packages contained in requirements.txt to the install_requires field of the setup call (in step 1).

I ran into the same issue and unfortunately, the docs are not as verbose as they need to be.
So, the problem as it turns out is that both the root_dir and the other_files_dir must contain an __init__.py file. When a directory contains an __init__.py file (even if it's empty) python will treat that directory as a package, which in this instance is what we want. So, your final folder structure should look something like this:
root_dir/
__init__.py
setup.py
main.py
other_files_dir/
__init__.py
module_1.py
module_2.py
And what you'll find is that python will build an .egg-info folder that describes your package including all pip dependencies. It will also contain the top_level.txt file which contains the name of the directory that holds the modules (i.e other_files_dir)
Then you would simply call the modules in main.py as below
from other_files_dir import module_1

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

handling config yaml files in setup.py - python

Related

Google Cloud Buildpack custom source directory for Python app

Error when importing modules from different folders in Python

Python module not found when using Docker

pathlib Path resolves installed path directory of package instead of source code directory

Google Dataflow - Failed to import custom python modules

Categories

Resources