Deploy GCP Cloud function with local dependencies - python

I have been trying to deploy a Cloud Function with some private dependencies (pyodbc) as I couldn't get it working thru requirements.txt. Please note, I don't want to use Docker here. So all I have built here is below files,
1. main.py
2. process.py ( this one use pyodbc to connect to teradata)
3. libs (folder)
3.1 pyodbc-4.0.30.dist-info (package)
3.2 pyodbc (python extension module)
3.3 __init.py__ ( this is to make this folder as module)
4.requirements.txt
I also updated process.py file to import pyodbc module as below,
import libs.pyodbc
Please note: I used GCP docs to install the pyodbc package and put in libs using https://cloud.google.com/functions/docs/writing/specifying-dependencies-python
On top this, I am also requirements.txt to import as default.
But I am still getting module error as below.
Error message: Code in file main.py can't be loaded.
Did you list all required modules in requirements.txt?
Detailed stack trace: Traceback (most recent call last):
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 305, in check_or_load_user_function
_function_handler.load_user_function()
File "/env/local/lib/python3.7/site-packages/google/cloud/functions/worker.py", line 184, in load_user_function
spec.loader.exec_module(main)
File "<frozen importlib._bootstrap_external>", line 728, in exec_module
File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
File "/user_code/main.py", line 9, in <module>
from process import process
File "/user_code/process.py", line 6, in <module>
import libs.pyodbc
ModuleNotFoundError: No module named 'libs.pyodbc'
Any leads or help from here is really appreciated. All I am trying to achieve here is, Read CSV files from GCP bucket and process it thru dataframe which loads into teradata and generate output file back into another GCP bucket. I am trying to achieve all with Cloud Functions only. Thank you

The pyodbc project might be a bit of a special case here, because:
The project requires some platform-specific code;
They haven't published source distributions for their latest release (only built distributions).
Here's what I did to get this to work. Starting with an empty libs directory, first download the latest available source distribution:
$ pip download pyodbc --no-binary :all:
Make a directory for the module:
$ mkdir libs/pyodbc
Untar the source distribution into the module:
$ tar xf pyodbc-4.0.28.tar.gz -C libs/pyodbc
Then, in the function you can do:
import libs.pyodbc

Related

In my src/ my config.py (project variables) won't import to other directories

In my src/ directory of my project as of this post I have a config.py I want to be used in importing most of my python files in this PyObj-C and py2app project. As of the current sent link, everything works fine when ran with python3. However, once I build it with my Makefile using make build which runs python setup.py py2pp -A I get this error when running the build:
Traceback (most recent call last):
File "/Users/leif/PycharmProjects/shoutout/src/dist/Shoutout!.app/Contents/Resources/__boot__.py", line 149, in <module>
_run()
File "/Users/leif/PycharmProjects/shoutout/src/dist/Shoutout!.app/Contents/Resources/__boot__.py", line 143, in _run
exec(compile(source, script, "exec"), globals(), globals())
File "/Users/leif/PycharmProjects/shoutout/src/main.py", line 2, in <module>
import AppDelegate
File "/Users/leif/PycharmProjects/shoutout/src/AppDelegate.py", line 11, in <module>
from WindowController import mainWindow
File "/Users/leif/PycharmProjects/shoutout/src/WindowController.py", line 14, in <module>
from sutils.tasks import backendTasks as tasks
File "/Users/leif/PycharmProjects/shoutout/src/sutils/tasks.py", line 17, in <module>
from config import ymlDir, scheduleDir, configDir, resourcesURL
ModuleNotFoundError: No module named 'config'
2022-11-11 15:45:00.625 Shoutout![60555:5326242] Launch error
2022-11-11 15:45:00.625 Shoutout![60555:5326242] Launch error
See the py2app website for debugging launch issues
It seems my tasks.py, which has some classmethods in it for background tasks, in my utilities folder can't import it once it's compiled. It can't seem to access config.py once it's built for some reason or another
I have my setup.py here, which includes the config.py and the rest of the utilities folder's files and beyond that directory included in the data_files variable within the setup method in a list. I have also tried not using sutils/config.py, 'sutils/tasks.py' and so on and just did 'config.py', 'tasks.py', but this didn't work either.
I am stumped here. Is there a way to get around this, or a clear reason I am missing right under my nose somehow?
Thanks for any input or comments!
Again the specific commit of my src/ of this post I want to solve:
https://github.com/leifadev/shoutout/tree/398846a045bbb0c17bbd6905a578a2907349d9fa/src

Absolute/relative import in Python: ModuleNotFoundError and more

This is my project structure:
- config
- data
- src
- resources
- db
- test
N.B.: I am using Python 3.9 and every folder that contains a .py file also has a __init__.py file
All the scripts I want to run are located in the /src folder and they used code from other scripts placed in the /src/resources folder (which is basically acting like a library).
Some of these scripts also read YAML files from the /config folder
Here is the problem, I cannot find a way to properly run these scripts from the command line, I am always getting errors like:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/runpy.py", line 185, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/local/lib/python3.8/runpy.py", line 111, in _get_module_details
__import__(pkg_name)
File "/home/pi/crypto/src/ethMessage.py", line 4, in <module>
import update_db
File "/home/pi/crypto/src/update_db.py", line 1, in <module>
from db.mysql_main import insertValueAndFee
File "/home/pi/crypto/src/db/mysql_main.py", line 6, in <module>
from src.resources.parser import read_yaml
ModuleNotFoundError: No module named 'src'
I tried both with relative and absolute import, right now absolute import is what I am using (e.g. from src.resources.parser import read_yaml)
Which is the proper way to run scripts from the command line?
EDIT:
As you suggested, I added
sys.path.append( os.path.abspath(os.path.dirname(__file__)+'/..') )
to all the main scripts, and I am still getting a similar error:
Traceback (most recent call last):
File "src/ethMessage.py", line 6, in <module>
import update_db
File "/home/pi/crypto/src/update_db.py", line 1, in <module>
from db.mysql_main import insertValueAndFee
File "/home/pi/crypto/src/db/mysql_main.py", line 6, in <module>
from src.resources.parser import read_yaml
ModuleNotFoundError: No module named 'src'
To clarify, I am running my script from the global folder, which in my case is named "crypto".
I am also open to change the project structure with one that doesn't create problems.
If you want to refer to all of those packages by their root name, then all you have to do is add that folder to the Python path. So, for main program scripts in src, just add something like this:
import os
import sys
sys.path.append( os.path.abspath(os.path.dirname(__file__)+'/..') )
Now, the parent directory of your script will be on the path, no matter where you run it from. Now you can say
from src.resources.parser import read_yaml
If someone is still looking for a solution, I highly recommend not to bother with Python's imports: they are probably the worst part of the whole language.
Instead, if you want to use some files as a library, you should use setuptools to create a package from those files.
Then, you can install it locally or publish it on PyPi.
This way, you can import your library in a script just like another third-party module, (e.g. requests, selenium, ...), and things will work, instead of giving you a headache because a file is in a directory instead of another.

Use pyarrow in Glue pythonshell - ModuleNotFoundError: No module named 'pyarrow.lib'

Created a egg and whl file of pyarrow and put this on s3, for call this in pythonshell job. Received this message:
Job code:
import pyarrow
raise
Error, same structure for whl:
Traceback (most recent call last):
File "/tmp/runscript.py", line 118, in <module>
runpy.run_path(temp_file_path, run_name='__main__')
File "/usr/local/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/local/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/local/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/tmp/glue-python-scripts-e67xuz2j/genos.py", line 1, in <module>
File "/glue/lib/installation/kanna-0.1-py3.6.egg/pyarrow/__init__.py", line 49, in <module>
from pyarrow.lib import cpu_count, set_cpu_count
ModuleNotFoundError: No module named 'pyarrow.lib'
PD: Cannot found the lib.py or lib folder in local files.
I was having the same problem with AWS Lambda and came across this question.
For Glue, AWS docs state only pure python libraries can be used.
For Lambda:
The underlying problem is that modules like pyarrow port their code from C/ C++. When you check pyarrow codebase, you will find in fact two pyarrow.lib files exist, but they have .pyx and .pxd file extensions. This is not pure Python code and therefore depends on underlying CPU architecture.
I had to manually download .whl files for my required version for pyarrow and its dependency numpy. From http://pypi.org/project/pyarrow/, click on Download files and search for your matching version. cp39 means cpython 3.9. and x86 represents the CPU architecture. Follow the same steps for Numpy. I ended up downloading these files: pyarrow-8.0.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl and numpy-1.22.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
You then have to unzip them and create an archive where both sit together in a folder named Python. This folder can be used to create a layer in Lambda. Attach this layer to your project and import pyarrow should work.
The other solution is to use custom Docker images. This worked for me as well. I believe the AWS docs are exhaustive on that topic. I have written a PoC and all the steps that I followed here.
I followed this guide for creating a pyarrow layer.
pyarrow won't work as it is with glue as it needs support for C and glue doesn't support it.
What you could do is try installing the library on local machine and creating a package manually, then using that egg file.
That worked for my colleage, haven't tested personally.

Python app created via Pyinstaller does not run

I made a simple, one-file script that I would like to share with the end-user. I found that Pyinstaller does the job so I refactor my project structure regarding.
I have a project dir. Inside that, I have the package that has an empty __init__.py and a __main__.py with the actual script with a few include to libraries like opencv-python, numpy, etc. Outside of the project, I have a setup.py and an entry-point script that imports the main function from the __main__.py and calls it.
Then I have created the executable against this entry-point with PyInstaller in --onefile mode. When I use the created executable on my machine it does the job perfectly but when I send it to the end-user it ends up with an error (see below). I am not quite sure whats this error means but I saw paths in that to my dev environment which should be not there on other machines. It looks like it missing the dependencies but I thought that PyInstaller bundles these dependencies into the executable.
What am I missing here? I have read many regarding questions here in StackOverflow but I couldn't find a solution.
I developed this script using PyCharm on the latest macOS within a venv created by PyCharm. The venv folder is in the project dir - I don't know that could be a problem.
Traceback (most recent call last):
  File "PyInstaller/loader/rthooks/pyi_rth_pkgres.py", line 11, in <module>
  File "/Users/hordon/Documents/DEV/projects/scan_detect/venv/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
  File "setuptools-40.8.0-py3.7.egg/pkg_resources/__init__.py", line 33, in <module>
  File "/Users/hordon/Documents/DEV/projects/scan_detect/venv/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
  File "platform.py", line 116, in <module>
  File "/Users/hordon/Documents/DEV/projects/scan_detect/venv/lib/python3.7/site-packages/PyInstaller/loader/pyimod03_importers.py", line 627, in exec_module
  File "subprocess.py", line 153, in <module>
ImportError: dlopen(/var/folders/wk/cwx1b16j50z5_yt1ynq82hr00000gn/T/_MEI7eUUkV/select.cpython-37m-darwin.so, 2): Symbol not found: ____chkstk_darwin
  Referenced from: /var/folders/wk/cwx1b16j50z5_yt1ynq82hr00000gn/T/_MEI7eUUkV/select.cpython-37m-darwin.so (which was built for Mac OS X 10.15)
  Expected in: /usr/lib/libSystem.B.dylib
in /var/folders/wk/cwx1b16j50z5_yt1ynq82hr00000gn/T/_MEI7eUUkV/select.cpython-37m-darwin.so
[15314] Failed to execute script pyi_rth_pkgres
i have the same problem ,and sadly it's a problem of mac os. So, if you want to support different version of mac os, you need to build your app on the oldest version of mac os.
"In Mac OS X, components from one version of the OS are usually compatible with later versions, but they may not work with earlier versions.
The only way to be certain your app supports an older version of Mac OS X is to run PyInstaller in the oldest version of the OS you need to support."
https://pyinstaller.readthedocs.io/en/stable/usage.html#making-mac-os-x-apps-forward-compatible

Import error on first-party library with dev_appserver.py

On Ubuntu 16.04, am suddenly getting import errors from the local GAE development server.
The local dev server starts up, including the admin interface, but app no longer loads.
Native python imports of the same library on same machine (in this case "from google.cloud import datastore") work fine.
The GAE standard app does run when deployed, but development just got a little challenging.
google-cloud is version 0.27.0
gcloud components is 172.0.1
python is Anaconda 2.7.13
GAE is standard
I have confirmed to the best of my middling abilities that $PATH is correct for all named libraries.
I have removed and re-added all the named libraries to no effect.
cachetools(2.0.1) it should probably be noted, is installed as a dependency of the google cloud libraries, so I don't think this is addressable through requirements.txt or "libraries" in app.yaml.
I did recently go through a cycle of removing and adding libraries to fix a problem with apache_beam 2.0.1, so I may have jacked up something else, but am not sure where to look.
Suggestions deeply appreciated. Full traceback (from admin, same as from app):
Traceback (most recent call last):
File "/usr/lib/google-cloud-sdk/platform/google_appengine/google/appengine/tools/devappserver2/python/runtime/request_handler.py", line 232, in handle_interactive_request
exec(compiled_code, self._command_globals)
File "<string>", line 1, in <module>
File "/home/brian/anaconda3/lib/python2.7/site-packages/google/cloud/datastore/__init__.py", line 61, in <module>
from google.cloud.datastore.client import Client
File "/home/brian/anaconda3/lib/python2.7/site-packages/google/cloud/datastore/client.py", line 23, in <module>
from google.cloud.client import ClientWithProject
File "/home/brian/anaconda3/lib/python2.7/site-packages/google/cloud/client.py", line 27, in <module>
from google.oauth2 import service_account
File "/home/brian/anaconda3/lib/python2.7/site-packages/google/oauth2/service_account.py", line 79, in <module>
from google.auth import jwt
File "/home/brian/anaconda3/lib/python2.7/site-packages/google/auth/jwt.py", line 49, in <module>
import cachetools
ImportError: No module named cachetools
The stacktrace indicates you're running the libraries from the local system installation (the site-packages dir), not from your app.
For standard env GAE apps you need to install the dependencies inside your app and they will be uploaded to GAE together with your app code.
More specifically you need to use the -t <your_app_lib_dir> option for the pip installation. From Installing a third-party library:
Use pip (version 6 or later) with the -t <directory> flag to copy the libraries into the folder you created in the previous
step. For example:
pip install -t lib/ <library_name>
I addressed my problem through the requirements.txt file in app root directory.
I had: google-cloud==0.22.0
and changed it to: google-cloud==0.27.0
which fixed it.

Categories