Azure databricks PySpark custom UDF ModuleNotFoundError: No module named - python

I was checking this SO but none of the solutions helped PySpark custom UDF ModuleNotFoundError: No module named
I have the current repo on azure databricks:
|-run_pipeline.py
|-__init__.py
|-data_science
|--__init.py__
|--text_cleaning
|---text_cleaning.py
|---__init.py__
On the run_pipeline notebook I have this
from data_science.text_cleaning import text_cleaning
path = os.path.join(os.path.dirname(__file__), os.pardir)
sys.path.append(path)
spark = SparkSession.builder.master(
"local[*]").appName('workflow').getOrCreate()
df = text_cleaning.basic_clean(spark_df)
On the text_cleaning.py I have a function called basic_clean that will run something like this:
def basic_clean(df):
print('Removing links')
udf_remove_links = udf(_remove_links, StringType())
df = df.withColumn("cleaned_message", udf_remove_links("cleaned_message"))
return df
When I do df.show() on the run_pipeline notebook, I get this error message:
Exception has occurred: PythonException (note: full exception trace is shown but execution is paused at: <module>)
An exception was thrown from a UDF: 'pyspark.serializers.SerializationError: Caused by Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 165, in _read_with_length
return self.loads(obj)
File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads
return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'data_science''. Full traceback below:
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/serializers.py", line 165, in _read_with_length
return self.loads(obj)
File "/databricks/spark/python/pyspark/serializers.py", line 466, in loads
return pickle.loads(obj, encoding=encoding)
ModuleNotFoundError: No module named 'data_science'
Shouldnt the imports work? Why is this an issue?

It seems data-science module is missing on cluster. Kindly consider
installing it on cluster.
Please check below link about installing libraries to cluster.
https://learn.microsoft.com/en-us/azure/databricks/libraries/cluster-libraries
You can consider executing pip list command to see libraries installed on cluster.
You can consider running pip install data_science command also directly in notebook cell.

I've been facing the same issue running pyspark tests with UDFs in Azure Devops. I've noticed that this happens when running from the pool with vmImage:ubuntu-latest. When I use a custom container build from the following Dockerfile, the tests run fine:
FROM python:3.8.3-slim-buster AS py3
FROM openjdk:8-slim-buster
ENV PYSPARK_VER=3.3.0
ENV DELTASPARK_VER=2.1.0
COPY --from=py3 / /
WORKDIR /setup
COPY requirements.txt .
RUN python -m pip install --upgrade pip
RUN pip install --no-cache-dir -r requirements.txt && \
rm requirements.txt
WORKDIR /code
requirements.txt contains pyspark==3.3.0 and delta-spark==2.1.0.
This led me to conclude that it's due to how spark runs in the default ubuntu VM which runs python 3.10.6 and java 11 (at the time of posting this). I've tried setting env variables such as PYSPARK_PYTHON to enforce pyspark to use the same python binary on which the to-be-tested package is installed but to no avail.
Maybe you can use this information to find a way to use the default agent pool's ubuntu vm to get it to work, otherwise I recommend just using a pre-configured container like I did.

Related

AWS CDK - ImportError: cannot import name 'AssetManifestOptions' from 'aws_cdk.cloud_assembly_schema'

When trying to synthesize my CDK app, I receive the following error:
`
Traceback (most recent call last):
File "C:\Users\myusername\PycharmProjects\rbds-cdk_testing\app.py", line 2, in <module>
from aws_cdk.core import App, Environment
File "C:\Users\myusername\PycharmProjects\rbds-cdk_testing\.venv\lib\site-packages\aws_cdk\__init__.py", line 1260, in <module>
from .cloud_assembly_schema import (
ImportError: cannot import name 'AssetManifestOptions' from 'aws_cdk.cloud_assembly_schema' (C:\Users\myusername\PycharmProjects\rbds-cdk_testing\.venv\lib\site-packages\aws_cdk\cloud_assembly_schema\__init__.py)
I am using node version 18.0.0. Here's the steps I've done in creating my CDK app:
(FROM c:\Users\myusername\)
installed nvm
installed npm
nvm use 18.0.0
npm install -g yarn
npm install -g aws-cdk
cdk bootstrap aws://account-number/region
cd .\PyCharmProjects\mycdkapp
cdk init app --language python
.venv\Scripts\activate.bat
python -m pip install aws-cdk.aws-glue
python -m pip install aws-cdk
I error out even when executing cdk ls as the runtime tries to run app.py which contains
\
import yaml
from aws_cdk.core import App, Environment
from pipeline import PipelineCDKStack
In checking whether the init.py file for aws_cdk contains AssetManifestOptions, I've discovered it is completely missing:
Am I missing something here or is this a unique bug that I am experiencing? Any help much appreciated! I am banging my head on this one.
Its the same here, I think the issue can be in wrong package version.
cloud-assembly-schema==2.50.0 contains AssetManifestOptions.
Can you please paste here output of
pip list -v | grep aws
Iam able to install 2.50.0, however it depends on other packages of the same version (see attach)
And I cant set up core package because there is no CDKv2 matching distribution at the moment

Unable to run brownie from python script and can't run console (Windows VSCode)

If I try to import brownie in python script I get the following:
Traceback (most recent call last):
File "*filepath*", line 3, in <module>
from brownie import *
ModuleNotFoundError: No module named 'brownie'
If I try to run 'brownie console' i get the following:
INFO: Could not find files for the given pattern(s).
Brownie v1.19.0 - Python development framework for Ethereum
No project was loaded.
node:internal/modules/cjs/loader:936
throw err;
^
Error: Cannot find module 'C:\Users\user\AppData\Roaming\npm\node_modules\ganache\dist\node\cli.js'
←[90m at Function.Module._resolveFilename (node:internal/modules/cjs/loader:933:15)←[39m
←[90m at Function.Module._load (node:internal/modules/cjs/loader:778:27)←[39m
←[90m at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)←[39m
←[90m at node:internal/main/run_main_module:17:47←[39m {
code: ←[32m'MODULE_NOT_FOUND'←[39m,
requireStack: []
}
File "C:\Users\user\.local\pipx\venvs\eth-brownie\lib\site-packages\brownie\_cli\__main__.py", line 64, in main
importlib.import_module(f"brownie._cli.{cmd}").main()
File "C:\Users\user\.local\pipx\venvs\eth-brownie\lib\site-packages\brownie\_cli\console.py", line 58, in main
network.connect(CONFIG.argv["network"])
File "C:\Users\user\.local\pipx\venvs\eth-brownie\lib\site-packages\brownie\network\main.py", line 50, in connect
rpc.launch(active["cmd"], **active["cmd_settings"])
File "C:\Users\user\.local\pipx\venvs\eth-brownie\lib\site-packages\brownie\network\rpc\__init__.py", line 76, in launch
self.process = self.backend.launch(cmd, **kwargs)
File "C:\Users\user\.local\pipx\venvs\eth-brownie\lib\site-packages\brownie\network\rpc\ganache.py", line 70, in launch
ganache_version = get_ganache_version(cmd_list[0])
File "C:\Users\user\.local\pipx\venvs\eth-brownie\lib\site-packages\brownie\network\rpc\ganache.py", line 115, in get_ganache_version
raise ValueError("could not read ganache version: {}".format(ganache_version_stdout))
ValueError: could not read ganache version: b''
What could be causing this? I installed brownie using pipx and I was able to run brownie console just yesterday. When I opened VScode this morning it told me to install python extensions and that's what I did, perhaps that messed something up? Any help would be appreciated.
Open up a terminal from upper menu bar where it says terminal then open new terminal then type
python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install eth-brownie
Check: https://github.com/eth-brownie/brownie
then restart the vscode then you can Import it. Also check for typo's While importing. And installing the brownie
If my Reply Helped you It would be great if you can upvote.
Pipx seems to be the source of the issue so I went ahead and installed using regular pip instead and it worked.

Python package not installing on the first attempt but installing on the second attempt

Okay. So I am working on a python 2.7 based package called starkit. I am installing it using the following commands:
curl -O https://raw.githubusercontent.com/starkit/starkit/master/starkit_env27.yml
# create env
conda env create --file starkit_env27.yml -n starkit
# activate
source activate starkit
# get starkit
git clone https://github.com/starkit/starkit
cd starkit
# install
python setup.py install
When I run the python setup command I get the following error:
Traceback (most recent call last):
File "setup.py", line 65, in <module>
get_debug_option(PACKAGENAME))
File "/Users/97amarnathk/Documents/starkit/astropy_helpers/astropy_helpers/setup_helpers.py", line 125, in get_debug_option
if any(cmd in dist.commands for cmd in ['build', 'build_ext']):
File "/Users/97amarnathk/Documents/starkit/astropy_helpers/astropy_helpers/setup_helpers.py", line 125, in <genexpr>
if any(cmd in dist.commands for cmd in ['build', 'build_ext']):
AttributeError: Distribution instance has no attribute 'commands'
But when I again do python setup.py install, it works perfectly. I am not able to find why the package is not installing on the first attempt but on the second attempt?
This happens irrespective of the underlying computer. And when I clone this repository somewhere else, the same error occurs on the first try, but not on the second. Why?

flask.cli.NoAppException: Could not import "flaskr.flaskr"

I'm working through: http://flask.pocoo.org/docs/1.0/tutorial/
I've written __init__.py (code here: http://codepad.org/4FGIE901) in a /flaskr/ directory, set up a virtual environment called 'venv' and installed Flask.
I then ran these commands — on the command line, in the flaskr directory – as 'Run the application' advises: (export FLASK_APP=flaskr, export FLASK_ENV=development, flask run)
What I should see is Hello, World!
Instead, I'm presented with the following errors:
Traceback (most recent call last):
File "/Users/David/Desktop/flaskr/venv/lib/python3.6/site-packages/flask/cli.py", line 330, in __call__
rv = self._load_unlocked()
File "/Users/David/Desktop/flaskr/venv/lib/python3.6/site-packages/flask/cli.py", line 317, in _load_unlocked
self._app = rv = self.loader()
File "/Users/David/Desktop/flaskr/venv/lib/python3.6/site-packages/flask/cli.py", line 372, in load_app
app = locate_app(self, import_name, name)
File "/Users/David/Desktop/flaskr/venv/lib/python3.6/site-packages/flask/cli.py", line 246, in locate_app
'Could not import "{name}".'.format(name=module_name)
flask.cli.NoAppException: Could not import "flaskr.flaskr".
Simply, I'm not sure how I should respond to or work upon fixing an error like this. Perhaps I have a mismatch in what I have installed in the venv and what this particular project requires?
Like this person: Could not Import Pandas: TypeError
Flask:
/Users/David/Desktop/flaskr/venv/bin/Flask
Version: 1.0.2
Pip:
from /Users/David/Desktop/flaskr/venv/lib/python3.6/site-packages (python 3.6)
Version: 9.0.1
Python:
/Users/David/Desktop/flaskr/venv/bin/python
Version: 3.6.0
I think you are in the wrong folder. You probably did:
cd flask_tutorial/flaskr
You need to go up to the tutorial folder:
cd ..
You should flask run in the flask_tutorial folder rather than flask_tutorial/flaskr because you want to import flaskr from that folder, not flaskr/flaskr (which doesn't exist).
Those running flask for the first time ..This works well for windows
find the location of your folder use cmd please 'mine is cd /users/hp/Desktop'
cd /users/hp/Desktop>set FLASK_APP=hello.py -->file name
cd /users/hp/Desktop>set FLASK_ENV=development
cd /users/hp/Desktop>flask run
I just installed the latest versions of Flask and Werkzeug and everything was fixed:
python -m pip install Flask==2.0.1
python -m pip install Werkzeug==2.0.1
Notes:
I did not use __init__.py
I run flask app as:
set FLASK_ENV=development
set FLASK_APP=app.py
flask run --port 5000

Ansible commands work when root, but not when default user. Throws a python PyYAML error only as default user, but then works for root

I recently ran some PIP commands in my server, and after doing so none of the Ansible commands would work as the default user. If I try to run any Ansible commands as the default user, I get the following error -
Traceback (most recent call last):
File "/usr/bin/ansible-playbook", line 4, in <module>
from pkg_resources import require; require('ansible==2.2.0')
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 2659, in <module>
parse_requirements(__requires__), Environment()
File "/usr/lib/python2.6/site-packages/pkg_resources.py", line 546, in resolve
raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: PyYAML
However, if I sudo and then try to run any ansible commands, it seems to work fine. I think I might have messed something up with the python packages that were installed with PIP, and now they only work for root.
How can I get the default user to be able to run these commands again?
Fixed it. The issue was with permissions when I ran the pip install. Running
chmod -R ugo+rX /usr/lib64/python2.6/site-packages/ and
chmod -R ugo+rX /usr/lib/python2.6/site-packages/ fixed the issue.

Categories