In Amazon EMR, I am using the following script as a custom bootstrap action to install python packages. The script runs OK (checked the logs, packages installed successfully) but when I open a notebook in Jupyter Lab, I cannot import any of them. If I open a terminal in JupyterLab and run pip list or pip3 list, none of my packages is there. Even if I go to / and run find . -name mleap for instance, it does not exist.
Something I have noticed is that on the master node, I am getting all the time an error saying bootstrap action 2 has failed (there is no second action, only one). According to this, it is a rare error which I get in all my clusters. However, my cluster eventually gets created and I can use it.
My script is called aws-emr-bootstrap-actions.sh
#!/bin/bash
sudo python3 -m pip install numpy scikit-learn pandas mleap sagemaker boto3
I suspect it might have something to do with a docker image being deployed that invalidates my previous installs or something, but I think (for my Google searches) it is common to use bootstrap actions to install python packages and should work ...
The PYSPARK, Python interpreter that Spark is using, is different than the one to which the OP was installing the modules (as confirmed in comments).
I'm trying to write to GCS bucket via Beam (and TF Transform). But I keep getting the following error:
ValueError: Unable to get the Filesystem for path [...]
The answer here and some other sources suggest that I need to pip install aache-beam[gcp] to get a different variant of Apache Beam that works with GCP.
So, I tried changing the setup.py of my training package as:
REQUIRED_PACKAGES = ['apache_beam[gcp]==2.14.0', 'tensorflow-ranking', 'tensorflow_transform==0.14.0']
which didn't help. I also tried adding the following to the beginning of my code:
subprocess.check_call('pip uninstall apache-beam'.split())
subprocess.check_call('pip install apache-beam[gcp]'.split())
which didn't work either.
The logs of the failed GCP job is here. The traceback and the error message appear on row 276.
I should mention that running the same code using Beam's DirectRunner and writing the outputs to local disk runs fine. But I'm now trying to switch to DataflowRunner.
Thanks.
It turns out that you need to uninstall google-cloud-dataflow in addition to installing apache-beam with the gcp option. I guess this happens because google-cloud-dataflow is installed on GCP instances by default. Not sure if the same would be true on other platforms like AWS. But anyway, here are the commands I used:
pip uninstall -y google-cloud-dataflow
pip install apache-beam[gcp]
I noticed this in the very first cell of [this notebook] (https://github.com/GoogleCloudPlatform/training-data-analyst/blob/master/courses/machine_learning/deepdive/10_recommend/wals_tft.ipynb).
I found this script (tutorial) on GitHub (https://github.com/amyoshino/Dash_Tutorial_Series/blob/master/ex4.py) and I am trying to run in my local machine.
Unfortunately I am having and Error
I would really appreciate if anyone can help me to run this script.
Perhaps this is something easy but I am new in coding.
Thank you!
You probably just need to pip install the dash-core-components library!
Take a look at the Dash Installation documentation. It currently recommends running these commands:
pip install dash==0.38.0 # The core dash backend
pip install dash-html-components==0.13.5 # HTML components
pip install dash-core-components==0.43.1 # Supercharged components
pip install dash-table==3.5.0 # Interactive DataTable component (new!)
pip install dash-daq==0.1.0 # DAQ components (newly open-sourced!)
For more info on using pip to install Python packages, see: Installing Packages.
If you have run those commands, and Flask still throws that error, you may be having a path/environment issue, and should provide more info in your question about your Python setup.
Also, just to give you a sense of how to interpret this error message:
It's often easiest to start at the bottom and work your way up.
Here, the bottommost message is a FileNotFound error.
The program is looking for the file in your Python37/lib/site-packages folder. That tells you it's looking for a Python package. That is the directory to which Python packages get installed when you use a tool like pip.
I am creating a Python Web App in Google App engine.
When I
sudo pip install
a third party library and then try to import it, I get the error 'ImportError: No module named x'. Where x is the name of that library. In my case as an example: Boto, Boto3, Fask etc.
If I go into shell in GAE and type python >> import X the library can be used inside the python environment. When deploying the app though or running the app in the virtaul server in Google App Engine I get the module import error.
I even tried methods like: python >> import sys >> sys.path.insert(0, "path_here")
export PYTHONPATH and selected where those libraries are located
I even followed several Q&A here in Stackoverflow without any success, can somebody please give me a proper way to fix the import error in Google App Engine?
FYI
I am not using any local environment in my pc, I am working directly through the GAE bash console, the launch code editor in GAE and I am running the command dev_appserver.py $PWD
When I do
pip freeze
I can see that the modules are currently installed and deployed on the GAE virtual environment. Is there a problem with my path? What's the best approach to make GAE load my already installed third party libraries.
UPDATE:
Importing the library directly on the python shell from Google App Engine works just fine. Importing the library on my python app index.py file results in the error.
Python import directly from Shell
Python import to the index.py file
Though this is an old thread, adding this answer now:
Run command : gcloud components list
This will show the different components installed and not in your environment.
Install app-engine-python components if not installed:
gcloud components install app-engine-python
gcloud components install app-engine-python-extras
If it doesn't work:
In Windows, uninstall and download and install Google-sdk (check the python version you need). Delete all the files the installer ask you to delete in the last step and run the gcloud component commands again.
I'm trying to teach myself python using Google's AppEngine, and I can't get the dev server running. I get this error:
Traceback (most recent call last):
File "/opt/google_appengine/google_appengine_1.2.7/dev_appserver.py",
line 60, in
run_file(file, globals()) File
"/opt/google_appengine/google_appengine_1.2.7/dev_appserver.py",
line 57, in run_file
execfile(script_path, globals_) File
"/opt/google_appengine/google_appengine_1.2.7/google/appengine/tools/dev_appserver_main.py",
line 65, in
from google.appengine.tools import os_compat ImportError: cannot import
name os_compat
Ubuntu 9.10 comes with python2.6 (didn't work), and I installed python2.5 (didn't work), and have tried running it with python dev_appserver.py helloWorld (didn't work) as well as running dev_appserver.py after editing the first line to be:
#!/usr/bin/env python2.5
I can't seem to find anything online with this error. The only problem I've found is about using python 2.5, and I think I've solved that.
Kyle suggested I need to set my PYTHONPATH variable. After running
export PYTHONPATH=/opt/google_appengine/google_appengine_1.2.7
I still get the same error trying to run dev_appserver.py. Am I setting PYTHONPATH wrong? Alternatively, how do I uninstall the protocol buffers python project? I have no use for Ubuntu One and had already uninstalled it.
The problem appears to be the fact that Karmic Koala 9.10 (the latest version of Ubuntu) ships with Ubuntu One, a python app that depends on Google's protocol buffers library. The python-protobuf package provides the google.protobuf package in /usr/lib/pymodules/python2.6.
Unfortunately, the AppEngine SDK includes another package called google.appengine. So somewhere in your code, the google package is being imported, and the package that contains protobuf is being found on PYTHONPATH first. Python caches the first package it finds in sys.modules, so the second google package in the SDK will never be imported.
You could move the google AppEngine SDK up to the front of your PYTHONPATH. That should ensure that Python finds the google.appengine package instead of the package provided by python-protobuf.
PYTHONPATH=/opt/google_appengine/google_appengine_1.2.7 \
python dev_appserver.py helloWorld
This is a bug that should be reported to the AppEngine SDK project.
Update: I've submitted a bug against the AppEngine API.
It was a file permission problem. os_compat.py wasn't readable by user, just by root. I'm not sure if I screwed this up, or if the permissions by default don't have read-all, but that was the fix.
I hate to accept my own answer after Kyle gave such a good response, but I don't need the $PYTHONPATH fix to make it work now that I did sudo chown -R +r /opt/google_appengine/google_appengine_1.2.7
With that error, Python is saying that it can't find or read the name that it's trying to import. Since the import of os_compat is the very first executable line of AppEngine's dev_appserver.py, I suspect that there's a problem with the way that your paths are configured.
The latest version of Ubuntu (10.10) has also removed Python 2.5 - making it a pain to install the App Engine development environment.
I (finally) got my environment working (including using App Engine Helper for unit testing). I built this bash script which might be useful to others. It installs:
sqlite
libsqlite
pep8
mock
OpenSSL
Python 2.5.2
Python SSL Library
Django 1.1 (latest version in production)
App Engine
App Engine Helper
http://pageforest.googlecode.com/hg/tools/pfsetup
Ubuntu 11.04 comes with python 2.6 as the default version. It is suggested to use Google app engine with version 2.5. I am using it though for many years with python 2.6 without any issues.
What you need to do in order to execute it smoothly with python 2.6 is to edit google/appengine/tools/dev_appserver.py and add these three lines
'_counter',
'_fastmath',
'strxor',
after 'XOR', and before '_Crypto_Cipher__AES', around line ~1350.
If you are now using Google Cloud SDK, put this into ~/.profile.
export CLOUDSDK_ROOT_DIR="/path/to/google/cloud/sdk/"
export APPENGINE_HOME="${CLOUDSDK_ROOT_DIR}/platform/appengine-java-sdk"
export GAE_SDK_ROOT="${CLOUDSDK_ROOT_DIR}/platform/google_appengine"
# The next line enables Java libraries for Google Cloud SDK
export CLASSPATH="${APPENGINE_HOME}/lib":${CLASSPATH}
# The next line enables Python libraries for Google Cloud SDK
export PYTHONPATH=${GAE_SDK_ROOT}:${PYTHONPATH}
# * OPTIONAL STEP *
# If you wish to import all Python modules, you may iterate in the directory
# tree and import each module.
#
# * WARNING *
# Some modules have two or more versions available (Ex. django), so the loop
# will import always its latest version.
for module in ${GAE_SDK_ROOT}/lib/*; do
if [ -r ${module} ]; then
PYTHONPATH=${module}:${PYTHONPATH}
fi
done
unset module
Do not put inside ~/.bashrc because, every time you open a bash session, all those modules will be added again and again into your PYTHONPATH environment variable.