Python Deployment Package with SKLEARN, PANDAS and NUMPY issue? - python

I am a newbie on the AWS & Python and trying to implement a simple ML recommendation system using AWS Lambda function for self-learning. I am stuck on the packaging the combination of sklearn, numpy and pandas. If combined any two lib means (Pandas and Numpy) or (Numpy and Skype) is working fine and deploy perfectly. Because I am using ML system then i need sklearn (scipy and pandas and numpy) which cannot work and getting this error on aws lambda test.
What I have done so far :
my deployment package from within a python3.6 virtualenv, rather than directly from the host machine. (have python3.6, virtualenv and awscli already installed/configured, and that your lambda function code is in the ~/lambda_code directory):
cd ~ (We'll build the virtualenv in the home directory)
virtualenv venv --python=python3.6 (Create the virtual environment)
source venv/bin/activate (Activate the virtual environment)
pip install sklearn, pandas, numpy
cp -r ~/venv/lib/python3.6/site-packages/* ~/lambda_code (Copy all installed packages into root level of lambda_code directory. This will include a few unnecessary files, but you can remove those yourself if needed)
cd ~/lambda_code
zip -r9 ~/package.zip . (Zip up the lambda package)
aws lambda update-function-code --function-name my_lambda_function --zip-file fileb://~/package.zip (Upload to AWS)
after that getting this error:
**"errorMessage": "Unable to import module 'index'"**
and
START RequestId: 0e9be841-2816-11e8-a8ab-636c0eb502bf Version: $LATEST
Unable to import module 'index': **Missing required dependencies ['numpy']**
END RequestId: 0e9be841-2816-11e8-a8ab-636c0eb502bf
REPORT RequestId: 0e9be841-2816-11e8-a8ab-636c0eb502bf Duration: 0.90 ms Billed Duration: 100 ms Memory Size: 128 MB Max Memory Used: 33 MB
I have tried this on EC2 instance as well but did not a success.I did the google and read multiple blogs and solution but not worked.
Please help me out on this.

u are using python 3.6 .
so
pip3 install numpy
should be used, make a try .

You need to make sure all the dependent libraries AND the Python file containing your function are all in one zip file in order for it to detect the correct dependencies.
So essentially, you will need to have Numpy, Panda and your own files all in one zip file before you upload it. Also make sure that your code is referring to the local files (in the same unzipped directory) as dependencies. If you have done that already, the issue is probably how your included libraries gets referenced. Make sure you are able to use the included libraries as a dependency by getting the correct relative path on AWS once it's deployed to Lambda.

So like Wai kin chung said, you need to use pip3 to install the libraries.
so to figure out which python version is default you can type:
which python
or
python -v
So in order to install with python3 you need to type:
python3 -m pip install sklearn, pandas, numpy --user
Once that is done, you can make sure that the packages are installed with:
python3 -m pip freeze
This will show all the python libraries installed with your python model.
Once you have the libraries you would want to continue with you regular steps. Of course you would first want to delete everything that you have placed in ~/venv/lib/python3.6/site-packages/*.
cd ~/lambda_code
zip -r9 ~/package.zip

If you're running this on Windows (like I was), you'll run into an issue with the libraries being compiled on an incompatible OS.
You can use an Amazon Linux EC2 instance, or a Cloud9 development instance to build your virtualenv as detailed above.
Or, you could just download the pre-compiled wheel files as discussed on this post:
https://aws.amazon.com/premiumsupport/knowledge-center/lambda-python-package-compatible/
Essentially, you need to go to the project page on https://pypi.org and download the files named like the following:
For Python 2.7: module-name-version-cp27-cp27mu-manylinux1_x86_64.whl
For Python 3.6: module-name-version-cp36-cp36m-manylinux1_x86_64.whl
Then unzip the .whl files to your project directory and re-zip the contents together with your lambda code.

Was having a similar problem on Ubuntu 18.04.
Solved the issue by using python3.7 and pip3.7.
Its important to use pip3.7 when installing the packages, like pip3.7 install numpy or pip3.7 install numpy --user
To install python3.7 and pip3.7 on Ubuntu you can use deadsnakes/ppa
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt-get update
sudo apt-get install python3.7
curl https://bootstrap.pypa.io/get-pip.py -o /tmp/get-pip.py
python3.7 /tmp/get-pip.py
This solution should also work on Ubuntu 16.04.

Related

How to pip freeze source package

I am learning how to use venv here: https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#installing-from-source
And it says I can install a source package by:
python3 -m pip install .
Which works, but now if I do pip freeze then I see:
my-package # file:///Users/joesmith/my-package
The problem is if I export this to a requirements.txt and try to install this environment on another machine then it won't work cause the path to source changed obviously.
What is the proper way to use a source package locally like i did but also export it afterwards so that another person can recreate the environment/run the code on another machine?
Pip has support for VCS like git. You can upload your code to git (e.g. Github, Gitlab, ..) for example and then use the requirements.txt. like this:
git+http://git.example.com/MyProject#egg=MyProject
https://pip.pypa.io/en/stable/cli/pip_install/#vcs-support
You would install package from PyPI rather than from source.
i.e. pip install requests
In this way other developers will also easily run your project.

Airbnb Airflow: Installing Airflow without pip

Is there a way to install Airflow without pip?
I am trying to install Airflow on a offline computer that does not have pip. I have downloaded the packages from the internet but I am not sure how to run the installation without pip.
Does anybody knows how to run an installation with 'setup.py'?
# /usr/bin/bash
python setup.py build
python setup.py install
If you get problems like the lack of setuptools you can install it (depending on your system, it usually a package. If you want to be sure to have everything download and later install the python-devel (sometimes also python-dev) package with your OS package manager) or you could use airflow via docker, you will build the image and then export it to the offline unit.
To install it without pip,
download airflow zip from git-repo.
unzip the contents and follow the instructions in the INSTALL file. (Steps inside create a virtualenv and then start the airflow installation)
These are the steps you will find in INSTALL file:
python -m my_env
source my_env/bin/activate
# [required] by default one of Apache Airflow's dependencies pulls in a GPL
# library. Airflow will not install (and upgrade) without an explicit choice.
#
# To make sure not to install the GPL dependency:
# export SLUGIFY_USES_TEXT_UNIDECODE=yes
# In case you do not mind:
# export AIRFLOW_GPL_UNIDECODE=yes
# [required] building and installing
# by pip (preferred)
pip install .
# or directly
python setup.py install
Alternative, If you are facing any conflicts while installing through pip,
create a virtualenv
source-activate virtualenv
Follow steps from airflow to install and configure it.
One way or the other you will have to use pip as there might be a number of modules you may have to download using pip. Creating a virtualenv here will help.

'bz2 is module not available' when installing Pandas with pip in python virtual environment

I am going through this post Numpy, Scipy, and Pandas - Oh My!, installing some python packages, but got stuck at the line for installing Pandas:
pip install -e git+https://github.com/pydata/pandas#egg=pandas
I changed 'wesm' to 'pydata' for the latest version, and the only other difference to the post is that I'm using pythonbrew.
I found this post, related to the error, but where is the Makefile for bz2 mentioned in the answer? Is there another way to resolve this problem?
Any help would be much appreciated. Thanks.
You need to build python with BZIP2 support.
Install the following package before building python:
Red Hat/Fedora/CentOS: yum install bzip2-devel
Debian/Ubuntu: sudo apt-get install libbz2-dev
Extract python tarball. Then
configure;
make;
make install
Install pip using the new python.
Alternative:
Install a binary python distribution using yum or apt, that was build with BZIP2 support.
See also: ImportError: No module named bz2 for Python 2.7.2
I spent a lot of time on the internet and got a partial answer everywhere. Here is what you need to do to make it work. Follow every step.
sudo apt-get install libbz2-dev Thanks to Freek Wiekmeijer for this.
Now you also need to build python with bz2. Previously installed python won't work. For that do following:
Download stable python version from https://www.python.org/downloads/source/ then extract that Gzipped source tarball file. You can use wget https://python-tar-file-link.tgz to download and tar -xvzf python-tar-file.tgz to extract it in current directory
Go inside extracted folder then run following commands one at a time
./configure
make
make install
This will build a python file with bz2 that you previously installed
Since this python doesn't have pip installed, idea was to create a virtual environment with above-built python then install pandas using previously installed pip
You will see python file in the same directory. Just create a virtual environment.
./python -m env myenv (create myenv in the same directory or outside it's your choice)
source myenv/bin/activate (activate virtual environment)
pip install pandas (install pandas in the current environment)
That's it. Now with this environment, you should be able to use pandas without error.
pyenv
I noticed that installing Python using source takes a long time (I am doing it on i7 :/ ); especially the make and make test...
A simpler and shorter solution was to install another version of Python (I did Python 3.7.8) using pyenv, install it using these steps.
It not only saved the problem of using multiple Python instances on the same system but also maintain my virtual environments without virtualenvwrapper (which turned buggy on my newly setup ubuntu-20.04).

Can not import snappy in python

I use the package named python-snappy. This package requires snappy library. So, I download and install snappy successfully by the following commands such as:
./configure
make
sudo make install
When I import snappy, I receive the errors:
from _snappy import CompressError, CompressedLengthError, \
ImportError: libsnappy.so.1 cannot open shared object file: No such file or directory
I'm using Python 2.7, snappy, python-snappy and Ubuntu 12.04
How can I fix this problem?
Thanks
Traditionally you might have to run the ldconfig utility to update your /etc/ld.so.cache (or equivalent as appropriate to your OS). Sometimes it might be necessary to add new entries (paths) to your /etc/ld.so.conf.
Basically the shared object (so) loaders on many versions of Unix (and probably other Unix-like operating systems) use a cache to help resolve their base filenames into actual files to be loaded (usually mmap()'d). This is roughly similar to the intermittent need to run hash -r or rehash in your shell after adding things to directories in your PATH.
Usually you can just run ldconfig with no arguments (possibly after adding your new library's path to your /etc/ld.so.conf text file). Good Makefiles will do this for you during make install.
Here's a little bit more info: http://linux.101hacks.com/unix/ldconfig/
You can install the python-snappy and libsnappy1 from the ubuntu repos:
$ sudo apt-get install libsnappy1 python-snappy
You should not have to download anything.
the following worked for me:
$ conda install python-snappy
then in my code I used:
import snappy
Here for e.g. anaconda python
Download snappy from github
also download the python file
extract both files
google-snappy folder
$ ./configure
$ make
$ sudo make install
Then in python folder:
$ python setup.py build # here I get the same import _snappy error
$ python setup.py install # after this import works

associate python packages to a different version of python on ubuntu

I am using ubuntu 11.04, which comes with the system-wide python 2.6. Now, I installed the python2.7 in addition to the v2.6.
Now, the question is, if I want to install the latest version of numpy, scipy, matplotlib, etc to make them associated with the python2.7, what should I do to make sure they are not associated with the python 2.6?
Thanks.
J.
You have a few options. Which is best depends on what you want to use those libraries for. If you're doing development, virtualenv is a good idea:
$ virtualenv -p /usr/bin/python2.7 py27env && . py27env/bin/activate
py27env$ pip install numpy scipy matplotlib
Pull down the latest tarballs for numpy, scipy, and matplotlib. You can get numpy and scipy from here:
http://scipy.org/Download
Matplotlib can be found here:
http://sourceforge.net/projects/matplotlib/files/matplotlib/matplotlib-1.1.0/
Then open up a terminal and use python 2.7 to install them using the setup.py scripts that come with the tarballs. For example, do the following for numpy (assuming you've pulled down the latest tarball already from sourceforge and it's sitting on your desktop:
$ mv Desktop/numpy-1.6.2.tar.gz /tmp/
$ cd /tmp/
$ tar -xvzf numpy-1.6.2.tar.gz
$ cd numpy-1.6.2
$ python2.7 setup.py install
That should do it. Tarballs for python code generally come with a setup.py script that will install things in the right place for the version of python you run it with.
Seems like this post answers your question:
Newbie hint on installing Python and it’s modules and packages
You install every Python separately, you install every module and
package separately in those Python install, and you use everything
explicitly.

Categories