Heroku unable to deploy python Pandas app

Heroku unable to deploy python Pandas app - python

folks. I recently wrote my first python Flask app, a simple data wrangling app that has the Pandas library as the only extra dependency (apart from Flask itself).
It retrieves data from online csv files, process them and return a few numbers inside a text. (I use no database. It takes the data from these online files and returns the result to the browser screen without storing it.)
I've built it within a conda environment for which I added only Pandas and Flask libraries. And it ran perfectly on my localhost.
But now I'm having a hard time trying to deploy it to Heroku...
First I generated my Procfile and a requirements.txt file using pip:
pip freeze > requirements.txt
But then, when I try to deploy the app, Heroku returns a series of errors related do "mkl" dependencies. I'm not sure if these are related to pandas, but here what they are: mkl-service==2.3.0, mkl_fft==1.3.0 and mkl_random==1.1.1.
I tried to change the versions of the packs, choosing versions suggeted by Heroku, but it didn't work for mkl-service, which seems to be not supported at all by Heroku. I then just erased this line and took a chance. I managed to deploy the app with no error messages, but then the app doesn't run.
I also tried to generate my requirements.txr file through conda rather than pip.
conda list -e requirements.txt
It generates a weird file with "=" operators rather than "==", which I corrected through search/replace and deleting info on endcaps. This time, it also included mkl==2020.2, a dependency that has over 300 MB and didn't make my app work properly neither.
I wonder if the problem with my app has to do with some mess I did by mixing conda and pip for my environment. Or maybe it has to do with getting pandas to work online. I wonder if some of you might know what's going wrong.

After trying several combinations I managed to get the app to work by creating a requirements.txt that takes only the names of gunicorn and the packages I had originally installed in my local enviroment
Flask
pandas
gunicorn
Of course, Flask and pandas have their own dependencies, but it turns out Heroku automatically installs them, just like pip does. So, this time, making requirements.txt as simple as possible worked fine.

Related

Flask app deployment can't import my package

I'm following this link Deploy to Production
After having deployed the whl file on the server, then installed it through pip, if I run pip list the package is present. But in a python console I can't import it.
Is there any reason ?
Following this tutorial packaging-projects, is it mandatory to upload the package as explained in the "Uploading the distribution archives" section ?
Thanks

I followed the link in my 1st post to create a package and I discovered the tree files was not correct. So I rearranged it and now it's fine

Amazon EMR pip install in bootstrap actions runs OK but has no effect

In Amazon EMR, I am using the following script as a custom bootstrap action to install python packages. The script runs OK (checked the logs, packages installed successfully) but when I open a notebook in Jupyter Lab, I cannot import any of them. If I open a terminal in JupyterLab and run pip list or pip3 list, none of my packages is there. Even if I go to / and run find . -name mleap for instance, it does not exist.
Something I have noticed is that on the master node, I am getting all the time an error saying bootstrap action 2 has failed (there is no second action, only one). According to this, it is a rare error which I get in all my clusters. However, my cluster eventually gets created and I can use it.
My script is called aws-emr-bootstrap-actions.sh
#!/bin/bash
sudo python3 -m pip install numpy scikit-learn pandas mleap sagemaker boto3
I suspect it might have something to do with a docker image being deployed that invalidates my previous installs or something, but I think (for my Google searches) it is common to use bootstrap actions to install python packages and should work ...

The PYSPARK, Python interpreter that Spark is using, is different than the one to which the OP was installing the modules (as confirmed in comments).

Use Pandas in Azure Functions

Trying to create a basic Python Function and use it in Azure Function App(Consumption based). Used the HTTP Template via VS Code and able to use and get it deployed on Azure. However when I try to use "Pandas" in the logic, I get the error which I am not able to rectify. Me being a rookie in Python. Can you suggest how to rectify ?
Tool Used : VS Code , Azure Functions Tools
Python version installed locally : 3.8.5
Azure Function App Python Version : 3.8

It seems the pandas module hasn't been installed in your function on azure. You need to add the pandas module into your local requirements.txt and then deploy the function from local to azure. It will install the modules according to the lines in requirements.txt.
You can run this command in "Terminal" window to generate the pandas line in your requirements.txt automatically.
pip freeze > requirements.txt
After running the command above, your requirements.txt should be like:

python: cannot import name beam_runner_api_pb2

I am relatively new to Python and Beam and I have followed the Apache Beam - Python Quickstart (here) to the last letter. My Python 2.7 virtual environment was created with conda.
I cloned the example from https://github.com/apache/beam
When I try to run
python -m apache_beam.examples.wordcount --input sample_text.txt --output counts
I get the following error
/Users/name/anaconda3/envs/py27/bin/python: cannot import name beam_runner_api_pb2
(which after searching I understand means that there is a circular import)
I have no idea where to begin. Is this a bug or something wrong with my setup.
(I have now tried redoing the example in three different virtual environments - all with the same result)

It seems it was my mistake. I did not correctly install the Google Cloud Platfrom (gcp) components. Once I did this it all worked
# As part of the initial setup, install Google Cloud Platform specific extra components.
pip install apache-beam[gcp]

Unable to install pandas on AWS Lambda

I'm trying to install and run pandas on an Amazon Lambda instance. I've used the recommended zip method of packaging my code file model_a.py and related python libraries (pip install pandas -t /path/to/dir/) and uploaded the zip to Lambda. When I try to run a test, this is the error message I get:
Unable to import module 'model_a': C extension:
/var/task/pandas/hashtable.so: undefined symbol: PyFPE_jbuf not built.
If you want to import pandas from the source directory, you may need
to run 'python setup.py build_ext --inplace' to build the C extensions
first.
Looks like an error in a variable defined in hashtable.so that comes with the pandas installer. Googling for this did not turn up any relevant articles. There were some references to a failure in numpy installation but nothing concrete. Would appreciate any help in troubleshooting this! Thanks.

I would advise you to use Lambda layers to use additional libraries. The size of a lambda function package is limited, but layers can be used up to 250MB (more here).
AWS has open sourced a good package, including Pandas, for dealing with data in Lambdas. AWS has also packaged it making it convenient for Lambda layers. You can find instructions here.

I have successfully run pandas code on lambda before. If your development environment is not binary-compatible with the lambda environment, you will not be able to simply run pip install pandas -t /some/dir and package it up into a lambda .zip file. Even if you are developing on linux, you may still run into compatability issues.
So, how do you get around this? The solution is actually pretty simple: run your pip install on a lambda container and use the pandas module that it downloads/builds instead. When I did this, I had a build script that would spin up an instance of the lambci/lambda container on my local system (a clone of the AWS Lambda container in docker), bind my local build folder to /build and run pip install pandas -t /build/. Once that's done, kill the container and you have the lambda-compatible pandas module in your local build folder, ready to zip up and send to AWS along with the rest of your code.
You can do this for an arbitrary set of python modules by making use of a requirements.txt file, and you can even do it for arbitrary versions of python by first creating a virtual environment on the lambci container. I haven't needed to do this for a couple of years, so maybe there are better tools by now, but this approach should at least be functional.

If you want to install it directly through the AWS Console, I made a step-by-step youtube tutorial, check out the video here: How to install Pandas on AWS Lambda

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.