How to bundle Python dependancies in IronWorker?

How to bundle Python dependancies in IronWorker? - python

I'm writing a simple IronWorker in Python to do some work with the AWS API.
To do so I want to use the boto library which is distributed via PyPi repository. The boto library is not installed by default in the IronWorker runtime environment.
How can I bundle the boto library dependancy with my IronWorker code?
Ideally I'm hoping I can use something like the gem dependancy bundling available for Ruby IronWorkers - i.e in myRuby.worker specify
gemfile '../Gemfile', 'common', 'worker' # merges gems from common and worker groups
In the Python Loggly sample, I see that the hoover library is used:
#here we have to include hoover library with worker.
hoover_dir = os.path.dirname(hoover.__file__)
shutil.copytree(hoover_dir, worker_dir + '/loggly') #copy it to worker directory
However, I can't see where/how you specify which hoover library version you want, or where to download it from.
What is the official/correct way to use 3rd party libraries in Python IronWorkers?

Newer iron_worker version has native support of pip command.
So, you need:
runtime "python"
exec "something.py"
pip "boto"
pip "someotherpip"
full_remote_build true

[edit]We've worked on our toolset a bit since this answer was written and accepted. The answer from my colleague below is the recommended course moving forward.[/edit]
I wrote the Python client library for IronWorker. I'm also employed by Iron.io.
If you're using the Python client library, the easiest (and recommended) way to do this is to just copy over the library's installed folder, and include it when uploading the package. That's what the Python Loggly sample is doing above. As you said, that doesn't specify a version or where to download the library from, because it doesn't care. It just takes the one installed on your system and uses it. Whatever you get when you enter "import boto" on your local machine is what would be uploaded.
The other option is using our CLI to upload your worker, with a .worker file.
To do this, here's what you'd need to do:
Create a botoworker.worker file:
runtime "binary"
build 'pip install --install-option="--prefix=`pwd`/pips" boto'
file 'botoworker.py'
exec "botoworker.sh"
That second line is the pip command that will be run to install the dependency. You can modify it like you would any pip command run from the command line. It's going to execute that command on the worker during the "build" phase, so it's only executed once instead of every time you run a task.
The third line should be changed to the Python file you want to run--it's your Python worker file. Here's the one we used to test this:
import boto
If you save that as botoworker.py, the above should work without any modification. :)
The fourth line is a shell script that's going to actually run your worker. I've included the one we used below. Just save it as botoworker.sh, and you won't have to worry about modifying the .worker file above.
PYTHONPATH="$HOME/pips/lib/python2.7/site-packages:$PYTHONPATH" python botoworker.py "$#"
You'll notice it refers to your Python file--if you don't name your Python file botoworker.py, remember to change it here, too. All this does is set your PYTHONPATH to include the installed library, and then runs your Python file.
To upload this, just make sure you have the CLI installed (gem install iron_worker_ng, making sure your Ruby version is 1.9.3 or higher) and then run "iron_worker upload botoworker" in your shell, from the same directory your botoworker.worker file is in.
Hope this helps!

Related

How do I add python libraries to an AWS lambda function for Alexa?

I was following the tutorial to create an Alexa app using Python:
Python Alexa Tutorial
I was able to successfully follow all the steps and get the app to work.I now want to modify the python code and use external libraries such as import requests
or any other libraries that I install using pip. How would I setup my lambda function to include any pip packages that I install locally on my machine?

As it is described in the Amazon official documentation link here It is as simple as just creating a zip of all the folder contents after installing the required packages in your folder where you have your python lambda code.
As Vineeth pointed above in his comment, The very first step in moving from an inline code editor to a zip file upload approach is to change your lambda function handler name under configuration settings to include the python script file name that holds the lambda handler.
lambda_handler => {your-python-script-file-name}.lambda_handler.
Other solutions like python-lambda and lambda-uploader help with simplifying the process of uploading and the most importantly LOCAL TESTING. These will save a lot of time in development.

The official documentation is pretty good. In a nutshell, you need to create a zip file of a directory containing both the code of your lambda function and all external libraries you use at the top level.
You can simulate that by deactivating your virtualenv, copying all your required libraries into the working directory (which is always in sys.path if you invoke a script on the command line), and checking whether your script still works.

You may want to look into using frameworks such as zappa which will handle packaging up and deploying the lambda function for you.
You can use that in conjunction with flask-ask to have an easier time making Alexa skills. There's even a video tutorial of this (from the zappa readme) here

To solve this particular problem we're using a library called juniper. In a nutshell, all you need to do is create a very simple manifest file that looks like:
functions:
# Name the zip file you want juni to create
router:
# Where are your dependencies located?
requirements: ./src/requirements.txt.
# Your source code.
include:
- ./src/lambda_function.py
From this manifest file, calling juni build will create the zip file artifact for you. The file will include all the dependencies you specify in the requirements.txt.
The command will create this file ./dist/router.zip. We're using that file in conjunction with a sam template. However, you can then use that zip and upload it to the console, or through the awscli.

Echoing #d3ming's answer, a framework is a good way to go at this point. Creating the deployment package manually isn't impossible, but you'll need to be uploading your packages' compiled code, and if you're compiling that code on a non-linux system, the chance of running into issues with differences between your system and the Lambda function's deployed environment are high.
You can then work around that by compiling your code on a linux machine or Docker container.. but between all that complexity you can get much more from adopting a framework.
Serverless is well adopted and has support for custom python packages. It even integrates with Docker to compile your python dependencies and build the deployment package for you.
If you're looking for a full tutorial on this, I wrote one for Python Lambda functions here.

Amazon created a repository that deals with your situation:
https://github.com/awsdocs/aws-lambda-developer-guide/tree/master/sample-apps/blank-python
The blank app is an example on how to push a lambda function that depends on requirements, with the bonus that being made by Amazon.
Everything you need to do is to follow the step by step, and update the repository based on your needs.

For some lambda POCs and fast lambda prototyping you can include and use the following function _install_packages, you can place a call to it before lambda handling function (for lambda init time package installation, if your deps need less than 10 seconds to install) or place the call at the beginning of the lambda handler (this will call the function exactly once at the first lambda event). Given pip install options included, packages to be installed must provide binary installable versions for manylinux.
_installed = False
def _install_packages(*packages):
global _installed
if not _installed:
import os
import sys
import time
_started = time.time()
os.system("mkdir -p /tmp/packages")
_packages = " ".join(f"'{p}'" for p in packages)
print("INSTALLED:")
os.system(
f"{sys.executable} -m pip freeze --no-cache-dir")
print("INSTALLING:")
os.system(
f"{sys.executable} -m pip install "
f"--no-cache-dir --target /tmp/packages "
f"--only-binary :all: --no-color "
f"--no-warn-script-location {_packages}")
sys.path.insert(0, "/tmp/packages")
_installed = True
_ended = time.time()
print(f"package installation took: {_ended - _started:.2f} sec")
# usage example before lambda handler
_install_packages("pymssql", "requests", "pillow")
def lambda_handler(event, context):
pass # lambda code
# usage example from within the lambda handler
def lambda_handler(event, context):
_install_packages("pymssql", "requests", "pillow")
pass # lambda code
Given examples install python packages: pymssql, requests and pillow.
An example lambda that installs requests and then calls ifconfig.me to obtain it's egress IP address.
import json
_installed = False
def _install_packages(*packages):
global _installed
if not _installed:
import os
import sys
import time
_started = time.time()
os.system("mkdir -p /tmp/packages")
_packages = " ".join(f"'{p}'" for p in packages)
print("INSTALLED:")
os.system(
f"{sys.executable} -m pip freeze --no-cache-dir")
print("INSTALLING:")
os.system(
f"{sys.executable} -m pip install "
f"--no-cache-dir --target /tmp/packages "
f"--only-binary :all: --no-color "
f"--no-warn-script-location {_packages}")
sys.path.insert(0, "/tmp/packages")
_installed = True
_ended = time.time()
print(f"package installation took: {_ended - _started:.2f} sec")
# usage example before lambda handler
_install_packages("requests")
def lambda_handler(event, context):
import requests
return {
'statusCode': 200,
'body': json.dumps(requests.get('http://ifconfig.me').content.decode())
}
Given single quote escaping is considered when building pip's command line, you can specify a version in a package spec like this pillow<9, the former will install most recent 8.X.X version of pillow.

I too struggled for a while with this. The after deep diving into aws resources I got to know the lambda function on aws runs locally on a a linux. And it's very important to have the the python package version which matches with the linux version.
You may find more information on this on :
https://aws.amazon.com/lambda/faqs/
Follow the steps to download the version.
1. Find the .whl image of the package from pypi and download it on you local.
2. Zip the packages and add them as layers in aws lambda
3. Add the layer to the lambda function.
Note: Please make sure that version you're trying to install python package matches the linux os on which the aws lambda performs computes tasks.
References :
https://pypi.org/project/Pandas3/#files

A lot of python libraries can be imported via Layers here: https://github.com/keithrozario/Klayers, or your can use a framework like serverless that has plugins to package packages directly into your artifact.

Set up Brew installed Python 2.7.X as sdk for Intellij / Pycharm

I am trying to import the brew installed version of python by emulating the Global Libraries structure existing for the (mostly) working mac os built-in 2.7.2. However IJ is unable to infer the types or to create the library properly.
Update this is a large existing project. Creating a new project just to get a different version of python is not an option.
Here are the steps:
Try to create new Global Library: Fail : no python .
OK, so I use Copy to clone the built-in SDK:
Now - let us try to emulate the paths included in the original built-in but with the brew base dir: here is a starting point:
And here is one of the exact entries from the builtin library:
So let us clikc on the + to add it:
So .. IJ is unable to handle it properly. I also tried a half dozen others - all with same shrug result from IJ.
So then what is the correct process?
Update Here is the project SDK dialog (thanks to scribbles).
And trying to add: **but the "OK" button is not enabled! So then IJ is not able to load it..

New Project -> Select SDK.
See this video if you still have any questions.
EDIT: Is this more along the lines of what you're looking for (link)?

This is old, but but I ran into the same problem with the current Python 2 install from homebrew in High Sierra. Instead of choosing a directory like it needed in the previous setup, I just setup the Python SDK pointing to the python executable link in /usr/local/opt/python/libexec/bin (which is the directory I added to my path for Python 2. It seems to be working just fine now.
Hopefully this will help someone.

eclipse pydev - how to install python modules

Just working my way through a (very good) book call Test Driven Development using Python.
This makes use of Python3.4 by the way. By the way, I am running in a Windows 7 OS.
I've got all the stuff working using a simple text editor and running from the command line... in the course of which in particular I used "pip install" to install Django and Selenium, as per book's instructions.
This created folders "selenium" and "django" under ...\Python34\Lib\site-packages\ ... so I added these to the PythonPath for my Eclipse/PyDev project.
With the correct interpreter selected I then tried to run a file which runs fine on the command line: "> python3 functional_tests.py"... but I get
File "D:\apps\Python34\lib\site-packages\django\http\__init__.py", line 1, in <module>
from django.http.cookie import SimpleCookie, parse_cookie
File "D:\apps\Python34\lib\site-packages\django\http\cookie.py", line 5, in <module>
from django.utils.six.moves import http_cookies
ImportError: cannot import name 'http_cookies'
... to me this looks like a dependency thing... as though "pip install" handles dependency matters in a way just including a single folder doesn't.
Question boils down to this: what's the "proper" way to install a python module using PyDev?
several days later
wow... nothing? Nothing! I suppose this must mean that you either have to add dependencies manually or use something like Ant, Maven or Gradle within Eclipse itself. These latter are not my strong areas, even outside an IDE. Would still be nice to have an answer from a PyDev expert!

Well, pip install should work for PyDev (it should automatically recognize the dependency)...
I.e.: in your use case, the only folder that should be in the PYTHONPATH is D:\apps\Python34\lib\site-packages (and pip should install packages to that folder -- make sure you don't add extra folders for "D:\apps\Python34\lib\site-packages\django" nor anything else inside the site-packages to the PYTHONPATH).
If it's still not working, please check if the module django.utils.six.moves.http_cookies is indeed where you expect it to be. Also, you can print the PYTHONPATH being used in runtime with:
import sys
print('\n'.join(sorted(sys.path)))
To check if that's really what you expect.

Is there a faster method to load a yaml file than the standard .load method? Django/Python

I am loading a big yaml file and it is taking forever. I am wondering if there is a faster method than the yaml.load() method.
I have read that there is a CLoader method but havent been able to run it.
The website that suggested this CLoader method asks me to do this:
Download the source package PyYAML-3.08.tar.gz and unpack it.
Go to the directory PyYAML-3.08 and run:
$ python setup.py install
If you want to use LibYAML bindings, which are much faster than the pure Python version, you need to download and install LibYAML.
Then you may build and install the bindings by executing
$ python setup.py --with-libyaml install
In order to use LibYAML based parser and emitter, use the classes CParser and CEmitter:
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
This looks like this will work but I dont have a setup.py directory anywhere in my Django project and therefore can't install/import any of these things
Can anyone help me figure out how to do this or let me know about another faster loading method??
Thanks for the help!!

I have no idea what's faster - bspymaster's ideas might be the most useful.
When you download PyYAML-3.08.tar.gz, inside the archive there will be a setup.py what you can run.
Note to use LibYAML, download this: http://pyyaml.org/download/libyaml/yaml-0.1.4.tar.gz
And run using the instructions from http://pyyaml.org/wiki/LibYAML
You will need a set a build tools, which should be installed on linux/unix, for osx make sure xcode is installed, and I'm not sure about windows.

How can I import hbase in python?

I'm trying to play around with hbase in python and I am using the cloudera repository to install the hadoop/hbase packages. It seems to work as I can access and work on the database using the shell but its not fully working within python.
I know to communicate with hbase I need thrift so I downloaded and complied it from source, I can import thrift into python but when I do from hbase import Hbase, I get module not found errors.
Does anyone know what package/module I would need to get it to work? I tried to look around easy_install and yum(I'm using centos6) and no luck. I did find an article where a person using debain installed it by doing sudo aptiutde install python-hbase I don't have that command/package, so I'm not sure how to get it(or if I have to compile from source to get it).
Also if it helps, I installed most of the base from cloudera and followed some instructions(the ones didn't require install) from http://yannramin.com/2008/07/19/using-facebook-thrift-with-python-and-hbase/
Any help/tips/suggestions would be great.
Thanks!

Have a look at HappyBase (see https://github.com/wbolster/happybase for info). It is the modern way to interact with HBase from Python. It covers the complete Thrift API but wraps it in a much better interface.

Okay, I figured it out. If anyone else is having problems with this in the future its actually pretty easy. In the step where you run thrift --gen py Hbase.thrift, it creates a hbase folder in the location you ran that command. Simply take that command and copy it to your default module folder(or in the folder where you run your program and it should work).

search for /src/contrib/thriftfs/gen-py under hadoop installation folder
Copy the output of thrift --gen py Hbase.thrif onto the location below (part till /home/hadoop/data/ will differ in your case) /home/hadoop/data/hadoop-1.0.4/src/contrib/thriftfs/gen-py
then
$ python
import sys
sys.path.append("/home/hadoop/data/hadoop-1.0.4/src/contrib/thriftfs/gen-py")
import hbase
It should work now

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.