I have created a simple HTTP trigger-based azure function in python which is calling another python script to create a sample file in azure data lake gen 1. My solution structure is given below: -
Requirements.txt contains the following imports: -
azure-functions
azure-mgmt-resource
azure-mgmt-datalake-store
azure-datalake-store
init.py
import logging, os, sys
import azure.functions as func
import json
def main(req: func.HttpRequest) -> func.HttpResponse:
logging.info('Python HTTP trigger function processed a request.')
name = req.params.get('name')
if not name:
try:
req_body = req.get_json()
except ValueError:
pass
else:
name = req_body.get('name')
if name:
full_path_to_script = os.path.join(os.path.dirname( __file__ ) + '/Test.py')
logging.info(f"Path: - {full_path_to_script}")
os.system(f"python {full_path_to_script}")
return func.HttpResponse(f"Hello {name}!")
else:
return func.HttpResponse(
"Please pass a name on the query string or in the request body",
status_code=400
)
Test.py
import json
from azure.datalake.store import core, lib, multithread
directoryId = ''
applicationKey = ''
applicationId = ''
adlsCredentials = lib.auth(tenant_id = directoryId, client_secret = applicationKey, client_id = applicationId)
adlsClient = core.AzureDLFileSystem(adlsCredentials, store_name = '')
with adlsClient.open('stage1/largeFiles/TestFile.json', 'rb') as input_file:
data = json.load(input_file)
with adlsClient.open('stage1/largeFiles/Result.json', 'wb') as responseFile:
responseFile.write(data)
Test.py is failing with an error that no module found azure.datalake.store
Why other required modules are not working for Test.py since it is inside the same directory?
pip freeze output: -
adal==1.2.2
azure-common==1.1.23
azure-datalake-store==0.0.48
azure-functions==1.0.4
azure-mgmt-datalake-nspkg==3.0.1
azure-mgmt-datalake-store==0.5.0
azure-mgmt-nspkg==3.0.2
azure-mgmt-resource==6.0.0
azure-nspkg==3.0.2
certifi==2019.9.11
cffi==1.13.2
chardet==3.0.4
cryptography==2.8
idna==2.8
isodate==0.6.0
msrest==0.6.10
msrestazure==0.6.2
oauthlib==3.1.0
pycparser==2.19
PyJWT==1.7.1
python-dateutil==2.8.1
requests==2.22.0
requests-oauthlib==1.3.0
six==1.13.0
urllib3==1.25.6
Problem
os.system(f"python {full_path_to_script}") from your functions project is causing the issue.
Azure Functions Runtime sets up the environment, along with modifying process level variables like os.path so that your function can load any dependencies you may have. When you create a sub-process like that, not all information will flow through. Additionally, you will face issues with logging -- logs from test.py would not show up properly unless explicitly handled.
Importing works locally because you have all your requirements.txt modules installed and available to test.py. This is not the case in Azure. After remotely building as part of publish, your modules are included as part of your code package published. It's not "installed" globally in the Azure environment per se.
Solution
You shouldn't have to run your script like that. In the example above, you could import your test.py from your __init__.py file, and that should behave like it was called python test.py (at least in the case above). Is there a reason you'd want to do python test.py in a sub-process over importing it?
Here's the official guide on how you'd want to structure your app to import shared code -- https://learn.microsoft.com/en-us/azure/azure-functions/functions-reference-python#folder-structure
Side-Note
I think once you get through the import issue, you may also face problems with adlsClient.open('stage1/largeFiles/TestFile.json', 'rb'). We recommend following the developer guide above to structure your project and using __file__ to get the absolute path (reference).
For example --
import pathlib
with open(pathlib.Path(__file__).parent / 'stage1' / 'largeFiles' /' TestFile.json'):
....
Now, if you really want to make os.system(f"python {full_path_to_script}") work, we have workarounds to the import issue. But, I'd rather not recommend such approach unless you have a really compelling need for it. :)
Related
I would like to be able to run an ad-hoc python script that would access and run analytics on the model calculated by a dbt run, are there any best practices around this?
We recently built a tool that could that caters very much to this scenario. It leverages the ease of referencing tables from dbt in Python-land. It's called fal.
The idea is that you would define the python scripts you would like to run after your dbt models are run:
# schema.yml
models:
- name: iris
meta:
owner: "#matteo"
fal:
scripts:
- "notify.py"
And then the file notify.py is called if the iris model was run in the last dbt run:
# notify.py
import os
from slack_sdk import WebClient
from slack_sdk.errors import SlackApiError
CHANNEL_ID = os.getenv("SLACK_BOT_CHANNEL")
SLACK_TOKEN = os.getenv("SLACK_BOT_TOKEN")
client = WebClient(token=SLACK_TOKEN)
message_text = f"""Model: {context.current_model.name}
Status: {context.current_model.status}
Owner: {context.current_model.meta['owner']}"""
try:
response = client.chat_postMessage(
channel=CHANNEL_ID,
text=message_text
)
except SlackApiError as e:
assert e.response["error"]
Each script is ran with a reference to the current model for which it is running in a context variable.
To start using fal, just pip install fal and start writing your python scripts.
For production, I'd recommend an orchestration layer such as apache airflow.
See this blog post to get started, but essentially you'll have an orchestration DAG (note - not a dbt DAG) that does something like:
dbt run <with args> -> your python code
Fair warning, though, this can add a bit of complexity to your project.
I suppose you could get a similar effect with a CI/CD tool like github actions or circleCI
I'm trying to import a custom module that I created, but it breaks my API just to import it.
data directory:
-src
--order
----__init__.py
----app.py
----validator.py
----requirments.txt
--__init__.py
on my app.py I have this code:
import json
from .validator import validate
def handler(event, context):
msg = ''
if event['httpMethod'] == 'GET':
msg = "GET"
elif event['httpMethod'] == 'POST':
pass #msg = validate(json.loads(event['body']))
return {
"statusCode": 200,
"body": json.dumps({
"message": msg,
}),
}
I get this error:
Unable to import module 'app': attempted relative import with no known parent package
However, if I remove line 2 (from .validator import validate) from my code, it works fine, so the problem is with that import, and honestly, I can't figure what is going on. I have tried to import using:
from src.order.validator import validate
but it doesn't work either.
was able to solve my issue by generating a build through the command: sam build, and zipping my file, and putting it on the root folder inside aws-sam, it's not a great solution because I have to rebuild at every small change, but at least it's a workaround for now
It seems app.py has not been loaded as part of the package hierarchy (i.e. src and order packages have not been loaded). You should be able to run
from src.order import app
from the parent directory of src and your code will work. If you run python app.py from the terminal — which I assume is what you did — app.py will be run as a standalone script — not as part of a package.
However, I do not believe you need the .validator in your case since both modules are in the same directory. You should be able to do
from validator import validate
When I create an AWS Lambda Layer, all the contents / modules of my zip file go to /opt/ when the AWS Lambda executes. This easily becomes cumbersome and frustrating because I have to use absolute imports on all my lambdas. Example:
import json
import os
import importlib.util
spec = importlib.util.spec_from_file_location("dynamodb_layer.customer", "/opt/dynamodb_layer/customer.py")
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
def fetch(event, context):
CustomerManager = module.CustomerManager
customer_manager = CustomerManager()
body = customer_manager.list_customers(event["queryStringParameters"]["acquirer"])
response = {
"statusCode": 200,
"headers": {
"Access-Control-Allow-Origin": "*"
},
"body": json.dumps(body)
}
return response
So I was wondering, is it possible to add these /opt/paths to the PATH environment variable by beforehand through serverless.yml? In that way, I could just from dynamodb_layer.customer import CustomerManager, instead of that freakish ugliness.
I've Lambda layer for Python3.6 runtime. My my_package.zip structure is:
my_package.zip
- python
- lib
- python3.6
- site-packages
- customer
All dependencies are in build folder in project root:
e.g. build/python/lib/python3.6/site-packages/customer
Relevant section of my serverless.yml
layers:
my_package:
path: build
compatibleRuntimes:
- python3.6
In my Lambda I import my package like I would do any other package:
import customer
Have you tried setting your PYTHONPATH env var?
https://stackoverflow.com/a/5944201/6529424
Have you tried adding to sys.path?
https://stackoverflow.com/a/12257807/6529424
In the zip archive, the module needs to be placed the in a python subdirectory so that when it is extracted as a layer in Lambda, it is located in /opt/python. That way you'll be able to directly import your module without the need for importlib.
It's documented here or see this detailed blogpost from an AWS dev evangelist for more.
Setting of PYTHONPATH variable is not required, as long as you place items correctly inside the zip file.
Simple modules, and package directories, these should be placed inside a directory "python", and then the whole python/ placed into the zip file for uploading to AWS as a layer. Don't forget to add the "compatible runtimes" (eg Python 3.6, 3.7, 3.8 ...) settings for the layers.
So as an example:
python/
- my_module.py
- my_package_dir
-- __init__.py
-- package_mod_1.py
-- package_mod_2.py
which then get included in the zip file.
zip -r my_layer_zip.zip python/
The modules can then be imported without any more ado, when accessed as a layer:
....
import my_module
from my_package.package_mod_2 import mod_2_function
....
You can see the package structure from within the lambda if you look at '/opt/python/' which will show my_module.py, my_package/ etc., this is easily tested using the AWS Lambda test function, assuming the layer is attached to the function (or else the code will error)
import json
import os
def lambda_handler(event, context):
# TODO implement
dir_list = os.listdir('/opt/python/')
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!'),
'event': json.dumps(event),
'/opt/python/': dir_list
}
I'm new to Python. This is my first Ansible module in order to delete the SimpleDB domain from ChaosMonkey deletion.
When tested in my local venv with my Mac OS X, it keeps saying
Module unable to decode valid JSON on stdin. Unable to figure out
what parameters were passed.
Here is the code:
#!/usr/bin/python
# Delete SimpleDB Domain
from ansible.module_utils.basic import *
import boto3
def delete_sdb_domain():
fields = dict(
sdb_domain_name=dict(required=True, type='str')
)
module = AnsibleModule(argument_spec=fields)
client = boto3.client('sdb')
response = client.delete_domain(DomainName='module.params['sdb_domain_name']')
module.exit_json(changed = False, meta = response)
def main():
delete_sdb_domain()
if __name__ == '__main__':
main()
And I'm trying to pass in parameters from this file: /tmp/args.json.
and run the following command to make the local test:
$ python ./delete_sdb_domain.py /tmp/args.json
please note I'm using venv test environment on my Mac.
If you find any syntax error in my module, please also point it out.
This is not how you should test your modules.
AnsibleModule expects to have specific JSON as stdin data.
So the closest thing you can try is:
python ./delete_sdb_domain.py < /tmp/args.json
But I bet you have your json file in wrong format (no ANSIBLE_MODULE_ARGS, etc.).
To debug your modules you can use test-module script from Ansible hacking pack:
./hacking/test-module -m delete_sdb_domain.py -a "sdb_domain_name=zzz"
I have a few Munin plugins which report stats from an Autonomy database. They all use a small library which scrapes the XML status output for the relevant numbers.
I'm trying to bundle the library and plugins into a Puppet-installable RPM. The actual RPM-building should be straightforward; once I have a distutils-produced distfile I can make it into an RPM based on a .spec file pinched from the Dag or EPEL repos [1]. It's the distutils bit I'm unsure of -- in fact I'm not even sure my library is correctly written for packaging. Here's how it works:
idol7stats.py:
import datetime
import os
import stat
import sys
import time
import urllib
import xml.sax
class IDOL7Stats:
cache_dir = '/tmp'
def __init__(self, host, port):
self.host = host
self.port = port
# ...
def collect(self):
self.data = self.__parseXML(self.__getXML())
def total_slots(self):
return self.data['Service:Documents:TotalSlots']
Plugin code:
from idol7stats import IDOL7Stats
a = IDOL7Stats('db.example.com', 23113)
a.collect()
print a.total_slots()
I guess I want idol7stats.py to wind up in /usr/lib/python2.4/site-packages/idol7stats, or something else in Python's search path. What distutils magic do I need? This:
from distutils.core import setup
setup(name = 'idol7stats',
author = 'Me',
author_email = 'me#example.com',
version = '0.1',
py_modules = ['idol7stats'])
almost works, except the code goes in /usr/lib/python2.4/site-packages/idol7stats.py, not a subdirectory. I expect this is down to my not understanding the difference between modules/packages/other containers in Python.
So, what's the rub?
[1] Yeah, I could just plonk the library in /usr/lib/python2.4/site-packages using RPM but I want to know how to package Python code.
You need to create a package to do what you want. You'd need a directory named idol7stats containing a file called __init__.py and any other library modules to package. Also, this will affect your scripts' imports; if you put idol7stats.py in a package called idol7stats, then your scripts need to "import idol7stats.idol7stats".
To avoid that, you could just rename idol7stats.py to idol7stats/__init__.py, or you could put this line into idol7stats/__init__.py to "massage" the imports into the way you expect them:
from idol7stats.idol7stats import *