Communicating with azure container using server-less function - python

I have created a python serverless function in azure that gets executed when a new file is uploaded to azure blob (BlobTrigger). The function extracts certain properties of the file and saves it in the DB. As the next step, I want this function copy and process the same file inside a container instance running in ACS. The result of processing should be returned back to the same azure function.
This is a hypothetical architecture that I am currently brainstorming on. I wanted to know if this is feasible. Can you provide me some pointers on how I can achieve this.
I dont see any ContainerTrigger kind of functionality that can allow me to trigger the container and process my next steps.
I have tried utilizing the code examples mentioned here but they have are not really performing the tasks that I need: https://github.com/Azure-Samples/aci-docs-sample-python/blob/master/src/aci_docs_sample.py

Based on the comments above you can consider.
Azure Container Instance
Deploy your container in ACI (Azure Container Instance) and expose HTTP end point from container , just like any web url. Trigger Azure Function using blob storage trigger and then pass your blob file URL to the exposed http end point to your container. Process the file there and return the response back to azure function just like normal http request/response.
You can completely bypass azure function and can trigger your ACI (container instance) using logic apps , process the file and directly save in database.
When you are using Azure function make sure this is short lived process since Azure function will exit after certain time (default 5 mins). For long processing you may have to consider azure durable functions.
Following url can help you understand better.
https://github.com/Azure-Samples/aci-event-driven-worker-queue

Related

Save new blobs files using azure durable functions Python

i have a proyect that will be conterized in docker, it consists on a DurableFunctionsHttpStart, a Orchestactor and a activity function, the activity function its web scraper that once the data has been downloaded will save a csv into the azure blob storage, once i tested my function i got the next error inside the Docker container:
HttpResponseError: This request is not authorized to perform this operation using this resource type.
i tried using the function biding with an out blob as below:
and tried using the python sdk as showen below:
this is the host.json of my activity function
the problem its since i use the durable function because in normal httptrigger i dont have any issue.

Strategies to run user-submitted code on AWS

I'm building an application that can run user-submitted python code. I'm considering the following approaches:
Spinning up a new AWS lambda function for each user's request to run the submitted code in it. Delete the lambda function afterwards. I'm aware of AWS lambda's time limit - so this would be used to run only small functions.
Spinning up a new EC2 machine to run a user's code. One instance per user. Keep the instance running while the user is still interacting with my application. Kill the instance after the user is done.
Same as the 2nd approach but also spin up a docker container inside the EC2 instance to add an additional layer of isolation (is this necessary?)
Are there any security vulnerabilities I need to be aware of? Will the user be able to do anything if they gain access to environment variables in their own lambda function/ec2 machine? Are there any better solutions?
Any code which you run on AWS Lambda will have the capabilities of the associated function. Be very careful what you supply.
Even logging and metrics access can be manipulated to incur additional costs.

I want to trigger a python script using a cloud function whenever a specified file is created on the google cloud storage

One csv file is uploaded to the cloud storage everyday around 0200 hrs but sometime due to job fail or system crash file upload happens very late. So I want to create a cloud function that can trigger my python bq load script whenever the file is uploaded to the storage.
file_name : seller_data_{date}
bucket name : sale_bucket/
The question lacks enough description of the desired usecase and any issues the OP has faced. However, here are a few possible approaches that you might chose from depending on the usecase.
The simple way: Cloud Functions with Storage trigger.
This is probably the simplest and most efficient way of running a Python function whenever a file gets uploaded to your bucket.
The most basic tutorial is this.
The hard way: App Engine with a few tricks.
Having a basic Flask application hosted on GAE (Standard or Flex), with an endpoint specifically to handle this chek of the files existing, download object, manipulate it and then do something.
This route can act as a custom HTTP triggered function, where once it receives a request (could be from a simple curl request, visit from the browser, PubSub event, or even another Cloud Function).
Once it receives a GET (or POST) request, it downloads the object into the /tmp dir, process it and then do something.
The small benefit with GAE over CF is that you can set a minimum of one instance to stay always alive which means you will not have the cold starts, or risk the request timing out before the job is done.
The brutal/overkill way: Clour Run.
Similar approach to App Engine, but with Cloud Run you'll also need to work with the Dockerfile, have in mind that Cloud Run will scale down to zero when there's no usage, and other minor things that apply to building any application on Cloud Run.
########################################
For all the above approaches, some additional things you might want to achieve are the same:
a) Downloading the object and doing some processing on it:
You will have to download it to the /tmp directory as it's the directory for both GAE and CF to store temporary files. Cloud Run is a bit different here but let's not get deep into it as it's an overkill byitself.
However, keep in mind that if your file is large you might cause a high memory usage.
And ALWAYS clean that directory after you have finished with the file. Also when opening a file always use with open ... as it will also make sure to not keep files open.
b) Downloading the latest object in the bucket:
This is a bit tricky and it needs some extra custom code. There are many ways to achieve it, but the one I use (always tho paying close attention to memory usage), is upon the creation of the object I upload to the bucket, I get the current time, use Regex to transform it into something like results_22_6.
What happens now is that once I list the objects from my other script, they are already listed in an accending order. So the last element in the list is the latest object.
So basically what I do then is to check if the filename I have in /tmp is the same as the name of the object[list.length] in the bucket. If yes then do nothing, if no then delete the old one and download the latest one in the bucket.
This might not be optimal, but for me it's kinda preferable.

Azure Function App Python terminates after a while

I am using a blob trigger with a python function on an Azure functions app with the consumption plan. I know it is in preview but it is a bummer that the app terminates after a while of no usage. And it will not get back to live when a new blob is added.
The function works perfectly locally
Is there a way to keep the function app alive?
I did not find the right way to do it but I added a second http trigger function to start the app and that works. So my current process is: trigger the http function and then upload the blob.
I also tried a cron trigger but that also didn't fire.

How to show Spark application's percentage of completion on AWS EMR (and Boto3)?

I am running a Spark step on AWS EMR, this step is added to EMR through Boto3, I will like to return to the user a percentage of completion of the task, is there anyway to do this?
I was thinking to calculate this percentage with the number of completed stages of Spark, I know this won't be too precise, as the stage 4 may take double time than stage 5 but I am fine with that.
Is it possible to access this information with boto3?
I checked the method list_steps (here are the docs) but in the response I am getting only if its running without other information.
DISCLAIMER: I know nothing about AWS EMR and Boto3
I will like to return to the user a percentage of completion of the task, is there anyway to do this?
Any way? Perhaps. Just register a SparkListener and intercept events as they come. That's how web UI works under the covers (which is the definitive source of truth for Spark applications).
Use spark.extraListeners property to register a SparkListener and do whatever you want with the events.
Quoting the official documentation's Application Properties:
spark.extraListeners A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called.
You could also consider REST API interface:
In addition to viewing the metrics in the UI, they are also available as JSON. This gives developers an easy way to create new visualizations and monitoring tools for Spark. The JSON is available for both running applications, and in the history server. The endpoints are mounted at /api/v1. Eg., for the history server, they would typically be accessible at http://:18080/api/v1, and for a running application, at http://localhost:4040/api/v1.
This is not supported at the moment and I don't think it will be anytime soon.
You'll just have to follow application logs the old fashioned way. So maybe consider formatting your logs in a way you know what has actually finished.

Categories