I'm using Azure function apps with Python. I have two dozen function apps that all use a Postgres DB and Custom Vision. All function apps are setup as HttpTriggers. Right now, when a function is triggered, a new database handler (or custom vision handler) object is created, used and terminated when the function app call is done.
It seems to be very counterproductive to instantiate a new objects on every single request that comes in. Is there a way to instantiate shared objects once and then pass them to a function when they are called?
In general, Azure Functions are intended to be stateless and not share objects from one invocation to the next. However, there are some exceptions.
Sharing Connection Objects
Azure Docs recommend the Improper Instantiation Pattern for sharing of connection objects that are intended in an application to be opened once and used again and again.
There are some things to keep in mind for this to work for you, mainly:
The key element of this antipattern is repeatedly creating and destroying instances of a shareable object. If a class is not shareable (not thread-safe), then this antipattern does not apply.
They have some walkthroughs there that will probably help you. Since your question is fairly generic, the best I can do is recommend you read through it and see if that will help you.
Durable Functions
The alternative is to consider the Durable Functions instead of the standard. They are intended to be able to pass objects between functions making them not quite stateless.
Durable Functions is an advanced extension for Azure Functions that isn't appropriate for all applications. This article assumes that you have a strong familiarity with concepts in Azure Functions and the challenges involved in serverless application development.
Related
I am a C# developer who has been tasked with converting some deployed C# Azure functions (mostly webhooks / SB) to Python.
I am struggling with the concept of the python equivalency to dependency injection.
Take for example an api client class that makes calls continuously to some 3rd party API to push and pull data. In .NET if i had a webhook function that needed to use this api client, i would initiate a singleton service in the startup.cs class, and inject it into my azure webhook function. This is advantageous cause i can handle the token refreshing and what not inside of the service class itself and have the token stored in memory, instead of having to re-create an instance of the api client class each time the webhook is fired.
How do i do this in Python? Or what is the right method of doing something similar in a similar environment (Azure functions) where we store tokens in memory AND create a service once and use the same service across multiple functions?
Thanks
While not being using widely because the way python is setup, dependency injection can be implemented using python. There is a python package called dependency-injector which helps in implementing di.
We should not store the tokens in local memory of the function as functions are stateless and serverless what that means is we don't have either control or knowledge about the servers on which our function is running thus we don't know how long they will be stored in the memory instead of this once you acquire the tokens add them to the azure keyvault, so that you can use them frequently.
you can share a python class among the functions just place them a separate folder and import them in your functions.
from ..common_files import test_file
Refer this article by Szymon miks for examples of dependency Injection in python
Refer this documentation on the sharing file between functions.
Refer the following documentation on azure keyvault about retrieving keys
I've recently started working on a project in Azure Functions that has two components.
Both components use the same shared code, however the plans these components use seem to differ.
An API endpoint for web users for controlling objects saved in a CosmosDB - Would benefit from a Service App plan because of the cold/warm issues with Consumption plan.
A backend scheduled process that uses these objects - Would benefit from a Consumption plan, seeing as it could use automatic scaling, and exact execution time is not so important.
I thought of using the premium plan to solve both issues, but it seems pretty expensive (my workload is pretty low, and from the calculator on Azure's page it looks like around 150$ a month by default - correct me if I'm wrong).
I was wondering if there is a way to split a function app into two plans, or have two function apps share code.
Thanks!
I was wondering if there is a way to split a function app into two plans, or have two function apps share code.
The first options is not something that can be done. The second option is the one to go for here. You can have the shared code in a separate Class Library project and reference it from both Function Apps.
A class library defines types and methods that are called by an application. If the library targets .NET Standard 2.0, it can be called by any .NET implementation (including .NET Framework) that supports .NET Standard 2.0. If the library targets .NET 5, it can be called by any application that targets .NET 5.
When you create a class library, you can distribute it as a NuGet package or as a component bundled with the application that uses it.
More information: Tutorial: Create a .NET class library using Visual Studio
EDIT:
For Python, you can create modules.
Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a module; definitions from a module can be imported into other modules or into the main module (the collection of variables that you have access to in a script executed at the top level and in calculator mode).
More info: 6. Modules
I am running a Spark step on AWS EMR, this step is added to EMR through Boto3, I will like to return to the user a percentage of completion of the task, is there anyway to do this?
I was thinking to calculate this percentage with the number of completed stages of Spark, I know this won't be too precise, as the stage 4 may take double time than stage 5 but I am fine with that.
Is it possible to access this information with boto3?
I checked the method list_steps (here are the docs) but in the response I am getting only if its running without other information.
DISCLAIMER: I know nothing about AWS EMR and Boto3
I will like to return to the user a percentage of completion of the task, is there anyway to do this?
Any way? Perhaps. Just register a SparkListener and intercept events as they come. That's how web UI works under the covers (which is the definitive source of truth for Spark applications).
Use spark.extraListeners property to register a SparkListener and do whatever you want with the events.
Quoting the official documentation's Application Properties:
spark.extraListeners A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called.
You could also consider REST API interface:
In addition to viewing the metrics in the UI, they are also available as JSON. This gives developers an easy way to create new visualizations and monitoring tools for Spark. The JSON is available for both running applications, and in the history server. The endpoints are mounted at /api/v1. Eg., for the history server, they would typically be accessible at http://:18080/api/v1, and for a running application, at http://localhost:4040/api/v1.
This is not supported at the moment and I don't think it will be anytime soon.
You'll just have to follow application logs the old fashioned way. So maybe consider formatting your logs in a way you know what has actually finished.
I am working on a system where a bunch of modules connect to a MS SqlServer DB to read/write data. Each of these modules are written in different languages (C#, Java, C++) as each language serves the purpose of the module best.
My question however is about the DB connectivity. As of now, all these modules use the language-specific Sql Connectivity API to connect to the DB. Is this a good way of doing it ?
Or alternatively, is it better to have a Python (or some other scripting lang) script take over the responsibility of connecting to the DB? The modules would then send in input parameters and the name of a stored procedure to the Python Script and the script would run it on the database and send the output back to the respective module.
Are there any advantages of the second method over the first ?
Thanks for helping out!
If we assume that each language you use will have an optimized set of classes to interact with databases, then there shouldn't be a real need to pass all database calls through a centralized module.
Using a "middle-ware" for database manipulation does offer a very significant advantage. You can control, monitor and manipulate your database calls from a central and single location. So, for example, if one day you wake up and decide that you want to log certain elements of the database calls, you'll need to apply the logical/code change only in a single piece of code (the middle-ware). You can also implement different caching techniques using middle-ware, so if the different systems share certain pieces of data, you'd be able to keep that data in the middle-ware and serve it as needed to the different modules.
The above is a very advanced edge-case and it's not commonly used in small applications, so please evaluate the need for the above in your specific application and decide if that's the best approach.
Doing things the way you do them now is fine (if we follow the above assumption) :)
I am using a Python module (PyCLIPS) and Django 1.3.
I want develop a thread-safety class which realizes the Object Pool and the Singleton patterns and also that have to be shared between requests in Django.
For example, I want to do the following:
A request gets the object with some ID from the pool, do
something with it and push it back to the pool, then send response
with the object's ID.
Another request, that has the object's ID, gets
the object with the given ID from the pool and repeats the steps from the above request.
But the state of the object will has to be kept while it'll be at the pool while the server is running.
It should be like a Singleton Session Bean in Java EE
How I should do it? Is there something I'll should read?
Update:
I can't store objects from the pool in a database, because these objects are wrappers under a library written on C-language which is API for the Expert System Engine CLIPS.
Thanks!
Well, I think a different angle is necessary here. Django is not like Java, the solution should be tailored for a multi-process environment, not a multi-threaded one.
Django has no immediate equivalent of a singleton session bean.
That said, I see no reason your description does not fit a classic database model. You want to save per object data, which should always go in the DB layer.
Otherwise, you can always save stuff on the session, which Django provides for both logged-in users as well as for anonymous ones - see the docs on Django sessions.
Usage of any other pattern you might be familiar with from a Java environment will ultimately fail, considering the vast difference between running a Java web container, and the Python/Django multi-process environment.
Edit: well, considering these objects are not native to your app rather accessed via a third-party library, it does complicate things. My gut feeling is that these objects should not be handled by the web layer but rather by some sort of external service which you can access from a multi-process environment. As Daniel mentioned, you can always throw them in the cache (if said objects are pickle-able). But it feels as if these objects do not belong in the web tier.
Assuming the object cannot be pickled, you will need to create an app to manage the object and all of the interactions that need to happen against it. Probably the easiest implementation would be to create a single process wsgi app (on a different port) that exposes an api to do all of the operations that you need. Whether you use a RESTful api or form posts is up to your personal preference.
Are these database objects? Because if so, the db itself is really the pool, and there's no need to do anything special - each request can independently load the instance from the db, modify it, and save it back.
Edit after comment Well, the biggest problem is that a production web server environment is likely to be multi-process, so any global variables (ie the pool) are not shared between processes. You will need to store them somewhere that's globally accessible. A short in the dark, but are they serializable using Pickle? If so, then perhaps memcache might work.