I've found that my Azure Function is unavailable for approx. 3-5 mins when publishing changes. Any triggers are not recognized nor captured in App Insights.
This is not a big deal at the moment, but if this Function ever requires increased uptime, how can I ensure it will be available during code changes?
As mentioned in the comment, you just need to use the Azure Functions deployment slots, when you need to publish the changes, swap your function to the staging slot, after publishing, swap it back to the production slot.
Related
I'm building an application that can run user-submitted python code. I'm considering the following approaches:
Spinning up a new AWS lambda function for each user's request to run the submitted code in it. Delete the lambda function afterwards. I'm aware of AWS lambda's time limit - so this would be used to run only small functions.
Spinning up a new EC2 machine to run a user's code. One instance per user. Keep the instance running while the user is still interacting with my application. Kill the instance after the user is done.
Same as the 2nd approach but also spin up a docker container inside the EC2 instance to add an additional layer of isolation (is this necessary?)
Are there any security vulnerabilities I need to be aware of? Will the user be able to do anything if they gain access to environment variables in their own lambda function/ec2 machine? Are there any better solutions?
Any code which you run on AWS Lambda will have the capabilities of the associated function. Be very careful what you supply.
Even logging and metrics access can be manipulated to incur additional costs.
One csv file is uploaded to the cloud storage everyday around 0200 hrs but sometime due to job fail or system crash file upload happens very late. So I want to create a cloud function that can trigger my python bq load script whenever the file is uploaded to the storage.
file_name : seller_data_{date}
bucket name : sale_bucket/
The question lacks enough description of the desired usecase and any issues the OP has faced. However, here are a few possible approaches that you might chose from depending on the usecase.
The simple way: Cloud Functions with Storage trigger.
This is probably the simplest and most efficient way of running a Python function whenever a file gets uploaded to your bucket.
The most basic tutorial is this.
The hard way: App Engine with a few tricks.
Having a basic Flask application hosted on GAE (Standard or Flex), with an endpoint specifically to handle this chek of the files existing, download object, manipulate it and then do something.
This route can act as a custom HTTP triggered function, where once it receives a request (could be from a simple curl request, visit from the browser, PubSub event, or even another Cloud Function).
Once it receives a GET (or POST) request, it downloads the object into the /tmp dir, process it and then do something.
The small benefit with GAE over CF is that you can set a minimum of one instance to stay always alive which means you will not have the cold starts, or risk the request timing out before the job is done.
The brutal/overkill way: Clour Run.
Similar approach to App Engine, but with Cloud Run you'll also need to work with the Dockerfile, have in mind that Cloud Run will scale down to zero when there's no usage, and other minor things that apply to building any application on Cloud Run.
########################################
For all the above approaches, some additional things you might want to achieve are the same:
a) Downloading the object and doing some processing on it:
You will have to download it to the /tmp directory as it's the directory for both GAE and CF to store temporary files. Cloud Run is a bit different here but let's not get deep into it as it's an overkill byitself.
However, keep in mind that if your file is large you might cause a high memory usage.
And ALWAYS clean that directory after you have finished with the file. Also when opening a file always use with open ... as it will also make sure to not keep files open.
b) Downloading the latest object in the bucket:
This is a bit tricky and it needs some extra custom code. There are many ways to achieve it, but the one I use (always tho paying close attention to memory usage), is upon the creation of the object I upload to the bucket, I get the current time, use Regex to transform it into something like results_22_6.
What happens now is that once I list the objects from my other script, they are already listed in an accending order. So the last element in the list is the latest object.
So basically what I do then is to check if the filename I have in /tmp is the same as the name of the object[list.length] in the bucket. If yes then do nothing, if no then delete the old one and download the latest one in the bucket.
This might not be optimal, but for me it's kinda preferable.
I would like developers to give a comment about :) thank you
I am building an application that needs an exact timestamp for multiple devices at the same time. Device time cannot be used, because they are not at the same time.
Asking time from the server is Ok but it is slow and depend on connection speed, and if making this as serverless/regioness then serverside timestamp cannot be used because of time-zones?
On demo application, this works fine with one backend at pointed region but still it needs to respond faster to clients.
Here is simply image to see this in another way?
In the image, there are no IDs etc but the main idea is there
I am planning to build this on nodejs server and later when more timing calculations translate it to python/Django pack...
Thank you
Using server time is perfectly fine and timezone difference due to different regions can be easily adjusted in code. If you are using a cloud provider, they typically group their services by region instead of distributing them worldwide, so adjusting the timezone shouldn't be a problem. Anyway, check the docs for the cloud service you are planning to use, they usually clearly document the geographic distribution.
Alternatively, you could consider implementing (or using) Network Time Protocol (NTP) on your devices. It's a fairly simple protocol for synchronizing clocks, if you need a source code example take a look at the source for node-ntp-client , a javascript implementation.
I am running a Spark step on AWS EMR, this step is added to EMR through Boto3, I will like to return to the user a percentage of completion of the task, is there anyway to do this?
I was thinking to calculate this percentage with the number of completed stages of Spark, I know this won't be too precise, as the stage 4 may take double time than stage 5 but I am fine with that.
Is it possible to access this information with boto3?
I checked the method list_steps (here are the docs) but in the response I am getting only if its running without other information.
DISCLAIMER: I know nothing about AWS EMR and Boto3
I will like to return to the user a percentage of completion of the task, is there anyway to do this?
Any way? Perhaps. Just register a SparkListener and intercept events as they come. That's how web UI works under the covers (which is the definitive source of truth for Spark applications).
Use spark.extraListeners property to register a SparkListener and do whatever you want with the events.
Quoting the official documentation's Application Properties:
spark.extraListeners A comma-separated list of classes that implement SparkListener; when initializing SparkContext, instances of these classes will be created and registered with Spark's listener bus. If a class has a single-argument constructor that accepts a SparkConf, that constructor will be called; otherwise, a zero-argument constructor will be called.
You could also consider REST API interface:
In addition to viewing the metrics in the UI, they are also available as JSON. This gives developers an easy way to create new visualizations and monitoring tools for Spark. The JSON is available for both running applications, and in the history server. The endpoints are mounted at /api/v1. Eg., for the history server, they would typically be accessible at http://:18080/api/v1, and for a running application, at http://localhost:4040/api/v1.
This is not supported at the moment and I don't think it will be anytime soon.
You'll just have to follow application logs the old fashioned way. So maybe consider formatting your logs in a way you know what has actually finished.
Does anybody know, how GAE limit Python interpreter? For example, how they block IO operations, or URL operations.
Shared hosting also do it in some way?
The sandbox "internally works" by them having a special version of the Python interpreter. You aren't running the standard Python executable, but one especially modified to run on Google App engine.
Update:
And no it's not a virtual machine in the ordinary sense. Each application does not have a complete virtual PC. There may be some virtualization going on, but Google isn't saying exactly how much or what.
A process has normally in an operating system already limited access to the rest of the OS and the hardware. Google have limited this even more and you get an environment where you are only allowed to read the very specific parts of the file system, and not write to it at all, you are not allowed to open sockets and not allowed to make system calls etc.
I don't know at which level OS/Filesystem/Interpreter each limitation is implemented, though.
From Google's site:
An application can only access other
computers on the Internet through the
provided URL fetch and email
services. Other computers can only
connect to the application by making
HTTP (or HTTPS) requests on the
standard ports.
An application cannot write to the
file system. An app can read files,
but only files uploaded with the
application code. The app must use
the App Engine datastore, memcache or
other services for all data that
persists between requests.
Application code only runs in
response to a web request, a queued
task, or a scheduled task, and must
return response data within 30
seconds in any case. A request
handler cannot spawn a sub-process or
execute code after the response has
been sent.
Beyond that, you're stuck with Python 2.5, you can't use any C-based extensions, more up-to-date versions of web frameworks won't work in some cases (Python 2.5 again).
You can read the whole article What is Google App Engine?.
I found this site
that has some pretty decent information. What exactly are you trying to do?
Here
FRESH!
Look here: http://code.google.com/appengine/docs/python/runtime.html
Your IO Operations are limited as follows (beyond disabled modules):
App Engine records how much of each resource an application uses in a calendar day, and considers the resource depleted when this amount reaches the app's quota for the resource. A calendar day is a period of 24 hours beginning at midnight, Pacific Time. App Engine resets all resource measurements at the beginning of each day, except for Stored Data which always represents the amount of datastore storage in use.
When an app consumes all of an allocated resource, the resource becomes unavailable until the quota is replenished. This may mean that your app will not work until the quota is replenished.
An application can determine how much CPU time the current request has taken so far by calling the Quota API. This is useful for profiling CPU-intensive code, and finding places where CPU efficiency can be improved for greater cost savings. You can measure the CPU used for the entire request, or call the API before and after a section of code then subtract to determine the CPU used between those two points.
Resource| Free Default Quota| Billing Enabled Default Quota
Blobstore |Stored Data| 1 GB| 1 GB free; no maximum
Resource |Billing Enabled| Default Quota
Daily Limit| Maximum Rate
Blobstore API Calls |140,000,000 calls| 72,000 calls/minute
Hmm my table isn't that good, but hopefully still readable.
EDIT: OK, I understand. But sir, you did not have to use the "f" word. :) And you know, it's kinda like the whole 'teach a man to fish' scenario. Google is who I always ask and that's why I'm answering questions here for fun.
EDIT AGAIN: OK that made more sense before the comment was tooked. So I went and answered the question a little more. I hope it helps.
IMO it's not a standard python, but a version specifically patched for app engine. In other words you can think more or less like an "higher level" VM that however is not emulating x86 instructions but python opcodes (if you don't know what they are try writing a small function named "foo" and the doing "import dis; dis.dis(foo)" you will see the python opcodes that the compiler produced).
By patching python you can impose to it whatever limitations you like. Of course you've however to forbid the use of user supplied C/C++ extension modules as a C/C++ module will have access to everything the process can access.
Using such a virtual environment you're able to run safely python code without the need to use a separate x86 VM for every instance.