I have a apache beam program on python. To save running cost I would like to executed this python using service instead of on EC2 instance.
The script lasts from 50s to over 60min.
If this is GCP I think google app engine. However, on AWS, I am not sure whether I should use AWS beanstalk or AWS batch .
Generally, which service is best to run long running script on AWS?
Thanks,
Yu
AWS Batch is recommended for batch processing at any scale, However AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications.
Also there is no additional charge for both the services AWS batch & AWS Elastic Beanstalk. You pay for AWS resources (e.g. EC2 instances or AWS Lambda functions) you create to store and run your application.
In your case Apache Beam is used which is an unified model and set of language-specific SDKs for defining and executing data processing workflows/pipelines which fall under category of batch processing.
References:
https://aws.amazon.com/batch/?nc=sn&loc=0
https://aws.amazon.com/elasticbeanstalk/
Related
I am new to AWS with python. I came across boto3 initially, later somone suggested cdk. What is the difference between aws cdk and boto3?
In simple terms, CDK helps you to programmatically create AWS resources(Infrastructure as Code) while boto3 helps you to programmatically access AWS services.
Here is a snippet on CDK and Boto3 from AWS reference links :
CDK:
The AWS Cloud Development Kit (AWS CDK) is an open source software development framework to define your cloud application resources using familiar programming languages. AWS CDK provisions your resources in a safe, repeatable manner through AWS CloudFormation. It also enables you to compose and share your own custom constructs that incorporate your organization's requirements, helping you start new projects faster. (Reference: https://aws.amazon.com/cdk/)
With CDK and Cloudformation, you will get the benefits of repeatable deployment, easy rollback, and drift detection. (Reference: https://aws.amazon.com/cdk/features/)
Boto3:
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.
(Reference: https://pypi.org/project/boto3/)
Welcome to Stack Overflow and to AWS usage!
boto3 is the python SDK for AWS. It is useful in order for your software to be able to elevate other AWS services.
use case example: your code has to put an object in an S3 bucket (store a file in other words).
aws-cdk is a framework that helps you provision infrastructure in an IaC (Infrastructure as Code) manner.
use case example: describe and provision your application infrastructure (e.g. a lambda function and an S3 bucket).
In many projects you will use both.
you can find an example URL shortener that uses boto3 and aws-cdk here. The URL shortener uses boto3 in order to access a DynamoDB table and aws-cdk in order to provision the whole infrastructure (including the lambda function which uses boto3).
You're creating an application that needs to use AWS services and resources. Should you use cdk or boto-3?
Consider if your application needs AWS services and resources at build time or run time.
Build time: you need the AWS resources to be available IN ORDER TO build the application.
Run time: you need the AWS resources to be available via API call when your application is up and running.
AWS CDK setups the infrastructure your application needs in order to run.
AWS SDK compliments your application to provide business logic and to make services available through your application.
Another point to add is that AWS CDK manages the state of your deployed resources internally, thus allowing you to keep track of what has been deployed and to specify the desired state of your final deployed resources.
On the other hand, if you're using AWS SDK, you have to store and manage the state of the deployed resources (deployed using AWS SDK) yourself.
I am also new to AWS, here is my understanding for relevant AWS services and boto3
AWS Cloud Development Kit (CDK) is a software library, available in different programming languages, to define and provision cloud infrastructure*, through AWS CloudFormation.
Boto3 is a Python software development kit (SDK) to create, configure, and manage AWS services.
AWS CloudFormation is a low-level service to create a collection of relation AWS and third-party resources, and provision and manage them in an orderly and predictable fashion.
AWS Elastic Beanstalk is a high-level service to deploy and run applications in the cloud easily, and sits on top of AWS CloudFormation.
I need to implement a simple web service in python - it's my first experience with web services and REST APIs so I want to undertand what environment and tools would fit my needs. In my web service, I need to read some data from a database, do some simple logic, and support a GET call from another application (qualtrics).
I read and implemented a simple test web service with python using some useful blogs such as: Building a Basic RestFul API in Python | Codementor
but I need a real server so that I could call the API from external applications.
As I'm looking for a long term solution, I thought that using AWS EC2 instance may be a good solution for a server. I tried to implement it using some guidelines in blogs such as: Deploy a Flask app on AWS EC2 | Codementor
However, as I'm new to this and encountered some implementation/editing errors (e.g. handling of the wsgi file) and as I'm a windows person and the ubuntu stuff are not always easy to get used to, I was wondering what is the best framework for my needs?
Is there any recomended flow in which I'll be able to implement my simple python code and connect it to a small server (either AWS EC2 instance or any other recomended one) in a more convenient way?
Another important note - I will need to run it only from time to time, this web server and web service should not be contantly live (that's why I thought that aws virtual instance would fit best).
To begin, my recommendation would be to look at Elastic Beanstalk, Fargate and API Gateway with Lambda.
You can use Elastic Beanstalk to easliy provision out-of-the-box AWS environment to host your python app in Flask with minimal configurations required:
Deploying a flask application to Elastic Beanstalk.
The other thing to consider would be to develop your python app as a docker container using, e.g., tiangolo/uwsgi-nginx-flask as the base image. This would allow you to easily work with in on your localhost, and then just move your image to AWS for hosting.
You can host it on Fargate to save time on configuring container instances, or on Beanstalk as well which also supports docker.
Yet other choice is to go fully serverless and develop your Python REST api using API Gateway and lambda.
I have a Python script that pulls some data from an Azure Data Lake cluster, performs some simple compute, then stores it into a SQL Server DB on Azure. The whole shebang runs in about 20 seconds. It needs sqlalchemy, pandas, and some Azure data libraries. I need to run this script daily. We also have a Service Fabric cluster available to use.
What are my best options? I thought of containerizing it with Docker and making it into an http triggered API, but then how do I trigger it 1x per day? I'm not good with Azure or microservices design so this is where I need the help.
You can use Web Jobs in App Service. It has two types of Azure Web Jobs for you to choose: Continuous and Trigger. As I see you need the type Trigger
You could refer to the document here for more details.In addition, here shows how to run tasks in WebJobs.
Also, you can use Azure function timer-based on python which was made generally available in recent months.
I have created a custom python package which has few machine learning algorithms in it.
I would like to deploy this custom python package on azure as a service that can be consume by my other applications like a batch job and a website.
I have bought an azure license but have no clue on the deployment strategy. Please advice
I'd recommend using the following documentation, it will allow you to Deploy Python application to Azure App Service using VSTS.The lab would provide you with the same skills to deploy your app to Azure App service.
I have a small Python (Flask) application running in a Docker container.
The container orchestrator is Kubernetes, all running in Azure.
What is the best approach to set up centralized logging? (similar to Graylog)
Is it possible to get the application logs over OMS to Azure Log Analytics?
Thank you,
Tibor
I have a similar requirement. I have a continuously running Python application running in a Docker container. So far I have found that the Azure SDK for Python supports lots of integration into Azure from Python. This page might be able to help:
https://pypi.org/project/azure-storage-logging/
Here is also a package and guide how to set up Blob Storage and enable logging:
https://github.com/Azure/azure-storage-python/tree/master/azure-storage-blob