I am new to AWS with python. I came across boto3 initially, later somone suggested cdk. What is the difference between aws cdk and boto3?
In simple terms, CDK helps you to programmatically create AWS resources(Infrastructure as Code) while boto3 helps you to programmatically access AWS services.
Here is a snippet on CDK and Boto3 from AWS reference links :
CDK:
The AWS Cloud Development Kit (AWS CDK) is an open source software development framework to define your cloud application resources using familiar programming languages. AWS CDK provisions your resources in a safe, repeatable manner through AWS CloudFormation. It also enables you to compose and share your own custom constructs that incorporate your organization's requirements, helping you start new projects faster. (Reference: https://aws.amazon.com/cdk/)
With CDK and Cloudformation, you will get the benefits of repeatable deployment, easy rollback, and drift detection. (Reference: https://aws.amazon.com/cdk/features/)
Boto3:
Boto3 is the Amazon Web Services (AWS) Software Development Kit (SDK) for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2.
(Reference: https://pypi.org/project/boto3/)
Welcome to Stack Overflow and to AWS usage!
boto3 is the python SDK for AWS. It is useful in order for your software to be able to elevate other AWS services.
use case example: your code has to put an object in an S3 bucket (store a file in other words).
aws-cdk is a framework that helps you provision infrastructure in an IaC (Infrastructure as Code) manner.
use case example: describe and provision your application infrastructure (e.g. a lambda function and an S3 bucket).
In many projects you will use both.
you can find an example URL shortener that uses boto3 and aws-cdk here. The URL shortener uses boto3 in order to access a DynamoDB table and aws-cdk in order to provision the whole infrastructure (including the lambda function which uses boto3).
You're creating an application that needs to use AWS services and resources. Should you use cdk or boto-3?
Consider if your application needs AWS services and resources at build time or run time.
Build time: you need the AWS resources to be available IN ORDER TO build the application.
Run time: you need the AWS resources to be available via API call when your application is up and running.
AWS CDK setups the infrastructure your application needs in order to run.
AWS SDK compliments your application to provide business logic and to make services available through your application.
Another point to add is that AWS CDK manages the state of your deployed resources internally, thus allowing you to keep track of what has been deployed and to specify the desired state of your final deployed resources.
On the other hand, if you're using AWS SDK, you have to store and manage the state of the deployed resources (deployed using AWS SDK) yourself.
I am also new to AWS, here is my understanding for relevant AWS services and boto3
AWS Cloud Development Kit (CDK) is a software library, available in different programming languages, to define and provision cloud infrastructure*, through AWS CloudFormation.
Boto3 is a Python software development kit (SDK) to create, configure, and manage AWS services.
AWS CloudFormation is a low-level service to create a collection of relation AWS and third-party resources, and provision and manage them in an orderly and predictable fashion.
AWS Elastic Beanstalk is a high-level service to deploy and run applications in the cloud easily, and sits on top of AWS CloudFormation.
Related
I need to implement a simple web service in python - it's my first experience with web services and REST APIs so I want to undertand what environment and tools would fit my needs. In my web service, I need to read some data from a database, do some simple logic, and support a GET call from another application (qualtrics).
I read and implemented a simple test web service with python using some useful blogs such as: Building a Basic RestFul API in Python | Codementor
but I need a real server so that I could call the API from external applications.
As I'm looking for a long term solution, I thought that using AWS EC2 instance may be a good solution for a server. I tried to implement it using some guidelines in blogs such as: Deploy a Flask app on AWS EC2 | Codementor
However, as I'm new to this and encountered some implementation/editing errors (e.g. handling of the wsgi file) and as I'm a windows person and the ubuntu stuff are not always easy to get used to, I was wondering what is the best framework for my needs?
Is there any recomended flow in which I'll be able to implement my simple python code and connect it to a small server (either AWS EC2 instance or any other recomended one) in a more convenient way?
Another important note - I will need to run it only from time to time, this web server and web service should not be contantly live (that's why I thought that aws virtual instance would fit best).
To begin, my recommendation would be to look at Elastic Beanstalk, Fargate and API Gateway with Lambda.
You can use Elastic Beanstalk to easliy provision out-of-the-box AWS environment to host your python app in Flask with minimal configurations required:
Deploying a flask application to Elastic Beanstalk.
The other thing to consider would be to develop your python app as a docker container using, e.g., tiangolo/uwsgi-nginx-flask as the base image. This would allow you to easily work with in on your localhost, and then just move your image to AWS for hosting.
You can host it on Fargate to save time on configuring container instances, or on Beanstalk as well which also supports docker.
Yet other choice is to go fully serverless and develop your Python REST api using API Gateway and lambda.
Context: I'm in the process of designing an event-driven Python application. Various stakeholders have tasked me with investigating options for deploying the application using GraphQL endpoints within a serverless environment running on Azure Functions. End goal being that as the underlying data structures grow, we'd like to easily maintain the use-ability and performance of the application over time. Based on below resources it appears this is possible:
(https://azure.microsoft.com/en-us/resources/videos/build-2019-build-scalable-apis-using-graphql-and-serverless/)
(https://azure.microsoft.com/en-us/resources/videos/azure-friday-live-building-serverless-python-apps-with-azure-functions/)
(https://graphene-python.org/)
Question: User requirements dictate that the Azure Functions MUST be for internal use only and cannot be exposed publicly. Reading through the docs below I haven't found any resources on security config options for private endpoints.
https://learn.microsoft.com/en-us/azure/azure-functions/
Private endpoint in Azure
Can someone please point me in the right direction? Are Azure Functions even capable of this? And if they aren't can this be achieved with an alternative like Azure App Service?
Azure Functions have multiple hosting options including Consumption Plan, Premium Plan and App Service Plan.
Out of which, for complete VNET Isolation, App Service Environment is the only way to go as of now since the Private Endpoints for Azure Web Apps is currently in preview.
But note that Azure Functions can be deployed into a Kubernetes cluster as well which could be the better option if you already have a kubernetes cluster to deploy to.
I have a apache beam program on python. To save running cost I would like to executed this python using service instead of on EC2 instance.
The script lasts from 50s to over 60min.
If this is GCP I think google app engine. However, on AWS, I am not sure whether I should use AWS beanstalk or AWS batch .
Generally, which service is best to run long running script on AWS?
Thanks,
Yu
AWS Batch is recommended for batch processing at any scale, However AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications.
Also there is no additional charge for both the services AWS batch & AWS Elastic Beanstalk. You pay for AWS resources (e.g. EC2 instances or AWS Lambda functions) you create to store and run your application.
In your case Apache Beam is used which is an unified model and set of language-specific SDKs for defining and executing data processing workflows/pipelines which fall under category of batch processing.
References:
https://aws.amazon.com/batch/?nc=sn&loc=0
https://aws.amazon.com/elasticbeanstalk/
Once an Apache Beam pipeline designed and tested in Google’s cloud Dataflow using Python SDK and DataflowRunner what is a convenient way to have it in the Google cloud and manage its execution?
What is a convenient way to deploy and manage execution of a Python SDK Apache Beam pipeline for Google Cloud Dataflow?
Should it be somehow packaged? Uploaded to Google storage? Create a Dataflow template? How can one schedule its execution beyond a developer execution it from its development environment?
Update
Preferably without 3rd party tools or a need in additional management tools/infrastructure beyond Google cloud and Dataflow in particular.
Intuitively you’d expect that “deploying a pipeline” section under How-to guides of the Dataflow documentation will cover that. But you find an explanation of that only 8 sections below in the “templates overview” section.
According to that section:
Cloud Dataflow templates introduce a new development and execution workflow that differs from traditional job execution workflow. The template workflow separates the development step from the staging and execution steps.
Trivially you do not deploy and execute your Dataflow pipeline from Google Cloud. But if you need to share the execution of a pipeline with nontechnical members of your cloud or simply want to trigger it without being dependant on a development environment or 3rd party tools then Dataflow templates is what you need.
Once a pipeline developed and tested you can create a Dataflow job template from it.
Please note that:
To create templates with the Cloud Dataflow SDK 2.x for Python, you must have version 2.0.0 or higher.
You will need to execute your pipeline using DataflowRunner with pipeline options that will generate a template on the Google Cloud storage rather than running it.
For more details refer to creating templates documentation section and to run it from template refer to executing templates section.
I'd say the most convenient way is to use Airflow. This allows you to author, schedule, and monitor workflows. The Dataflow Operator can start your designed data pipeline. Airflow can be started either on a small VM, or by using Cloud Composer, which is a tool on the Google Cloud Platform.
There are more options to automate your workflow, such as Jenkins, Azkaban, Rundeck, or even running a simple cronjob (which I'd discourage you to use). You might want to take a look at these options as well, but Airflow probably fits your needs.
I have created a custom python package which has few machine learning algorithms in it.
I would like to deploy this custom python package on azure as a service that can be consume by my other applications like a batch job and a website.
I have bought an azure license but have no clue on the deployment strategy. Please advice
I'd recommend using the following documentation, it will allow you to Deploy Python application to Azure App Service using VSTS.The lab would provide you with the same skills to deploy your app to Azure App service.