TaskDefinition CDK and LogConfiguration

TaskDefinition CDK and LogConfiguration - python

I am trying to accomplish the following
If you are using the Fargate launch type for your tasks, all you need to do to turn on the awslogs log driver is add the required logConfiguration parameters to your task definition.
I am using CDK to generate the FargateTaskDefn
task_definition = _ecs.FargateTaskDefinition(self, "TaskDefinition",
cpu=2048,
memory_limit_mib=4096,
execution_role=ecs_role,
task_role = ecs_role,
)
task_definition.add_container("getFileTask",
memory_limit_mib = 4096,
cpu=2048,
image = _ecs.ContainerImage.from_asset(directory="assets", file="Dockerfile-ecs-file-download"))
I looked up the documentation and did not find the any attribute called logConfiguration.
What am I missing?
I am not able to send the logs from Container running on ECS/Fargate to Cloudwatch and what is needed is to enable this logConfiguration option in the task defn.
Thank you for your help.
Regards

Finally figured out that Logging option in add_container is the one.

Related

CICD for Azure SQL Server Using Python

I am looking to create a CICD pipeline for my Azure SQL Database. I have read about the State-based approach and the Migration-based approach that has been described here,
https://devblogs.microsoft.com/azure-sql/devops-for-azure-sql/
However, I want to know if there is an approach I can use to do this in Python. I am looking to deploy both schema and data changes to the other environment through my pipeline. It would be great if I can implement a method that will deploy only chosen data points though. For example, if I can filter on a stage column for production records.
What kind of approach can I take to accomplish this?
It does not matter if I need to trigger this CICD pipeline manually through an API call or something. I believe this is also possible in Azure Pipelines.

What you can do is deploy the sql server normally and then make changes by adding a different task in .yaml file which will execute the changes .
For this you can either use DACPAC or you can directly use sql script .
In both the case you have to create you sql based script before deploying.
for DACPAC you need add the following type to task :
- task: SqlAzureDacpacDeployment#1
displayName: Execute Azure SQL : DacpacTask
inputs:
azureSubscription: '<Azure service connection>'
ServerName: '<Database server name>'
DatabaseName: '<Database name>'
SqlUsername: '<SQL user name>'
SqlPassword: '<SQL user password>'
DacpacFile: '<Location of Dacpac file in $(Build.SourcesDirectory) after compilation>'
for sql script add the following type to task :
- task: AzureMysqlDeployment#1
inputs:
ConnectedServiceName: # Or alias azureSubscription
ServerName:
#DatabaseName: # Optional
SqlUsername:
SqlPassword:
#TaskNameSelector: 'SqlTaskFile' # Optional. Options: SqlTaskFile, InlineSqlTask
#SqlFile: # Required when taskNameSelector == SqlTaskFile
#SqlInline: # Required when taskNameSelector == InlineSqlTask
#SqlAdditionalArguments: # Optional
#IpDetectionMethod: 'AutoDetect' # Options: AutoDetect, IPAddressRange
#StartIpAddress: # Required when ipDetectionMethod == IPAddressRange
#EndIpAddress: # Required when ipDetectionMethod == IPAddressRange
#DeleteFirewallRule: true # Optional
For detailed explanation please refer the following documentation on DACPAC task and refer this documentation for the sql script task.
As of now there are no Api to start-stop preexisting pipeline I have consulted this documentation for this.

How to connect kafka IO from apache beam to a cluster in confluent cloud

I´ve made a simple pipeline in Python to read from kafka, the thing is that the kafka cluster is on confluent cloud and I am having some trouble conecting to it.
Im getting the following log on the dataflow job:
Caused by: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:820)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:631)
at org.apache.kafka.clients.consumer.KafkaConsumer.<init>(KafkaConsumer.java:612)
at org.apache.beam.sdk.io.kafka.KafkaIO$Read$GenerateKafkaSourceDescriptor.processElement(KafkaIO.java:1495)
Caused by: java.lang.IllegalArgumentException: Could not find a 'KafkaClient' entry in the JAAS configuration. System property 'java.security.auth.login.config' is not set
So I think Im missing something while passing the config since it mentions something related to it, Im really new to all of this and I know nothing about java so I dont know how to proceed even reading the JAAS documentation.
The code of the pipeline is the following:
import apache_beam as beam
from apache_beam.io.kafka import ReadFromKafka
from apache_beam.options.pipeline_options import PipelineOptions
import os
import json
import logging
os.environ['GOOGLE_APPLICATION_CREDENTIALS']='credentialsOld.json'
with open('cluster.configuration.json') as cluster:
data=json.load(cluster)
cluster.close()
def logger(element):
logging.INFO('Something was found')
def main():
config={
"bootstrap.servers":data["bootstrap.servers"],
"security.protocol":data["security.protocol"],
"sasl.mechanisms":data["sasl.mechanisms"],
"sasl.username":data["sasl.username"],
"sasl.password":data["sasl.password"],
"session.timeout.ms":data["session.timeout.ms"],
"auto.offset.reset":"earliest"
}
print('======================================================')
beam_options = PipelineOptions(runner='DataflowRunner',project='project',experiments=['use_runner_v2'],streaming=True,save_main_session=True,job_name='kafka-stream-test')
with beam.Pipeline(options=beam_options) as p:
msgs = p | 'ReadKafka' >> ReadFromKafka(consumer_config=config,topics=['users'],expansion_service="localhost:8088")
msgs | beam.FlatMap(logger)
if __name__ == '__main__':
main()
I read something about passing a property java.security.auth.login.config in the config dictionary but since that example is with java and I´am using python Im really lost at what I have to pass or even if that´s the property I have to pass etc.
btw Im getting the api key and secret from here and this is what I am passing to sasl.username and sasl.password

I faced the same error the first time I tried the beam's expansion service. The key sasl.mechanisms that you are supplying is incorrect, try with sasl.mechanism also you do not need to supply the username and password since you are connection is authenticated by jasl basically the consumer_config like below worked for me:
config={
"bootstrap.servers":data["bootstrap.servers"],
"security.protocol":data["security.protocol"],
"sasl.mechanism":data["sasl.mechanisms"],
"session.timeout.ms":data["session.timeout.ms"],
"group.id":"tto",
"sasl.jaas.config":f'org.apache.kafka.common.security.plain.PlainLoginModule required serviceName="Kafka" username=\"{data["sasl.username"]}\" password=\"{data["sasl.password"]}\";',
"auto.offset.reset":"earliest"
}

I got a partial answer to this question since I fixed this problem but got into another one:
config={
"bootstrap.servers":data["bootstrap.servers"],
"security.protocol":data["security.protocol"],
"sasl.mechanisms":data["sasl.mechanisms"],
"sasl.username":data["sasl.username"],
"sasl.password":data["sasl.password"],
"session.timeout.ms":data["session.timeout.ms"],
"group.id":"tto",
"sasl.jaas.config":f'org.apache.kafka.common.security.plain.PlainLoginModule required serviceName="Kafka" username=\"{data["sasl.username"]}\" password=\"{data["sasl.password"]}\";',
"auto.offset.reset":"earliest"
}
I needed to provide the sasl.jaas.config porpertie with the api key and secret of my cluster and also the service name, however, now Im facing a different error whe running the pipeline on dataflow:
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired while fetching topic metadata
This error shows after 4-5 mins of trying to run the job on dataflow, actually I have no idea how to fix this but I think is related to my broker on confluent rejecting the connection, I think this could be related to the zone execution since the cluster is in a different zone than job region.
UPDATE:
I tested the code on linux/ubuntu and I dont know why but the expansión service gets downloaded automatically so you wont get unsoported signal error, still having some issues trying to autenticate to confluent kafka tho.

Can't Add IAM Policy to Glue Crawler with get_att

I am currently trying to add a policy statement to a glue crawler using the AWS CDK (Python) and am getting an issue with trying to retrieve the ARN of the crawler using the get_att() method from the crawler (documentation here). I have provided the code that I am using to create the crawler and would like to then use a policy document to add the statement to the resource. I'm happy to provide further info if anyone thinks it would help. Thanks in advance for your time!
from aws_cdk import (
aws_glue,
aws_iam
)
def new_glueCrawler(stack):
glue_job_role = aws_iam.Role(
stack,
'roleName',
role_name='roleName',
assumed_by=aws_iam.ServicePrincipal('glue.amazonaws.com'),
managed_policies=[aws_iam.ManagedPolicy.from_aws_managed_policy_name('service-role/AWSGlueServiceRole')])
def prepend(list, str):
str += '{0}'
list = [{"path": str.format(i)} for i in list]
return(list)
s3TargetList = prepend('pathList', 'bucketName')
glueCrawler = aws_glue.CfnCrawler(stack, 'crawlerName',
name='crawlerName',
role=glue_job_role.role_arn,
targets={"s3Targets": s3TargetList},
crawler_security_configuration='securityName',
database_name='dbName',
schedule=aws_glue.CfnCrawler.ScheduleProperty(schedule_expression='cron(5 2 * * ? *)'),
schema_change_policy=aws_glue.CfnCrawler.SchemaChangePolicyProperty(delete_behavior='DELETE_FROM_DATABASE',
update_behavior='UPDATE_IN_DATABASE'))
return glueCrawler
adminPolicyDoc = aws_iam.PolicyDocument()
adminPolicyDoc.add_statements([aws_iam.PolicyStatement(actions=['glue:StartCrawler'],
effect=aws_iam.Effect.ALLOW,
resources=[glueCrawler.get_att('arn')]
)
]
)
Unfortunately, with CfnCrawler, the process isn't as nice as it is with other objects in the CDK framework. For example, if you wanted to obtain the arn of a lambdaObject, you could simply call lambdaObject.function_arn. It doesn't appear that it is that easy with Crawler's. Any insight would be greatly appreciated!

So I was able to obtain the arn using the following code snippet where the crawler is the object that I am trying to get the arn for:
core.Stack.of(stack).format_arn(service='glue',resource='crawler',resource_name=crawler.name)

It looks like you are almost there, I believe the "secret string" to get the arn attribute is:
"resource.arn", so change this line:
resources=[glueCrawler.get_att('arn')]
to:
resources=[glueCrawler.get_att('resource.arn')]

Invalid AttributeDataType input, consider using the provided AttributeDataType enum

I am trying to create aws cognito user pool using aws cdk.
below is my code -
user_pool = _cognito.UserPool(
stack,
id="user-pool-id",
user_pool_name="temp-user-pool",
self_sign_up_enabled=True,
sign_in_aliases={
"username": False,
"email": True
},
required_attributes={
"email": True
}
)
I want to set "Attributes" section in User pool for email .
But above code gives me this exception -
Invalid AttributeDataType input, consider using the provided AttributeDataType enum. (Service: AWSCognitoIdentityProviderService; Status Code: 400; Error Code: InvalidParameterException; Request ID:
I have tried many scenarios but it didn't work. Am I missing something here. Any help would be appreciated. Thanks!
I was referring this AWS doc to create userpool - https://docs.aws.amazon.com/cdk/api/latest/python/aws_cdk.aws_cognito/UserPool.html and https://docs.aws.amazon.com/cdk/api/latest/python/aws_cdk.aws_cognito/RequiredAttributes.html#aws_cdk.aws_cognito.RequiredAttributes

According to a comment on this GitHub issue this error is thrown when an attempt is made to modify required attributes for a UserPool. This leaves you two options:
Update the code such that existing attributes are not modified.
Remove the UserPool and create a new one. E.g. cdk destroy followed by cdk deploy will recreate your whole stack (this is probably not what you want if your stack is in production).

https://github.com/terraform-providers/terraform-provider-aws/issues/3891
Found a way to get around it in production as well, where you don't need to recreate the user pool.

Using core.Token to pass a String Parameter as a number

I raised a feature request on the CDK github account recently and was pointed in the direction of Core.Token as being pretty much the exact functionality I was looking for. I'm now having some issues implementing it and getting similar errors, heres the feature request I raised previously: https://github.com/aws/aws-cdk/issues/3800
So my current code looks something like this:
fargate_service = ecs_patterns.LoadBalancedFargateService(
self, "Fargate",
cluster = cluster,
memory_limit_mib = core.Token.as_number(ssm.StringParameter.value_from_lookup(self, parameter_name='template-service-memory_limit')),
execution_role=fargate_iam_role,
container_port=core.Token.as_number(ssm.StringParameter.value_from_lookup(self, parameter_name='port')),
cpu = core.Token.as_number(ssm.StringParameter.value_from_lookup(self, parameter_name='template-service-container_cpu')),
image=ecs.ContainerImage.from_registry(ecrRepo)
)
When I try synthesise this code I get the following error:
jsii.errors.JavaScriptError:
Error: Resolution error: Supplied properties not correct for "CfnSecurityGroupEgressProps"
fromPort: "dummy-value-for-template-service-container_port" should be a number
toPort: "dummy-value-for-template-service-container_port" should be a number.
Object creation stack:
To me it seems to be getting past the validation requiring a number to be passed into the FargateService validation, but when it tried to create the resources after that ("CfnSecurityGroupEgressProps") it cant resolve the dummy string as a number. I'd appreciate any help on solving this or alternative suggestions to passing in values from AWS system params instead (I thought it might be possible to parse the values into here via a file pulled from S3 during the build pipeline or something along those lines, but that seems hacky).

With some help I think we've cracked this!
The problem was that I was passing "ssm.StringParameter.value_from_lookup" the solution is to provide the token with "ssm.StringParameter.value_for_string_parameter", when this is synthesised it stores the token and then upon deployment the value stored in system parameter store is substituted.
(We also came up with another approach for achieving similar which we're probably going to use over SSM approach, I've detailed below the code snippet if you're interested)
See the complete code below:
from aws_cdk import (
aws_ec2 as ec2,
aws_ssm as ssm,
aws_iam as iam,
aws_ecs as ecs,
aws_ecs_patterns as ecs_patterns,
core,
)
class GenericFargateService(core.Stack):
def __init__(self, scope: core.Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
containerPort = core.Token.as_number(ssm.StringParameter.value_for_string_parameter(
self, 'template-service-container_port'))
vpc = ec2.Vpc(
self, "cdk-test-vpc",
max_azs=2
)
cluster = ecs.Cluster(
self, 'cluster',
vpc=vpc
)
fargate_iam_role = iam.Role(self,"execution_role",
assumed_by = iam.ServicePrincipal("ecs-tasks"),
managed_policies=[iam.ManagedPolicy.from_aws_managed_policy_name("AmazonEC2ContainerRegistryFullAccess")]
)
fargate_service = ecs_patterns.LoadBalancedFargateService(
self, "Fargate",
cluster = cluster,
memory_limit_mib = 1024,
execution_role=fargate_iam_role,
container_port=containerPort,
cpu = 512,
image=ecs.ContainerImage.from_registry("000000000000.dkr.ecr.eu-west-1.amazonaws.com/template-service-ecr")
)
fargate_service.target_group.configure_health_check(path=self.node.try_get_context("health_check_path"), port="9000")
app = core.App()
GenericFargateService(app, "generic-fargate-service", env={'account':'000000000000', 'region': 'eu-west-1'})
app.synth()
Solutions to problems are like buses, apparently you spend ages waiting for one and then two arrive together. And I think this new bus is the option we're probably going to run with.
The plan is to have developers provide an override for the cdk.json file withing their code repos, which can then put parsed into the CDK pipeline where the generic code will be synthesised. This file will contain some "context", the context will then be used within the CDK to set our variables for the LoadBalancedFargate service.
I've included some code snippets for setting cdk.json file and then using its values within code below.
Example CDK.json:
{
"app": "python3 app.py",
"context": {
"container_name":"template-service",
"memory_limit":1024,
"container_cpu":512,
"health_check_path": "/gb/template/v1/status",
"ecr_repo": "000000000000.dkr.ecr.eu-west-1.amazonaws.com/template-service-ecr"
}
}
Python example for assigning context to variables:
memoryLimitMib = self.node.try_get_context("memory_limit")
I believe we could also use a Try/Catch block to assign some default values to this if not provided by the developer in their CDK.json file.
I hope this post has provided some useful information to those looking for ways to create a generic template for deploying CDK code! I don't know if we're doing the right thing here, but this tool is so new it feels like some common patterns dont exist yet.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

TaskDefinition CDK and LogConfiguration - python

Finally figured out that Logging option in add_container is the one.

Related

CICD for Azure SQL Server Using Python

How to connect kafka IO from apache beam to a cluster in confluent cloud

Can't Add IAM Policy to Glue Crawler with get_att

Invalid AttributeDataType input, consider using the provided AttributeDataType enum

Using core.Token to pass a String Parameter as a number

Categories

Resources