Handling S3 Bucket Trigger Event in Lambda Using Python - python

The AWS Lambda handler has a signature of
def lambda_handler(event, context):
However, I cannot find any documentation as to the event's structure when the trigger is an S3 Bucket receiving a put
I thought that it might be defined in the s3 console, but couldn't find that there.
Anyone have any leads?

The event from S3 to Lambda function will be in json format as shown below,
{
"Records":[
{
"eventVersion":"2.0",
"eventSource":"aws:s3",
"awsRegion":"us-east-1",
"eventTime":The time, in ISO-8601 format, for example, 1970-01-01T00:00:00.000Z, when S3 finished processing the request,
"eventName":"event-type",
"userIdentity":{
"principalId":"Amazon-customer-ID-of-the-user-who-caused-the-event"
},
"requestParameters":{
"sourceIPAddress":"ip-address-where-request-came-from"
},
"responseElements":{
"x-amz-request-id":"Amazon S3 generated request ID",
"x-amz-id-2":"Amazon S3 host that processed the request"
},
"s3":{
"s3SchemaVersion":"1.0",
"configurationId":"ID found in the bucket notification configuration",
"bucket":{
"name":"bucket-name",
"ownerIdentity":{
"principalId":"Amazon-customer-ID-of-the-bucket-owner"
},
"arn":"bucket-ARN"
},
"object":{
"key":"object-key",
"size":object-size,
"eTag":"object eTag",
"versionId":"object version if bucket is versioning-enabled, otherwise null",
"sequencer": "a string representation of a hexadecimal value used to determine event sequence,
only used with PUTs and DELETEs"
}
}
},
{
// Additional events
}
]
}
here is the link for aws documentation which can guide you. http://docs.aws.amazon.com/lambda/latest/dg/with-s3-example.html

I think your easiest route is just to experiment quickly:
Create a bucket using the console
Create a lambda that is triggered by puts to the bucket using the console
Ensure you choose the default execution role, so you create cloudwatch logs
The lambda function just needs to "print(event)" when called, which is then logged
Save an object to the bucket
You'll then see the event structure in the log - its pretty self explanatory.

Please refer this URL to get Event Message Structure: http://docs.aws.amazon.com/AmazonS3/latest/dev/notification-content-structure.html

Related

How to create a s3 trigger programmatically to invoke a lambda function in a different lambda function itself

How to create a s3 trigger programmatically in a lambda function to invoke another lambda function whenever a specific file is uploaded into s3 bucket. I tried using putbucketnotification configuration api of boto 3.Here I used this API in a lambda function which will create a s3 trigger for another lambda function. But the problem with this API is it is replacing the existing triggers with new one and not adding new triggers to the existing one and also I cant add both prefix and suffix ,can add only either prefix or suffix to the trigger. So is there any other way to add s3 trigger in the lambda function like with any other api in. I will share my code
import boto3
def lambda_handler(event, context):
client1 = boto3.client('lambda')
client = boto3.client(
's3',
aws_access_key_id='access key',
aws_secret_access_key='secret key')
response = client.put_bucket_notification_configuration(
Bucket='bucketname',
NotificationConfiguration= {'LambdaFunctionConfigurations':[{'LambdaFunctionArn': 'lambdafunction_arn', 'Events': ['s3:ObjectCreated:*'], "Filter": {
"Key": {
"FilterRules": [
{
"Name": "prefix",
"Value": "folder1/"
}
]
}}}]})

how to restart instance group via python google cloud library

I am not able to find any code sample or relevant documentation on python library for google cloud
Want to restart managed instance groups all vms via cloud function.
To list instances I am using something like this
import googleapiclient.discovery
def list_instances(compute, project, zone):
result = compute.instances().list(project=project, zone=zone).execute()
return result['items'] if 'items' in result else None
in requirement file I have
google-api-python-client==2.31.0
google-auth==2.3.3
google-auth-httplib2==0.1.0
From command line this is possible via SDK ->
https://cloud.google.com/sdk/gcloud/reference/compute/instance-groups/managed/rolling-action/restart
gcloud compute instance-groups managed rolling-action restart NAME [--max-unavailable=MAX_UNAVAILABLE] [--region=REGION | --zone=ZONE] [GCLOUD_WIDE_FLAG …]
But in python I am not able to write any code.
This is an incomplete answer since the python docs are pretty unreadable to me.
Looking at the gcloud cli code (which I couldn't find an official repo for so I looked here),
the restart command is triggered by something called a "minimal action".
minimal_action = (client.messages.InstanceGroupManagerUpdatePolicy.
MinimalActionValueValuesEnum.RESTART)
In the Python docs, there's references to these fields in the applyUpdatesToInstances method.
So I think the relevant code is something similar to:
compute.instanceGroupManagers().applyUpdatesToInstances(
project=project,
zone=zone,
instanceGroupManager='NAME',
body={"allInstances": True, "minimalAction": "RESTART"},
)
There may or may not be a proper Python object for the body, the docs aren't clear.
And the result seems to be an Operation object of some kind, but I don't know if there's execute() method or not.
This is confusing, because gcloud compute instance-groups managed rolling-action is syntactic sugar that does two things:
It turns on Proactive updater, by setting appropriate UpdatePolicy on the InstanceGroupManager resource
And it changes version name on the same resource to trigger an update.
It is covered in the docs in https://cloud.google.com/compute/docs/instance-groups/rolling-out-updates-to-managed-instance-groups#performing_a_rolling_replace_or_restart
Compare the gcloud and API tabs to get the idea.
Unfortunately I am illiterate in Python, so I am not able to translate it into Python code :(.
Using the documentation that #Grzenio provided, use patch() method to restart the instance group. See patch documentation to check its parameters.
This could be written in python using the code below. I provided the required parameters project,zone,instanceGroupManager and body. The value of body is from the example in the documentation.
import googleapiclient.discovery
import json
project = 'your-project-id'
zone = 'us-central1-a' # the zone of your instance group
instanceGroupManager = 'instance-group-1' # instance group name
body = {
"updatePolicy": {
"minimalAction": "RESTART",
"type": "PROACTIVE"
},
"versions": [{
"instanceTemplate": "global/instanceTemplates/instance-template-1",
"name": "v2"
}]
}
compute = googleapiclient.discovery.build('compute', 'v1')
rolling_restart = compute.instanceGroupManagers().patch(
project=project,
zone=zone,
instanceGroupManager=instanceGroupManager,
body=body
)
restart_operation = rolling_restart.execute() # execute the request
print(json.dumps(restart_operation,indent=2))
This will return an operation object and the instance group should restart in the rolling fashion:
{
"id": "3206367254887659944",
"name": "operation-1638418246759-5d221f9977443-33811aed-eed3ee88",
"zone": "https://www.googleapis.com/compute/v1/projects/your-project-id/zones/us-central1-a",
"operationType": "patch",
"targetLink": "https://www.googleapis.com/compute/v1/projects/your-project-id/zones/us-central1-a/instanceGroupManagers/instance-group-1",
"targetId": "810482163278776898",
"status": "RUNNING",
"user": "serviceaccountused#your-project-id.iam.gserviceaccount.com",
"progress": 0,
"insertTime": "2021-12-01T20:10:47.654-08:00",
"startTime": "2021-12-01T20:10:47.670-08:00",
"selfLink": "https://www.googleapis.com/compute/v1/projects/your-project-id/zones/us-central1-a/operations/operation-1638418246759-5d221f9977443-33811aed-eed3ee88",
"kind": "compute#operation"
}

SQS to AWS Lambda Function with AWS Chalice and BOTO3

I am using AWS SQS to store information coming in from an external server and then sending it to a Lambda function to process it and dequeue the information.
The information that I am sending in is in the form of a JSON and is being used as a python dictionary.
def lambda_handler(event, context):
for record in event['Records']:
messageHandler(record)
return {
'statusCode': 200,
'body': json.dumps('Batch Processed')
}
Assuming that the code for the messageHandler is working and properly implemented, how do I catch the messages from the queue in their batches. This is all being deployed by AWS Chalice without the use of CLI.
I am well out of my depth right now and have no idea why this is not working when I deploy it but is working when I trigger a normal Lambda Function in the AWS Console through the SQS Send/Recieve Message feature. As far as I know the triggers are set up correctly and they should have no issue.
If you have any questions please let me know.
The event that you are processing will look something like this:
{
"Records": [
{
"messageId": "11d6ee51-4cc7-4302-9e22-7cd8afdaadf5",
"receiptHandle": "AQEBBX8nesZEXmkhsmZeyIE8iQAMig7qw...",
"body": "Test message.",
"attributes": {
"ApproximateReceiveCount": "1",
"SentTimestamp": "1573251510774",
"SequenceNumber": "18849496460467696128",
"MessageGroupId": "1",
"SenderId": "AIDAIO23YVJENQZJOL4VO",
"MessageDeduplicationId": "1",
"ApproximateFirstReceiveTimestamp": "1573251510774"
},
"messageAttributes": {},
"md5OfBody": "e4e68fb7bd0e697a0ae8f1bb342846b3",
"eventSource": "aws:sqs",
"eventSourceARN": "arn:aws:sqs:us-east-2:123456789012:fifo.fifo",
"awsRegion": "us-east-2"
}
]
}
where the "body" is your json encoded message. You'll want your message handler function to do something like this:
def message_handler(event):
message = json.loads(event["body"])
# do stuff...
The return value from the lambda is pretty pointless if it is being used as the event target from sqs.

get step function execution result

I'm new to AWS and struggling with step function.
My workflow is like this:
client ('search_word')-> api gateway -> lambda function (invoke step function) -> step function (generate search output) -> client
Here's my invoke lambda function.
import json
import boto3
import uuid
client = boto3.client('stepfunctions')
def lambda_handler(event, context):
transactionId = str(uuid.uuid1())
print(transactionId)
input = {'TransactionId':transactionId,'text':'search_word'}
response = client.start_execution(
stateMachineArn='arn:aws:states:ap-northeast-2:xxxxxxxxxx:stateMachine:MyStateMachine',
name=transactionId,
input=json.dumps(input)
)
print(response)
return {
'statusCode': 200,
'body': json.dumps('Hello from Lambda!')
}
I want to get the execution result from step function and pass it to client. But I have no idea how to do it.
The workflow doesn't have to be what I suggested as long as I can give the execution result of step function to the client.
Here's my step function.
{
"Comment": "A simple AWS Step Functions state machine.",
"StartAt": "Tokenize",
"States": {
"Tokenize": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-northeast-2:xxxxxxxx:function:search_ko",
"Next": "Search"
},
"Search": {
"Type": "Task",
"Resource": "arn:aws:lambda:ap-northeast-2:xxxxxxx:function:BM-25-Get-Index",
"End": true
}
}
}
Please help.
Thanks in advance!
You are trying to combine api gateway which is synchronous, with step functions which are asynchronous. If you use step functions your flow should be asynchronous. You will not get the response with a single API call. You will need to create another API which will keep "polling" the step function to see if it has successfully executed and what the response is.
Here is the flow you can try out.
API call that initializes the step function. You can try directly connecting the API gateway post call with your step function and see if it returns the execution arn in response. If it does you won't need a lambda. If it doesn't, then stick to your current lambda and make it return execution arn.
Your app then gets the execution arn and uses it to call another API periodically. You can either call API periodically or have the lambda in backend keep polling for response if you think the time it takes to complete step function execution will be less than 30 seconds (API gateway timeout limit).
Your second API/Lambda then finds the result of execution using DescribeExecution, and returns the response.
Edit:
If you believe your step functions will execute consistently under 30 seconds, you can try using a single lambda to start the step function and then keep polling it for completion using DescribeExecution.

How to set lifecycle for bucket in Google Cloud Storage

I want to change lifecycle of "my-bucket". I have this piece of code.
from google.cloud import storage
client = storage.Client(project='my-project')
bucket = client.get_bucket('my-bucket')
rules = {
"action": {"type": "Delete"},
"condition": {
"age": 3
}
}
bucket.lifecycle_rules = rules
"bucket.lifecycle_rules = rules"
successfully set the lifecycle for bucket but somehow it didn't commit the change to remote side.
Can anyone help me with that?
Once you change the properties, you need to submit those changes.
Try adding this line:
bucket.patch()
https://googlecloudplatform.github.io/google-cloud-python/stable/storage-buckets.html#google.cloud.storage.bucket.Bucket.patch
You can also set lifecycle delete rules on your bucket as follows:
# age is in days
bucket.add_lifecycle_delete_rule(age=175)
# .patch() is needed to push changes to gcp
bucket.patch()
Similarly you can also change the storage class:
# set storage to nearline after 3 days
bucket.add_lifecycle_set_storage_class_rule(age=3, storage_class='NEARLINE')
bucket.patch()
See also:
https://cloud.google.com/storage/docs/samples/storage-enable-bucket-lifecycle-management

Categories