AWS Batch Operation - completion reports, no message - python

I make batch operations in my lambda function on a huge number of csv files. I want to have the content of error/exception in my completion reports. I have only 5% of error files so lambda works fine, but it doesn't write errors in the report.
When I test my lambda on a file that leads to errors, I see that "ResultMessage" is the same as error or exception. I tried adding a string with exception to report but the last column is always Null.
Can you help me?
except ClientError as e:
# If request timed out, mark as a temp failure
# and S3 Batch Operations will make the task for retry. If
# any other exceptions are received, mark as permanent failure.
errorCode = e.response['Error']['Code']
errorMessage = e.response['Error']['Message']
if errorCode == 'RequestTimeout':
resultCode = 'TemporaryFailure'
resultString = 'Retry request to Amazon S3 due to timeout.'
else:
resultCode = 'PermanentFailure'
resultString = '{}: {}'.format(errorCode, errorMessage)
except Exception as e:
# Catch all exceptions to permanently fail the task
resultCode = 'PermanentFailure'
resultString = 'Exception: {}'.format(e)
finally:
results.append({
'taskId': taskId,
'resultCode': resultCode,
'ResultMessage': resultString
})
return {
'invocationSchemaVersion': invocationSchemaVersion,
'invocationId': invocationId,
'results': results
}
Example rows of my report with failed csv

There's nothing obviously wrong with your code.
I checked the docs:
Response and result codes
There are two levels of codes that S3 Batch Operations expect from Lambda functions. The first is the response code for the entire request, and the second is a per-task result code. The following table contains the response codes.
Response code
Description
Succeeded
The task completed normally. If you requested a job completion report, the task's result string is included in the report.
TemporaryFailure
The task suffered a temporary failure and will be redriven before the job completes. The result string is ignored. If this is the final redrive, the error message is included in the final report.
PermanentFailure
The task suffered a permanent failure. If you requested a job-completion report, the task is marked as Failed and includes the error message string.
Sounds to me like you'd need to look into the Job Completion Report to get more details.

Related

Google cloud DLP API: How to get full dlp job inspection results when inspecting google cloud storage files

I am running dlp job inspections from google cloud storage and i was wondering if there is a method or way to get the full inspection results instead of the summary just the same way as inspecting external files? Here is a code snippet of how i am getting my inspection results when scanning external and local files:
# Print out the results.
results = []
if response.result.findings:
for finding in response.result.findings:
finding_dict = {
"quote": finding.quote if "quote" in finding else None,
"info_type": finding.info_type.name,
"likelihood": finding.likelihood.name,
"location_start": finding.location.byte_range.start,
"location_end": finding.location.byte_range.end
}
results.append(finding_dict)
else:
print("No findings.")
The output looks like this:
{
"quote": "gitlab.com",
"info_type": "DOMAIN_NAME",
"likelihood": "LIKELY",
"location_start": 3015,
"location_end": 3025
},
{
"quote": "www.makeareadme.com",
"info_type": "DOMAIN_NAME",
"likelihood": "LIKELY",
"location_start": 3107,
"location_end": 3126
}
But when scanning google cloud storage items using the dlp_get_job method with pub/sub this way:
def callback(message):
try:
if message.attributes["DlpJobName"] == operation.name:
# This is the message we're looking for, so acknowledge it.
message.ack()
# Now that the job is done, fetch the results and print them.
job = dlp_client.get_dlp_job(request={"name": operation.name})
if job.inspect_details.result.info_type_stats:
for finding in job.inspect_details.result.info_type_stats:
print(
"Info type: {}; Count: {}".format(
finding.info_type.name, finding.count
)
)
else:
print("No findings.")
# Signal to the main thread that we can exit.
job_done.set()
else:
# This is not the message we're looking for.
message.drop()
except Exception as e:
# Because this is executing in a thread, an exception won't be
# noted unless we print it manually.
print(e)
raise
The results are in this summary format:
Info type: LOCATION; Count: 18
Info type: DATE; Count: 12
Info type: LAST_NAME; Count: 4
Info type: DOMAIN_NAME; Count: 170
Info type: URL; Count: 20
Info type: FIRST_NAME; Count: 7
is there a way to get the detailed inspection results when scanning files on google cloud storage where i will get the quote, info_type, likelihood etc...without being summarized? I have tried a couple of methods and read through almost the docs but i am not finding anything that can help. I am running the inspection job on a windows environment with the dlp python client api. I would appreciate anyone's help with this;)
Yes you can do this. Since the detailed inspection results can be sensitive, those are not kept in the job details/summary, but you can configure a job "action" to write the detailed results to a BigQuery table that you own/control. This way you can get access to the details of every finding (file or table path, column name, byte offset, optional quote, etc.).
The API details for that are here: https://cloud.google.com/dlp/docs/reference/rest/v2/Action#SaveFindings
Below are some more docs on how to query the detailed findings:
https://cloud.google.com/dlp/docs/querying-findings
https://cloud.google.com/dlp/docs/analyzing-and-reporting
Also more details on DLP Job Actions: https://cloud.google.com/dlp/docs/concepts-actions

How to change Prometheus error message if token is invalid?

I have a python file called main.py and I am trying to make a Prometheus connection with a specific token. However, if the token is expired, instead of the error message printing out prometheus_api_client.exceptions.PrometheusApiClientException, how can I get the error message to print our like status_code: 500, reason: Invalid token using a try and except block.
Code:
#token="V0aksn-as9ckcnblqc6bi3ans9cj1nsk" #example, expired token
token ="c0ams7bnskd9dk1ndk7aKNYTVOVRBajs" #example, valid token
pc = PrometheusConnect(url = url, headers={"Authorization": "bearer {}".format(token)}, disable_ssl=True)
try:
#Not entirely sure what to put here and the except block
except:
I've tested out a couple code in the try and except blocks and could not get rid of the long error from Prometheus. Any suggestions?
How about putting your pc variable in try and PrometheusApiClientException for the exception. If that doesn't work, go to the source file and use whatever exception developers used while making authorization.
This is how you catch that exception in a try/except block
try:
# Interact with prometheus here
pass
except prometheus_api_client.exceptions.PrometheusApiClientException as e:
print('status_code: 500, reason: Invalid token')

boto3 lambda payload always returns NULL [python, sync invoked]

I am using Lambda to trigger a step function. It works fine in terms of triggering the step function, but I also need the lambda to return the state machine execution arn (NOT state machine arn). I need the execution arn because I am implementing the whole process as a github action workflow, so I need it to check the status (running/success/failed/aborted) of the state machine.
My code of getting the lambda return for github action, wrapped as docker-compose service:
client = boto3.client("lambda", region_name="us-west-1")
lambda_response = client.invoke(
FunctionName="my-lambda",
InvocationType="RequestResponse",
Payload=json.dumps({"detail-type": "gh-action"}),
)
payload = json.loads(lambda_response["Payload"].read()) # tried .decode() too
print("payload:", payload) # payload prints None as a whole, not {"sfn_exe_arn": None}
The relevant part of my lambda function:
try:
client = boto3.client("stepfunctions")
response = client.start_execution(
stateMachineArn=STATE_MACHINE_ARN,
name=run_name,
input=json.dumps(
{"runName": run_name, "model_name": MODEL_NAME, "queries": QUERIES}
),
)
sfn_exe_arn = response["executionArn"]
except Exception as e:
raise e
return {"sfn_exe_arn": sfn_exe_arn}
# this `sfn_exe_arn` can print out with expected value in console
# but it does not return when called in github action
When I invoke this lambda from the console, most of the time it returns as expected, which is {"sfn_exe_arn": sfn_exe_arn}, but sometime it also returns null.
When I invoke this lambda as part of github action workflow, the return is always null (the lambda_response is returned, just the payload part is always null)
Can anyone help me understand why there is this gap? apparently my lambda got the executionArn, but it just doesn't return to the client.invoke()
The entire lambda_response (it is named response in the screenshot):
You have to decode the bytestream that you get from StreamingBody.read()
add .decode() to the bytes object you get from reading the reponse payload.
payload = json.loads(lambda_response["Payload"].read().decode())
My bad, I didn't try thoroughly other posts' answers. Thanks #jarmod pointed out the solution in the comments: you need to assign StreamingBody to a variable before read it. Link: Lambda Return Payload botocore.response.StreamingBody object prints but then empty in variable

Constantly polling SQS Queue using infinite loop

I have an SQS queue that I need to constantly monitor for incoming messages. Once a message arrives, I do some processing and continue to wait for the next message. I achieve this by setting up an infinite loop with a 2 second pause at the end of the loop. This works however I can't help but feel this isn't a very efficient way of solving the need to constantly pole the queue.
Code example:
while (1):
response = sqs.receive_message(
QueueUrl=queue_url,
AttributeNames=[
'SentTimestamp'
],
MaxNumberOfMessages=1,
MessageAttributeNames=[
'All'
],
VisibilityTimeout=1,
WaitTimeSeconds=1
)
try:
message = response['Messages'][0]
receipt_handle = message['ReceiptHandle']
# Delete received message from queue
sqs.delete_message(
QueueUrl=queue_url,
ReceiptHandle=receipt_handle
)
msg = message['Body']
msg_json = eval(msg)
value1 = msg_json['value1']
value2 = msg_json['value2']
process(value1, value2)
except:
pass
#print('Queue empty')
time.sleep(2)
In order to exit the script cleanly (which should run constantly), I catch the KeyboardInterrupt which gets triggered on Ctrl+C and do some clean-up routines to exit gracefully.
if __name__ == '__main__':
try:
main()
except KeyboardInterrupt:
logout()
Is there a better way to achieve the constant poling of the SQS queue and is the 2 second delay necessary? I'm trying not to hammer the SQS service, but perhaps it doesn't matter?
This is ultimately the way that SQS works - it requires something to poll it to get the messages. But some suggestions:
Don't get just a single message each time. Do something more like:
messages = sqs.receive_messages(
MessageAttributeNames=['All'],
MaxNumberOfMessages=10,
WaitTimeSeconds=10
)
for msg in messages:
logger.info("Received message: %s: %s", msg.message_id, msg.body)
This changes things a bit for you. The first thing is that you're willing to get up to 10 messages (this is the maximum number for SQS in one call). The second is that you will wait up to 10 seconds to get the messages. From the SQS docs:
The duration (in seconds) for which the call waits for a message to
arrive in the queue before returning. If a message is available, the
call returns sooner than WaitTimeSeconds. If no messages are available
and the wait time expires, the call returns successfully with an empty
list of messages.
So you don't need your own sleep call - if there are no messages the call will wait until it expires. Conversely, if you have a ton of messages then you'll get them all as fast as possible as you won't have your own sleep call in the code.
Adding on #stdunbar Answer:
You will find that MaxNumberOfMessages as stated by the Docs might return fewer messages than the provided integer number, which was the Case for me.
MaxNumberOfMessages (integer) -- The maximum number of messages to return. Amazon SQS never returns more messages than this value (however, fewer messages might be returned). Valid values: 1 to 10. Default: 1.
As a result, i made this solution to read from SQS Dead-Letter-Queue:
def read_dead_letter_queue():
""" This function is responsible for Reading Query Execution IDs related to the insertion that happens on Athena Query Engine
and we weren't able to deal with it in the Source Queue.
Args:
None
Returns:
Dictionary: That consists of execution_ids_list, mssg_receipt_handle_list and queue_url related to messages in a Dead-Letter-Queue that's related to the insertion operation into Athena Query Engine.
"""
try:
sqs_client = boto3.client('sqs')
queue_url = os.environ['DEAD_LETTER_QUEUE_URL']
execution_ids_list = list()
mssg_receipt_handle_list = list()
final_dict = {}
# You can change the range stop number to whatever number that suits your scenario, you just need to add a number that's more than the number of messages that maybe in the Queue as 1 thousand or 1 million, as the loop will break out when there aren't any messages left in the Queue before reaching the end of the range.
for mssg_counter in range(1, 20, 1):
sqs_response = sqs_client.receive_message(
QueueUrl = queue_url,
MaxNumberOfMessages = 10,
WaitTimeSeconds = 10
)
print(f"This is the dead-letter-queue response --> {sqs_response}")
try:
for mssg in sqs_response['Messages']:
print(f"This is the message body --> {mssg['Body']}")
print(f"This is the message ID --> {mssg['MessageId']}")
execution_ids_list.append(mssg['Body'])
mssg_receipt_handle_list.append(mssg['ReceiptHandle'])
except:
print(f"Breaking out of the loop, as there isn't any message left in the Queue.")
break
print(f"This is the execution_ids_list contents --> {execution_ids_list}")
print(f"This is the mssg_receipt_handle_list contents --> {mssg_receipt_handle_list}")
# We return the ReceiptHandle to be able to delete the message after we read it in another function that's responsible for deletion.
# We return a dictionary consists of --> {execution_ids_list: ['query_exec_id'], mssg_receipt_handle_list: ['ReceiptHandle']}
final_dict['execution_ids_list'] = execution_ids_list
final_dict['mssg_receipt_handle_list'] = mssg_receipt_handle_list
final_dict['queue_url'] = queue_url
return final_dict
#TODO: We need to delete the message after we finish reading in another function that will delete messages for both the DLQ and the Source Queue.
except Exception as ex:
print(f"read_dead_letter_queue Function Exception: {ex}")

pub_sub action from google sample code errors with missing 1 required positional argument: 'callback'

I am setting up a google DLP scan on a big query table, to look for identifiable personal information. I have been working through the google sample code for this, but have had problems with the pub/sub element of the code
This is for a python google cloud function calling google dlp, using the google sample here using the method inspect_bigquery.
...
actions = [{
'pub_sub': {'topic': '{}/topics/{}'.format(parent, topic_id)},
'save_findings': {
'output_config': {
'table': {
'project_id': project,
'dataset_id': dataset_id,
'table_id': table_id + '_inspection_results',
}
}
},
}]
...
subscriber = google.cloud.pubsub.SubscriberClient()
subscription_path = subscriber.subscription_path(
project, subscription_id)
# subscription = subscriber.subscribe(subscription_path, callback)
subscription = subscriber.subscribe(subscription_path)
...
def callback(message):
try:
if (message.attributes['DlpJobName'] == operation.name):
# This is the message we're looking for, so acknowledge it.
message.ack()
# Now that the job is done, fetch the results and print them.
job = dlp.get_dlp_job(operation.name)
if job.inspect_details.result.info_type_stats:
for finding in job.inspect_details.result.info_type_stats:
print('Info type: {}; Count: {}'.format(
finding.info_type.name, finding.count))
else:
print('No findings.')
# Signal to the main thread that we can exit.
job_done.set()
else:
# This is not the message we're looking for.
message.drop()
except Exception as e:
# Because this is executing in a thread, an exception won't be
# noted unless we print it manually.
print(e)
raise
# Register the callback and wait on the event.
subscription.open(callback)
finished = job_done.wait(timeout=timeout)
if not finished:
print('No event received before the timeout. Please verify that the '
'subscription provided is subscribed to the topic provided.')
There are two errors I get with this, when I leave the subscribe method with just the subscription path, it errors with TypeError: subscribe() missing 1 required positional argument: 'callback'.
When I put the callaback into the subscribe method it fails with
Function execution took 60002 ms, finished with status: 'timeout'
No event received before the timeout. Please verify that the subscription provided is subscribed to the topic provided.
The save findings action does however work, and I am able to see the results in bigquery after a couple of seconds.
Thanks
Couple things:
1. Just so you know, you can leave table_id blank if you don't want to be in the business of generating them.
But to your actual question:
Are you running this within Cloud Functions by chance, which has execution deadlines? (https://cloud.google.com/functions/docs/concepts/exec#timeout)
If yes, you actually want to have a Cloud Function subscribe to the pub/sub via triggers (https://cloud.google.com/functions/docs/calling/pubsub), not in your code to avoid the timeouts. There is a specific DLP solution guide here on that https://cloud.google.com/solutions/automating-classification-of-data-uploaded-to-cloud-storage#create_pubsub_topic_and_subscription
Helpful at all?

Categories