I'm currently utilising AWS Lambda to create snapshots of my database and delete snapshots older than 6 days. I'm using the Boto3 library to interface with the AWS API. I'm using a CloudWatch rule to trigger the deletion code every day.
Normally this is working fine, but I've come across an issue where at the start of the month (first 6 days) the delete script does not appear to delete any snapshots, even though snapshots older than 6 days exist.
The code is below:
import json
import boto3
from datetime import datetime, timedelta, tzinfo
class Zone(tzinfo):
def __init__(self,offset,isdst,name):
self.offset = offset
self.isdst = isdst
self.name = name
def utcoffset(self, dt):
return timedelta(hours=self.offset) + self.dst(dt)
def dst(self, dt):
return timedelta(hours=1) if self.isdst else timedelta(0)
def tzname(self,dt):
return self.name
UTC = Zone(10,False,'UTC')
# Setting retention period of 6 days
retentionDate = datetime.now(UTC) - timedelta(days=6)
def lambda_handler(event, context):
print("Connecting to RDS")
rds = boto3.setup_default_session(region_name='ap-southeast-2')
client = boto3.client('rds')
snapshots = client.describe_db_snapshots(SnapshotType='manual')
print('Deleting all DB Snapshots older than %s' % retentionDate)
for i in snapshots['DBSnapshots']:
if i['SnapshotCreateTime'] < retentionDate:
print ('Deleting snapshot %s' % i['DBSnapshotIdentifier'])
client.delete_db_snapshot(DBSnapshotIdentifier=i['DBSnapshotIdentifier']
)
Code looks perfectly fine and you are following the documentation
I would simply add
print(i['SnapshotCreateTime'], retentionDate)
in the for loop, the logs will tell you quickly what's going on in the beginning of every month.
Btw, are you using RDS from AWS? RDS supports automatic snapshot creation and you can also define a retention period. There is no need to create custom lambda scripts.
Due to the distributed nature of the CloudWatch Events and the target services, the delay between the time the scheduled rule is triggered and the time the target service honors the execution of the target resource might be several seconds. Your scheduled rule will be triggered within that minute but not on the precise 0th second.
In that case, your utc now will may miss a few seconds during execution there by retention date also may miss a few seconds. This should be very minimal but still there is a chance for missed deletion. Going by that, the subsequent run should delete the missed ones in the earlier run.
Related
I am making an API call to a private cloud to GET the alarms information and POST it to a ticketing tool. I already have my code written in python. And I want this code to run whenever new alarm comes in the system.For example i receive an alarm on time (T1) , code should run and get the new alarm. Once a new alarm is received at time (T2) it should only pull the new alarm. Idea is to create tickets whenever new alarm comes in the system.Whenever T2 is bigger than T1 code should run and pull the newest alarm.Here is my function to query the time(creation time is the time of alarm creation) :
def query_res():
search_period= datetime.now() - timedelta(days=1)
date_str= f"{search_period.strftime('%Y-%m-%dT%H:%M:%S.%f')[:-3]}Z"
query_filter=f"Severity eq Critical and CreationTime gt {date_str}"
query_select="CreationTime,Description,Name,OrigSeverity,AffectedMoDisplayName"
alarm_query= api_instance.get_cond_alarm_list(filter=query_filter,select=query_select)
alarm_res = alarm_query.results
return alarm_res
After I send a query to be executed in BigQuery, how can I find the execution time and other statistics about this job?
The Python API for BigQuery has a suggestive field timeline, but barely mentioned in the documentation. However, this method returns an empty iterable.
So far, I have been running this Python code, with the BigQuery Python API.
from google.cloud import bigquery
bq_client = bigquery.Client()
job = bq_client.query(some_sql_query, location="US")
ts = list(job.timeline)
if len(ts) > 0:
for t in ts:
print(t)
else:
print('No timeline results ???') # <-- no timeline results.
When your job is "DONE", the BigQuery API returns a JobStatistics object (described here) that is accessible in Python through the attributes of your job object.
The available attrs are accessible here.
For accessing the time taken by your job you mainly have the job.created, job.started and job.ended attributes.
To come back to your code snippet, you can try something like this:
from google.cloud import bigquery
bq_client = bigquery.client()
job = bq_client.query(some_sql_query, location="US")
while job.running():
print("Running..")
sleep(0.1)
print("The job duration was {}".format(job.ended - job.started))
In case you need to have job logs available after some time and ready for analysis I'd suggest creating a log sink for your big query jobs. It would create table in big query, where job logs are dumped. After that you can easily run analysis to determine how long they took and how much they cost.
First create a sink in Operations/Logging:
Make sure to set sink as BigQuery Dataset (pick one) and add query_job_completed as logs to include in the sink. After a while you will see new table in that data set similar to: {project}.{dataset}.cloudaudit_googleapis_com_data_access_{date}
Go you BigQuery and create a view. This view for example will show how much data each job consumed (and how much it cost):
SELECT
protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes as totalBilledBytes
,protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.endTime as endTime
,protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobConfiguration.query.query
,protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobConfiguration.query.destinationTable.datasetId targetDataSet
,protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobConfiguration.query.destinationTable.tableId targetTable
,protopayload_auditlog.authenticationInfo.principalEmail
,(protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes / 1000000000000.0) * 5.0 as billed
FROM `{project}.{dataset}.cloudaudit_googleapis_com_data_access_*`
where protopayload_auditlog.servicedata_v1_bigquery.jobCompletedEvent.job.jobStatistics.totalBilledBytes > 0
i'm trying to use the Azure blob service to upload video files to the cloud.
I'm trying to figure what happens if my internet where to suddenly go out in the middle of a transfer.
There seems to be no Exceptions thrown when the internet go out.
from azure.common import AzureException
from azure.storage.blob import AppendBlobService, BlockBlobService, ContentSettings
try:
self.append_blob_service.append_blob_from_path(self.container_name, blob_name, upload_queue.get(timeout=3))
except AzureException as ae:
print("hey i caught something") <-- this line never seem to run
If i put the internet back on the blob seem to upload itself after about 30 minutes. I can't find any information about this in the docs. How long does the append_blob_from_path function keep trying?
There is LinearRetry, ExponentialRetry, NoRetry and Custom Retry Policy.
The default is Linear which makes a max of 5 attempts 5 seconds apart. So if your net connection was down for < 25 seconds your upload will continue.
I am not sure if your internet connection was down for 30 mins. In that case it should have thrown and exception.
PS: You can look up the corresponding C# documentation for Retry policies.
Python SDK for Azure Storage is OpenSource : https://github.com/Azure/azure-storage-python
If we look on calls from append_blob_from_path() we can see following things:
There is a default socket timeout:
# Socket timeout in seconds
DEFAULT_SOCKET_TIMEOUT = 20
At the end it uses functions from StorageClient (AppendBlobService(BaseBlobService) -> BaseBlobService(StorageClient)) and StorageClient uses :
self.retry = ExponentialRetry().retry
ExponentialRetry has following constructor:
def __init__(self, initial_backoff=15, increment_base=3, max_attempts=3,
retry_to_secondary=False, random_jitter_range=3):
'''
Constructs an Exponential retry object. The initial_backoff is used for
the first retry. Subsequent retries are retried after initial_backoff +
increment_power^retry_count seconds. For example, by default the first retry
occurs after 15 seconds, the second after (15+3^1) = 18 seconds, and the
third after (15+3^2) = 24 seconds.
:param int initial_backoff:
The initial backoff interval, in seconds, for the first retry.
:param int increment_base:
The base, in seconds, to increment the initial_backoff by after the
first retry.
:param int max_attempts:
The maximum number of retry attempts.
:param bool retry_to_secondary:
Whether the request should be retried to secondary, if able. This should
only be enabled of RA-GRS accounts are used and potentially stale data
can be handled.
:param int random_jitter_range:
A number in seconds which indicates a range to jitter/randomize for the back-off interval.
For example, a random_jitter_range of 3 results in the back-off interval x to vary between x+3 and x-3.
'''
Also there is a RetryContext which is used by this _retry() function to decide if retry is needed
If you enable INFO-level logging in your code, you will see all retries:
# Basic configuration: configure the root logger, including 'azure.storage'
logging.basicConfig(format='%(asctime)s %(name)-20s %(levelname)-5s %(message)s', level=logging.INFO)
To resume:
You have (20 seconds of socket timeout + dynamic interval started from 15 seconds and randomly incremented each attempt) and you have 3 attempts. You can see what exactly happening when you enable INFO-level logging.
I'm working on a django application which reads csv file from dropbox, parse data and store it in database. For this purpose I need background task which checks if the file is modified or changed(updated) and then updates database.
I've tried 'Celery' but failed to configure it with django. Then I find django-background-tasks which is quite simpler than celery to configure.
My question here is how to initialize repeating tasks?
It is described in documentation
but I'm unable to find any example which explains how to use repeat, repeat_until or other constants mentioned in documentation.
can anyone explain the following with examples please?
notify_user(user.id, repeat=<number of seconds>, repeat_until=<datetime or None>)
repeat is given in seconds. The following constants are provided:
Task.NEVER (default), Task.HOURLY, Task.DAILY, Task.WEEKLY,
Task.EVERY_2_WEEKS, Task.EVERY_4_WEEKS.
You have to call the particular function (notify_user()) when you really need to execute it.
Suppose you need to execute the task while a request comes to the server, then it would be like this,
#background(schedule=60)
def get_csv(creds):
#read csv from drop box with credentials, "creds"
#then update the DB
def myview(request):
# do something with my view
get_csv(creds, repeat=100)
return SomeHttpResponse
Excecution Procedure
1. Request comes to the url hence it would dispatch to the corresponding view, here myview()
2. Excetes the line get_csv(creds, repeat=100) and then creates a async task in DB (it wont excetute the function now)
3. Returning the HTTP response to the user.
After 60 seconds from the time which the task creation, get_csv(creds) will excecutes repeatedly in every 100 seconds
For example, suppose you have the function from the documentation
#background(schedule=60)
def notify_user(user_id):
# lookup user by id and send them a message
user = User.objects.get(pk=user_id)
user.email_user('Here is a notification', 'You have been notified')
Suppose you want to repeat this task daily until New Years day of 2019 you would do the following
import datetime
new_years_2019 = datetime.datetime(2019, 01, 01)
notify_user(some_id, repeat=task.DAILY, repeat_until=new_years_2019)
Python 2.7
Boto3
I'm trying to get a timestamp of when the instance was stopped OR the time the last state transition took place OR a duration of how long the instance has been in the current state.
My goal is to test if an instance has been stopped for x hours.
For example,
instance = ec2.Instance('myinstanceID')
if int(instance.state['Code']) == 80:
stop_time = instance.state_change_time() #Dummy method.
Or something similar to that.
I see that boto3 has a launch_time method. And lots of ways to analyze state changes using state_transition_reason and state_reason but I'm not seeing anything regarding the state transition timestamp.
I've got to be missing something.
Here is the Boto3 docs for Instance "state" methods...
state
(dict) --
The current state of the instance.
Code (integer) --
The low byte represents the state. The high byte is an opaque internal value and should be ignored.
0 : pending
16 : running
32 : shutting-down
48 : terminated
64 : stopping
80 : stopped
Name (string) --
The current state of the instance.
state_reason
(dict) --
The reason for the most recent state transition.
Code (string) --
The reason code for the state change.
Message (string) --
The message for the state change.
Server.SpotInstanceTermination : A Spot instance was terminated due to an increase in the market price.
Server.InternalError : An internal error occurred during instance launch, resulting in termination.
Server.InsufficientInstanceCapacity : There was insufficient instance capacity to satisfy the launch request.
Client.InternalError : A client error caused the instance to terminate on launch.
Client.InstanceInitiatedShutdown : The instance was shut down using the shutdown -h command from the instance.
Client.UserInitiatedShutdown : The instance was shut down using the Amazon EC2 API.
Client.VolumeLimitExceeded : The limit on the number of EBS volumes or total storage was exceeded. Decrease usage or request an increase in your limits.
Client.InvalidSnapshot.NotFound : The specified snapshot was not found.
state_transition_reason
(string) --
The reason for the most recent state transition. This might be an empty string.
The EC2 instance has an attribute StateTransitionReason which also has the time the transition happened. Use Boto3 to get the time the instance was stopped.
print status['StateTransitionReason']
...
User initiated (2016-06-23 23:39:15 GMT)
The code below prints stopped time and current time. Use Python to parse the time and find the difference. Not very difficult if you know Python.
import boto3
import re
client = boto3.client('ec2')
rsp = client.describe_instances(InstanceIds=['i-03ad1f27'])
if rsp:
status = rsp['Reservations'][0]['Instances'][0]
if status['State']['Name'] == 'stopped':
stopped_reason = status['StateTransitionReason']
current_time = rsp['ResponseMetadata']['HTTPHeaders']['date']
stopped_time = re.findall('.*\((.*)\)', stopped_reason)[0]
print 'Stopped time:', stopped_time
print 'Current time:', current_time
Output
Stopped time: 2016-06-23 23:39:15 GMT
Current time: Tue, 20 Dec 2016 20:33:22 GMT
You might consider using AWS Config to view the configuration history of the instances.
AWS Config is a fully managed service that provides you with an AWS resource inventory, configuration history, and configuration change notifications to enable security and governance
The get-resource-config-history command can return information about an instance, so it probably has Stop & Start times. It will take a bit of parsing to extract the details.