i'm trying to use the Azure blob service to upload video files to the cloud.
I'm trying to figure what happens if my internet where to suddenly go out in the middle of a transfer.
There seems to be no Exceptions thrown when the internet go out.
from azure.common import AzureException
from azure.storage.blob import AppendBlobService, BlockBlobService, ContentSettings
try:
self.append_blob_service.append_blob_from_path(self.container_name, blob_name, upload_queue.get(timeout=3))
except AzureException as ae:
print("hey i caught something") <-- this line never seem to run
If i put the internet back on the blob seem to upload itself after about 30 minutes. I can't find any information about this in the docs. How long does the append_blob_from_path function keep trying?
There is LinearRetry, ExponentialRetry, NoRetry and Custom Retry Policy.
The default is Linear which makes a max of 5 attempts 5 seconds apart. So if your net connection was down for < 25 seconds your upload will continue.
I am not sure if your internet connection was down for 30 mins. In that case it should have thrown and exception.
PS: You can look up the corresponding C# documentation for Retry policies.
Python SDK for Azure Storage is OpenSource : https://github.com/Azure/azure-storage-python
If we look on calls from append_blob_from_path() we can see following things:
There is a default socket timeout:
# Socket timeout in seconds
DEFAULT_SOCKET_TIMEOUT = 20
At the end it uses functions from StorageClient (AppendBlobService(BaseBlobService) -> BaseBlobService(StorageClient)) and StorageClient uses :
self.retry = ExponentialRetry().retry
ExponentialRetry has following constructor:
def __init__(self, initial_backoff=15, increment_base=3, max_attempts=3,
retry_to_secondary=False, random_jitter_range=3):
'''
Constructs an Exponential retry object. The initial_backoff is used for
the first retry. Subsequent retries are retried after initial_backoff +
increment_power^retry_count seconds. For example, by default the first retry
occurs after 15 seconds, the second after (15+3^1) = 18 seconds, and the
third after (15+3^2) = 24 seconds.
:param int initial_backoff:
The initial backoff interval, in seconds, for the first retry.
:param int increment_base:
The base, in seconds, to increment the initial_backoff by after the
first retry.
:param int max_attempts:
The maximum number of retry attempts.
:param bool retry_to_secondary:
Whether the request should be retried to secondary, if able. This should
only be enabled of RA-GRS accounts are used and potentially stale data
can be handled.
:param int random_jitter_range:
A number in seconds which indicates a range to jitter/randomize for the back-off interval.
For example, a random_jitter_range of 3 results in the back-off interval x to vary between x+3 and x-3.
'''
Also there is a RetryContext which is used by this _retry() function to decide if retry is needed
If you enable INFO-level logging in your code, you will see all retries:
# Basic configuration: configure the root logger, including 'azure.storage'
logging.basicConfig(format='%(asctime)s %(name)-20s %(levelname)-5s %(message)s', level=logging.INFO)
To resume:
You have (20 seconds of socket timeout + dynamic interval started from 15 seconds and randomly incremented each attempt) and you have 3 attempts. You can see what exactly happening when you enable INFO-level logging.
Related
I have a python library which must be fast enough for online application. If a particular request (function call) takes too long, I want to just bypass this request with an empty result returned.
The function looks like the following:
def fast_function(text):
result = mylibrary.process(text)
...
If the mylibrary.process spend time more than a threshold limit, i.e. 100 milliseconds, I want to bypass this request and proceed to process the next 'text'.
What's the normal way to handle this? Is this a normal scenario? My application can afford to bypass a very small number of requests like this, if it takes too long.
One way is to use a signal timer. As an example:
import signal
def took_too_long():
raise TimeoutError
signal.signal(signal.SIGALRM, took_too_long)
signal.setitimer(signal.ITIMER_REAL, 0.1) # 0.1 seconds
try:
result = mylibrary.process(text)
signal.setitimer(signal.ITIMER_REAL, 0) # success, reset to 0 to disable the timer
except TimeoutError:
# took too long, do something
You'll have to experiment to see if this does or does not add too much overhead.
You can add a timeout to your function.
One way to implement it is to use a timeout decorator which will throw an exception if the function runs for more than the defined timeout. In order to pass to the next operation you can catch the exception thrown by the timeout.
Install this one for example: pip install timeout-decorator
import timeout_decorator
#timeout_decorator.timeout(5) # timeout of 5 seconds
def fast_function(text):
result = mylibrary.process(text)
I have the following function,
import requests
def get_url_type(data):
x = {}
for i in range(0,len(data)):
print i
try:
x[i] = requests.head(data['url'][i]).headers.get('content-type')
except:
x[i] = 'Not Available'
return(x)
This function returns the URL type of each URL that is being passed to it and whenever there is no response, it throws error which is caught using exception. My problem here is, some of the requests take more than 5-10 mins time which is too much on production environment. I want the function to return "Not Available" when it takes more than 5 mins. When I did a research about it, it was mentioned to convert the function to asynchronous one. I have trying to change it without much success.
The following is what I have tried,
import asyncio
import time
from datetime import datetime
async def custom_sleep():
print('SLEEP', datetime.now())
time.sleep(5)
My objective is, whenever the request function takes more than 5 mins, it should return "Not available" and move to the next iteration.
Can anybody help me in doing this?
Thanks in advance !
It seems you just want a request to time out after a given time has passed without reply and move on to the next request. For this functionality there is a timeout parameter you can add to your request. The documentation on this: http://docs.python-requests.org/en/master/user/quickstart/#timeouts.
With a 300 seconds (5 minutes) timeout your code becomes:
requests.head(data['url'][i], timeout=300)
The asynchronous functionality you are mentioning has actually a different objective. It would allow your code to not have to wait the 5 minutes at all before continuing execution but I believe that would be a different question.
Python 2.7
Boto3
I'm trying to get a timestamp of when the instance was stopped OR the time the last state transition took place OR a duration of how long the instance has been in the current state.
My goal is to test if an instance has been stopped for x hours.
For example,
instance = ec2.Instance('myinstanceID')
if int(instance.state['Code']) == 80:
stop_time = instance.state_change_time() #Dummy method.
Or something similar to that.
I see that boto3 has a launch_time method. And lots of ways to analyze state changes using state_transition_reason and state_reason but I'm not seeing anything regarding the state transition timestamp.
I've got to be missing something.
Here is the Boto3 docs for Instance "state" methods...
state
(dict) --
The current state of the instance.
Code (integer) --
The low byte represents the state. The high byte is an opaque internal value and should be ignored.
0 : pending
16 : running
32 : shutting-down
48 : terminated
64 : stopping
80 : stopped
Name (string) --
The current state of the instance.
state_reason
(dict) --
The reason for the most recent state transition.
Code (string) --
The reason code for the state change.
Message (string) --
The message for the state change.
Server.SpotInstanceTermination : A Spot instance was terminated due to an increase in the market price.
Server.InternalError : An internal error occurred during instance launch, resulting in termination.
Server.InsufficientInstanceCapacity : There was insufficient instance capacity to satisfy the launch request.
Client.InternalError : A client error caused the instance to terminate on launch.
Client.InstanceInitiatedShutdown : The instance was shut down using the shutdown -h command from the instance.
Client.UserInitiatedShutdown : The instance was shut down using the Amazon EC2 API.
Client.VolumeLimitExceeded : The limit on the number of EBS volumes or total storage was exceeded. Decrease usage or request an increase in your limits.
Client.InvalidSnapshot.NotFound : The specified snapshot was not found.
state_transition_reason
(string) --
The reason for the most recent state transition. This might be an empty string.
The EC2 instance has an attribute StateTransitionReason which also has the time the transition happened. Use Boto3 to get the time the instance was stopped.
print status['StateTransitionReason']
...
User initiated (2016-06-23 23:39:15 GMT)
The code below prints stopped time and current time. Use Python to parse the time and find the difference. Not very difficult if you know Python.
import boto3
import re
client = boto3.client('ec2')
rsp = client.describe_instances(InstanceIds=['i-03ad1f27'])
if rsp:
status = rsp['Reservations'][0]['Instances'][0]
if status['State']['Name'] == 'stopped':
stopped_reason = status['StateTransitionReason']
current_time = rsp['ResponseMetadata']['HTTPHeaders']['date']
stopped_time = re.findall('.*\((.*)\)', stopped_reason)[0]
print 'Stopped time:', stopped_time
print 'Current time:', current_time
Output
Stopped time: 2016-06-23 23:39:15 GMT
Current time: Tue, 20 Dec 2016 20:33:22 GMT
You might consider using AWS Config to view the configuration history of the instances.
AWS Config is a fully managed service that provides you with an AWS resource inventory, configuration history, and configuration change notifications to enable security and governance
The get-resource-config-history command can return information about an instance, so it probably has Stop & Start times. It will take a bit of parsing to extract the details.
I'm currently utilising AWS Lambda to create snapshots of my database and delete snapshots older than 6 days. I'm using the Boto3 library to interface with the AWS API. I'm using a CloudWatch rule to trigger the deletion code every day.
Normally this is working fine, but I've come across an issue where at the start of the month (first 6 days) the delete script does not appear to delete any snapshots, even though snapshots older than 6 days exist.
The code is below:
import json
import boto3
from datetime import datetime, timedelta, tzinfo
class Zone(tzinfo):
def __init__(self,offset,isdst,name):
self.offset = offset
self.isdst = isdst
self.name = name
def utcoffset(self, dt):
return timedelta(hours=self.offset) + self.dst(dt)
def dst(self, dt):
return timedelta(hours=1) if self.isdst else timedelta(0)
def tzname(self,dt):
return self.name
UTC = Zone(10,False,'UTC')
# Setting retention period of 6 days
retentionDate = datetime.now(UTC) - timedelta(days=6)
def lambda_handler(event, context):
print("Connecting to RDS")
rds = boto3.setup_default_session(region_name='ap-southeast-2')
client = boto3.client('rds')
snapshots = client.describe_db_snapshots(SnapshotType='manual')
print('Deleting all DB Snapshots older than %s' % retentionDate)
for i in snapshots['DBSnapshots']:
if i['SnapshotCreateTime'] < retentionDate:
print ('Deleting snapshot %s' % i['DBSnapshotIdentifier'])
client.delete_db_snapshot(DBSnapshotIdentifier=i['DBSnapshotIdentifier']
)
Code looks perfectly fine and you are following the documentation
I would simply add
print(i['SnapshotCreateTime'], retentionDate)
in the for loop, the logs will tell you quickly what's going on in the beginning of every month.
Btw, are you using RDS from AWS? RDS supports automatic snapshot creation and you can also define a retention period. There is no need to create custom lambda scripts.
Due to the distributed nature of the CloudWatch Events and the target services, the delay between the time the scheduled rule is triggered and the time the target service honors the execution of the target resource might be several seconds. Your scheduled rule will be triggered within that minute but not on the precise 0th second.
In that case, your utc now will may miss a few seconds during execution there by retention date also may miss a few seconds. This should be very minimal but still there is a chance for missed deletion. Going by that, the subsequent run should delete the missed ones in the earlier run.
TL;DR:
How can I work around this bug in Appengine: sometimes is_shutting_down returns False, and in a second or two, the instance is shut down?
Details
I have a backend instance on a Google Appengine application (Python). The backend instance is used to generate reports, which sometimes takes minutes or even hours to finish.
To deal with unexpected shutdowns, I am watching for runtime.is_shutting_down() and store the report's intermediate state into DB when is_shutting_down returns True.
Here's the portion of code where I check it:
from google.appengine.api import runtime
#...
def my_report_function():
#...
# Check if we should interrupt and reschedule to avoid timeout error.
duration_sec = time.time() - start
too_long = MAX_SEC < duration_sec
is_shutting_down = runtime.is_shutting_down()
log.debug('Does this report iteration need to wrap it up soon? '
'Too long? %s (%s sec). Shutting down? %s'
% (too_long, duration_sec, is_shutting_down))
if too_long or is_shutting_down:
# save the state of report, reschedule next iteration, and return
Sometimes it works, but sometimes I see the following in the Appengine log:
D 2013-06-20 18:41:56.893 Does this report iteration need to wrap it up soon? Too long? False (348.865950108 sec). Shutting down? False
E 2013-06-20 18:42:00.248 Process terminated because the backend took too long to shutdown.
Clearly, the 30-second timeout has not passed between the time when I checked the value returned by runtime.is_shutting_down(), and when Appengine killed the backend.
Does anybody know why this is happening, and whether there is a workaround for this?
Thank you in advance!
There is demo code from Google IO here http://backends-io.appspot.com/
The included counter_v3_with_write_behind.py demonstrates a pattern:
On '/_ah/start' set a shutdown hook via
runtime.set_shutdown_hook(something_to_save_progress_and_requeue_task)
It looks like your code is 'are you shutting down right now, if not, go do something that may take a while'. This pattern should listen for 'shut down ASAP or you lose everything'.