Deploying static site with s3 and cloudfront using python boto3 - python

Trying to automate the deployment of a static website using boto3. I have a static website (angular/javascript/html) sitting in a bucket, and need to use the aws cloudfront CDN.
Anyway, looks like making the s3 bucket and copying in the html/js is working fine.
import boto3
cf = boto3.client('cloudfront')
cf.create_distribution(DistributionConfig=dict(CallerReference='firstOne',
Aliases = dict(Quantity=1, Items=['mydomain.com']),
DefaultRootObject='index.html',
Comment='Test distribution',
Enabled=True,
Origins = dict(
Quantity = 1,
Items = [dict(
Id = '1',
DomainName='mydomain.com.s3.amazonaws.com')
]),
DefaultCacheBehavior = dict(
TargetOriginId = '1',
ViewerProtocolPolicy= 'redirect-to-https',
TrustedSigners = dict(Quantity=0, Enabled=False),
ForwardedValues=dict(
Cookies = {'Forward':'all'},
Headers = dict(Quantity=0),
QueryString=False,
QueryStringCacheKeys= dict(Quantity=0),
),
MinTTL=1000)
)
)
When I try to create the cloudfront distribution, I get the following error:
InvalidOrigin: An error occurred (InvalidOrigin) when calling the CreateDistribution operation: The specified origin server does not exist or is not valid.
An error occurred (InvalidOrigin) when calling the CreateDistribution operation: The specified origin server does not exist or is not valid.
Interestingly, it looks to be complaining about the origin, mydomain.com.s3.amazonaws.com, however when I create a distribution for the s3 bucket in the web console, it has no problem with the same origin domain name.
Update:
I can get this to work with boto with the following, but would rather use boto3:
import boto
c = boto.connect_cloudfront()
origin = boto.cloudfront.origin.S3Origin('mydomain.com.s3.amazonaws.com')
distro = c.create_distribution(origin=origin, enabled=False, comment='My new Distribution')

Turns out their is a required parameter that is not documented properly.
Since the Origin is a S3 bucket, you must have S3OriginConfig = dict(OriginAccessIdentity = '') defined even if OriginAccessIdentity not used, and is an empty string.
The following command works. Note, you still need a bucket policy to make the objects accessible, and a route53 entry to alias the cname we want to cloudfront generated hostname.
cf.create_distribution(DistributionConfig=dict(CallerReference='firstOne',
Aliases = dict(Quantity=1, Items=['mydomain.com']),
DefaultRootObject='index.html',
Comment='Test distribution',
Enabled=True,
Origins = dict(
Quantity = 1,
Items = [dict(
Id = '1',
DomainName='mydomain.com.s3.amazonaws.com',
S3OriginConfig = dict(OriginAccessIdentity = ''))
]),
DefaultCacheBehavior = dict(
TargetOriginId = '1',
ViewerProtocolPolicy= 'redirect-to-https',
TrustedSigners = dict(Quantity=0, Enabled=False),
ForwardedValues=dict(
Cookies = {'Forward':'all'},
Headers = dict(Quantity=0),
QueryString=False,
QueryStringCacheKeys= dict(Quantity=0),
),
MinTTL=1000)
)
)

Related

AWS CDK reference existing image on ECR

New to AWS CDK and I'm trying to create a load balanced fargate service with the construct ApplicationLoadBalancedFargateService.
I have an existing image on ECR that I would like to reference and use. I've found the ecs.ContainerImage.from_ecr_repository function, which I believe is what I should use in this case. However, this function takes an IRepository as a parameter and I cannot find anything under aws_ecr.IRepository or aws_ecr.Repository to reference a pre-existing image. These constructs all seem to be for making a new repository.
Anyone know what I should be using to get the IRepository object for an existing repo? Is this just not typically done this way?
Code is below. Thanks in Advance.
from aws_cdk import (
# Duration,
Stack,
# aws_sqs as sqs,
)
from constructs import Construct
from aws_cdk import (aws_ec2 as ec2, aws_ecs as ecs,
aws_ecs_patterns as ecs_patterns,
aws_route53,aws_certificatemanager,
aws_ecr)
class NewStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
_repo = aws_ecr.Repository(self, 'id1', repository_uri = repo_uri)
vpc = ec2.Vpc(self, "applications", max_azs=3) # default is all AZs in region
cluster = ecs.Cluster(self, "id2", vpc=vpc)
hosted_zone = aws_route53.HostedZone.from_lookup(self,
'id3',
domain_name = 'domain'
)
certificate = aws_certificatemanager.Certificate.from_certificate_arn(self,
id4,
'cert_arn'
)
image = ecs.ContainerImage.from_ecr_repository(self, _repo)
ecs_patterns.ApplicationLoadBalancedFargateService(self, "id5",
cluster=cluster, # Required
cpu=512, # Default is 256
desired_count=2, # Default is 1
task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
image = image,
container_port=8000),
memory_limit_mib=2048, # Default is 512
public_load_balancer=True,
domain_name = 'domain_name',
domain_zone = hosted_zone,
certificate = certificate,
redirect_http = True)
You are looking for from_repository_attributes() to create an instance of IRepository from an existing ECR repository.

Create or Replace AWS Glue Crawler

Using boto3:
Is it possible to check if AWS Glue Crawler already exists and create it if it doesn't?
If it already exists I need to update it.
How would the crawler create script look like?
Would this be similar to CREATE OR REPLACE TABLE in an RDBMS...
Has anyone done this or has recommendations?
Thank you :)
Michael
As far as I know, there is no API for this. We manually list the crawlers using list_crawlers and iterate through the list to decide whether to add or update the crawlers(update_crawler).
Check out the API #
https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/glue.html
Yes, you can do all of that using boto3, however, there is no single function that can do this all at once. Instead, you would have to make a series of the following API calls:
list_crawlers
get_crawler
update_crawler
create_crawler
Each time these function would return response, which you would need to parse/verify/check manually.
AWS is pretty good on their documentation, so definetely check it out. It might seem overwhelming, but at the beggining you might find it easy to simply copy and paste a request systex that they provide in docs and then strip down unnesessary parts etc. Although boto3 is very helpful with for autocompletion/suggestions but there is a project that can help with that mypy_boto3_builder and its predecessors mypy_boto3, boto3_type_annotations.
If something goes wrong, i.e you haven't specified some parameters correcly, their error responses are pretty good and helpful.
Here is an example of how you can list all existing crawlers
import boto3
from pprint import pprint
client = boto3.client('glue')
response = client.list_crawlers()
available_crawlers = response["CrawlerNames"]
for crawler_name in available_crawlers:
response = client.get_crawler(Name=crawler_name)
pprint(response)
Assuming that in IAM you have AWSGlueServiceRoleDefault with all required permissions for glue crawler, here is how you can create one:
response = client.create_crawler(
Name='my-crawler-via-api',
Role='AWSGlueServiceRoleDefault',
Description='Crawler generated with Python API', # optional
Targets={
'S3Targets': [
{
'Path': 's3://some/path/in/s3/bucket',
},
],
},
)
I ended up using standard Python exception handling:
#Instantiate the glue client.
glue_client = boto3.client(
'glue',
region_name = 'us-east-1'
)
#Attempt to create and start a glue crawler on PSV table or update and start it if it already exists.
try:
glue_client.create_crawler(
Name = 'crawler name',
Role = 'role to be used by glue to create the crawler',
DatabaseName = 'database where the crawler should create the table',
Targets =
{
'S3Targets':
[
{
'Path':'full s3 path to the directory that crawler should process'
}
]
}
)
glue_client.start_crawler(
Name = 'crawler name'
)
except:
glue_client.update_crawler(
Name = 'crawler name',
Role = 'role to be used by glue to create the crawler',
DatabaseName = 'database where the crawler should create the table',
Targets =
{
'S3Targets':
[
{
'Path':'full s3 path to the directory that crawler should process'
}
]
}
)
glue_client.start_crawler(
Name = 'crawler name'
)

How to Query Images AMI's from AWS Console based on their status : Available using Python boto3?

I need to get the Details of Images AMI's from AWS Console based on their State: Available.
When I tried it is getting stuck and not printing any line.
Python code 1:
conn = boto3.resource('ec2')
image = conn.describe_images()
print(image) # prints nothing
for img in image:
image_count.append(img)
print("img count ->" + str(len(image_count)))
#prints nothing
Is there any exact keywords for this Image AMI's Please correct me
An important thing to realize about AMIs is that every AMI is provided.
If you only wish to list the AMIs belonging to your own account, use Owners=self:
import boto3
ec2_client = boto3.client('ec2')
images = ec2_client.describe_images(Owners=['self'])
available = [i['ImageId'] for i in images['Images'] if i['State'] == 'available']
If you want to do your own filtering change describe_images(Filters...) to describe_images(). Note: describe_images() returns a lot of data. Be prepared to wait a few minutes. On my system 89,362 images for us-east-1.
import boto3
client = boto3.client('ec2')
image_count = []
response = client.describe_images(Filters=[{'Name': 'state', 'Values': ['available']}])
if 'Images' in response:
for img in response['Images']:
image_count.append(img)
print("img count ->" + str(len(image_count)))

boto3 searching unused security groups

I am using AWS Python SDK Boto3 and I am trying to know which security groups are unused. With boto2 I did it but I do not know how to do the same with boto3.
from boto.ec2.connection import EC2Connection
from boto.ec2.regioninfo import RegionInfo
import boto.sns
import sys
import logging
from security_groups_config import config
# Get settings from config.py
aws_access_key = config['aws_access_key']
aws_secret_key = config['aws_secret_key']
ec2_region_name = config['ec2_region_name']
ec2_region_endpoint = config['ec2_region_endpoint']
region = RegionInfo(name=ec2_region_name, endpoint=ec2_region_endpoint)
if aws_access_key:
conn = EC2Connection(aws_access_key, aws_secret_key, region=region)
else:
conn = EC2Connection(region=region)
sgs = conn.get_all_security_groups()
## Searching unused SG if the instances number is 0
def search_unused_sg(event, context):
for sg in sgs:
print sg.name, len(sg.instances())
Use the power of Boto3 and Python's list comprehension and sets to get what you want in 7 lines of code:
import boto3
ec2 = boto3.resource('ec2') #You have to change this line based on how you pass AWS credentials and AWS config
sgs = list(ec2.security_groups.all())
insts = list(ec2.instances.all())
all_sgs = set([sg.group_name for sg in sgs])
all_inst_sgs = set([sg['GroupName'] for inst in insts for sg in inst.security_groups])
unused_sgs = all_sgs - all_inst_sgs
Debug information
print 'Total SGs:', len(all_sgs)
print 'SGS attached to instances:', len(all_inst_sgs)
print 'Orphaned SGs:', len(unused_sgs)
print 'Unattached SG names:', unused_sgs
Output
Total SGs: 289
SGS attached to instances: 129
Orphaned SGs: 160
Unattached SG names: set(['mysg', '...
First , I suggest you relook how boto3 deal with credential. Better use a genereic AWS credential file , so in the future when required, you can switch to IAM roles base credential or AWS STS without changing your code.
import boto3
# You should use the credential profile file
ec2 = boto3.client("ec2")
# In boto3, if you have more than 1000 entries, you need to handle the pagination
# using the NextToken parameter, which is not shown here.
all_instances = ec2.describe_instances()
all_sg = ec2.describe_security_groups()
instance_sg_set = set()
sg_set = set()
for reservation in all_instances["Reservations"] :
for instance in reservation["Instances"]:
for sg in instance["SecurityGroups"]:
instance_sg_set.add(sg["GroupName"])
for security_group in all_sg["SecurityGroups"] :
sg_set.add(security_group ["GroupName"])
idle_sg = sg_set - instance_sg_set
Note : code are not tested. Please debug it as required.
Note: If you have ASG (Autoscalling group) that are in null state (count=0), when the ASG start adding adding the security groups, then it will adopt the orphan security groups. Keep in mind you need to check for the ASG security groups also
I used an alternative approach. If we skip the credentials discussion and go back to the main question "boto3 searching unused security groups" here is an option:
You go and enumerate the resource, in my case a network interface, because if you think about it, a security group has to be associated to a resource in order to be used.
My example:
client = boto3.client('ec2', region_name=region, aws_access_key_id=newsession_id, aws_secret_access_key=newsession_key, aws_session_token=newsession_token)
response = client.describe_network_interfaces()
for i in response["NetworkInterfaces"]:
#Check if the security group is attached
if 'Attachment' in i and i['Attachment']['Status'] == 'attached':
#Create a list with the attached SGs
groups = [g['GroupId'] for g in i['Groups']]
II used the network interface resource because I needed to get public IPs for the accounts.

How to create a signed cloudfront URL with Python?

I would like to know how to create a signed URL for cloudfront. The current working solution is unsecured, and I would like to switch the system to secure URL's.
I have tried using Boto 2.5.2 and Django 1.4
Is there a working example on how to use the boto.cloudfront.distribution.create_signed_url method? or any other solution that works?
I have tried the following code using the BOTO 2.5.2 API
def get_signed_url():
import boto, time, pprint
from boto import cloudfront
from boto.cloudfront import distribution
AWS_ACCESS_KEY_ID = 'YOUR_AWS_ACCESS_KEY_ID'
AWS_SECRET_ACCESS_KEY = 'YOUR_AWS_SECRET_ACCESS_KEY'
KEYPAIR_ID = 'YOUR_KEYPAIR_ID'
KEYPAIR_FILE = 'YOUR_FULL_PATH_TO_FILE.pem'
CF_DISTRIBUTION_ID = 'E1V7I3IOVHUU02'
my_connection = boto.cloudfront.CloudFrontConnection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
distros = my_connection.get_all_streaming_distributions()
oai = my_connection.create_origin_access_identity('my_oai', 'An OAI for testing')
distribution_config = my_connection.get_streaming_distribution_config(CF_DISTRIBUTION_ID)
distribution_info = my_connection.get_streaming_distribution_info(CF_DISTRIBUTION_ID)
my_distro = boto.cloudfront.distribution.Distribution(connection=my_connection, config=distribution_config, domain_name=distribution_info.domain_name, id=CF_DISTRIBUTION_ID, last_modified_time=None, status='Active')
s3 = boto.connect_s3()
BUCKET_NAME = "YOUR_S3_BUCKET_NAME"
bucket = s3.get_bucket(BUCKET_NAME)
object_name = "FULL_URL_TO_MP4_ECLUDING_S3_URL_DOMAIN_NAME EG( my/path/video.mp4)"
key = bucket.get_key(object_name)
key.add_user_grant("READ", oai.s3_user_id)
SECS = 8000
OBJECT_URL = 'FULL_S3_URL_TO_FILE.mp4'
my_signed_url = my_distro.create_signed_url(OBJECT_URL, KEYPAIR_ID, expire_time=time.time() + SECS, valid_after_time=None, ip_address=None, policy_url=None, private_key_file=KEYPAIR_FILE, private_key_string=KEYPAIR_ID)
Everything seems fine until the method create_signed_url. It returns an error.
Exception Value: Only specify the private_key_file or the private_key_string not both
Omit the private_key_string:
my_signed_url = my_distro.create_signed_url(OBJECT_URL, KEYPAIR_ID,
expire_time=time.time() + SECS, private_key_file=KEYPAIR_FILE)
That parameter is used to pass the actual contents of the private key file, as a string. The comments in the source explain that only one of private_key_file or private_key_string should be passed.
You can also omit all the kwargs which are set to None, since None is the default.

Categories