Cloudformation wildcard search with boto3 - python

I have been tasked with converting some bash scripting used by my team that performs various cloudformation tasks into Python using the boto3 library. I am currently stuck on one item. I cannot seem to determine how to do a wildcard type search where a cloud formation stack name contains a string.
My bash version using the AWS CLI is as follows:
aws cloudformation --region us-east-1 describe-stacks --query "Stacks[?contains(StackName,'myString')].StackName" --output json > stacks.out
This works on the cli, outputting the results to a json file, but I cannot find any examples online to do a similar search for contains using boto3 with Python. Is it possible?
Thanks!

Yes, it is possible. What you are looking for is the following:
import boto3
# create a boto3 client first
cloudformation = boto3.client('cloudformation', region_name='us-east-1')
# use client to make a particular API call
response = cloudformation.describe_stacks(StackName='myString')
print(response)
# as an aside, you'd need a different client to communicate
# with a different service
# ec2 = boto3.client('ec2', region_name='us-east-1')
# regions = ec2.describe_regions()
where, response is a Python dictionary, which, among other things, will contain the description of the stack, "myString".

Related

Inconsistent access to subfolder of a bucket between gsutil and storage Client

As to avoid managing a large number of buckets for data received from a lot of devices, I plan to have them write the files they capture in folders of a single bucket instead of having one bucket for each device.
As to make sure each device can only write in its subfolder, I have set the IAM condition as described in this answer:
resource.name.startsWith('projects/_/buckets/dev_bucket/objects/test_folder')
My service account now has the Storage Object Creator and Storage Object viewer role with the condition above attached.
This is the (truncated only to this service account) output of the gcloud get-iam-policy <project> command
- condition:
expression: |-
resource.name.startsWith("projects/_/buckets/dev_bucket/objects/test_folder/")
title: only_test_subfolder
members:
- serviceAccount:myserviceaccount.iam.gserviceaccount.com
role: roles/storage.objectCreator
- condition:
expression: |-
resource.name.startsWith("projects/_/buckets/dev_bucket/objects/test_folder/")
title: only_test_subfolder
members:
- serviceAccount:myserviceaccount.iam.gserviceaccount.com
role: roles/storage.objectViewer
When using the gsutil command, everything seems to work fine
# Set the authentication via the service account json key
gcloud auth activate-service-account --key-file=/path/to/my/key.json
# all of these commands work fine
gcloud ls gs://dev_bucket/test_folder
gcloud cp gs://dev_bucket/test_folder/distant_file.txt local_file.txt
# These ones get a 403 as expected
gcloud ls gs://dev_bucket/
gcloud ls gs://another_bucket
gcloud_cp gs://dev_bucket/another_subfolder/somefile.txt local_file.txt
However, when I am trying to use the google storage client (v 2.1.0) I cannot manage to make it work, mainly because I am supposed to define the bucket before getting an object in this bucket.
import os
from google.cloud import storage
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="path/to/my/key.json"
client = storage.Client()
client.get_bucket("dev_bucket")
>>> Forbidden: 403 GET https://storage.googleapis.com/storage/v1/b/dev_bucket?projection=noAcl&prettyPrint=false: <Service account> does not have storage.buckets.get access to the Google Cloud Storage bucket.
I have also tried to list all files using the prefix argument, but get the same error:
client.list_blobs("dev_bucket", prefix="test_folder")
Is there a way to use the python storage client with this type of permissions ?
This is an expected behavior!
You are doing:
gsutil ls gs://dev_bucket/test_folder
gsutil cp gs://dev_bucket/test_folder/distant_file.txt local_file.txt
Both commands does not require any additional permissions other than storage.objects.get which your SA has it from the role Storage Object viewer
but in your code you are trying to access the bucket details (the bucket itself, not objects inside the bucket) so it won't work unless your SA have the permission storage.buckets.get
this line:
client.get_bucket("dev_bucket")
will perform a GET method on v1/buckets/get which requires the above mentioned IAM permission.
So, you need to modify your code to read objects only without accessing bucket details.
Here is a sample code for downloading objects from a bucket.
note: the method bucket(bucket_name, user_project=None) which is used in this sample code will not perform any HTTP requests as quoted from docs.
This will not make an HTTP request; it simply instantiates a bucket object owned by this client.
BTW, You can try to run something like:
gsutil ls -L -b gs://dev_bucket
I expect this command to give you the same error which you get from your code.
References:
https://cloud.google.com/storage/docs/access-control/iam-gsutil
https://cloud.google.com/storage/docs/access-control/iam-json

Using boto3 to filter s3 objects so that caller is not filtering

My s3 bucket has a thousands of files. They are all in the same "folder". So the s3 prefix is the same. I want to use python boto3 to get a list of filenames that contains a certain word. I dont want the boto3 call to send back all the filenames and have the client filter out the names. I seen example from using "yield" and ".filter" but those are receiving all the files and making the client do a lot of work.
To help give a better understanding, if I use the AWS CLI:
aws --profile test s3api list-objects-v2 --bucket mybucket --prefix tmp/ --output json --query "Contents[?contains(Key, 'foobar')]"
BUT I need to send a request using boto3 and AWS just send the filenames back with "foobar" in them.
BUT I need to send a request using boto3 and AWS just send the filenames back with "foobar" in them
You can't do this with regular boto3 s3 API calls as this is not how that API works. So if you don't want to get all the names first, and then filter them out by yourself, then there is no way to achieve what you want with just a single boto3 request.
The only help would maybe be from Amazon S3 inventory. So you could request the inventory, get the resulting CSV file, and filter that. But still you would have to filter it yourself.

creating Azure Data factory linked service using Managed identity with python

I'm using python scripts to create and manage data factory pipelines, when I want to create a linked service, I'm just using this code:
https://learn.microsoft.com/en-us/azure/data-factory/quickstart-create-data-factory-python#create-a-linked-service
but now I want to create the linked service using managed identity and not by name and key, and I can't find any example of how to do it with python.
I managed to do it manually like so:
but I want to do it using python.
thanks!
service_endpoint str Required Blob service endpoint of the Azure
Blob Storage resource. It is mutually exclusive with connectionString,
sasUri property.
According to the API documentation, you should use service_endpoint to create linked service with Managed identity. You should pass Blob service endpoint to service_endpoint.
The following is my test code:
ls_name = 'storageLinkedService001'
endpoint_string = 'https://<account name>.blob.core.windows.net'
ls_azure_storage = LinkedServiceResource(properties=AzureBlobStorageLinkedService(service_endpoint=endpoint_string))
ls = adf_client.linked_services.create_or_update(rg_name, df_name, ls_name, ls_azure_storage)
Result:

Accessing DynamoDB Local from boto3

I am doing AWS tutorial Python and DynamoDB. I downloaded and installed DynamoDB Local. I got the access key and secret access key. I installed boto3 for python. The only step I have left is setting up authentication credentials. I do not have AWS CLI downloaded, so where should I include access key and secret key and also the region?
Do I include it in my python code?
Do I make a file in my directory where I put this info? Then should I write anything in my python code so it can find it?
You can try passing the accesskey and secretkey in your code like this:
import boto3
session = boto3.Session(
aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY,
)
client = session.client('dynamodb')
OR
dynamodb = session.resource('dynamodb')
From the AWS documentation:
Before you can access DynamoDB programmatically or through the AWS
Command Line Interface (AWS CLI), you must configure your credentials
to enable authorization for your applications. Downloadable DynamoDB
requires any credentials to work, as shown in the following example.
AWS Access Key ID: "fakeMyKeyId"
AWS Secret Access Key:"fakeSecretAccessKey"
You can use the aws configure command of the AWS
CLI to set up credentials. For more information, see Using the AWS
CLI.
So, you need to create an .aws folder in yr home directory.
There create the credentials and config files.
Here's how to do this:
https://docs.aws.amazon.com/cli/latest/userguide/cli-configure-files.html
If you want to write portable code and keep in the spirit of developing 12-factor apps, consider using environment variables.
The advantage is that locally, both the CLI and the boto3 python library in your code (and pretty much all the other offical AWS SDK languages, PHP, Go, etc.) are designed to look for these values.
An example using the official Docker image to quickly start DynamoDB local:
# Start a local DynamoDB instance on port 8000
docker run -p 8000:8000 amazon/dynamodb-local
Then in a terminal, set some defaults that the CLI and SDKs like boto3 are looking for.
Note that these will be available until you close your terminal session.
# Region doesn't matter, CLI will complain if not provided
export AWS_DEFAULT_REGION=us-east-1
# Set some dummy credentials, dynamodb local doesn't care what these are
export AWS_ACCESS_KEY_ID=abc
export AWS_SECRET_ACCESS_KEY=abc
You should then be able to run the following (in the same terminal session) if you have the CLI installed. Note the --endpoint-url flag.
# Create a new table in DynamoDB Local
aws dynamodb create-table \
--endpoint-url http://127.0.0.1:8000 \
--table-name tmp \
--attribute-definitions AttributeName=id,AttributeType=S \
--key-schema AttributeName=id,KeyType=HASH \
--billing-mode PAY_PER_REQUEST
You should then able to list out the tables with:
aws dynamodb list-tables --endpoint-url http://127.0.0.1:8000
And get a result like:
{
"TableNames": [
"tmp"
]
}
So how do we get the endpoint-url that we've been specifying in the CLI to work in Python? Unfortunately, there isn't a default environment variable for the endpoint url in the boto3 codebase, so we'll need to pass it in when the code runs. The docs for .NET and Java are comprehensive but for Python, they are a bit more elusive. From the boto3 github repo and also see this great answer, we need to create a client or resource with the endpoint_url keyword. In the below, we're looking for a custom environment variable called AWS_DYNAMODB_ENDPOINT_URL. The point being that if specified, it will be used, otherwise will fall back to whatever the platform default is, making your code portable.
# Run in the same shell as before
export AWS_DYNAMODB_ENDPOINT_URL=http://127.0.0.1:8000
# file test.py
import os
import boto3
# Get environment variable if it's defined
# Make sure to set the environment variable before running
endpoint_url = os.environ.get('AWS_DYNAMODB_ENDPOINT_URL', None)
# Using (high level) resource, same keyword for boto3.client
resource = boto3.resource('dynamodb', endpoint_url=endpoint_url)
tables = resource.tables.all()
for table in tables:
print(table)
Finally, run this snippet with
# Run in the same shell as before
python3 test.py
# Should produce the following output:
# dynamodb.Table(name='tmp')

Secrets in a google cloud bucket

We want to have a production airflow environment but do not know how to deal properly with secrets, in particular google bigquery client JSON files
We tried setting up the kubernetes secrets on the automatically created kubernetes cluster (automatically by creationg a google cloud composer (airflow) environment). We currently just put the files on the bucket, but would like a better way.
def get_bq_client ():
""" returns bq client """
return bq.Client.from_service_account_json(
join("volumes", "bigquery.json")
)
We would like some form of proper management of the required secrets. Sadly, using Airflow Variables won't work because we can't create the client object using the json file as text
One solution that would work, is to encrypt the JSON files and put that on the bucket. As long as the decryption key exists on the bucket and no where else you'll be able to just check the code in with secrets to some source control and in the bucket checkout and decrypt.

Categories