Test S3 operations with simulated credentials - python

I'm writing unit tests for a function that should authenticate to AWS S3 and then perform some operations on S3. I have a bunch of functions that do various things (like downloading/uploading files, checking for existence, etc).
Since in production these will need authentication, I want to simulate it in the tests, so that I can check that this part of the code is ok too.
For mocking the AWS environment I'm using moto. As of now I have the following code, below. A quick intro to it: my_head_bucket is an example function that needs to have unit tests. create_user_with_access_key_and_policy should mock the IAM user and policy needed for authenticated S3 access. It is almost the same as in the examples in the documentation.
Then there are two tests. The first should pass without errors (having the correct authentication). The second should fail with ClientError, because "invalid" keys are being passed.
For some reason I am not able to pass through the creation of the mock user and policy, getting botocore.exceptions.ClientError: An error occurred (InvalidClientTokenId) when calling the AttachUserPolicy operation: The security token included in the request is invalid. It seems that no authentication should be needed for create_access_key, so what am I doing wrong?
import os
import unittest
import boto3
import botocore
from moto import mock_s3, mock_iam
from botocore.client import ClientError
from moto.core import set_initial_no_auth_action_count
import json
def my_head_bucket(bucket, aws_access_key_id, aws_secret_access_key):
"""
This is a sample function. In the real case, this function will do more than
just heading a bucket (like uploading/downloading files or other actions).
It would be imported from another module and should not have any decorators (like moto's)
"""
s3_client = boto3.client("s3", aws_access_key_id=aws_access_key_id, aws_secret_access_key=aws_secret_access_key)
s3_client.head_bucket(Bucket=bucket)
#mock_iam
def create_user_with_access_key_and_policy(user_name="test-user"):
"""
Should create a user with attached policy allowing read/write operations on S3.
"""
policy_document = {
"Version": "2012-10-17",
"Statement": [
{"Effect": "Allow", "Action": "s3:*", "Resource": "*"}
],
}
# Create client and user
client = boto3.client("iam", region_name="us-east-1")
client.create_user(UserName=user_name)
# Create and attach the policy
policy_arn = client.create_policy(
PolicyName="policy1", PolicyDocument=json.dumps(policy_document)
)["Policy"]["Arn"]
client.attach_user_policy(UserName=user_name, PolicyArn=policy_arn)
# Return the access keys
return client.create_access_key(UserName=user_name)["AccessKey"]
class TestMyTest(unittest.TestCase):
#set_initial_no_auth_action_count(0)
#mock_s3
def test_correct_credentials(self):
"""
Sets the environment (creates user with keys and policy, creates the bucket), then calls
the function-to-be-tested and expects it to run without exceptions.
"""
### Arrange
iam_keys = create_user_with_access_key_and_policy()
print(iam_keys)
s3 = boto3.client('s3', aws_access_key_id=iam_keys["AccessKeyId"],aws_secret_access_key=iam_keys["SecretAccessKey"])
s3.create_bucket(Bucket='mock_bucket')
my_head_bucket('mock_bucket', aws_access_key_id=iam_keys["AccessKeyId"],aws_secret_access_key=iam_keys["SecretAccessKey"])
#set_initial_no_auth_action_count(0)
#mock_s3
def test_incorrect_credentials(self):
"""
Sets the environment (creates user with keys and policy, creates the bucket), then calls
the function-to-be-tested and expects it to run without exceptions.
"""
### Arrange
iam_keys = create_user_with_access_key_and_policy()
print(iam_keys)
s3 = boto3.client('s3', aws_access_key_id=iam_keys["AccessKeyId"],aws_secret_access_key=iam_keys["SecretAccessKey"])
s3.create_bucket(Bucket='mock_bucket')
with self.assertRaises(ClientError):
my_head_bucket('mock_bucket', aws_access_key_id=iam_keys["AccessKeyId"],aws_secret_access_key="invalid")

Related

How do I handle job secrets with MLRun?

I have a job that requires secrets to connect to S3 and a relational database. I can use environment variables to pass the connection information, but I am looking for a more secure way to handle this. My current code does something like:
import mlrun
fn = mlrun.code_to_function("db-load",
kind = "job",
requirements = ['psycopg2-binary']
)
fn.set_env("DBUSER","user")
fn.set_env("DBPASS", "pass")
Can you suggest a more secure way of handling this?
MLRun uses the concept of Tasks to encapsulate runtime parameters. Tasks are used to specify execution context such as hyper-parameters. They can also be used to pass details about secrets that are going to be used in the runtime.
To pass secret parameters, use the Task’s with_secrets() function. For example, the following command passes secrets provided by a kubernetes secret to the execution context:
function = mlrun.code_to_function(
name="secret_func",
filename="my_code.py",
handler="test_function",
kind="job",
image="mlrun/mlrun"
)
task = mlrun.new_task().with_secrets("kubernetes", ["AWS_KEY", "DB_PASSWORD"])
run = function.run(task, ...)
Within the code in my_code.py, the handler can access these secrets by using the get_secret() API:
def test_function(context, db_name):
context.logger.info("running function")
db_password = context.get_secret("DB_PASSWORD")
# Rest of code can use db_password to perform processing.
...
To learn more about handling secrets in MLRun click here

enabling bucket logging using boto3

I have a list of bucket names that I need to enable logging on programmatically. I am using boto3 and lambda. I can't seem to find the correct function in boto3/s3 to do what I need although I'm sure it's possible. Here is my code so far:
import json
import boto3
s3 = boto3.client('s3')
def lambda_handler(event, context):
# TODO implement
# print("hanlder:event")
# print(event)
# bucketDump()
setBucketPolicy(bucketDump())
def bucketDump():
##This program lists all exsisting buckets within an aws account (Tommy's Personal Account)
response = s3.list_buckets()
buckets = []
for bucket in response['Buckets']:
value = bucket["Name"]
buckets.append(value)
return buckets
##setting a bucket policy
def setBucketPolicy(buckets):
for bucket in buckets:
value = s3.get_bucket_logging(bucket)
print(value)
##TODO if bucket in buckets does not have loggin enabled, enable it!
# print(bucket)
My process is I want to iterate over the list of buckets I have and enable_logging for them! Thank you in advance.
As suggested by #jordanm in the comment below your question, using a resource instead of the client would make your life much easier as it provides a higher-level interface.
If the only goal of the bucketDump in your question was to retrieve all the buckets in your account, then you could totally remove it and use the standard function s3.buckets.all() that already returns an iterable of buckets (docs).
Assuming that you want to enable logging on all your buckets that don't have it already enabled and that you want to deliver logs from all the buckets to the same bucket, you could add a parameter to the `` function to specify this bucket. The implementation suggested below will enable logging and result in logs being organized like so:
- name_of_bucket_in_wich_to_store_logs
- bucket_name_1
- logs
- bucket_name_2
- logs
If you want to organize your logs differently you have to play with the TargetBucket and TargetPrefix parameters and, if needed, you can specify other parameters for grants as detailed in the docs.
import boto3
s3 = boto3.resource('s3')
def lambda_handler(event, context):
# TODO implement
setBucketPolicy(target_bucket='name_of_bucket_in_wich_to_store_logs')
def setBucketPolicy(target_bucket: str):
for bucket in s3.buckets.all():
bucket_logging = s3.BucketLogging(bucket.name)
if not bucket_logging.logging_enabled:
bucket_logging.put(
BucketLoggingStatus={
'LoggingEnabled': {
'TargetBucket': target_bucket,
'TargetPrefix': f'{bucket.name}/'
}
}
)

The effective way to handle missing file on s3

clients.py
"""
Wraps boto3 to provide client and resource objects that connect to Localstack when
running tests if LOCALSTACK_URL env variable is set, otherwise connect to real AWS.
"""
import os
import boto3
# Port 4566 is the universal "edge" port for all services since localstack v0.11.0,
# so LOCALSTACK_URL can be of the form "http://hostname:4566".
# "None" means boto3 will use the default endpoint for the real AWS services.
ENDPOINT_URL = os.getenv("LOCALSTACK_URL", None)
def s3_resource():
return boto3.resource("s3", endpoint_url=ENDPOINT_URL)
def s3_client():
return boto3.client("s3", endpoint_url=ENDPOINT_URL)
mycode.py
from mymod.aws.clients import s3_resource
s3 = s3_resource()
def __get_object_etag(self, s3_dir_to_download_file_from: str, file_name: str) -> str
bucket, key = s3.deconstruct_s3_path(
f"{s3_dir_to_download_file_from}/{file_name}"
)
try:
etag_value = s3_resource().Object(bucket, key).e_tag
return etag_value
except botocore.exceptions.ClientError:
raise
I am wondering if in mycode.py I am doing the correct error handling when the file i am looking for does not exist on s3. I basically expect key to be there and if not i want to raise and error i do not want to proceed further as this code will be used part of a pipeline that relies on each previous step.
I would love to know if i am handling the error correct or if i am wrong then how would one handle this case?
From what I understand "exceptions" are suppose to handle errors and proceed further but in my case I do not want to proceed further in the rest of my code if the file i am looking for on S3 is missing.

How to run cache s3 session generated by MFA

I am trying to write python script to access s3 objects.
I am able to get access to s3 (using assume role, boto3 and MFA is enabled on my account)
My question is do i need to enter MFA token every single time i run the script? Is it possible to cache session for selected duration and use when running the script for the second time?
At the moment i have created a setup class in Test suite
import unittest
import boto3
class TestS3Buckets():
s3 = None
#classmethod
def setUpClass(cls):
credentials = get_role_credentials(
"arn:aws:iam::123456:role/admin-assume",
"test-env",
"arn:aws:iam::111111111:mfa/test#test.com",
"MFA token"
)
session = boto3.Session(
aws_access_key_id=credentials["Credentials"]["AccessKeyId"],
aws_secret_access_key=credentials["Credentials"]["SecretAccessKey"],
aws_session_token=credentials["Credentials"]["SessionToken"]
)
cls.s3 = session.client('s3')
Is there a way i can ignore MFA token while running the script for the second time ?
Any help would be appreciated.
Thanks,

authenticate google API without a config file

I am trying to authenticate google API without a config file, I can't even find proof it is possible other than old code in my service that wasn't used in years.
My class receive this dict:
self._connection_data = {
"type": args,
"project_id": args,
"private_key_id": args,
"private_key": args,
"client_email": args,
"client_id": args,
"auth_uri": args,
"token_uri": args,
"auth_provider_x509_cert_url": args,
"client_x509_cert_url": args
}
and the code is -
from google.cloud import bigquery
from google.oauth2 import service_account
def _get_client(self):
credentials = service_account.Credentials.from_service_account_info(self._connection_data)
return bigquery.Client(project=self._project_id, credentials=credentials, location='US')
I receive the error
'{"error":"invalid_grant","error_description":"Invalid grant: account not found"}
however, everything works when I use a helper file for the configs called config.json and an OS environmentnt variable:
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "config.json"
self.job_config = bigquery.QueryJobConfig()
self.job_config.use_legacy_sql = True
return bigquery.Client()
I don't want a solution with the env variable, I would like to use the Credentials class without a file path
Well In the end I managed to make my code work without any need for the global variable or a file path. I had a problem with my configured credentials...
This is the code -
# init class here
self.job_config = bigquery.QueryJobConfig()
self.job_config.use_legacy_sql = True
def _get_client(self):
credentials = service_account.Credentials.from_service_account_info(self._connection_data)
return bigquery.Client(project=self._project_id, credentials=credentials)
# function to get columns
query_job = self._get_client().query(query, job_config=self.job_config)
results = query_job.result(timeout=self._current_timeout)
The only part I was missing was to send the QueryJobConfig class with legacy SQL set to true in all of my queries.
Unfortunately, there are no other methods to authenticate your API request without either using an environment variable or specifying the key file path. There are some ways of authenticating your request with GCP using a key json file. Before anything, you should set up your service account and download the json file with your key, as described here.
Then, the first method is using default credentials, according to the documentation:
If you don't specify credentials when constructing the client, the
client library will look for credentials in the environment.
That means, you just need to set your environment variable. Then, the Google Client Library will determine the credentials implicitly. In addition, it also allows you to provide credentials separately from your application, which eases the process of making changes in the code. You can set the environment variable as follows:
export GOOGLE_APPLICATION_CREDENTIALS="/home/user/Downloads/[FILE_NAME].json"
After setting it, you would be able to run the following code:
def implicit():
from google.cloud import storage
# If you don't specify credentials when constructing the client, the
# client library will look for credentials in the environment.
storage_client = storage.Client()
# Make an authenticated API request
buckets = list(storage_client.list_buckets())
print(buckets)
Secondly, you can specify the file path within your code using the [google.oauth2.service_account][3] module. It is stated in the documentation that:
An OAuth 2.0 client identifies the application and lets end users
authenticate your application with Google. It allows your application
to access Google Cloud APIs on behalf of the end user.
In order to use the module, you can use one of both codes:
#It creates credentials using your .json file and the Credentials.from_service_account_file constructor
credentials = service_account.Credentials.from_service_account_file(
'service-account.json')
Or
#If you set the environment variable, you can also use
#info = json.loads(os.environ['GOOGLE_APPLICATION_CREDENTIALS_JSON_STRING'])
#Otherwise, you specify the path inside json.load() as below
service_account_info = json.load(open('service_account.json'))
credentials = service_account.Credentials.from_service_account_info(
service_account_info)
Finally, I encourage you to check the Authentication strategies in the documentation.

Categories