I have a job that requires secrets to connect to S3 and a relational database. I can use environment variables to pass the connection information, but I am looking for a more secure way to handle this. My current code does something like:
import mlrun
fn = mlrun.code_to_function("db-load",
kind = "job",
requirements = ['psycopg2-binary']
)
fn.set_env("DBUSER","user")
fn.set_env("DBPASS", "pass")
Can you suggest a more secure way of handling this?
MLRun uses the concept of Tasks to encapsulate runtime parameters. Tasks are used to specify execution context such as hyper-parameters. They can also be used to pass details about secrets that are going to be used in the runtime.
To pass secret parameters, use the Task’s with_secrets() function. For example, the following command passes secrets provided by a kubernetes secret to the execution context:
function = mlrun.code_to_function(
name="secret_func",
filename="my_code.py",
handler="test_function",
kind="job",
image="mlrun/mlrun"
)
task = mlrun.new_task().with_secrets("kubernetes", ["AWS_KEY", "DB_PASSWORD"])
run = function.run(task, ...)
Within the code in my_code.py, the handler can access these secrets by using the get_secret() API:
def test_function(context, db_name):
context.logger.info("running function")
db_password = context.get_secret("DB_PASSWORD")
# Rest of code can use db_password to perform processing.
...
To learn more about handling secrets in MLRun click here
Related
Short Version:
I am creating an Azure Active Directory Group, an Azure KeyVault to which the group has access, a key in that vault, a PostgresServer whose principal is a member of the group. The server is wrapped in a ComponentResource.
The server is supposed to use the key for encryption, but does not have access at first - it can only access the key when there is a time delay before trying to use the key.
Question: How can I make sure that the permissions have propagated before trying to use the key to encrypt?
Long Version
The infrastructure is the same as I have described above. It is created like this:
group = azuread.Group(
"security_group",
display_name=display_name,
description=description,
owners=None,
opts=ResourceOptions(),
security_enabled=True,
)
policy = azure.keyvault.AccessPolicyEntryArgs(
object_id=group.object_id,
tenant_id=tenant_id,
permissions=azure.keyvault.PermissionsArgs(
keys=['get', 'list', 'unwrapKey', 'wrapKey']
)
)
vault = azure.keyvault.Vault(
"key_vault",
resource_group_name=resource_group_name,
location=location,
properties=azure.keyvault.VaultPropertiesArgs(
access_policies=[policy],
# other properties
),
opts=opts
)
class Postgres(ComponentResource)
def __init__(self, group, vault):
server = azure.dbforpostgresql.Server(
"postgres_server",
identity=azure.dbforpostgresql.ResourceIdentityArgs(type="SystemAssigned"),
# other properties
)
postgres_principal_group_membership = azure_ad.GroupMember(
"postgres-group-member",
group_object_id=group.object_id,
member_object_id=server.identity.principal_id,
)
key = azure.keyvault.Key(
"encryption-key",
key_name=f"encryption-key",
# other properties
)
def make_key_uri_with_version(args):
vault_name, key_name, key_uri_with_version = args
key_version = key_uri_with_version.rsplit('/', 1)[-1]
return f"{vault_name}_{key_name}_{key_version}"
postgres_key_name = Output.all(
encryption_key_vault.name,
key.name,
key.key_uri_with_version
).apply(make_key_name)
### Everything before here is created without issues
### This resource cannot be created during the first attempt to run this - but is sucessfull during the second run
### We can also make it succeed during the first try if we do:
# import time
# time.sleep(120)
### This sleep does not need to be placed right here or inside the constructor at all.
### It will help as long as it happens after creating the vault, but before completing the constructor
azure.dbforpostgresql.ServerKey(
f"server-use-encryption-key",
key_name=postgres_key_name,
server_key_type="AzureKeyVault",
server_name=server.name,
uri=key.key_uri_with_version,
# other properties
)
Postgres(group, vault)
When this is executed for the first time, the following error occurs:
Error: resource partially created but read failed autorest/azure: Service returned an error.
Status=404
Code="ResourceNotFound"
Message="The requested resource of type 'Microsoft.DBforPostgreSQL/servers/keys' with name '<name of the key>' was not found.":
Code="AzureKeyVaultMissingPermissions"
Message="The server '<name of the postgres server>' requires following Azure Key Vault permissions: 'Get, WrapKey, UnwrapKey'. Please grant any missing permissions to the service principal with ID '<postgres server principal ID>'."
I have verified that the problem is one of timing by bluntly using time.sleep(120) before attempting to connect the encryption key, which made the process work. In order to make the code more stable/quicker (rather than waiting for a fixed time and hoping that it will be long enough), I think that checking for the actual permissions is the way to go.
Currently, I don't see any way to do this with the azure-native provider by Pulumi.
Therefore, I am thinking of using the azure API directly - which would imply a new dependency, which is not ideal in itself.
Question: Is there a better way to achieve the desired result? I've tried various explicit dependsOn values when setting the encryption key, with no success.
Thanks!
I have a Google App Engine Standard Environment application written in Python 3, using Flask as the framework, and firestore in native mode as the database. All of the database calls are done in the App Engine code, hidden behind Flask end points/views/handlers. Client browsers do not execute any javascript that directly call the firestore database. Client side javascript is basically 'dumb' code used for cosmetics. The only time client side javascript does "anything" is when a user creates a new account or logs in using the firebase auth ui.
Having said so, I noticed that some online resources mention that it is absolutely necessary to secure the firestore database since anything that is not disallowed by security rules are basically allowed (i.e. the firestore database is insecure by default), however, I suspect that this is only the case for apps that have thick clients (i.e. the client side code or javascript is in charge of doing the heavy lifting of querying and writing to firestore).
So my question is, is writing these security rules necessary only for mobile/web clients and not for firestore databases accessed only by server side code? Or is it necessary for all firestore projects to define these security rules? If so, then I would appreciate any pointers as to where to find reasonable default security rules to start securing my firestore database.
I am including a caricature of my flask main.py file for reference.
# main.py
from google.cloud import firestore
from mylibrary import function_that_fetches_user_data
from mylibrary2 import function_that_writes_user_content
def validate_cookie(protected_function):
def wrapper(*args, **kwargs):
# handle cookie validation
# run protected function
return wrapper
# The dashboard is meant to display user data and user content to the user.
# It is not meant to be seen by other users.
#app.route("/user_dashboard")
#validate_cookie
def dashboard():
user_id = get_uid_from_cookie
firestore_client = firestore.Client()
user_data = function_that_fetches_user_data(user_id, firestore_client)
return render_template('dashboard.html', user_data)
# The write function creates user content that should only be accessible to the author
# and the system/app.
#app.route("/write_user_content")
#validate_cookie
def write_user_content():
user_id = get_uid_from_cookie
firestore_client = firestore.Client()
result = function_that_writes_user_content(user_id, firestore_client)
return render_template('success.html', result)
Security rules are only necessary to control access coming from web and mobile clients. Backend SDK accessing Firestore actually bypass security rules altogether, so writing any rules at all won't change the behavior of your backend code at all.
If you simply do not directly access the database from web or mobile, then you can set the security rules to reject all access, and that's fine.
match /{document=**} {
allow read, write: if false;
}
When trying to create a sink using the Google Cloud Python3 API Client I get the error:
RetryError: GaxError(Exception occurred in retry method that was not classified as transient, caused by <_Rendezvous of RPC that terminated with (StatusCode.PERMISSION_DENIED, The caller does not have permission)>)
The code I used was this one:
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path_to_json_secrets.json'
from google.cloud.bigquery.client import Client as bqClient
bqclient = bqClient()
ds = bqclient.dataset('dataset_name')
print(ds.access_grants)
[]
ds.delete()
ds.create()
print(ds.access_grants)
[<AccessGrant: role=WRITER, specialGroup=projectWriters>,
<AccessGrant: role=OWNER, specialGroup=projectOwners>,
<AccessGrant: role=OWNER, userByEmail=id_1#id_2.iam.gserviceaccount.com>,
<AccessGrant: role=READER, specialGroup=projectReaders>]
from google.cloud.logging.client import Client as lClient
lclient = lClient()
dest = 'bigquery.googleapis.com%s' %(ds.path)
sink = lclient.sink('sink_test', filter_='jsonPayload.project=project_name', destination=dest)
sink.create()
Don't quite understand why this is happening. When I use lclient.log_struct() I can see the logs arriving in the Logging console so I do have access to Stackdriver Logging.
Is there any mistake in this setup?
Thanks in advance.
Creating a sink requires different permissions than writing a log entry. By default service accounts are given project Editor (not Owner), which does not have permission to create sinks.
See the list of permissions required in the access control docs.
Make sure the service account you're using has logging.sinks.create permission. The simplest way to do this is to switch the service account from Editor to Owner, but it would be better to add the Logs Editor Role so you just give it the permission it needs.
I am using tkinter to create gui application that returns the security groups. Currently if you want to change your credentials (e.g. if you accidentally entered the wrong ones) you would have to restart the application otherwise boto3 would carry on using the old credentials.
I'm not sure why it keeps using the old credentials because I am running everything again using the currently entered credentials.
This is a snippet of the code that sets the environment variables and launches boto3. It works perfectly fine if you enter the right credentials the first time.
os.environ['AWS_ACCESS_KEY_ID'] = self.accessKey
os.environ['AWS_SECRET_ACCESS_KEY'] = self.secretKey
self.sts_client = boto3.client('sts')
self.assumedRoleObject = self.sts_client.assume_role(
RoleArn=self.role,
RoleSessionName="AssumeRoleSession1"
)
self.credentials = self.assumedRoleObject['Credentials']
self.ec2 = boto3.resource(
'ec2',
region_name=self.region,
aws_access_key_id=credentials['AccessKeyId'],
aws_secret_access_key=credentials['SecretAccessKey'],
aws_session_token=credentials['SessionToken'],
)
The credentials variables are set using:
self.accessKey = str(self.AWS_ACCESS_KEY_ID_Form.get())
self.secretKey = str(self.AWS_SECRET_ACCESS_KEY_Form.get())
self.role = str(self.AWS_ROLE_ARN_Form.get())
self.region = str(self.AWS_REGION_Form.get())
self.instanceID = str(self.AWS_INSTANCE_ID_Form.get())
Is there a way to use different credentials in boto3 without restarting the program?
You need boto3.session.Session to overwrite the access credentials.
Just do this
reference http://boto3.readthedocs.io/en/latest/reference/core/session.html
import boto3
# Assign you own access
mysession = boto3.session.Session(aws_access_key_id='foo1', aws_secret_access_key='bar1')
# If you want to use different profile call foobar inside .aws/credentials
mysession = boto3.session.Session(profile_name="fooboar")
# Afterwards, just declare your AWS client/resource services
sqs_resource=mysession.resource("sqs")
# or client
s3_client=mysession.client("s3")
Basically, little change to your code. you just pass in the session instead of direct boto3.client/boto3.resource
self.sts_client = mysession.client('sts')
Sure, just create different sessions from botocore.session.Session object for each set of credentials:
import boto3
s1 = boto3.session.Session(aws_access_key_id='foo1', aws_secret_access_key='bar1')
s2 = boto3.session.Session(aws_access_key_id='foo2', aws_secret_access_key='bar2')
Also you can leverage set_credentials method to keep 1 session an change creds on the fly:
import botocore
session - botocore.session.Session()
session.set_credentials('foo', 'bar')
client = session.create_client('s3')
client._request_signer._credentials.access_key
u'foo'
session.set_credentials('foo1', 'bar')
client = session.create_client('s3')
client._request_signer._credentials.access_key
u'foo1'
The answers given by #mootmoot and #Vor clearly state the way of dealing with multiple credentials using a session.
#Vor's answer
import boto3
s1 = boto3.session.Session(aws_access_key_id='foo1', aws_secret_access_key='bar1')
s2 = boto3.session.Session(aws_access_key_id='foo2', aws_secret_access_key='bar2')
But some of you would be curious about
why does the boto3 client or resource behave in that manner in the first place?
Let's clear out a few points about Session and Client as they'll actually lead us to the answer to the aforementioned question.
Session
A 'Session' stores configuration state and allows you to create service clients and resources
Client
if the credentials are not passed explicitly as arguments to the boto3.client method, then the credentials configured for the session will automatically be used. You only need to provide credentials as arguments if you want to override the credentials used for this specific client
Now let's get to the code and see what actually happens when you call boto3.client()
def client(*args, **kwargs):
return _get_default_session().client(*args, **kwargs)
def _get_default_session():
if DEFAULT_SESSION is None:
setup_default_session()
return DEFAULT_SESSION
def setup_default_session(**kwargs):
DEFAULT_SESSION = Session(**kwargs)
Learnings from the above
The function boto3.client() is really just a proxy for the boto3.Session.client() method
If you once use the client, the DEFAULT_SESSION is set up and for the next consecutive creation of clients it'll keep using the DEFAULT_SESSION
The credentials configured for the DEFAULT_SESSION are used if the credentials are not explicitly passed as arguments while creating the boto3 client.
Answer
The first call to boto3.client() sets up the DEFAULT_SESSION and configures the session with the oldCredsAccessKey, oldCredsSecretKey, the already set values for env variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACESS_KEY respectively.
So even if you set new values of credentials in the environment, i.e do this
os.environ['AWS_ACCESS_KEY_ID'] = newCredsAccessKey
os.environ['AWS_SECRET_ACCESS_KEY'] = newCredsSecretKey
The upcoming boto3.client() calls still pick up the old credentials configured for the DEFAULT_SESSION
NOTE
boto3.client() call in this whole answer means that no arguments passed to the client method.
References
https://boto3.amazonaws.com/v1/documentation/api/latest/_modules/boto3.html#client
https://boto3.amazonaws.com/v1/documentation/api/latest/_modules/boto3/session.html#Session
https://ben11kehoe.medium.com/boto3-sessions-and-why-you-should-use-them-9b094eb5ca8e
I have a thorny problem that I can't seem to get to grips with. I am
currently writing unit tests for a django custom auth-backend. On our
system we actually have two backends: one the built-in django backend
and the custom backend that sends out requests to a Java based API
that returns user info in the form of XML. Now, I am writing unit
tests so I don't want to be sending requests outside the system like
that, I'm not trying to test the Java API, so my question is how can I
get around this and mock the side-effects in the most robust way.
The function I am testing is something like this, where the url
settings value is just the base url for the Java server that
authenticates the username and password data and returns the xml, and the service value is
just some magic for building the url query, its unimportant for
us:
#staticmethod
def get_info_from_api_with_un_pw(username, password, service=12345):
url = settings.AUTHENTICATE_URL_VIA_PASSWORD
if AUTH_FIELD == "username":
params = {"nick": username, "password": password}
elif AUTH_FIELD == "email":
params = {"email": username, "password": password}
params["service"] = service
encoded_params = urlencode([(k, smart_str(v, "latin1")) for k, v in params.items()])
try:
# get the user's data from the api
xml = urlopen(url + encoded_params).read()
userinfo = dict((e.tag, smart_unicode(e.text, strings_only=True))
for e in ET.fromstring(xml).getchildren())
if "nil" in userinfo:
return userinfo
else:
return None
So, we get the xml, parse it into a dict and if the key nil is present
then we can return the dict and carry on happy and authenticated.
Clearly, one solution is just to find a way to somehow override or
monkeypatch the logic in the xml variable, I found this answer:
How can one mock/stub python module like urllib
I tried to implement something like that, but the details there are
very sketchy and I couldn't seem to get that working.
I also captured the xml response and put it in a local file in the
test folder with the intention of finding a way to use that as a mock
response that is passed into the url parameter of the test function,
something like this will override the url:
#override_settings(AUTHENTICATE_URL_VIA_PASSWORD=(os.path.join(os.path.dirname(__file__), "{0}".format("response.xml"))))
def test_get_user_info_username(self):
self.backend = RemoteAuthBackend()
self.backend.get_info_from_api_with_un_pw("user", "pass")
But that also needs to take account of the url building logic that the
function defines, (i.e. "url + encoded_params"). Again, I could rename
the response file to be the same as the concatenated url but this is becoming
less like a good unit-test for the function and more of a "cheat", the whole
thing is just getting more and more brittle all the time with these solutions, and its really just a fixture anyway, which is also something I want to avoid if
at all possible.
I also wondered if there might be a way to serve the xml on the django development server and then point the function at that? It seems like a saner solution, but much googling gave me no clues if such a thing would be possible or advisable and even then I don't think that would be a test to run outside of the development environment.
So, ideally, I need to be able to somehow mock a "server" to
take the place of the Java API in the function call, or somehow serve
up some xml payload that the function can open as its url, or
monkeypatch the function from the test itself, or...
Does the mock library have the appropriate tools to do such things?
http://www.voidspace.org.uk/python/mock
So, there are two points to this question 1) I would like to solve my
particular problem in a clean way, and more importantly 2) what are
the best practices for cleanly writing Django unit-tests when you are
dependent on data, cookies, etc. for user authentication from a remote
API that is outside of your domain?
The mock library should work if used properly. I prefer the minimock library and I wrote a small base unit testcase (minimocktest) that helps with this.
If you want to integrate this testcase with Django to test urllib you can do it as follows:
from minimocktest import MockTestCase
from django.test import TestCase
from django.test.client import Client
class DjangoTestCase(TestCase, MockTestCase):
'''
A TestCase class that combines minimocktest and django.test.TestCase
'''
def _pre_setup(self):
MockTestCase.setUp(self)
TestCase._pre_setup(self)
# optional: shortcut client handle for quick testing
self.client = Client()
def _post_teardown(self):
TestCase._post_teardown(self)
MockTestCase.tearDown(self)
Now you can use this testcase instead of using the Django test case directly:
class MySimpleTestCase(DjangoTestCase):
def setUp(self):
self.file = StringIO.StringIO('MiniMockTest')
self.file.close = self.Mock('file_close_function')
def test_urldump_dumpsContentProperly(self):
self.mock('urllib2.urlopen', returns=self.file)
self.assertEquals(urldump('http://pykler.github.com'), 'MiniMockTest')
self.assertSameTrace('\n'.join([
"Called urllib2.urlopen('http://pykler.github.com')",
"Called file_close_function()",
]))
urllib2.urlopen('anything')
self.mock('urllib2.urlopen', returns=self.file, tracker=None)
urllib2.urlopen('this is not tracked')
self.assertTrace("Called urllib2.urlopen('anything')")
self.assertTrace("Called urllib2.urlopen('this is mocked but not tracked')", includes=False)
self.assertSameTrace('\n'.join([
"Called urllib2.urlopen('http://pykler.github.com')",
"Called file_close_function()",
"Called urllib2.urlopen('anything')",
]))
Here's the basics of the solution that I ended up with for the record. I used the Mock library itself rather than Mockito in the end, but the idea is the same:
from mock import patch
#override_settings(AUTHENTICATE_LOGIN_FIELD="username")
#patch("mymodule.auth_backend.urlopen")
def test_get_user_info_username(self, urlopen_override):
response = "file://" + os.path.join(os.path.dirname(__file__), "{0}".format("response.xml"))
# mock patch replaces API call
urlopen_override.return_value = urlopen(response)
# call the patched object
userinfo = RemoteAuthBackend.get_info_from_api_with_un_pw("user", "pass")
assert_equal(type(userinfo), dict)
assert_equal(userinfo["nick"], "user")
assert_equal(userinfo["pass"], "pass")