AWS security can it be more confusing - python

I read AWS docs on python and AssumeRole operation and stumble upon these lines which looks to me like a total security hole - "Notmally would not have access to" , what am i missing ?
Returns a set of temporary security credentials that you can use to access AWS resources that you might not normally have access to. These temporary credentials consist of an access key ID, a secret access key, and a security token
from here
https://docs.aws.amazon.com/STS/latest/APIReference/API_AssumeRole.html
i just don't understand some basic stuff - someone told me to use AssumeRole instead of keeping credentials in home folder (~/.aws)
but reading the boto docs about credentials reveals that in order to perform assumerole i still need credentials - so why to bother and assume a role , i can just give my access_key the right permissions and thats it no ?
# In ~/.aws/credentials:
[development]
aws_access_key_id=foo
aws_access_key_id=bar
# In ~/.aws/config
[profile crossaccount]
role_arn=arn:aws:iam:...
source_profile=development
here is the docs
https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#configuring-credentials

To answer your question: yes, AWS security - both the concepts and practices - can be more confusing especially when you're first learning about security in the cloud. Computer security practices that you might be used to outside the cloud often seem weird and opaque when applied to the context of cloud, however a lot of the time the underlying concepts hold true no matter where they're applied.
Secondly, there is no security hole in the AssumeRole mechanism. It was designed like this to adhere to the principle of least privilege, a universally accepted concept in computer security. The idea being that a particular entity (such as a developer or computer program) only be granted enough power to perform a finite set of operations.
For example, let's say I'm one of many developers contracted by a large company to build a social media app in their AWS infrastructure. The company gives me access keys that only have power over their EC2 instances (creating, deleting, etc) and S3 buckets. They make me assume a special role, DatabaseOperator, when I need to perform database maintenance. And they allow security auditors to audit their system with the role ApplicationSecurityAuditor. Every other resource in AWS is denied by virtue of them not being granted in the roles, therefore these roles give the people using them access to resources they would not normally have.
You asked "why would I bother with multiple roles when I can just assign permissions to the user and be done with it?". You can do this and there's nothing inherently bad about it. If your development environment is small enough and you can keep track of which users have certain permissions assigned then you may forego the overhead of separate roles.
However, this approach doesn't scale well and has serious security and maintenance implications:
you no longer have fine-grained permissions
privilege is lumped together and assigned at the user level
this quickly becomes unmanageable especially when you have hundreds of users
you cannot easily revoke privileges in case of an emergency
only manual inspection of each user would reveal who had certain permissions
In the example I gave, if the company detected malicious behaviour on their databases then they could instantly revoke the DatabaseOperator role, preventing any further damage. They could then bring in security auditors and let them assume the ApplicationSecurityAuditor role to check out the state of the system, removing their access once the audit is completed. Also, if they decide to lock down their databases then that's as easy as removing/disabling the DatabaseOperator role or removing destructive abilities from the role.

You would typically use IAM roles for two things:
cross-account access (rather than creating an IAM user for someone in the other account to use in your account)
applications, for example running in EC2 or on AWS Lambda
One of the primary benefits of IAM roles is that credentials derived from an IAM role are short-term and will expire. A set of credentials being exposed a few days after they were created becomes a non-issue as they've already expired.
I think the phrase "use to access AWS resources that you might not normally have access to" relates mostly to #1 above (cross-account access).
For your situation, I think it's typical to use IAM User credentials and to apply appropriate best practices there, notably secure them properly and rotate them periodically.
For more, read IAM Best Practices.

Related

Storing decryptable passwords for automatied usage

TLDR
I am making a REST Session management solution for industrial automation purposes and need to automatically log into devices to perform configurations.
NOTE:
These devices are 99% of the time going to be isolated to private networks/VPNs (i.e., Will not have a public IP)
Dilemma
I am being tasked with creating a service that can store hardware device credentials so automated configurations (& metrics scraping) can be done. The hardware in question only allows REST Session logins via a POST method where the user and (unencrypted) password are sent in the message body. This returns a Session cookie that my service then stores (in memory).
The service in question consists of:
Linux (Ubuntu 20.04) server
FastAPI python backend
SQLITE3 embedded file DB
Storing Credentials?
My background is not in Security so this is all very new to me but it seems that I should prefer storing a hash (e.g., bcrypt) of my password in my DB for future verification however there will not be any future verification as this is all automated.
This brings me to what seems like is the only solution - hashing the password and using that as the salt to encrypt the password, then storing the hashed password in the DB for decryption purposes later. I know this provides almost 0 security given the DB is compromised but I am at a loss for alternate solutions. Given the DB is embedded, maybe there is some added assurance that the server itself would have to be compromised before the DB itself is compromised? I don't know if there is a technical "right" approach to this, maybe not, however if anyone has any advice I am all ears.
You should consider using a hardware security module (HSM). There are cloud alternatives (like AWS Secrets manager, an encrypted secrets repository based on keys stored in an actual HSM, AWS KMS). Or if your app is not hosted in a public cloud, you can consider buying an actual HSM too, but that's expensive. So it all comes down to the risk you want to accept vs the cost.
You can also consider building architecture to properly protect your secrets. If you build a secure secrets store service and apply appropriate protection (which would be too broad to describe for an answer here), you can at least provide auditing of secret usage, you can implement access control, you can easily revoke secrets, you can monitor usage patterns in that component and so on. Basically your secrets service would act like a very well protected "HSM", albeit it might not involve specialized hardware at all. This would not guarantee that secrets (secret encryption keys, typically) cannot ever be retrieved from the service like a real HSM would, but it would have many of the benefits as described above.
However, do note that applying appropriate protection is the key there - and that's not straightforward at all. One approach that you can take is model your potential attackers, list ways (attack paths) for compromising different aspects of different components, and then design protections against those, as long as it makes sense financially.

How to safely store users' credentials to third party websites when no authentication API exists?

I am developing a web app which depends on data from one or more third party websites. The websites do not provide any kind of authentication API, and so I am using unofficial APIs to retrieve the data from the third party sites.
I plan to ask users for their credentials to the third party websites. I understand this requires users to trust me and my tool, and I intend to respect that trust by storing the credentials as safely as possible as well as make clear the risks of sharing their credentials.
I know there are popular tools that address this problem today. Mint.com, for example, requires users' credentials to their financial accounts so that it may periodically retrieve transaction information. LinkedIn asks for users' e-mail credentials so that it can harvest their contacts.
What would be a safe design to store users' credentials? In particular, I am writing a Django application and will likely build on top of a PostgreSQL backend, but I am open to other ideas.
For what it's worth, the data being accessed from these third party sites is nowhere near the level of financial accounts, e-mail accounts, or social networking profiles/accounts. That said, I intend to treat this access with the utmost respect, and that is why I am asking for assistance here first.
There’s no such thing as a safe design when it comes to storing passwords/secrets. There’s only, how much security overhead trade-off you are willing to live with. Here is what I would consider the minimum that you should do:
HTTPS-only (all passwords should be encrypted in transit)
If possible keep passwords encrypted in memory when working with them except when you need to access them to access the service.
Encryption in the data store. All passwords should be strongly encrypted in the data store.
[Optional, but strongly recommended] Customer keying; the customer should hold the key to unlock their data, not you. This will mean that your communications with the third party services can only happen when the customer is interacting with your application. The key should expire after a set amount of time. This protects you from the rogue DBA or your DB being compromised.
And this is the hard one, auditing. All accesses of any of the customer's information should be logged and the customer should be able to view the log to verify / review the activity. Some go so far as to have this logging enabled at the database level as well so all row access at the DB level are logged.
In addition to what 2ps said:
There is a project called django-cryptographic-fields that handles storing encrypted data in a Postgres database using Django.
As noted in their README from the link above:
django-cryptographic-fields is set of fields that wrap standard Django fields with encryption provided by the python cryptography library. These fields are much more compatible with a 12-factor design since they take their encryption key from the settings file instead of a file on disk used by keyczar.
While keyczar is an excellent tool to use for encryption, it's not
compatible with Python 3, and it requires, for hosts like Heroku, that
you either check your key file into your git repository for
deployment, or implement manual post-deployment processing to write
the key stored in an environment variable into a file that keyczar can
read.

AWS IAM Users vs Secure Token Services (Keys On Demand)

I'm trying to better understand the potential of AWS Secure Token Service, and looking for thoughts on how best to solve a particular problem.
Let's say I have a bunch of IAM users that are currently fairly well carved up into different groups that restrict access to the AWS services that are needed (admins, dba's, datawarehouse, etc.). However, even with this setup, there are still a bunch of long-lived keys that will inevitably end up hard coded into various utils, committed to version control for all to see, etc. You can manually rotate all of these keys, of course, but that's a lot of effort.
So, it seemed to me that a better solution was to adopt a stance of "keys on demand". The IAM user accounts remain intact, and associated with the groups that grant them whatever access comes with that group, but without active API keys, the accounts are useless. I started writing a python app that first authenticates users against LDAP internally, and then allows them to click a button to generate a new API key for their AWS account. Then, I would run a cron job every minute or hour or whatever, to find keys that are older than 12 hours, send an API call to AWS to delete that key.
I got about 80% done with that, and stumbled on STS. Is my scenario a perfect use-case for STS? Could I just create roles that mirror the various permission sets I need, and allow IAM users to have default stance of no access, but assume a role on demand?
Anything else I need to be considering? I.e. anything that might break by implementing something along these lines?

Secure credential storage in python

The attack
One possible threat model, in the context of credential storage, is an attacker which has the ability to :
inspect any (user) process memory
read local (user) files
AFAIK, the consensus on this type of attack is that it's impossible to prevent (since the credentials must be stored in memory for the program to actually use them), but there's a couple of techniques to mitigate it:
minimize the amount of time the sensitive data is stored in memory
overwrite the memory as soon as the data is not needed anymore
mangle the data in memory, keep moving it, and other security through obscurity measures
Python in particular
The first technique is easy enough to implement, possibly through a keyring (hopefully kernel space storage)
The second one is not achievable at all without writing a C module, to the best of my knowledge (but I'd love to be proved wrong here, or to have a list of existing modules)
The third one is tricky.
In particular, python being a language with very powerful introspection and reflection capabilities, it's difficult to prevent access to the credentials to anyone which can execute python code in the interpreter process.
There seems to be a consensus that there's no way to enforce private attributes and that attempts at it will at best annoy other programmers who are using your code.
The question
Taking all this into consideration, how does one securely store authentication credentials using python? What are the best practices? Can something be done about the language "everything is public" philosophy? I know "we're all consenting adults here", but should we be forced to choose between sharing our passwords with an attacker and using another language?
There are two very different reasons why you might store authentication credentials:
To authenticate your user: For example, you only allow the user access to the services after the user authenticates to your program
To authenticate the program with another program or service: For example, the user starts your program which then accesses the user's email over the Internet using IMAP.
In the first case, you should never store the password (or an encrypted version of the password). Instead, you should hash the password with a high-quality salt and ensure that the hashing algorithm you use is computationally expensive (to prevent dictionary attacks) such as PBKDF2 or bcrypt. See Salted Password Hashing - Doing it Right for many more details. If you follow this approach, even if the hacker retrieves the salted, slow-hashed token, they can't do very much with it.
In the second case, there are a number of things done to make secret discovery harder (as you outline in your question), such as:
Keeping secrets encrypted until needed, decrypting on demand, then re-encrypting immediately after
Using address space randomization so each time the application runs, the keys are stored at a different address
Using the OS keystores
Using a "hard" language such as C/C++ rather than a VM-based, introspective language such as Java or Python
Such approaches are certainly better than nothing, but a skilled hacker will break it sooner or later.
Tokens
From a theoretical perspective, authentication is the act of proving that the person challenged is who they say they are. Traditionally, this is achieved with a shared secret (the password), but there are other ways to prove yourself, including:
Out-of-band authentication. For example, where I live, when I try to log into my internet bank, I receive a one-time password (OTP) as a SMS on my phone. In this method, I prove I am by virtue of owning a specific telephone number
Security token: To log in to a service, I have to press a button on my token to get a OTP which I then use as my password.
Other devices:
SmartCard, in particular as used by the US DoD where it is called the CAC. Python has a module called pyscard to interface to this
NFC device
And a more complete list here
The commonality between all these approaches is that the end-user controls these devices and the secrets never actually leave the token/card/phone, and certainly are never stored in your program. This makes them much more secure.
Session stealing
However (there is always a however):
Let us suppose you manage to secure the login so the hacker cannot access the security tokens. Now your application is happily interacting with the secured service. Unfortunately, if the hacker can run arbitrary executables on your computer, the hacker can hijack your session for example by injecting additional commands into your valid use of the service. In other words, while you have protected the password, it's entirely irrelevant because the hacker still gains access to the 'secured' resource.
This is a very real threat, as the multiple cross-site scripting attacks have shows (one example is U.S. Bank and Bank of America Websites Vulnerable, but there are countless more).
Secure proxy
As discussed above, there is a fundamental issue in keeping the credentials of an account on a third-party service or system so that the application can log onto it, especially if the only log-on approach is a username and password.
One way to partially mitigate this by delegating the communication to the service to a secure proxy, and develop a secure sign-on approach between the application and proxy. In this approach
The application uses a PKI scheme or two-factor authentication to sign onto the secure proxy
The user adds security credentials to the third-party system to the secure proxy. The credentials are never stored in the application
Later, when the application needs to access the third-party system, it sends a request to the proxy. The proxy logs on using the security credentials and makes the request, returning results to the application.
The disadvantages to this approach are:
The user may not want to trust the secure proxy with the storage of the credentials
The user may not trust the secure proxy with the data flowing through it to the third-party application
The application owner has additional infrastructure and hosting costs for running the proxy
Some answers
So, on to specific answers:
How does one securely store authentication credentials using python?
If storing a password for the application to authenticate the user, use a PBKDF2 algorithm, such as https://www.dlitz.net/software/python-pbkdf2/
If storing a password/security token to access another service, then there is no absolutely secure way.
However, consider switching authentication strategies to, for example the smartcard, using, eg, pyscard. You can use smartcards to both authenticate a user to the application, and also securely authenticate the application to another service with X.509 certs.
Can something be done about the language "everything is public" philosophy? I know "we're all consenting adults here", but should we be forced to choose between sharing our passwords with an attacker and using another language?
IMHO there is nothing wrong with writing a specific module in Python that does it's damnedest to hide the secret information, making it a right bugger for others to reuse (annoying other programmers is its purpose). You could even code large portions in C and link to it. However, don't do this for other modules for obvious reasons.
Ultimately, though, if the hacker has control over the computer, there is no privacy on the computer at all. Theoretical worst-case is that your program is running in a VM, and the hacker has complete access to all memory on the computer, including the BIOS and graphics card, and can step your application though authentication to discover its secrets.
Given no absolute privacy, the rest is just obfuscation, and the level of protection is simply how hard it is obfuscated vs. how much a skilled hacker wants the information. And we all know how that ends, even for custom hardware and billion-dollar products.
Using Python keyring
While this will quite securely manage the key with respect to other applications, all Python applications share access to the tokens. This is not in the slightest bit secure to the type of attack you are worried about.
I'm no expert in this field and am really just looking to solve the same problem that you are, but it looks like something like Hashicorp's Vault might be able to help out quite nicely.
In particular WRT to the problem of storing credentials for 3rd part services. e.g.:
In the modern world of API-driven everything, many systems also support programmatic creation of access credentials. Vault takes advantage of this support through a feature called dynamic secrets: secrets that are generated on-demand, and also support automatic revocation.
For Vault 0.1, Vault supports dynamically generating AWS, SQL, and Consul credentials.
More links:
Github
Vault Website
Use Cases

How can I protect my AWS access id and secret key in my python application

I'm making an application in Python and using Amazon Web Services in some modules.
I'm now hard coding my AWS access id and secret key in *.py file. Or might move them out to an configuration file in future.
But there's a problem, how can I protect AWS information form other people? As I know python is a language that easy to de-compile.
Is there a way to do this?
Well what I'm making is an app to help user upload/download stuff from cloud. I'm using Amazon S3 as cloud storage. As I know Dropbox also using S3 so I'm wondering how they protects the key.
After a day's research I found something.
I'm now using boto (an AWS library for python). I can use a function of 'generate_url(X)' to get a url for the app to accessing the object in S3. The url will be expired in X seconds.
So I can build a web service for my apps to provide them the urls. The AWS keys will not be set into the app but into the web service.
It sounds great, but so far I only can download objects with this function, upload doesn't work. Any body knows how to use it for uploading?
Does anyone here know how to use key.generate_url() of boto to get a temporary url for uploading stuff to S3?
There's no way to protect your keys if you're going to distribute your code. They're going to be accessible to anyone who has access to your server or source code.
There are two things you can do to protect yourself against malicious use of your keys.
Use the amazon IAM service to create a set of keys that only has permission to perform the tasks that you require for your script. http://aws.amazon.com/iam/
If you have a mobile app or some other app that will require user accounts you can create a service to create temporary tokens for each user. The user must have a valid token and your keys to perform any actions. If you want to stop a user from using your keys you can stop generating new tokens for them. http://awsdocs.s3.amazonaws.com/STS/latest/sts-api.pdf
Specifically to S3 if you're creating an application to allow people to upload content. The only way to protect your account and the information of the other users is to make them register an account with you.
The first step of the application would be to authenticate with your server.
Once your server authenticates you make a request to amazons token server and return a token
Your application then makes a request using the keys built into the exe and the token.
Based on the permissions applied to this user he can upload only to the bucket that is assigned to him.
If this seems pretty difficult then you're probably not ready to design an application that will help users upload data to S3. You're going to have significant security problems if you only distribute 1 key even if you can hide that key from the user they would be able to edit any data added by any user.
The only way around this is to have each user create their own AWS account and your application will help them upload files to their S3 account. If this is the case then you don't need to worry about protecting the keys because the user will be responsible for adding their own keys after installing your application.
I've been trying to answer the same question... the generate_url(x) looks quite promising.
This link had a suggestion about creating a cloudfront origin access identity, which I'm guessing taps into the IAM authentication... meaning you could create a key for each application without giving away your main account details. With IAM, you can set permissions based on keys as to what they can do, so they can have limited access.
Note: I don't know if this really works, I haven't tried it yet, but it might be another avenue to explore.
2 - Create a Cloudfront "Origin Access Identity"
This identity can be reused for many different distributions and keypairs. It is only used
to allow cloudfront to access your private S3 objects without allowing
everyone. As of now, this step can only be performed using the API.
Boto code is here:
# Create a new Origin Access Identity
oai = cf.create_origin_access_identity(comment='New identity for secure videos')
print("Origin Access Identity ID: %s" % oai.id)
print("Origin Access Identity S3CanonicalUserId: %s" % oai.s3_user_id)
You're right, you can't upload using pre-signed URLs.
There is a different, more complex capability that you can use called GetFederationToken. This will return you some temporary credentials, to which you can apply any policy (permissions) that you like.
So for example, you could write a web service POST /upload that creates a new folder in S3, then creates temporary credentials with permissions to PutObject to only this folder, and returns the folder path and credentials to the caller. Presumably, some authorization check would be performed by this method as well.
You can't embed cloud credentials, or any other credentials, in your application code. Which isn't to say that nobody ever accidentally does this, even security professionals.
To safely distribute credentials to your infrastructure, you need tool support. If you use an AWS facility like CloudFormation, you can (somewhat more) safely give it your credentials. CloudFormation can also create new credentials on the fly. If you use a PaaS like Heroku, you can load your credentials into it, and Heroku will presumably treat them carefully. Another option for AWS is IAM Role. You can create an IAM Role with permission to do what you need, then "pass" the role to your EC2 instance. It will be able to perform the actions permitted by the role.
A final option is a dedicated secrets management service, such as Conjur. (Disclaimer: I'm a founder of the company). You load your credentials and other secrets into a dedicated virtual appliance, and you define access permissions that govern the modification and distribution of the credentials. These permissions can be granted to people or to "robots" like your EC2 box. Credentials can be retrieved via REST or client APIs, and every interaction with credentials is recorded to a permanent record.
Don't put it in applications you plan to distribute. It'll be visible and they can launch instances that are directly billable to you or worst..they can take down instances if you use it in production.
I would look at your programs design and seriously question why I need to include that information in the app. If you post more details on the design I'm sure we can help you figure out a way in which you don't need to bundle this information.

Categories