I am aware that these questions has been asked before several times separately, and most of the answers I've found are "Python is not easy to obfuscate, because that's the nature of the language. If you really need obfuscation, use another tool" and "At some point you need a tradeoff" (see How do I protect Python code and How to keep the OAuth consumer secret safe, and how to react when it's compromised?).
However, I have just made a small Python app which makes use of Twitter's API (and therefore needs OAuth). OAuth requires a Consumer Secret, which is to be kept away from users. The app needs that information, but the user should not be able to access it easily. If that information cannot be protected (and I am using obfuscation and protection as synonyms, because I do not know of any other way), what is the point of having a OAuth API for Python in the first place?
The question(s) then are:
Would it be possible to hardcode the secret in the app and then
obfuscate it in an effective manner?
If not, what would be the best way to use OAuth in Python? I have thought of "shipping" the encrypted consumer secret along with the app and using a hardcoded key to recover it, but the problem remains the same (how to protect the key); having the consumer secret in a server, and have the application retrieve it at start up (if information is sent unencrypted, it would be even easier for a malicious attacker to just use Wireshark and get the consumer secret from the network traffic than decompiling the bytecode, plus how could I make sure that I am sending that secret to my app and not to a malicious attacker? Any form of authentication I know would require having secret information in the app side, the problem remains the same); a mixture of both (have the server send the encryption key, same problems as before). The basic problem is the same: how can you have something secret if critical information cannot be hidden?
I have also seen comments saying that one should use a C/C++ extension for those critical parts, but I do not know anything about that, so if that were the answer, I'd appreciate some extra information.
If you want to deploy on servers (or laptop) you own, you can store secrets in env var or files. If you want to deploy to user, suggestion is that you, or your user should register an API key, generate ssl key, or similar.
You can code your own simple symetric crypt fucntion with a lot of data manipulation to make it harder to reverse.
It is unclear why you'd need to ship your OAuth key with the script. That would mean giving anyone access to your Twitter account, whether or not the key itself is obfuscated inside the app.
The more typical scenario is that you develop some Twitter client, and anyone who wants to run it locally will have to input their own OAuth token before being able to run it. You simply do not hardcode the token and require any user to supply the token.
Related
I'm working on a personal project which makes use of Python, FastAPI and a microservices architecture.
I want to learn more about security so I'm trying to add some into this. I have read through the fastapi security intro and it mostly makes sense to me.
One thing I'm not sure about though is going about handling this cleanly in a microservices architecture.
Let's assume I have 2 services, user service and bankAccount service. The user service is supposed to handle everything with regards to a new user registering on my site, to logging them in, etc. At this point, it shouldn't be too difficult to authenticate the user as the user service can access it's db.
The part where I'm not sure about the best way to go forward would be with the bankAccount service. If a user makes a request to an endpoint within that service, how should I go about authenticating/authorising them?
The two options I can think of are as follows:
Create an /authenticate endpoint which has the sole purpose of other services being able to call it. Then, create a wrapper function in bankAccount service which wraps every endpoint and calls the /authenticate endpoint before running it's function
Create an /authenticate endpoint which has the sole purpose of other services being able to call it. Then, using something like NGINX or some sort of gateway, have this called before sending the request to the bankAccount service.
I lack experience/knowledge in this area so I'm not sure which of these would be the better option. I am leaning towards 2 so that I don't have to copy the wrapper code from the bankAccount service to any new service I create, but I don't know anything about NGINX or other gateways so any advice on how best to proceed here would be appreciated.
I'm not an expert in the subject, since I started recently diving into the microservices topic. So, take what I'm saying with a pinch of salt.
JWT AUTH WITH PUBLIC AND PRIVATE KEY
One thing you could do, is to use JWT authentication in all of your microservices. Basically, every service is capable of decrypting/reading the JWT token, handle the necessary verifications and respond accordingly.
The authentication service would be the one in charge of generating the tokens, so the idea is to use asymmetric encryption, where one key, owned by the authentication service, is used to generate the tokens, while other (public) keys are used to assess the authenticity of the token provided by users. The public/private keys could also be a pair of public/private certificates.
This is no way a scalable approach, as all the public keys have to be updated in case of an update of the key. Also, if the content of the token or the checks that are to be performed on the key change, then all the microservices have to be updated accordingly, which can be a tedious and long process.
Unfortunately, I haven't got any occasion to dive deeper as the topic is not simple and experimenting approaches in production isn't a good idea.
If someone more experienced than me can fill in missing details or other approaches, feel free to edit this answer or to comment below and I'll try to learn and update my answer.
(Note: Now, I know a lot of you might jump ahead and be like "Hey. Duplicate." Please read ahead!)
Background:
My goal is to make a Python app for PC that interacts with Spotify using their python API Spotipy. This obviously brings about the need to store the client_secret for purposes of user authentication. Based on my research, storing this as plaintext anywhere is a big no-no. The other solutions involved encrypting that data (but then, where to store that key). The best solution is apparently to have the authentication request handled by the backend in a server (I being a student, obviously have a million servers at my disposal ;) ...) But seriously, to be clear, I do NOT have a server to host this app on. And I do not want to spend money to buy resources from AWS, or others. Also, to clarify, this is not to be a web application. Is it meant to be downloadable, so that a user can install it, login to Spotify, and voila.
Problem:
Basically, without a server, how do I store this key securely? And based on my usage, is there even a need to store the key securely?
Is it meant to be downloadable, so that a user can install it, login to Spotify, and voila. Basically, without a server, how do I store this key securely?
No secret should reside on the user side. Or the user/hacker will be able to find it sooner or later. More about this here How to store a secret API key in an application's binary?
And based on my usage, is there even a need to store the key securely?
If you work without a server, I see 2 options:
(safe but inconvenient) let the user use their own app ID / Secret,
(risky but convenient) decide to publish your app ID / Secret openly. Since everyone can create Spotify apps for free, there isn't really much that's secret about it, apart from the statistics your app will generate. At least, it shouldn't stop your app from working unless someone decided to use their own time and money to reach the rate limits of your app.
Edit: you might be interested by the Implicit Grant Flow that works without any secret. However it's not implemented yet
I have a server implementing a python API. I am calling functions from a frontend that uses Angular.js. Is there any way to add an authentication key to my calls so that random people cannot see the key through the Angular exposed code? Maybe file structure? I am not really sure.
As long as you send the sensitive data outside, you are at risk. You can obfuscate your code so that first grade malicious users have a hard time finding the key, but basically breaking your security is just a matter of time as an attacker will have all the elements to analyse your protocol and exchanged data and design a malicious software that will mimic your original client.
One possible (although not unbreakable) solution would be to authenticate the users themselves so that you keep a little control over who is accessing the data and revoke infected accounts.
The attack
One possible threat model, in the context of credential storage, is an attacker which has the ability to :
inspect any (user) process memory
read local (user) files
AFAIK, the consensus on this type of attack is that it's impossible to prevent (since the credentials must be stored in memory for the program to actually use them), but there's a couple of techniques to mitigate it:
minimize the amount of time the sensitive data is stored in memory
overwrite the memory as soon as the data is not needed anymore
mangle the data in memory, keep moving it, and other security through obscurity measures
Python in particular
The first technique is easy enough to implement, possibly through a keyring (hopefully kernel space storage)
The second one is not achievable at all without writing a C module, to the best of my knowledge (but I'd love to be proved wrong here, or to have a list of existing modules)
The third one is tricky.
In particular, python being a language with very powerful introspection and reflection capabilities, it's difficult to prevent access to the credentials to anyone which can execute python code in the interpreter process.
There seems to be a consensus that there's no way to enforce private attributes and that attempts at it will at best annoy other programmers who are using your code.
The question
Taking all this into consideration, how does one securely store authentication credentials using python? What are the best practices? Can something be done about the language "everything is public" philosophy? I know "we're all consenting adults here", but should we be forced to choose between sharing our passwords with an attacker and using another language?
There are two very different reasons why you might store authentication credentials:
To authenticate your user: For example, you only allow the user access to the services after the user authenticates to your program
To authenticate the program with another program or service: For example, the user starts your program which then accesses the user's email over the Internet using IMAP.
In the first case, you should never store the password (or an encrypted version of the password). Instead, you should hash the password with a high-quality salt and ensure that the hashing algorithm you use is computationally expensive (to prevent dictionary attacks) such as PBKDF2 or bcrypt. See Salted Password Hashing - Doing it Right for many more details. If you follow this approach, even if the hacker retrieves the salted, slow-hashed token, they can't do very much with it.
In the second case, there are a number of things done to make secret discovery harder (as you outline in your question), such as:
Keeping secrets encrypted until needed, decrypting on demand, then re-encrypting immediately after
Using address space randomization so each time the application runs, the keys are stored at a different address
Using the OS keystores
Using a "hard" language such as C/C++ rather than a VM-based, introspective language such as Java or Python
Such approaches are certainly better than nothing, but a skilled hacker will break it sooner or later.
Tokens
From a theoretical perspective, authentication is the act of proving that the person challenged is who they say they are. Traditionally, this is achieved with a shared secret (the password), but there are other ways to prove yourself, including:
Out-of-band authentication. For example, where I live, when I try to log into my internet bank, I receive a one-time password (OTP) as a SMS on my phone. In this method, I prove I am by virtue of owning a specific telephone number
Security token: To log in to a service, I have to press a button on my token to get a OTP which I then use as my password.
Other devices:
SmartCard, in particular as used by the US DoD where it is called the CAC. Python has a module called pyscard to interface to this
NFC device
And a more complete list here
The commonality between all these approaches is that the end-user controls these devices and the secrets never actually leave the token/card/phone, and certainly are never stored in your program. This makes them much more secure.
Session stealing
However (there is always a however):
Let us suppose you manage to secure the login so the hacker cannot access the security tokens. Now your application is happily interacting with the secured service. Unfortunately, if the hacker can run arbitrary executables on your computer, the hacker can hijack your session for example by injecting additional commands into your valid use of the service. In other words, while you have protected the password, it's entirely irrelevant because the hacker still gains access to the 'secured' resource.
This is a very real threat, as the multiple cross-site scripting attacks have shows (one example is U.S. Bank and Bank of America Websites Vulnerable, but there are countless more).
Secure proxy
As discussed above, there is a fundamental issue in keeping the credentials of an account on a third-party service or system so that the application can log onto it, especially if the only log-on approach is a username and password.
One way to partially mitigate this by delegating the communication to the service to a secure proxy, and develop a secure sign-on approach between the application and proxy. In this approach
The application uses a PKI scheme or two-factor authentication to sign onto the secure proxy
The user adds security credentials to the third-party system to the secure proxy. The credentials are never stored in the application
Later, when the application needs to access the third-party system, it sends a request to the proxy. The proxy logs on using the security credentials and makes the request, returning results to the application.
The disadvantages to this approach are:
The user may not want to trust the secure proxy with the storage of the credentials
The user may not trust the secure proxy with the data flowing through it to the third-party application
The application owner has additional infrastructure and hosting costs for running the proxy
Some answers
So, on to specific answers:
How does one securely store authentication credentials using python?
If storing a password for the application to authenticate the user, use a PBKDF2 algorithm, such as https://www.dlitz.net/software/python-pbkdf2/
If storing a password/security token to access another service, then there is no absolutely secure way.
However, consider switching authentication strategies to, for example the smartcard, using, eg, pyscard. You can use smartcards to both authenticate a user to the application, and also securely authenticate the application to another service with X.509 certs.
Can something be done about the language "everything is public" philosophy? I know "we're all consenting adults here", but should we be forced to choose between sharing our passwords with an attacker and using another language?
IMHO there is nothing wrong with writing a specific module in Python that does it's damnedest to hide the secret information, making it a right bugger for others to reuse (annoying other programmers is its purpose). You could even code large portions in C and link to it. However, don't do this for other modules for obvious reasons.
Ultimately, though, if the hacker has control over the computer, there is no privacy on the computer at all. Theoretical worst-case is that your program is running in a VM, and the hacker has complete access to all memory on the computer, including the BIOS and graphics card, and can step your application though authentication to discover its secrets.
Given no absolute privacy, the rest is just obfuscation, and the level of protection is simply how hard it is obfuscated vs. how much a skilled hacker wants the information. And we all know how that ends, even for custom hardware and billion-dollar products.
Using Python keyring
While this will quite securely manage the key with respect to other applications, all Python applications share access to the tokens. This is not in the slightest bit secure to the type of attack you are worried about.
I'm no expert in this field and am really just looking to solve the same problem that you are, but it looks like something like Hashicorp's Vault might be able to help out quite nicely.
In particular WRT to the problem of storing credentials for 3rd part services. e.g.:
In the modern world of API-driven everything, many systems also support programmatic creation of access credentials. Vault takes advantage of this support through a feature called dynamic secrets: secrets that are generated on-demand, and also support automatic revocation.
For Vault 0.1, Vault supports dynamically generating AWS, SQL, and Consul credentials.
More links:
Github
Vault Website
Use Cases
My app must read an SSL url from a third party. How do I best store the third party credentials in my own database, which protects the third party credentials from being compromised? Consider both absolute security and practicality. One-way hashing the credentials is not useful as I must restore credentials to plaintext for the SSL call. I'm using python on google app engine, and my app authenticates with google credentials.
encrypt credentials using e.g. AES and save the encryption key somewhere else (just moves the problem), or derive it from the credentials and keep the algorithm secret (just moves the problem)
encrypt credentials using a synchronous stream cipher, derive the (not)entropy from the credentials and keep the algorithm secret (just moves the problem)
on a separate web app dedicated to storing third party credentials, provide a SSL url to receive the third party credentials, this url is accessed with google credentials (same as my app) and can use authsub or something to transfer authorization to the other web app. this sounds more secure because its harder to hack a trivially simple webapp, and if my complex main app gets compromised the third party credentials aren't exposed.
what do you think about all approaches?
How are the credentials being used? If their use is only triggered by the original owner (eg. you're storing a bank card number and they're making their 2nd purchase) then they can provide a password at that point which is used as your encryption key. You would then never need to store that key locally and the database content alone is useless to an attacker.
It's a difficult task, and no approach will save you the trouble to make sure that there is no weak link. For starters, I wouldn't know if hosting on Google is the best way to go, because you will be forfeiting control (I really don't know if App Engine is designed with the required level of security in mind, you should find that out) and probably cannot do penetration testing (which you should.)
Having a separate small application is probably a good idea, but that doesn't save you from having to encrypt one way or the other the credentials themselves in this smaller app. It just buys you simplicity, which in turn makes things easier to analyze.
I personally would try to design the app so the key changes randomly after each use, having a kind of one time pad approach. You don't specify the app in enough detail to see if this is feasible.
If you need to reversably store credentials there simply is no solution. Use AES and keep the secret key under well paid armed guard.
If your using windows I would check out the Cred* Win32 API (advapi32.dll) it would at least allow you to punt key management to windows syskey where TPM and or bootup passphrase can provide protection against low level compromise (stolen disk drives)
Obviously if your application or the security context within which it runs is compromised none of the above would be of much help.
A decent book that covers this sort of situation is Cryptography In The Database.