Securely Erasing Password in Memory (Python)

Securely Erasing Password in Memory (Python) - python

How do you store a password entered by the user in memory and erase it securely after it is no longer need?
To elaborate, currently we have the following code:
username = raw_input('User name: ')
password = getpass.getpass()
mail = imaplib.IMAP4(MAIL_HOST)
mail.login(username, password)
After calling the login method, what do we need to do to fill the area of memory that contains password with garbled characters so that someone cannot recover the password by doing a core dump?
There is a similar question, however it is in Java and the solution uses character arrays:
How does one store password hashes securely in memory, when creating accounts?
Can this be done in Python?

Python doesn't have that low of a level of control over memory. Accept it, and move on. The best you can do is to del password after calling mail.login so that no references to the password string object remain. Any solution that purports to be able to do more than that is only giving you a false sense of security.
Python string objects are immutable; there's no direct way to change the contents of a string after it is created. Even if you were able to somehow overwrite the contents of the string referred to by password (which is technically possible with stupid ctypes tricks), there would still be other copies of the password that have been created in various string operations:
by the getpass module when it strips the trailing newline off of the inputted password
by the imaplib module when it quotes the password and then creates the complete IMAP command before passing it off to the socket
You would somehow have to get references to all of those strings and overwrite their memory as well.

There actually -is- a way to securely erase strings in Python; use the memset C function, as per Mark data as sensitive in python
Edited to add, long after the post was made: here's a deeper dive into string interning. There are some circumstances (primarily involving non-constant strings) where interning does not happen, making cleanup of the string value slightly more explicit, based on CPython reference counting GC. (Though still not a "scrubbing" / "sanitizing" cleanup.)

The correct solution is to use a bytearray() ... which is mutable, and you can safely clear keys and sensitive material from RAM.
However, there are some libraries, notably the python "cryptography" library that prevent "bytearray" from being used. This is problematic... to some extent these cryptographic libraries should ensure that only mutable types be used for key material.
There is SecureString which is a pip module that allows you to fully remove a key from memory...(I refactored it a bit and called it SecureBytes). I wrote some unit tests that demonstrate that the key is fully removed.
But there is a big caveat: if someone's password is "type", then the word "type" will get wiped from all of python... including in function definitions and object attributes.
In other words... mutating immutable types is a terrible idea, and unless you're extremely careful, can immediately crash any running program.
The right solution is: never use immutable types for key material, passwords, etc. Anyone building a cryptographic library or routine like "getpass" should be working with a "bytearray" instead of python strings.

If you don't need the mail object to persist once you are done with it, I think your best bet is to perform the mailing work in a subprocess (see the subprocess module.) That way, when the subprocess dies, so goes your password.

This could be done using numpy chararray:
import numpy as np
username = raw_input('User name: ')
mail = imaplib.IMAP4(MAIL_HOST)
x = np.chararray((20,))
x[:] = list("{:<20}".format(raw_input('Password: ')))
mail.login(username, x.tobytes().strip())
x[:] = ''
You would have to determine the maximum size of password, but this should remove the data when it is overwritten.

EDIT: removed the bad advice...
You can also use arrays like the java example if you like, but just overwriting it should be enough.
http://docs.python.org/library/array.html

Store the password in a list, and if you just set the list to null, the memory of the array stored in the list is automatically freed.

Related

cryptography.fernet.InvalidToken error when decrypting existing salt

Preamble:
I did search StackOverflow and I know someone had a question like mine, but now (a couple months after first seeing it), I could not find that answer again. I'm fully aware there is likely a duplicate, but for the life of me, I cannot find it in my searches.
The problem:
I have a cryptography module which generates a hashed password used for encrypting and decrypting secrets stored in a TinyDB database. All the functionality works except for decrypting the secret. The password is verifying correctly, so I know that isn't the issue. I am almost positive my issue is in getting the salt encoded properly in the decryption function.
Encryption code:
pas = use_password(args[0])
salt = urandom(16)
kdf = PBKDF2HMAC(algorithm=hashes.SHA256(),length=32,salt=salt,iterations=100000,)
sec = base64.b64encode(kdf.derive(bytes(pas,'utf-8')))
token = Fernet(sec).encrypt(bytes(key,'utf-8'))
salt = base64.b64encode(salt).decode('utf-8')
return token, salt
Decryption:
pas = use_password(pw)
***salt = base64.b64decode(salt)***
kdf = PBKDF2HMAC(algorithm=hashes.SHA256(),length=32,salt=salt,iterations=100000,)
sec = base64.b64encode(kdf.derive(bytes(pas,'utf-8')))
# BUG: Reported defects -#ctrenthem at 7/26/2021, 8:07:00 PM: cryptography.fernet.InvalidToken error
key = Fernet(sec).decrypt(bytes(token,'utf-8'))
return key # Returns a byte object which may need to be converted to a string.
What I tried:
I've already tried different variations of duplicating the salt encryption, but keep getting errors. Part of the problem is that base64.encode requires two "file objects" for input and output, and does not accept a string variable, which makes it unusable for my needs.
I could resolve this by creating a temp file, but that would be the worst solution because it involves
creating a new file in the filesystem with partially decrypted information, outside of RAM and the database, weakening the security of the whole system
Creates a new temp file, which is just a waste of system resources
Adds a whole lot more code just to implement something that I am positive can be implemented differently with one or two words.
Despite knowing that there is another solution, I cannot figure it out and am lost as to which base64 or bytes function will accomplish the job.

Passing a record over a socket

I have basic socket communication set up between python and Delphi code (text only). Now I would like to send/receive a record of data on both sides. I have a Record "C compatible" and would like to pass records back and forth have it in a usable format in python.
I use conn.send("text") in python to send the text but how do I send/receive a buffer with python and access the record items sent in python?
Record
TPacketData = record
pID : Integer;
dataType : Integer;
size : Integer;
value : Double;
end;

I don't know much about python, but I have done a lot between Delphi, C++, C# and Java even with COBOL.Anyway, to send a record from Delphi to C first you need to pack the record at both ends,
in Deplhi
MyRecord = pack record
in C++
#pragma pack(1)
I don’t know in python but I guess there must be a similar one. Make sure that at both sides the sizeof(MyRecord) is the same length.Also, before sending the records, you should take care about byte ordering (you know, Little-Endian vs Big-Endian), use the Socket.htonl() and Socket.ntohl() in python and the equivalent in Deplhi which are in WinSock unit. Also a "double" in Delphi could not be the same as in python, in Delphi is 8 bytes check this as well, and change it to Single(4 bytes) or Extended (10 bytes) whichever matches.
If all that match then you could send/receive binary records in one shut, otherwise, I'm afraid, you have to send the individual fields one by one.

I know this answer is a bit late to the game, but may at least prove useful to other people finding this question in their search-results. Because you say the Delphi code sends and receives "C compatible data" it seems that for the sake of the answer about Python's handling it is irrelevant whether it is Delphi (or any other language) on the other end...
The python struct and socket modules have all the functionality for the basic usage you describe. To send the example record you would do something like the below. For simplicity and sanity I have presumed signed integers and doubles, and packed the data in "network order" (bigendian). This can easily be a one-liner but I have split it up for verbosity and reusability's sake:
import struct
t_packet_struc = '>iiid'
t_packet_data = struct.pack(t_packet_struc, pid, data_type, size, value)
mysocket.sendall(t_packet_data)
Of course the mentioned "presumptions" don't need to be made, given tweaks to the format string, data preparation, etc. See the struct inline help for a description of the possible format strings - which can even process things like Pascal-strings... By the way, the socket module allows packing and unpacking a couple of network-specific things which struct doesn't, like IP-address strings (to their bigendian int-blob form), and allows explicit functions for converting data bigendian-to-native and vice-versa. For completeness, here is how to unpack the data packed above, on the Python end:
t_packet_size = struct.calcsize(t_packet_struc)
t_packet_data = mysocket.recv(t_packet_size)
(pid, data_type, size, value) = struct.unpack(t_packet_struc,
t_packet_data)
I know this works in Python version 2.x, and suspect it should work without changes in Python version 3.x too. Beware of one big gotcha (because it is easy to not think about, and hard to troubleshoot after the fact): Aside from different endianness, you can also distinguish between packing things using "standard size and alignment" (portably) or using "native size and alignment" (much faster) depending on how you prefix - or don't prefix - your format string. These can often yield wildly different results than you intended, without giving you a clue as to why... (there be dragons).

pefile How do I nullify the first 8 bytes of a file?

How do I nullify the first 8 bytes of a file?
this example does not work:
import pefile
pe = pefile.pe(In)
pe.set_dword_at_rva(0,0)
pe.set_dword_at_rva(0,4)
pe.write(Out)
pe.close()
How i can rename import functions in the file?
this example does not work:
for entry in pe.DIRECTORY_ENTRY_IMPORT:
print entry.dll
for imp in entry.imports:
imp.name = 'NewIMports'
pe.write(Out)
sorry for my english

I'd propose to use the standard way (i. e. not using pefile) of doing this:
with file('filename.bla', 'wr+') as f:
f.write('\0' * 8)

You've got multiple problems with your code, but I'm not sure which ones are causing whatever problems you're experiencing (since you haven't explained the problems).
First, you want to set the first 8 bytes to 0, but you're using set_dword_at_rva rather than set_dword_at_offset. An RVA ("Relative Virtual Address") is the offset in memory to the runtime address of a section (whatever that runtime address ends up being). An offset is the offset on disk from the start of the file. If that's what you want, use that. (While you're at it, the qword functions are the way to set 8 bytes at a time. Using dword instead can't cause any problems except for endianness, which can't possibly matter for just setting to 0… but still, why make it harder on yourself?)
So:
pe = pefile.pe(In)
pe.set_qword_at_offset(0, 0)
Meanwhile, if you're modifying arbitrary strings within the file, while pefile can make room for them, it does not adjust any other headers that need to be adjusted to compensate. See "Notes about the write support" in the docs for details. So, imp.name = 'NewIMports' may work, but generate a broken PE, if the old name was shorter.
On top of that, renaming all of the imports to have the same name will definitely generate a broken PE.

How to inspect mystery deserialized object in Python

I'm trying to load JSON back into an object. The "loads" method seems to work without error, but the object doesn't seem to have the properties I expect.
How can I go about examining/inspecting the object that I have (this is web-based code).
results = {"Subscriber": {"firstname": "Neal", "lastname": "Walters"}}
subscriber = json.loads(results)
for item in inspect.getmembers(subscriber):
self.response.out.write("<BR>Item")
for subitem in item:
self.response.out.write("<BR> SubItem=" + subitem)
The attempt above returned this:
Item
SubItem=__class__
I don't think it matters, but for context:
The JSON is actually coming from a urlfetch in Google App Engine to
a rest web service created using this utility:
http://code.google.com/p/appengine-rest-server.
The data is being retrieved from a datastore with this definition:
class Subscriber(db.Model):
firstname = db.StringProperty()
lastname = db.StringProperty()
Thanks,
Neal
Update #1: Basically I'm trying to deserialize JSON back into an object.
In theory it was serialized from an object, and I want to now get it back into an object.
Maybe the better question is how to do that?
Update #2: I was trying to abstract a complex program down to a few lines of code, so I made a few mistakes in "pseudo-coding" it for purposes of posting here.
Here's a better code sample, now take out of website where I can run on PC.
results = '{"Subscriber": {"firstname": "Neal", "lastname": "Walters"}}'
subscriber = json.loads(results)
for key, value in subscriber.items():
print " %s: %s" %(key, value)
The above runs, what it displays doesn't look any more structured than the JSON string itself. It displays this:
Subscriber: {u'lastname': u'Walters', u'firstname': u'Neal'}
I have more of a Microsoft background, so when I hear serialize/deserialize, I think going from an object to a string, and from a string back to an object. So if I serialize to JSON, and then deserialize, what do I get, a dictionary, a list, or an object? Actually, I'm getting the JSON from a REST webmethod, that is on my behalf serializing my object for me.
Ideally I want a subscriber object that matches my Subscriber class above, and ideally, I don't want to write one-off custom code (i.e. code that would be specific to "Subscriber"), because I would like to do the same thing with dozens of other classes. If I have to write some custom code, I will need to do it generically so it will work with any class.
Update #3: This is to explain more of why I think this is a needed tool. I'm writing a huge app, probably on Google App Engine (GAE). We are leaning toward a REST architecture for several reasons, but one is that our web GUI should access the data store via a REST web layer. (I'm a lot more used to SOAP, so switching to REST is a small challenge in itself). So one of the classic ways of getting and update data is through a business or data tier. By using the REST utility mention above, I have the choice of XML or JSON. I'm hoping to do a small working prototype of both before we develop the huge app). Then, suppose we have a successful app, and GAE doubles it prices. Then we can rewrite just the data tier, and take our Python/Django user tier (web code), and run it on Amazon or somewhere else.
If I'm going to do all that, why would I want everything to be dictionary objects. Wouldn't I want the power of full-blown class structure? One of the next tricks is sort of an object relational mapping (ORM) so that we don't necessarily expose our exact data tables, but more of a logical layer.
We also want to expose a RESTful API to paying users, who might be using any language. For them, they can use XML or JSON, and they wouldn't use the serialize routine discussed here.

json only encodes strings, floats, integers, javascript objects (python dicts) and lists.
You have to create a function to turn the returned dictionary into a class and then pass it to a json.loads using the object_hook keyword argument along with the json string. Heres some code that fleshes it out:
import json
class Subscriber(object):
firstname = None
lastname = None
class Post(object):
author = None
title = None
def decode_from_dict(cls,vals):
obj = cls()
for key, val in vals.items():
setattr(obj, key, val)
return obj
SERIALIZABLE_CLASSES = {'Subscriber': Subscriber,
'Post': Post}
def decode_object(d):
for field in d:
if field in SERIALIZABLE_CLASSES:
cls = SERIALIZABLE_CLASSES[field]
return decode_from_dict(cls, d[field])
return d
results = '''[{"Subscriber": {"firstname": "Neal", "lastname": "Walters"}},
{"Post": {"author": {"Subscriber": {"firstname": "Neal",
"lastname": "Walters"}}},
"title": "Decoding JSON Objects"}]'''
result = json.loads(results, object_hook=decode_object)
print result
print result[1].author
This will handle any class that can be instantiated without arguments to the constructor and for which setattr will work.
Also, this uses json. I have no experience with simplejson so YMMV but I hear that they are identical.
Note that although the values for the two subscriber objects are identical, the resulting objects are not. This could be fixed by memoizing the decode_from_dict class.

results in your snippet is a dict, not a string, so the json.loads would raise an exception. If that is fixed, each subitem in the inner loop is then a tuple, so trying to add it to a string as you are doing would raise another exception. I guess you've simplified your code, but the two type errors should already show that you simplified it too much (and incorrectly). Why not use an (equally simplified) working snippet, and the actual string you want to json.loads instead of one that can't possibly reproduce your problem? That course of action would make it much easier to help you.
Beyyond peering at the actual string, and showing some obvious information such as type(subscriber), it's hard to offer much more help based on that clearly-broken code and such insufficient information:-(.
Edit: in "update2", the OP says
It displays this: Subscriber: {u'lastname': u'Walters', u'firstname': u'Neal'}
...and what else could it possibly display, pray?! You're printing the key as string, then the value as string -- the key is a string, and the value is another dict, so of course it's "stringified" (and all strings in JSON are Unicode -- just like in C# or Java, and you say you come from a MSFT background, so why does this surprise you at all?!). str(somedict), identically to repr(somedict), shows the repr of keys and values (with braces around it all and colons and commas as appropriate separators).
JSON, a completely language-independent serialization format though originally centered on Javascript, has absolutely no idea of what classes (if any) you expect to see instances of (of course it doesn't, and it's just absurd to think it possibly could: how could it possibly be language-independent if it hard-coded the very concept of "class", a concept which so many languages, including Javascript, don't even have?!) -- so it uses (in Python terms) strings, numbers, lists, and dicts (four very basic data types that any semi-decent modern language can be expected to have, at least in some library if not embedded in the language proper!). When you json.loads a string, you'll always get some nested combination of the four datatypes above (all strings will be unicode and all numbers will be floats, BTW;-).
If you have no idea (and don't want to encode by some arbitrary convention or other) what class's instances are being serialized, but absolutely must have class instances back (not just dicts etc) when you deserialize, JSON per se can't help you -- that metainformation cannot possibly be present in the JSON-serialized string itself.
If you're OK with the four fundamental types, and just want to see some printed results that you consider "prettier" than the default Python string printing of the fundamental types in question, you'll have to code your own recursive pretty-printing function depending on your subjective definition of "pretty" (I doubt you'd like Python's own pprint standard library module any more than you like your current results;-).

My guess is that loads is returning a dictionary. To iterate over its content, use something like:
for key, value in subscriber.items():
self.response.out.write("%s: %s" %(key, value))

Best practice for two way hashing in python?

I want to allow users to validate their email address by clicking on a link. The link would look something like
http://www.example.com/verifyemail?id=some-random-string
When I am sending this email, I want to be able to easily generate this 'some-random-string' from row id of user, an integer. and when user clicks on this link, generate that integer back.
Only requirement is this 'some-random-string' should be as opaque and non-guessable to the user as possible.
Finally, this is what I settled on
def p3_encrypt_safe(plain, key):
return base64.urlsafe_b64encode(p3_encrypt(plain, key))
used the nice crypto library from http://www.nightsong.com/phr/crypto/p3.py
addition of base64 safe encoding is mine.

Use encryption, that's exactly what it's designed for. Blowfish, AES, even DES3 if you don't need particularly high security.
Alternatively, you could compute an SHA-256 or SHA-512 (or whatever) hash of the email address and store it in a database along with the email address itself. That way you can just look up the email address using the hash as a key.

Your best choice is to generate a hash (one-way function) of some of the user's data. For example, to generate a hash of user's row id, you could use something like:
>>> import hashlib
>>> hashlib.sha1('3').hexdigest()
'77de68daecd823babbb58edb1c8e14d7106e83bb'
However, basing your pseudorandom string only on a row id is not very secure, as the user could easily reverse the hash (try googling 77de68daecd823babbb58edb1c8e14d7106e83bb) of such a short string.
A simple solution here is to "salt" the hashed string, i.e. add the same secret string to every value that is hashed. For example:
>>> hashlib.sha1('3' + 'email#of.user' + 'somestringconstant').hexdigest()
'b3ca694a9987f39783a324f00cfe8279601decd3'
If you google b3ca694a9987f39783a324f00cfe8279601decd3, probably the only result will be a link to this answer :-), which is not a proof, but a good hint that this hash is quite unique.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.