Link encryption with django and python - python

I'm having a download application and I want to encrypt the links for file downloads, so that the user doesn't know the id of the file. Furthermore I'd like to include date/time in the link, and check when serving the file if the link is still valid.
There's a similar question asked here, but I'm running into problems with the character encodings, since I'd like to have urls like /file/encrypted_string/ pointing to the views for downloading, so best would be if the encrypted result only contains letters and numbers. I prefer not using a hash, because I do not want to store a mapping hash <> file somewhere. I do not know if there's a good encryption out there that fulfills my needs...

Sounds like it would be easy, especially if you don't mind using the same encryption key forever. Just delimit a string (/ or : works as well as anything) for the file name, the date/time, and anything else you want to include, then encrypt and b64 it! Remember to use urlsafe_b64encode, not the regular b64encode, which will produce broken urls. It'll be a long string, but so what?
I've done this a few times, using a slight variation: Add a few random characters as the last piece of the key and include that at the beginning or end of the string - more secure than always reusing the same key, without the headaches of a database mapping. As long as your key is complex enough the exposed bits won't be enough to let crackers generate requests at will.
Of course, if the file doesn't exist, don't let them see the decoded result...

By far the easiest way to handle this is to generate a random string for each file, and store a mapping between the key strings and the actual file name or file id. No complex encryption required.
Edit:
You will need to store the date anyway to implement expiring the links. So, you can store the expiration date, a long with the key, and periodically cull out expired links from the table.

If your problem is just one of encryption and decryption of short strings, Python's Crypto module makes it a breeze.

You can encode any character into the url, with django, you may use it's urlencode filter.
However, generating a random string and saving the mapping is more secure.

Related

How to restrict access of a file from python

So I made a python phonebook program which allows the user to add contacts, change contact info, delete contacts, etc. and write this data to a text file which I can read from every time the program is opened again and get existing contact data. However, in my program, I write to the text file in a very specific manner so I know what the format is and I can set it up to be read very easily. Since it is all formatted in a very specific manner, I want to prevent the user from opening the file and accidentally messing the data up with even just a simple space. How can I do this?
I want to prevent the user from opening the file and accidentally messing the data up...
I will advise you not to prevent users from accessing their own files. Messing with file permissions might result in some rogue files that the user won't be able to get rid of. Trust your user. If they delete or edit a sensitive file, it is their fault. Think of it this way - you have plenty of software installed on your own computer, but how often do you open them in an editor and make some damaging changes? Even if you do edit these files, does the application developer prevent you from doing so?
If you do intent to allow users to change/modify that file give them a good documentation on how to do it. This is the most apt thing to do. Also, make a backup file during run-time (see tempfile below) as an added layer of safety. Backups are almost always a good idea.
However, you can take some precautions to hide that data, so that users can't accidentally open them in an editor by double-clicking on it. There are plenty of options to do this including
Creating a binary file in a custom format
Zipping the text file using zipfile module.
Using tempfile module to create a temporary file, and zipping it using the previous option. (easy to discard if no changes needs to be saved)
Encryption
Everything from here on is not about preventing access, but about hiding the contents of your file
Note that all the above options doesn't have to be mutually exclusive. The advantages of using a zip file is that it will save some space, and it is not easy to read and edit in a text editor (binary data). It can be easily manipulated in your Python Script:
with ZipFile('spam.zip') as myzip:
with myzip.open('eggs.txt') as myfile:
print(myfile.read())
It is as simple as that! A temp file on the other hand, is a volatile (delete=True/False) file and can be discarded once you are done with it. You can easily copy its contents to another file or zip it before you close it as mentioned above.
with open tempfile.NamedTemporaryFile() as temp:
temp.write(b"Binary Data")
Again, another easy process. However, you must zip or encrypt it to achieve the final result. Now, moving on to encryption. The easiest way is an XOR cipher. Since we are simply trying to prevent 'readability' and not concerned about security, you can do the following:
recommended solution (XOR cipher):
from itertools import cycle
def xorcize(data, key):
"""Return a string of xor mutated data."""
return "".join(chr(ord(a)^ord(b)) for a, b in zip(data, cycle(key)))
data = "Something came in the mail today"
key = "Deez Nuts"
encdata = xorcize(data, key)
decdata = xorcize(encdata, key)
print(data, encdata, decdata, sep="\n")
Notice how small that function is? It is quite convenient to include it in any of your scripts. All your data can be encrypted before writing them to a file, and save it using a file extension such as ".dat" or ".contacts" or any custom name you choose. Make sure it is not opened in an editor by default (such as ".txt", ".nfo").
It is difficult to prevent user access to your data storage completely. However, you can either make it more difficult for the user to access your data or actually make it easier not to break it. In the second case, your intention would be to make it clear to the user what the rules are hope that not destroying the data is in the user's own best interest. Some examples:
Using a well established, human-readable serialization format, e.g. JSON. This is often the best solution as it actually allows an experienced user to easily inspect the data, or even modify it. Inexperienced users are unlikely to mess with the data anyways, and an experienced user knowing the format will follow the rules. At the same time, your parser will detect inconsistencies in the file structure.
Using a non-human readable, binary format, such as Pickle. Those files are likely to be left alone by the user as it is pretty clear that they are not meant to be modified outside the program.
Using a database, such as MySQL. Databases provide special protocols for data access which can be used to ensure data consistency and also make it easier to prevent unwanted access.
Assuming that you file format has a comment character, or can be modified to have one, add these lines to the top of your text file:
# Do not edit this file. This file was automatically generated.
# Any change, no matter how slight, may corrupt this file beyond repair.
The contact file belongs to your user, not to you. The best you can do is to inform the user. The best you can hope for is that the user will make intelligent use of your product.
I think the best thing to do in your case is just to choose a new file extension for your format.
It obviously doesn't prevent editing, but it clearly states for user that it has some specific format and probably shouldn't be edited manually. And GUI won't open it by default probably (it will ask what to edit it with).
And that would be enough for any case I can imagine if what you're worrying about is user messing up their own data. I don't think you can win with user who actively tries to mess up their data. Also I doubt any program does anything more. The usual "contract" is that user's data is, well, user's so it can be destroyed by the user.
If you actually won't to prevent editing you could change permissions to forbid editing with os.chmod for example. User would still be able to lift them manually and there will be some time window when you are actually writing, so it will be neither clean nor significantly more effective. And I would expect more trouble than benefit from such a solution.
If you want to actually make it impossible for a user to read/edit a file you can run your process from a different user (or use some heavier like SELinux or other MAC mechanism) and so you could make it really impossible to damage the data (with user's permissions). But it is not worth the effort if it is only about protecting the user from the not-so-catastophic effects of being careless.

Sanitizing arbitrary user input

I'm working on a web service that accepts STL files, does some simple processing on them (count facets, calculate total volume, etc) and returns some stats to users. There's no database or persistence planned (although that might be added at some point in the future.) Users can either upload files or point to a URL.
What should I be thinking about in order to sanitize use input and secure the Tornado server?
I'm using the templating system which auto-escapes html.
I can also impliment logic that checks that input "looks like" valid STL format as I parse it: binary STL is just floats; I also know what the format for ascii STL looks like.
I've done a bit of initial research including:
When is it Best to Sanitize User Input?
What is the best way to sanitize user inputs?
Many others.
Am I missing anything obvious?
Without you giving the code, all we can do is to speculate:
Think about the boundary conditions while converting between representations (binary -> float, ascii -> binary)
Think about the consequent stats calculated which will be rendered in a web browser? Is it possible to print some UTF-8 code, or is it possible to insert javascript?

How to reliable tell the uploaded file type (text or binary)?

I have an application where users should be able to upload a wide variety of files, but I need to know for each file, if I can safely display its textual representation as plain text.
Using python-magic like
m = Magic(mime=True).from_buffer(cgi.FieldStorage.file.read())
gives me the correct MIME type.
But sometimes, the MIME type for scripts is application/*, so simply looking for m.startswith('text/') is not enough.
Another site suggested using
m = Magic().from_buffer(cgi.FieldStorage.file.read())
and checking for 'text' in m.
Would the second approach be reliable enough for a collection of arbitrary file uploads or could someone give me another idea?
Thanks a lot.
What is your goal? Do you want the real mime type? Is that important for security reasons? Or is it "nice to have"?
The problem is that the same file can have different mime types. When a script file has a proper #! header, python-magic can determine the script type and tell you. If the header is missing, text/plain might be the best you can get.
This means there is no general "will always work" magic solution (despite the name of the module). You will have to sit down and think what information you can get, what it means and how you want to treat it.
The secure solution would be to create a list of mime types that you accept and check them with:
allowed_mime_types = [ ... ]
if m in allowed_mime_types:
That means only perfect matches are accepted. It also means that your server will reject valid files which don't have the correct mime type for some reason (missing header, magic failed to recognize the file, you forgot to mention the mime type in your list).
Or to put it another way: Why do you check the mime type of the file if you don't really care?
[EDIT] When you say
I need to know for each file, if I can safely display its textual representation as plain text.
then this isn't as easy as it sounds. First of all, "text" files have no encoding stored in them, so you will need to know the encoding that the user used when they created the file. This isn't a trivial task. There are heuristics to do so but things get hairy when encodings like ISO 8859-1 and 8859-15 are used (the latter has the Euro symbol).
To fix this, you will need to force your users to either save the text files in a specific encoding (UTF-8 is currently the best choice) or you need to supply a form into which users will have to paste the text.
When using a form, the user can see whether the text is encoded correctly (they see it on the screen), they can fix any problems and you can make sure that the browser sends you the text encoded with UTF-8.
If you can't do that, your only choice is to check for any bytes below 0x20 in the input with the exception of \r, \n and \t. That is a pretty good check for "is this a text document".
But when users use umlauts (like when you write an application that is being used world wide), this approach will eventually fail unless you can enforce a specific encoding on the user's side (which you probably can't since you don't trust the user).
[EDIT2] Since you need this to check actual source code: If you want to make sure the source code is "safe", then parse it. Most languages allow to parse the code without actually executing it. That would give you some real information (because the parsers know what to look for) and you wouldn't need to make wild guesses :-)
After playing around a bit, I discovered that I can propably use the Magic(mime_encoding=True) results!
I ran a simple script on my Dropbox folder and grouped the results both by encoding and by extension to check for irregularities.
But it does seem pretty usable by looking for 'binary' in encoding.
I think I will hang on to that, but thank you all.

xml file encryption with Python

i am making a mail client and i have made an option in which user can save his/her profile
and i saving all details in an xml file using SXML lib in python . now i want that file to be encrypted otherwise any one can see the details...How do i Do dat?
I have been using a Recipe from Active state for some time, you can find stronger algorithms but if you just need to keep away the curious it will be ok :)
If you really need a higher degree of confidence you can try pyDES and use a TripleDES for the encryptation.
TripleDES
An easy way:
Accept the password from the user and then store it use base64.
>>> import base64
>>> print base64.b64encode("password")
cGFzc3dvcmQ=
>>> print base64.b64decode("cGFzc3dvcmQ=")
password
So encode the password and save it in the XML file and then when you want to read from it, decode it.
DOCS
PS: I am not saying this is highly secure, but still this will suffice for a casual glance at the file. Again if you need it to be really secure (is that even possible?), then you should find something else. This solution is more about being obscure.

Caching system for dynamically created files?

I have a web server that is dynamically creating various reports in several formats (pdf and doc files). The files require a fair amount of CPU to generate, and it is fairly common to have situations where two people are creating the same report with the same input.
Inputs:
raw data input as a string (equations, numbers, and
lists of words), arbitrary length, almost 99% will be less than about 200 words
the version of the report creation tool
When a user attempts to generate a report, I would like to check to see if a file already exists with the given input, and if so return a link to the file. If the file doesn't already exist, then I would like to generate it as needed.
What solutions are already out there? I've cached simple http requests before, but the keys were extremely simple (usually database id's)
If I have to do this myself, what is the best way. The input can be several hundred words, and I was wondering how I should go about transforming the strings into keys sent to the cache.
//entire input, uses too much memory, one to one mapping
cache['one two three four five six seven eight nine ten eleven...']
//short keys
cache['one two'] => 5 results, then I must narrow these down even more
Is this something that should be done in a database, or is it better done within the web app code (python in my case)
Thanks you everyone.
This is what Apache is for.
Create a directory that will have the reports.
Configure Apache to serve files from that directory.
If the report exists, redirect to a URL that Apache will serve.
Otherwise, the report doesn't exist, so create it. Then redirect to a URL that Apache will serve.
There's no "hashing". You have a key ("a string (equations, numbers, and lists of words), arbitrary length, almost 99% will be less than about 200 words") and a value, which is a file. Don't waste time on a hash. You just have a long key.
You can compress this key somewhat by making a "slug" out of it: remove punctuation, replace spaces with _, that kind of thing.
You should create an internal surrogate key which is a simple integer.
You're simply translating a long key to a "report" which either exists as a file or will be created as a file.
The usual thing is to use a reverse proxy like Squid or Varnish

Categories