I have the following situation here. My OS shows that django TemporaryUploadedFile which I got via the POST request does not exist anymore but somehow this uploaded file can be read.
Here is the code
text_file = request.FILES['text_file']
print(text_file.temporary_file_path())
os.system('ls -l ' + text_file.temporary_file_path())
fs = FileSystemStorage()
file_new =fs.save(text_file.name, text_file)
print(text_file.temporary_file_path())
os.system('ls -l ' + text_file.temporary_file_path())
fs.delete(file_new)
for chunk in text_file.chunks():
text += chunk.decode(encoding)
print('Got text OK.')
This gives the following output:
/tmp/tmp0tngal9t.upload foo.txt
-rw------- 1 mine machine 3072889 oct 18 19:29 /tmp/tmp0tngal9t.upload
/tmp/tmp0tngal9t.upload foo.txt
ls: cannot access '/tmp/tmp0tngal9t.upload': No such file or directory
Got text OK.
So TemporaryUploadedFile is disappeared after it was saved to file_new which later is also deleted. Anyway text_file is successfully read by chunks and I get all the text from uploaded foo.txt file. How it is possible? From where text_file.chunks() gets the data if text_file does not exist anymore?
I use:
python 3.5.2
django 1.10.2
ubuntu 16.04.1
I found out that this problem still remains for bare python, so it is not particularly related to django as in this example I just read text_file which were open in request.FILES['text_file'].
I re-asked the similar question here focusing on python only. It turned out that the problem is not so related with python either, but with Linux/Unix system file management. I quote here the answer of Jean-François Fabre:
Nothing to do with Python. In C, Fortran, or Visual Cobol you'd have
the same behaviour as long as the code gets its handle from open
system call.
On Linux/Unix systems, once a process has a handle on a file, it can
read it, even if the file is deleted. For more details check that
question (I wasn't sure if it was OK to do that, it seems to be)
On Windows you just wouldn't be able to delete the file as long as
it's locked by a process.
Related
Goal
I am trying to create and edit a temporary file in vim (exactly the same behavior as a commit script in git/hg/svn).
Current code
I found a method to do so in this answer:
call up an EDITOR (vim) from a python script
import sys, tempfile, os
from subprocess import call
EDITOR = os.environ.get('EDITOR','vim')
initial_message = "write message here:"
with tempfile.NamedTemporaryFile(suffix=".tmp") as tmp:
tmp.write(initial_message)
tmp.flush()
call([EDITOR, tmp.name])
tmp.seek(0)
print tmp.read()
The Issue
When I run the above code, the tempfile does not read the changes made in vim. Here is the output after I have added several other lines in vim:
fgimenez#dn0a22805f> ./note.py
Please edit the file:
fgimenez#dn0a22805f>
Now for the interesting (weird) part. If I change my editor to nano or emacs, the script works just fine! So far, this only seems to break when I use vim or textedit.
As another experiment, I tried calling a couple editors in a row to see what happens. The modified code is:
with tempfile.NamedTemporaryFile(suffix=".tmp") as tmp:
tmp.write(initial_message)
tmp.flush()
# CALLING TWO EDITORS HERE, VIM THEN NANO
call(['vim', tmp.name])
raw_input("pausing between editors, just press enter")
call(['nano', tmp.name])
tmp.seek(0)
print tmp.read()
I.e. I edit with vim then nano. What happens is that nano DOES register the changes made by vim, but python doesn't register anything (same result as before):
fgimenez#dn0a22805f> ./note.py
Please edit the file:
fgimenez#dn0a22805f>
BUT, if I edit with nano first, then vim, python still registers the nano edits but not the vim ones!
with tempfile.NamedTemporaryFile(suffix=".tmp") as tmp:
tmp.write(initial_message)
tmp.flush()
# CALLING TWO EDITORS HERE, NANO THEN VIM
call(['nano', tmp.name])
raw_input("pausing between editors, just press enter")
call(['vim', tmp.name])
tmp.seek(0)
print tmp.read()
Ouput from running the program and adding a\nb\nc in nano and d\ne\nf in vim:
fgimenez#dn0a22805f> ./note.py
Please edit the file:
a
b
c
fgimenez#dn0a22805f>
It seems as if using vim or textedit eliminates the ability to append to the file. I'm completely confused here, and I just want to edit my notes in vim...
Edit 1: Clarifications
I am on osx Mavericks
I call vim from the shell (not MacVim) and end the session with ZZ (also tried :w :q)
I'm no Python expert, but it looks like you're keeping the handle to the temp file open while Vim is editing the file, and then attempt to read in the edited contents from the handle. By default, Vim creates a copy of the original file, writes the new contents to another file, and then renames it to the original (see :help 'backupcopy' for the details; other editors like nano apparently don't do it this way). This means that the Python handle still points to the original file (even though it may have already been deleted from the file system, depending on the Vim settings), and you get the original content.
You either need to reconfigure Vim (see :help 'writebackup'), or (better) change the Python implementation to re-open the same temp file name after Vim has exited, in order to get a handle to the new written file contents.
I had the same problem on OS X after my code worked fine on Linux. As Ingo suggests, you can get the latest contents by re-opening the file. To do this, you probably want to create a temporary file with delete=False and then explicitly delete the file when you're done:
import sys, tempfile, os
from subprocess import call
EDITOR = os.environ.get('EDITOR','vim')
initial_message = "write message here:"
with tempfile.NamedTemporaryFile(suffix=".tmp", delete=False) as tmp:
tmp.write(initial_message)
tmp.flush()
call([EDITOR, tmp.name])
tmp.close()
with open(tmp.name) as f:
print f.read()
os.unlink(tmp.name)
I have a server/client socket pair in Python. The server receives specific commands, then prepares the response and send it to the client.
In this question, my concern is just about possible injections in the code: if it could be possible to ask the server doing something weird with the 2nd parameter -- if the control on the command contents is not sufficient to avoid undesired behaviour.
EDIT:
according to advices received
added parameter shell=True when calling check_output on windows. Should not be dangerous since the command is a plain 'dir'.
.
self.client, address = self.sock.accept()
...
cmd = bytes.decode(self.client.recv(4096))
ls: executes a system command but only reads the content of a directory.
if cmd == 'ls':
if self.linux:
output = subprocess.check_output(['ls', '-l'])
else:
output = subprocess.check_output('dir', shell=True)
self.client.send(output)
cd: just calls os.chdir.
elif cmd.startswith('cd '):
path = cmd.split(' ')[1].strip()
if not os.path.isdir(path):
self.client.send(b'is not path')
else:
os.chdir(path)
self.client.send( os.getcwd().encode() )
get: send the content of a file to the client.
elif cmd.startswith('get '):
file = cmd.split(' ')[1].strip()
if not os.path.isfile(file):
self.client.send(b'ERR: is not a file')
else:
try:
with open(file) as f: contents = f.read()
except IOError as er:
res = "ERR: " + er.strerror
self.client.send(res.encode())
continue
... (send the file contents)
Except in implementation details, I cannot see any possibilities of direct injection of arbitrary code because you do not use received parameters in the only commands you use (ls -l and dir).
But you may still have some security problems :
you locate commands through the path instead of using absolute locations. If somebody could change the path environment variable what could happen ... => I advice you to use directly os.listdir('.') which is portable and has less risks.
you seem to have no control on allowed files. If I correctly remember reading CON: or other special files on older Windows version gave weird results. And you should never give any access to sensible files, configuration, ...
you could have control on length of asked files to avoid users to try to break the server with abnormally long file names.
Typical issues in a client-server scenario are:
Tricking the server into running a command that is determined by the client. In the most obvious form this happens if the server allows the client to run commands (yes, stupid). However, this can also happen if the client can supply only command parameters but shell=True is used. E.g. using subprocess.check_output('dir %s' % dir, shell=True) with a client-supplied dir variable would be a security issue, dir could have a value like c:\ && deltree c:\windows (a second command has been added thanks to the flexibility of the shell's command line interpreter). A relatively rare variation of this attack is the client being able to influence environment variables like PATH to trick the server into running a different command than intended.
Using unexpected functionality of built-in programming language functions. For example, fopen() in PHP won't just open files but fetch URLs as well. This allows passing URLs to functionality expecting file names and playing all kinds of tricks with the server software. Fortunately, Python is a sane language - open() works on files and nothing else. Still, database commands for example can be problematic if the SQL query is generated dynamically using client-supplied information (SQL Injection).
Reading data outside the allowed area. Typical scenario is a server that is supposed to allow only reading files from a particular directory, yet by passing in ../../../etc/passwd as parameter you can read any file. Another typical scenario is a server that allows reading only files with a particular file extension (e.g. .png) but passing in something like passwords.txt\0harmless.png still allows reading files of other types.
Out of these issues only the last one seems present in your code. In fact, your server doesn't check at all which directories and files the client should be allowed to read - this is a potential issue, a client might be able to read confidential files.
I feel like I am taking crazy pills. So for security on an api at work I am using, I have to read 2 things from the registry, that I then pass to suds. The problem is with reading the registry values. No matter what I do, I get "Error2 the system cannot find the file specified". I know that the registry file is there, yet it won't let me read it. I have tried the code below on 2 different 2008 r2 servers. On one windows 7 box, I am able to read the values...but only on one machine. Below is the code, with the actual directory I need changed(to protect anonymity)
from _winreg import *
key = OpenKey(HKEY_LOCAL_MACHINE, r"Software\a\b", 0, KEY_ALL_ACCESS)
devguid = QueryValueEx(key, "DeviceID")
devid = QueryValueEx(key, "DeviceGUID")
devnm = socket.gethostname()
If I change the directory to something other than \a\b, it works fine. I have verified that the permissions on these directories are the exact same as directories I can read from.
Also, I can run the following command from cmd and get the output I need:
reg query HKLM\software\a\b /v DeviceGUID
But when I run it from a python script, it says cannot find file specified.
import os
cmd = "reg query HKEY_LOCAL_MACHINE\software\a\b /v DeviceGUID"
a = os.system(cmd)
print a
Running my script as admin or anything doesn't help. For some reason, python is unable to try and ready registry....
First of all you do need to make sure that your backslashes are suitably escaped, or use raw strings as per the first code sample. I'm going to assume that you've done that.
The most likely explanation is that you use 32 bit Python on a 64 bit system. And so are subject to the registry redirector serving up the 32 bit view of the registry.
Either use 64 bit Python, or specifically open they key with a 64 bit view. Do the latter by specifying the KEY_WOW64_64KEY flag.
I am trying to return a zip file in django http response, the code goes something like...
archive = shutil.make_archive('testfolder', 'zip', MEDIA_ROOT, 'testfolder')
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
response['Content-Length'] = getsize(archive)
response['Content-Disposition'] = "attachment; filename=test %s.zip" % datetime.now()
return response
Now when this code is executed on ubuntu the resulting downloaded file opens without any issue, but when its executed on windows the file created does not open in winzip (gives error 'Unsupported Zip Format').
Is there something very obvious I am missing here? Isn't python code supposed to be portable?
EDIT:
Thanks to J.F. Sebastian for his comment...
There was no problem in creating the archive, it was reading it back into the request. So, the solution is to change second line of my code from,
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
to,
response = HttpResponse(FileWrapper(open(archive, 'rb')), # notice extra 'rb'
content_type=mimetypes.guess_type(archive)[0])
checkout, my answer to this question for more details...
The code you have written should work correctly. I've just run the following line from your snippet to generate a zip file and was able to extract on Linux and Windows.
archive = shutil.make_archive('testfolder', 'zip', MEDIA_ROOT, 'testfolder')
There is something funny and specific going on. I recommend you check the following:
Generate the zip file outside of Django with a script that just has that one liner. Then try and extract it on a Windows machine. This will help you rule out anything going on relating to Django, web server or browser
If that works then look at exactly what is in the folder you compressed. Do the files have any funny characters in their names, are there strange file types, or super long filenames.
Run a md5 checksum on the zip file in Windows and Linux just to make absolutely sure that the two files are byte by byte identical. To rule out any file corruption that might have occured.
Thanks to J.F. Sebastian for his comment...
I'll still write the solution here in detail...
There was no problem in creating the archive, it was reading it back into the request. So, the solution is to change second line of my code from,
response = HttpResponse(FileWrapper(open(archive)),
content_type=mimetypes.guess_type(archive)[0])
to,
response = HttpResponse(FileWrapper(open(archive, 'rb')), # notice extra 'rb'
content_type=mimetypes.guess_type(archive)[0])
because apparently, hidden somewhere in python 2.3 documentation on open:
The most commonly-used values of mode are 'r' for reading, 'w' for
writing (truncating the file if it already exists), and 'a' for
appending (which on some Unix systems means that all writes append to
the end of the file regardless of the current seek position). If mode
is omitted, it defaults to 'r'. The default is to use text mode, which
may convert '\n' characters to a platform-specific representation on
writing and back on reading. Thus, when opening a binary file, you
should append 'b' to the mode value to open the file in binary mode,
which will improve portability. (Appending 'b' is useful even on
systems that don’t treat binary and text files differently, where it
serves as documentation.) See below for more possible values of mode.
So, in simple terms while reading binary files, using open(file, 'rb') increases portability of your code (it certainly did in this case)
Now, it extracts without troubles, on windows...
I'm adding functionality onto our website so that users can download files stored in a database. The problem is that I cannot properly specify the filename for the user - the user is instead prompted to save the file with the name of the main python script running the website. I am setting the Content-Disposition information but its not working as expected. I've edited the code down to the following which still fails to work:
import sys, os
import mydatabasemodule
PDFReport = [...read file from database ...]
print('Content-Type: application/octet-stream\n')
print('Content-Disposition: attachment; filename=\"mytest.pdf\"\n')
print(report)
sys.stdout.close()
Running this code prompts the user to download the file as mysite.py. The PDF downloads correctly just with the wrong filename.
Can anyone tell what I'm doing wrong here? In the full version of the code, I also set Content-Description and Content-Length but that also fails. The files are small and I am trying to avoid saving them to disk but even when I do so, the same problem happens.
[edit]
The webserver is running CentOS 5.5, Python 2.4.3, Apache 2.2.3, and mod_python. I've tested this on an Ubuntu 11.04 client using Google Chrome 17.0.963.46 beta and Firefox 13. If I instead try to show the PDF inline:
print('Content-type: application/pdf\n')
print('Content-Disposition: inline; filename=\"mytest.pdf\"\n')
print("Content-Length: %d" % len(report))
then Chrome shows the PDF (with a plugin) and Firefox asks to save the file, recognizing it as a PDF but still with the wrong filename i.e. the filename is still the script name.
[edit]
The solution was given below by Mike. I think the problem was the newline I added in the first line above. Since print adds a newline, this second newline signaled the end of the header so the Content-Disposition line was never read. Thanks to all for the quick help!
In python versions < 3.0 print is not a function and automatically adds a newline char. Try this.
import sys, os
import mydatabasemodule
PDFReport = [...read file from database ...]
print 'Content-Type: application/octet-stream'
print 'Content-Disposition: attachment; filename="mytest.pdf"'
print
sys.stdout.write(PDFReport)
sys.stdout.flush()