I want to create empty file using Python script in Unix environment. Could see different ways mentioned of achieving the same. What are the benefits/pitfalls of one over the other.
os.system('touch abc')
open('abc','a').close()
open('abc','a')
subprocess.call(['touch','abc'])
Well, for a start, the ones that rely on touch are not portable. They won't work under standard Windows, for example, without the installation of CygWin, GNUWin32, or some other package providing a touch utility..
They also involve the creation of a separate process for doing the work, something that's totally unnecessary in this case.
Of the four, I would probably use open('abc','a').close() if the intent is to try and just create the file if it doesn't exist. In my opinion, that makes the intent clear.
But, if you're trying to create an empty file, I'd probably be using the w write mode rather than the a append mode.
In addition, you probably also want to catch the exception if, for example, you cannot actually create the file.
TLDR: use
open('abc','a').close()
(or 'w' instead of 'a' if the intent is to truncate the file if it already exists).
Invoking a separate process to do something Python can do itself is wasteful, and non-portable to platforms where the external command is not available. (Additionally, os.system uses two processes -- one more for a shell to parse the command line -- and is being deprecated in favor of subprocess.)
Not closing an open filehandle when you're done with it is bad practice, and could cause resource depletion in a larger program (you run out of filehandles if you open more and more files and never close them).
To create an empty file on Unix in Python:
import os
try:
os.close(os.open('abc', os.O_WRONLY | os.O_CREAT | os.O_EXCL |
getattr(os, "O_CLOEXEC", 0) |
os.O_NONBLOCK | os.O_NOCTTY))
except OSError:
pass # decide what to consider an error in your case and reraise
# 1. is it an error if 'abc' entry already exists?
# 2. is it an error if 'abc' is a directory or a symlink to a directory?
# 3. is it an error if 'abc' is a named pipe?
# 4. it is probably an error if the parent directory is not writable
# or the filesystem is read-only (can't create a file)
Or more portable variant:
try:
open('abc', 'ab', 0).close()
except OSError:
pass # see the comment above
Without the explicit .close() call, non-reference-counting Python implementations such as Pypy, Jython may delay closing the file until garbage collection is run (it may exhaust available file descriptors for your process).
The latter example may stuck on FIFO and follows symlinks. On my system, it is equivalent to:
from os import *
open("abc", O_WRONLY|O_CREAT|O_APPEND|O_CLOEXEC, 0666)
In addition, touch command updates the access and modification times of existing files to the current time.
In more recent Python 3 variants, we have Path.touch() from pathlib. This will create an empty file if it doesn't exist, and update the mtime if it does, in the same way as your example os.system('touch abc'), but it's much more portable:
from pathlib import Path
abc = Path('abc')
abc.touch()
Related
I try to make my code modular because it's too long, the problem is I don't know whether i'm doing it safely. I segmented my code into different files, so 1 python file runs the others, sometimes I have to call 1 file that will run another file that will run another file, so multiple chained commands.
The issue is that some of the files will process sensitive information like passwords, so I don't know whether I do it safely. Ideally after 1 file is executed, it should close itself, and delete all variables from it's memory, and free that space, like it normally would as if I were to just execute 1 file, the problem is that I don't know whether if I call multiple files nested into one another, this applies. Obviously only the file that is executed should clear itself, not the one that is active, but I don't know if this is the case.
I have been calling my modules like this
os.system('python3 ' + filename)
And in each file subsequently the same code calling another file with os.system, forming a nested or chained call system.
For example if I call the first file from shell:
python3 file1.py
and then file1 calls:
os.system('python3 file2.py')
and then file2 calls:
os.system('python3 file3.py')
I would want file3 cleaned from the memory and closed entirely after it runs, whereas file2 and file1 might still be active. I don't want file3 to be still inside the memory after it executed itself. So if file3 works with passwords, it should obviously clean them from the memory after it runs.
How to do this?
I have read about multiple options:
from subprocess import call
call(["python3", "file2.py"])
import subprocess
subprocess.call("file2.py", shell=True)
execfile('file2.py')
import subprocess
subprocess.Popen("file2.py", shell=True)
Which one is safer?
Python is heavily relying on the notion of importation. You should not try to reinvent the wheel on this one. Just import your scripts from the main script and use functions to trigger them. If you want to be sure variables are discarded you should include a del statement at the end of the functions or as soon as the variable is no longer in use.
On another hand, your problem with password is flawed from the start. If a .py file contains a password in plain text it's not, it will never be, in no scenario, secured. You should implement a secret : see this topic : I need to securely store a username and password in Python, what are my options?
I'm writing a program, which, inter alia, works with temporary file, created using tempfile library.
The temporary file creates and fills in function:
def func():
mod_script = tempfile.NamedTemporaryFile(dir='special')
dest = open(mod_script, 'w')
# filling dest
return mod_script
(I use open() and not with open() because I execute the temporary file after calling func())
After some operations with mod_script outside func(), I call mod_script.close(). And all works fine.
But I have one problem. If my program fails (or if I interrupt it), the temporary file doesn't remove.
How do I fix it ?
I really don't want to write try...except...finally clauses because I'll have to write it so many times (there are many points, where my program can fail).
First, use a with statement, and pass delete=False to the constructor.
Then you need to put the necessary error handling in your program. Catch exceptions (see try..finally) and clean up during program exit whether it is successful or crashes.
Alternatively, keep the file open while executing it to prevent the automatic deletion-on-close from deleting it before you have executed it. This may have issues on Windows where it tends to have conflicts using files that are open.
I'm trying to create a script in Python to back up some files. But, these files could be renamed or deleted at any time. I don't want my script to prevent that by locking the file; the file should be able to still be deleted at any time during the backup.
How can I do this in Python? And, what happens? Do my objects just become null if the stream cannot be read?
Thank you! I'm somewhat new to Python.
As mentioned by #kindall, this is a Windows-specific issue. Unix OSes allow deleting.
To do this in Windows, I needed to use win32file.CreateFile() to use the Windows-specific dwSharingMode flag (in Python's pywin32, it's just called shareMode).
Rough Example:
import msvcrt
import os
import win32file
py_handle = win32file.CreateFile(
'filename.txt',
win32file.GENERIC_READ,
win32file.FILE_SHARE_DELETE
| win32file.FILE_SHARE_READ
| win32file.FILE_SHARE_WRITE,
None,
win32file.OPEN_EXISTING,
win32file.FILE_ATTRIBUTE_NORMAL,
None
)
try:
with os.fdopen(
msvcrt.open_osfhandle(py_handle.handle, os.O_RDONLY)
) as file_descriptor:
... # read from `file_descriptor`
finally:
py_handle.Close()
Note: if you need to keep the win32-file open beyond the lifetime of the file-handle object returned, you should invoke PyHandle.detach() on that handle.
On UNIX-like OSs, including Linux, this isn't an issue. Well, some other program could write to the file at the same time you're reading it, which could cause problems (the file you are copying could end up corrupted) but this is solvable with a verification pass.
On Windows, use Volume Snapshot Service (aka Volume Shadow Copy). VSS creates a snapshot of the volume at a moment in time, and you can open files on the snapshot without locking the files on the original volume. A quick Google found a Python module for doing copies using VSS here: http://sourceforge.net/projects/pyvss/
In Python, and in general - does a close() operation on a file object imply a flush() operation?
Yes. It uses the underlying close() function which does that for you (source).
NB: close() and flush() won't ensure that the data is actually secure on the disk. It just ensures that the OS has the data == that it isn't buffered inside the process.
You can try sync or fsync to get the data written to the disk.
Yes, in Python 3 this is finally in the official documentation, but is was already the case in Python 2 (see Martin's answer).
As a complement to this question, yes python flushes before close, however if you want to ensure data is written properly to disk this is not enough.
This is how I would write a file in a way that it's atomically updated on a UNIX/Linux server, whenever the target file exists or not. Note that some filesystem will implicitly commit data to disk on close+rename (ext3 with data=ordered (default), and ext4 initially uncovered many application flaws before adding detection of write-close-rename patterns and sync data before metadata on those[1]).
# Write destfile, using a temporary name .<name>_XXXXXXXX
base, name = os.path.split(destfile)
tmpname = os.path.join(base, '.{}_'.format(name)) # This is the tmpfile prefix
with tempfile.NamedTemporaryFile('w', prefix=tmpname, delete=False) as fd:
# Replace prefix with actual file path/name
tmpname = str(fd.name)
try:
# Write fd here... ex:
json.dumps({}, fd)
# We want to fdatasync before closing, so we need to flush before close anyway
fd.flush()
os.fdatasync(fd)
# Since we're using tmpfile, we need to also set the proper permissions
if os.path.exists(destfile):
# Copy destination file's mask
os.fchmod(fd.fileno, os.stat(destfile).st_mode)
else:
# Set mask based on current umask value
umask = os.umask(0o22)
os.umask(umask)
os.fchmod(fd.fileno, 0o666 & ~umask) # 0o777 for dirs and executable files
# Now we can close and rename the file (overwriting any existing one)
fd.close()
os.rename(tmpname, destfile)
except:
# On error, try to cleanup the temporary file
try:
os.unlink(tmpname)
except OSError:
pass
raise
IMHO it would have been nice if Python provided simple methods around this... At the same time I guess if you care about data consistency it's probably best to really understand what is going on at a low level, especially since there are many differences across various Operating Systems and Filesystems.
Also note that this does not guarantee the written data can be recovered, only that you will get a consistent copy of the data (old or new). To ensure the new data is safely written and accessible when returning, you need to use os.fsync(...) after the rename, and even then if you have unsafe caches in the write path you could still lose data. this is common on consumer-grade hardware although any system can be configured for unsafe writes which boosts performance too. At least even with unsafe caches, the method above should still guarantee whichever copy of the data you get is valid.
filehandle.close does not necessarily flush. Surprisingly, filehandle.flush doesn't help either---it still can get stuck in the OS buffers when Python is running. Observe this session where I wrote to a file, closed it and Ctrl-Z to the shell command prompt and examined the file:
$ cat xyz
ghi
$ fg
python
>>> x=open("xyz","a")
>>> x.write("morestuff\n")
>>> x.write("morestuff\n")
>>> x.write("morestuff\n")
>>> x.flush
<built-in method flush of file object at 0x7f58e0044660>
>>> x.close
<built-in method close of file object at 0x7f58e0044660>
>>>
[1]+ Stopped python
$ cat xyz
ghi
Subsequently I can reopen the file, and that necessarily syncs the file (because, in this case, I open it in the append mode). As the others have said, the sync syscall (available from the os package) should flush all buffers to disk but it has possible system-wide performance implications (it syncs all files on the system).
I need to create a folder that I use only once, but need to have it exist until the next run. It seems like I should be using the tmp_file module in the standard library, but I'm not sure how to get the behavior that I want.
Currently, I'm doing the following to create the directory:
randName = "temp" + str(random.randint(1000, 9999))
os.makedirs(randName)
And when I want to delete the directory, I just look for a directory with "temp" in it.
This seems like a dirty hack, but I'm not sure of a better way at the moment.
Incidentally, the reason that I need the folder around is that I start a process that uses the folder with the following:
subprocess.Popen([command], shell=True).pid
and then quit my script to let the other process finish the work.
Creating the folder with a 4-digit random number is insecure, and you also need to worry about collisions with other instances of your program.
A much better way is to create the folder using tempfile.mkdtemp, which does exactly what you want (i.e. the folder is not deleted when your script exits). You would then pass the folder name to the second Popen'ed script as an argument, and it would be responsible for deleting it.
What you've suggested is dangerous. You may have race conditions if anyone else is trying to create those directories -- including other instances of your application. Also, deleting anything containing "temp" may result in deleting more than you intended. As others have mentioned, tempfile.mkdtemp is probably the safest way to go. Here is an example of what you've described, including launching a subprocess to use the new directory.
import tempfile
import shutil
import subprocess
d = tempfile.mkdtemp(prefix='tmp')
try:
subprocess.check_call(['/bin/echo', 'Directory:', d])
finally:
shutil.rmtree(d)
"I need to create a folder that I use only once, but need to have it exist until the next run."
"Incidentally, the reason that I need the folder around is that I start a process ..."
Not incidental, at all. Crucial.
It appears you have the following design pattern.
mkdir someDirectory
proc1 -o someDirectory # Write to the directory
proc2 -i someDirectory # Read from the directory
if [ %? == 0 ]
then
rm someDirectory
fi
Is that the kind of thing you'd write at the shell level?
If so, consider breaking your Python application into to several parts.
The parts that do the real work ("proc1" and "proc2")
A Shell which manages the resources and processes; essentially a Python replacement for a bash script.
A temporary file is something that lasts for a single program run.
What you need is not, therefore, a temporary file.
Also, beware of multiple users on a single machine - just deleting anything with the 'temp' pattern could be anti-social, doubly so if the directory is not located securely out of the way.
Also, remember that on some machines, the /tmp file system is rebuilt when the machine reboots.
You can also automatically register an function to completely remove the temporary directory on any exit (with or without error) by doing :
import atexit
import shutil
import tempfile
# create your temporary directory
d = tempfile.mkdtemp()
# suppress it when python will be closed
atexit.register(lambda: shutil.rmtree(d))
# do your stuff...
subprocess.Popen([command], shell=True).pid
tempfile is just fine, but to be on a safe side you'd need to safe a directory name somewhere until the next run, for example pickle it. then read it in the next run and delete directory. and you are not required to have /tmp for the root, tempfile.mkdtemp has an optional dir parameter for that. by and large, though, it won't be different from what you're doing at the moment.
The best way of creating the temporary file name is either using tempName.TemporaryFile(mode='w+b', suffix='.tmp', prifix='someRandomNumber' dir=None)
or u can use mktemp() function.
The mktemp() function will not actually create any file, but will provide a unique filename (actually does not contain PID).