I have an application written in Python that's writing large amounts of data to the %TEMP% folder. Oddly, every once and awhile, it dies, returning IOError: [Errno 28] No space left on device. The drive has plenty of free space, %TEMP% is not its own partition, I'm an administrator, and the system has no quotas.
Does Windows artificially put some types of limits on the data in %TEMP%? If not, any ideas on what could be causing this issue?
EDIT: Following discussions below, I clarified the question to better explain what's going on.
What is the exact error you encounter?
Are you creating too many temp files?
The GetTempFileName method will raise
an IOException if it is used to
create more than 65535 files without
deleting previous temporary files.
The GetTempFileName method will raise
an IOException if no unique temporary
file name is available. To resolve
this error, delete all unneeded
temporary files.
One thing to note is that if you're indirectly using the Win32 API, and you're only using it to get temp file names, note that while (indirectly) calling it:
Creates a uniquely named, zero-byte
temporary file on disk and returns the
full path of that file.
If you're using that path but also changing the value returned, be aware you might actually be creating a 0byte file and an additional file on top of that (e.g. My_App_tmpXXXX.tmp and tmpXXXX.tmp).
As Nestor suggested below, consider deleting your temp files after you're done using them.
Using a FAT32 filesystem I can imagine this happening when:
Writing a lot of data to one file, and you reach the 4GB file size cap.
Or when you are creating a lot of small files and reaching the 2^16-2 files per directory cap.
Apart from this, I don't know of any limitations the system can impose on the temp folder, apart from the phyiscal partition actually being full.
Another limitation is as Mike Atlas has suggested the GetTempFileName() function which creates files of type tmpXXXX.tmp. Although you might not be using it directly, verify that the %TEMP% folder does not contain too many of them (2^16).
And maybe the obvious, have you tried emptying the %TEMP% folder before running the utility?
There shouldn't be such space limitation in Temp. If you wrote the app, I would recommend creating your files in ProgramData...
There should be no trouble whatsoever with regard to your %TEMP% directory.
What is your disk quota set to for %TEMP%'s hosting volume? Depending in part on what the apps themselves are doing, one of them may be throwing an error due to the disk quota being reached, which is a pain if this quota is set unreasonably high. If the quota is very high, try lowering it, which you can do as Administrator.
Related
In python I am creating a tempdir (using tempfile.TemporaryDirectory), writing a few text files within it, and then (after processing the files) calling tempdir.cleanup(). The program works, but I'm wondering if there are any dangers to this that aren't immediately visible, especially when working with a large number of files.
Is it a bad practice? It certainly is. Is it dangerous? Depending on the name, it might get overwritten by another application. Or it might not.
The entire temp folder might be wiped between reboots (linux does this, not sure about windows). Or it might not. The user might delete files in the temp folder to free up space. Or they might not.
Generally, it's just bad practice and don't do it. If your file is non-temporary, then don't put it in the temp folder.
I run several processes in Python (using multiprocessing.Process) on an Ubuntu machine.
Each of the processes writes various temporary files. Each process writes different files, but all files are in the same folder.
Is there any potential risk of error here?
The reason I think there might be a problem is that, AFAIK, a folder in Unix is just a file. So it's jsut like several processes writing to the same file at the same time, which might cause a loss of information.
Is this really a potential risk here? If so, how to solve it?
This has absolutely nothing to do with Python, as file operations in Python use OS level system calls (unless run as root, your Python program would not have permissions to do raw device writes anyway and doing them as root would be incredibly stupid).
A little bit of file system theory if anyone cares to read:
Yes, if you study file system architecture and how data is actually stored on drives, there are similarities between files and directories - but only on data storage level. The reason being there is no need to separate these two. For example ext4 file system has a method of storing information about a file (metadata), stored in small units called inodes, and the actual file itself. Inode contains a pointer to the actual disk space where file data can be found.
File systems generally are rather agnostic to directories. A file system is basically just this: it contains information about free disk space, information about files with pointers to data, and the actual data. Part of metadata is the directory where the file resides. In modern file systems (ancient FAT is the exception that is still in use) data storage on disk is not related to directories. Directories are used to allow both humans and the computer implementing the file system locate files and folders quickly instead of walking through sequentially the list of inodes until the correct file is found.
You may have read that directories are just files. Yes, they are "files" that contain either a list of files in it (or actually a tree but please do not confuse this with a directory tree - it is just a mechanism of storing information about large directories so that files in that directory do not need to be searched sequentially within the directory entry). The reason this is a file is that it is the mechanism how file systems store data. There is no need to have a specific data storage mechanism, as a directory only contains a list of files and pointers to their inodes. You could think of it as a database or even simpler, a text file. But in the end it is just a file that contains pointers, not something that is allocated on the disk surface to contain the actual files stored in the directory.
That was the background.
The file system implementation on your computer is just a piece of software that knows how to deal with all this. When you open a file in a certain directory for writing, something like this usually happens:
A free inode is located and an entry created there
Free clusters / blocks database is queried to find storage space for the file contents
File data is stored and blocks/clusters are marked "in use" in that database
Inode is updated to contain file metadata and a
pointer to this disk space
"File" containing the directory data of
the target directory is located
This file is modified so that one
record is added. This record has a pointer to the inode just
created, and the file name as well
Inode of the file is updated to
contain a link to the directory, too.
It is the job of operating system and file system driver within it to ensure all this happens consistently. In practice it means the file system driver queues operations. Writing several files into the same directory simultaneously is a routine operation - for example web browser cache directories get updated this way when you browse the internet. Under the hood the file system driver queues these operations and completes steps 1-7 for each new file before it starts processing the following operation.
To make it a bit more complex there is a journal acting as an intermediate buffer. Your transactions are written to the journal, and when the file system is idle, the file system driver commits the journal transactions to the actual storage space, but theory remains the same. This is a performance and reliability issue.
You do not need to worry about this on application level, as it is the job of the operating system to do all that.
In contrast, if you create a lot of randomly named files in the same directory, in theory there could be a conflict at some point if your random name generator produced two identical file names. There are ways to mitigate this, and this would be the part you need to worry about in your application. But anything deeper than that is the task of the operating system.
On linux, opening a file (with or without the O_CREAT flag set) is an atomic operation (see for example this list). In a nutshell, as long as your processes use different files, you should have no trouble at all.
Just for you information appending to a file (up to a certain byte limit) is atomic as well. This article is interesting in this regard.
Writing to different files in the same folder won't cause a problem. Sure, a folder is a file in Linux but you open the file for writing not the folder.
On the other hand wiritng to the same file with multiple processes can, depending on your log size, cause issues. See this question for more details: Does python logging support multiprocessing?
I'm currently writing a little program to "reset" hard drives. In this program the user should be able to choose if he wants to have everything deleted completely or just a part of it, e.g a special folder.
Since I want to provide anonymousity to all pre-owners, I want to completely delete the folder or the drive, essentially I want to format a single folder.
The problem is, that with file recovery tools it is very easy to restore deleted files, since they are mostly not erased but just thrown out of the file system. How can I set all bytes that were taken by the folder and the files in it to Zero, or at least make them inrepairable?
I'm using python 2.7 and Debian
I found exactly a solution! Perfect!
https://manpages.debian.org/stretch/manpages-de/shred.1.de.html
You can shred files and directorys and whole partitions with it and it is for DEBIAN!!! You can set all bytes to zero if you want and many more options! Great command for such jobs!
In python 2.7, is it possible (and how) to in a single atomic (race free) operation:
Open a file
If it doesn't exists, create then open it.
Acquire a exclusive lock on the file (no other process can open or delete the file)
Context: I have a single python program that will fetch files given a list of URL/md5; If a file of the list exists and it's md5 matches, it gets skipped. If not, it will be downloaded. Now, there may be multiples instances of this program processing different lists which may overlap.
This question is almost what I need to do, but in my case I need to lock the file either way to check it's md5, while preventing others from doing so as well. Also, I need not to know whether the file existed or not prior to the operation; If it is just created the file will be empty and it's md5 won't match, so it will be downloaded anyway.
I'm using this program on Linux specifically, but cross-platform solutions are welcome.
EDIT:
In the end I've solved my issue by:
opening the file in a+b mode (not atomic, create if doesn't exists).
Try to lock the file exclusively (advisory):
If succeed, work on file.
If failed, assume someone else is working on the file and skips to the next. After no more files to process, come back to check if whoever locked the file did the job right.
As it stands, the desired operation is not supported in a single atomic step, but not needed neither.
No, it is not possible as a basic operation supported by Linux/UNIX.
The O_CREAT|O_EXCL technique in the answer you referenced can work here. Instead of exclusively creating the target file, you exclusively create a lockfile whose name is predictably derived from the target file. E.g., os.path.join("/tmp", hashlib.md5(target_filename).hexdigest() + ".lock").
However, as others have suggested, it's not clear that you need to protect both the target file creation and its checksumming + possible replacement. An fcntl advisory lock will suit your needs.
It is not possible, at least accord to this comprehensive report:
mv -T <oldsymlink> <newsymlink> atomically changes the target of <newsymlink> to the directory pointed to by <oldsymlink> and is
indispensable when deploying new code. Updated 2010-01-06: both
operands are symlinks. (So this isn’t a system call, it’s still
useful.) A reader pointed out that ln -Tfs <directory> <symlink>
accomplishes the same thing without the second symlink. Added
2010-01-06. Deleted 2010-01-06: strace(1) shows that ln -Tfs <directory> <symlink> actually calls symlink(2), unlink(2), and
symlink(2) once more, disqualifying it from this page. mv -T <oldsymlink> <newsymlink> ends up calling rename(2) which can
atomically replace <newsymlink>. Caveat 2013-01-07: this does not
apply to Mac OS X, whose mv(1) doesn’t call rename(2). mv(1).
link(oldpath, newpath) creates a new hard link called newpath pointing to the same inode as oldpath and increases the link count by
one. This will fail with the error code EEXIST if newpath already
exists, making this a useful mechanism for locking a file amongst
threads or processes that can all agree upon the name newpath. I
prefer this technique for whole-file locking because the lock is
visible to ls(1). link(2).
symlink(oldpath, newpath) operates very much like link(2) but creates a symbolic link at a new inode rather than a hard link to the
same inode. Symbolic links can point to directories, which hard links
cannot, making them a perfect analogy to link(2) when locking entire
directories. This will fail with the error code EEXIST if newpath
already exists, making this a perfect analogy to link(2) that works
for directories, too. Be careful of symbolic links whose target inode
has been removed ("dangling" symbolic links) — open(2) will fail with
the error code ENOENT. It should be mentioned that inodes are a
finite resource (this particular machine has 1,245,184 inodes).
symlink(2). Added 2010-01-07
rename(oldpath, newpath) can change a pathname atomically, provided oldpath and newpath are on the same filesystem. This will
fail with the error code ENOENT if oldpath does not exist, enabling
interprocess locking much like link(oldpath, newpath) above. I find
this technique more natural when the files in question will be
unlinked later. rename(2).
open(pathname, O_CREAT | O_EXCL, 0644) creates and opens a new file. (Don’t forget to set the mode in the third argument!) O_EXCL
instructs this to fail with the error code EEXIST if pathname exists.
This is a useful way to decide which process should handle a task:
whoever successfully creates the file. open(2).
mkdir(dirname, 0755) creates a new directory but fails with the error code EEXIST if dirname exists. This provides for directories
the same mechanism link(2) open(2) with O_EXCL provides for files.
mkdir(2). Added 2010-01-06; edited 2013-01-07.
As you see, open() can be used atomically only to create new files, not open existing ones for reading. If you want to use this approach though, you might want to use Python's os.open() which is a proxy for this syscall (not to confuse with built-in open()).
You might also consider using databases for this task, since they should offer much more reliability (for example, what if your files are hosted on NFS, which implements no locking at all and IIRC the only atomic operation there is mkdir()?).
I am wondering if it is possible to compile a list of deleted files on a windows file system, FAT or NTFS. I do not need to actually recover the files, only have access to their name and any other accessible time (time deleted, created etc).
Even if I can run a cmd line tool to achieve this it would be acceptable.
The application is being developed in Python, however if another language has the capability I could always create a small component implemented in that language.
Thanks.
This is a very complex task. I woudl look at open-source forensic tools.
You also should analyze the recylcing bin ( not completly deleted file )
For FAT you will not be able to get the first character of a deleted file.
For some deleted files the metadata will be gone.
NTFS is much more complex and time consuming due to the more complex nature of this file system.