What is expected behviour of tarfile.add() when adding archive to itself? - python

The question might sound strange because I know I enforce a strange situation> It came up by accident (a bug one might say) and I even know hot to avoid it, so please skip that part.
I would really like to understand the behaviour I see.
The point of the function is to add all files with a given prefix in a directory to an archive. I noticed that even despite a "bug", the program works correctly (sic!). I wanted to understand why.
The code is fairly simple so I allow myself to post whole function:
def pack(prefix, custom_meta_files = []):
postfix = 'tgz'
if prefix[-1] != '.':
postfix = '.tgz'
archive = tarfile.open(prefix+postfix, "w:gz")
files = filter(lambda path: path.startswith(prefix), os.listdir())
#print('files: {0}'.format(list(files)))
for file in files:
print('packing `{0}`'.format(file))
archive_name = file[len(prefix):] #skip prefix + dot
archive.add(file, archive_name)
not_doubled_metas = set(custom_meta_files) - set(archive.getnames())
print('metas to add: {0}'.format(not_doubled_metas))
for meta in not_doubled_metas:
print('packing `{0}`'.format(meta))
archive.add(meta)
print('contents:{0}'.format(archive.getnames()))
As one can notice I create the archive with the prefix, and then I create a list of files to pack by by listing everything in cwd and filter it via the lambda. Naturally the archive passes the filter. There is also a snippet to add fixed files if the names do not overlap, although it is not important I think.
So the output from such run is e.g:
packing `ga_run.seq_niche.N30.1.bt0_5K.params`
packing `ga_run.seq_niche.N30.1.bt0_5K.stats`
packing `ga_run.seq_niche.N30.1.bt0_5K.tgz`
metas to add: {'stats.meta'}
packing `stats.meta`
contents:['params', 'stats', 'stats.meta']
So the script tried adding itself, however it does not appear in the final contents. I do not know what is the expected behaviour, but there is no warning at all and the documentation does not mention anything. I read the parts about methods to add members and used search for itself and same name.
I would assume it is automatically skipped, but I don't know how to acutally check it. I would personally expect to add a zero length file as member, however I understand skipping as I makes more sense actually.
Question Is it a desired behaviour in tarfile.add() to ignore adding the archive to itself? Where is it said?

Scanning the tarfile.py code from 3.2 to 2.4 they all have code similar to:
# Skip if somebody tries to archive the archive...
if self.name is not None and os.path.abspath(name) == self.name:
self._dbg(2, "tarfile: Skipped %r" % name)
return

Related

Using Python tempfiles permanently

I'm currently abusing tempfile a little bit by using it to generate unique names for permanent files. I'm using the following function to get a unique id:
def gen_id():
tf = tempfile.mktemp()
tfname = os.path.split(tf)[1]
return tfname.strip(tempfile.gettempprefix())
Then I'm storing a file in a custom directory with a filename from that function. I use this function to give me more flexibility than the built-ins; with this function I can choose my own directory and remove the tmp prefix.
Since tempfiles are supposed to be "temporary files," are there any dangers to using their uniqueness for permanent files like this? Any reasons why my function would not be safe for generating unique ids?
EDIT: I got the idea to use tempfile for unique names from this SO answer.
help(tempfile.mktemp) -> "This function is unsafe and should not be used. The file name refers to a file that did not exist at some point, but by the time you get around to creating it, someone else may have beaten you to the punch."
i.e. you could get a filename from this that has the same name as an existing file.
The replacement is tempfile.mkstemp() and it actually does create a file which you normally have to remove after use ... but you can give it a parameter for where your custom directory is and tell it to use no prefix, and let it create the files for you full stop. And it will check for existing files of the same name and make new names until it finds an unused name.
tempfile.mkstemp(suffix="", prefix=template, dir=None, text=False)
(The tempfile module is written in Python, you can see the code for it in \Lib\tempfile.py )
I highly suggest just using this comment from the same answer as the way of generating unique names.
No need to abuse mktemp for that (and it's deprecated anyway).
Keep in mind using mktemp guarantees the file name won't exist during the call but if you deleted all of your temp files and cache, or even immediately after the call, the same file (or in case of mktemp the same path) can be created twice.
Using a random choice has less chance in that case to cause collisions and has no downsides. You should however have a check for the small chance a collision occurs, in which case you should generate a new name.

vim complete() with result from lvim

I am trying to create a function that will pop up a list of file includes the word "Module"(case insensitive).
I tried :lvim /Module/gj *.f90 when all *.f90 is in current dir, but I failed to make a globpath() like expand so that I can include and subdirs.
So, I turned to python. From python, I am getting the list perfectly. I am inserting the python code, which will possibly show my goal:
#!/usr/bin/python
import os
import re
flsts = []
path = "/home/rudra/Devel/dream/"
print("All files==>")
for dirs, subdirs, files in os.walk(path):
for tfile in files:
if tfile.endswith('f90'):
print(os.path.splitext(tfile)[0])
text = open(dirs+'/'+tfile).read()
match = re.search("Module", text)
if match:
flsts.append(os.path.splitext(tfile)[0])
print("The list to be used ==>")
print(flsts)
after having the list, I want a
complete(col('.')), flsts)
The problem is, I am unable to include it inside vim function.
May I kindly have some help, so that I can get a list from vim and use it in the complete function?
I have checked this as a possible solution, but unfortunately it is not.
Kindly help.
edit: More explanation
So, say, in my work-dir, i have:
$tree */*.f90
OLD/dirac.f90
OLD/environment.f90
src/constants.f90
src/gencrystal.f90
src/geninp.f90
src/init.f90
among them, only two has word module in it:
$ grep Module */*.f90
OLD/dirac.f90: 10 :module mdirac
src/constants.f90: 2 :module constants
So, I want, with a inoremap, complete() to pop up only constants and dirac.
Hence, Module is the keyword I am searching in the subdirs of present working directory, and only those file matches (dirac and constants in this example) should pop up in complete()
I'm not sure what your exact problem is.
With split(globpath('./**/Module/**', '*.f90'), '\n') you will obtain the list of all files that match *.f90, and which are somewhere within a directory named Module.
Then, using complete() has a few restrictions. It has to be from a function that will be called from insert mode, and that returns an empty string.
By itself, complete() will insert the selected text, if we play with the {starcol} parameter, we can even remove what's before the cursor. This way, you can type Module, hit the key you want and use Module to filter.
function! s:Complete()
" From lh-vim-lib: word_tools.vim
let key = GetCurrentKeyword()
let files = split(glob('./**/*'.key.'*/**', '*.vim'), '\n')
call complete(col('.')-len(key), files )
return ''
endfunction
inoremap ยต <c-R>=<sid>Complete()<cr>
However, if you want to trigger an action (instead of inserting text), it becomes much more complex. I did that in muTemplate. I've published the framework used to associate hooks to completion items in lh-vim-lib (See lh#icomplete#*() functions).
EDIT: OK, then, I'll work with let files=split(system("grep --include=*.f90 -Ril module *"), '\n') to obtain the list of files, then call complete(col('.'), files) with that list. That should be the more efficient solution. This is somehow quite similar to Ingo's solution. The difference is that we don't need Python if grep is available.
Regarding Python integration, well it's possible with :py vim.command(). See for instance jira-complete that integrates complete() with a Python script that builds the completion-list: https://github.com/mnpk/vim-jira-complete/blob/master/autoload/jira.vim#L116
Notes:
if "module:" can be pre-searched with ctags, it will to possible to extract your files from tags database with taglist().
It's also possible to fill dynamically the list of files with complete_add(), which is something that would make sense from a python script that tests each file one after the other.
There's an example at :help complete() that you can adapt. If you modify your Python script to output just the (newline-separated) files, you can invoke it via system():
inoremap <F5> <C-R>=FindFiles()<CR>
function! FindFiles()
call complete(col('.'), split(system('python path/to/script.py'), '\n'))
return ''
endfunction

"Batch" renaming one file at a time in Python

I would like to perform a sort of "manual" batch operation where Python looks in a directory, sees a list of files, then automatically displays them one at a time and waits for user input before moving on to the next file. I am going to assume the files have relatively random names (and the order in which Python chooses to display them doesn't really matter).
So, I might have pic001.jpg and myCalendar.docx. Is there a way to have Python move through these (in any order) so that I can prepend something to each one manually? For instance, it could look like
Please type a prefix for each of the following:
myCalendar.docx:
and when I typed "2014" the file would become 2014_myCalendar.docx. Python would then go on to say
Please type a prefix for each of the following:
myCalendar.docx: 2014
... myCalendar.docx renamed to 2014_myCalendar.docx
pic001.jpg:
then I could make it disneyland_pic001.jpg.
I know how to rename files, navigate directories, etc. I'm just not sure how to get Python to cycle through every file in a certain directory, one at a time, and let me modify each one. I think this would be really easy to do with a for loop if each of the files was numbered, but for what I'm trying to do, I can't assume that they will be.
Thank you in advance.
Additionally, if you could point me to some tutorials or documentation that might help me with this, I'd appreciate that as well. I've got http://docs.python.org open in a few tabs, but as someone who's relatively new to Python, and programming in general, I find their language to be a little over my head sometimes.
Something like this (untested):
DIR = '/Volumes/foobar'
prefix = raw_input('Please type a prefix for each of the following: ')
for f in os.listdir(DIR):
path = os.path.join(DIR, f)
new_path = os.path.join(DIR, '%s%s' % (prefix, f))
try:
os.rename(path, new_path)
print 'renamed', f
except:
raise

Python File Creation Date & Rename - Request for Critique

Scenario: When I photograph an object, I take multiple images, from several angles. Multiplied by the number of objects I "shoot", I can generate a large number of images. Problem: Camera generates images identified as, 'DSCN100001', 'DSCN100002", etc. Cryptic.
I put together a script that will prompt for directory specification (Windows), as well as a "Prefix". The script reads the file's creation date and time, and rename the file accordingly. The prefix will be added to the front of the file name. So, 'DSCN100002.jpg' can become "FatMonkey 20110721 17:51:02". The time detail is important to me for chronology.
The script follows. Please tell me whether it is Pythonic, whether or not it is poorly written and, of course, whether there is a cleaner - more efficient way of doing this. Thank you.
import os
import datetime
target = raw_input('Enter full directory path: ')
prefix = raw_input('Enter prefix: ')
os.chdir(target)
allfiles = os.listdir(target)
for filename in allfiles:
t = os.path.getmtime(filename)
v = datetime.datetime.fromtimestamp(t)
x = v.strftime('%Y%m%d-%H%M%S')
os.rename(filename, prefix + x +".jpg")
The way you're doing it looks Pythonic. A few alternatives (not necessarily suggestions):
You could skip os.chdir(target) and do os.path.join(target, filename) in the loop.
You could do strftime('{0}-%Y-%m-%d-%H:%M:%S.jpg'.format(prefix)) to avoid string concatenation. This is the only one I'd reccomend.
You could reuse a variable name like temp_date instead of t, v, and x. This would be OK.
You could skip storing temporary variables and just do:
for filename in os.listdir(target):
os.rename(filename, datetime.fromtimestamp(
os.path.getmtime(filename)).strftime(
'{0}-%Y-%m-%d-%H:%M:%S.jpeg'.format(prefix)))
You could generalize your function to work for recursive directories by using os.walk().
You could detect the file extension of files so it would be correct not just for .jpegs.
You could make sure you only renamed files of the form DSCN1#####.jpeg
Your code is nice and simple. Few possible improvements I can suggest:
Command line arguments is more preferable for dir names because of autocomplition by TAB
EXIF is more accurate source of date and time of photo creating. If you modify photo in image editor, modify time will be changed while EXIF information will be preserved. Here is discussion about EXIF library for Python: Exif manipulation library for python
My only thought is that if you are going to have the computer do the work for you, let it do more of the work. My assumption is that you are going to shoot one object several times, then either move to another object or move another object into place. If so, you could consider grouping the photos by how close the timestamps are together (maybe any delta over 2 minutes is considered a new object). Then based on these pseudo clusters, you could name the photos by object.
May not be what you are looking for, but thought I'd add in the suggestion.

Detecting case mismatch on filename in Windows (preferably using python)?

I have some xml-configuration files that we create in a Windows environment but is deployed on Linux. These configuration files reference each other with filepaths. We've had problems with case-sensitivity and trailing spaces before, and I'd like to write a script that checks for these problems. We have Cygwin if that helps.
Example:
Let's say I have a reference to the file foo/bar/baz.xml, I'd do this
<someTag fileref="foo/bar/baz.xml" />
Now if we by mistake do this:
<someTag fileref="fOo/baR/baz.Xml " />
It will still work on Windows, but it will fail on Linux.
What I want to do is detect these cases where the file reference in these files don't match the real file with respect to case sensitivity.
os.listdir on a directory, in all case-preserving filesystems (including those on Windows), returns the actual case for the filenames in the directory you're listing.
So you need to do this check at each level of the path:
def onelevelok(parent, thislevel):
for fn in os.listdir(parent):
if fn.lower() == thislevel.lower():
return fn == thislevel
raise ValueError('No %r in dir %r!' % (
thislevel, parent))
where I'm assuming that the complete absence of any case variation of a name is a different kind of error, and using an exception for that; and, for the whole path (assuming no drive letters or UNC that wouldn't translate to Windows anyway):
def allpathok(path):
levels = os.path.split(path)
if os.path.isabs(path):
top = ['/']
else:
top = ['.']
return all(onelevelok(p, t)
for p, t in zip(top+levels, levels))
You may need to adapt this if , e.g., foo/bar is not to be taken to mean that foo is in the current directory, but somewhere else; or, of course, if UNC or drive letters are in fact needed (but as I mentioned translating them to Linux is not trivial anyway;-).
Implementation notes: I'm taking advantage of the fact that zip just drop "extra entries" beyond the length of the shortest of the sequences it's zipping; so I don't need to explicitly slice off the "leaf" (last entry) from levels in the first argument, zip does it for me. all will short circuit where it can, returning False as soon as it detects a false value, so it's just as good as an explicit loop but faster and more concise.
it's hard to judge what exactly your problem is, but if you apply os.path.normcase along with str.stript before saving your file name, it should solve all your problems.
as I said in comment, it's not clear how are you ending up with such a mistake. However, it would be trivial to check for existing file, as long as you have some sensible convention (all file names are lower case, for example):
try:
open(fname)
except IOError:
open(fname.lower())

Categories