I'm currently abusing tempfile a little bit by using it to generate unique names for permanent files. I'm using the following function to get a unique id:
def gen_id():
tf = tempfile.mktemp()
tfname = os.path.split(tf)[1]
return tfname.strip(tempfile.gettempprefix())
Then I'm storing a file in a custom directory with a filename from that function. I use this function to give me more flexibility than the built-ins; with this function I can choose my own directory and remove the tmp prefix.
Since tempfiles are supposed to be "temporary files," are there any dangers to using their uniqueness for permanent files like this? Any reasons why my function would not be safe for generating unique ids?
EDIT: I got the idea to use tempfile for unique names from this SO answer.
help(tempfile.mktemp) -> "This function is unsafe and should not be used. The file name refers to a file that did not exist at some point, but by the time you get around to creating it, someone else may have beaten you to the punch."
i.e. you could get a filename from this that has the same name as an existing file.
The replacement is tempfile.mkstemp() and it actually does create a file which you normally have to remove after use ... but you can give it a parameter for where your custom directory is and tell it to use no prefix, and let it create the files for you full stop. And it will check for existing files of the same name and make new names until it finds an unused name.
tempfile.mkstemp(suffix="", prefix=template, dir=None, text=False)
(The tempfile module is written in Python, you can see the code for it in \Lib\tempfile.py )
I highly suggest just using this comment from the same answer as the way of generating unique names.
No need to abuse mktemp for that (and it's deprecated anyway).
Keep in mind using mktemp guarantees the file name won't exist during the call but if you deleted all of your temp files and cache, or even immediately after the call, the same file (or in case of mktemp the same path) can be created twice.
Using a random choice has less chance in that case to cause collisions and has no downsides. You should however have a check for the small chance a collision occurs, in which case you should generate a new name.
Related
I'm writing several related python programs that need to access the same file however, this file will be updated/replaced intermittently and I need them all to access the new file. My current idea is to have a specific folder where the latest file is placed whenever it needs to be replaced and was curious how I could have python select whatever text file is in the folder.
Or, would I be better off creating a program that has a Class entirely dedicated to holding the information of the file and have each program reference the file in that class. I could have the Class use tkinter.filedialog to select a new file whenever necessary and perhaps have a text file that has the path or name to the file that I need to access and have the other programs reference that.
Edit: I don't need to write to the file at all just read from it. However, I would like to have it so that I do not need to manually update the path to the file every time I run the program or update the file path.
Edit2: Changed title to suit the question more
If the requirement is to get the most recently modified file in a specific directory:
import os
mypath = r'C:\path\to\wherever'
myfiles = [(f,os.stat(os.path.join(mypath,f)).st_mtime) for f in os.listdir(mypath)]
mysortedfiles = sorted(myfiles,key=lambda x: x[1],reverse=True)
print('Most recently updated: %s'%mysortedfiles[0][0])
Basically, get a list of files in the directory, together with their modified time as a list of tuples, sort on modified date, then get the one you want.
It sounds like you're looking for a singleton pattern, which is a neat way of hiding a lot of logic into an 'only one instance' object.
This means the logic for identifying, retrieving, and delivering the file is all in one place, and your programs interact with it by saying 'give me the one instance of that thing'. If you need to alter how it identifies, retrieves, or delivers what that one thing is, you can keep that hidden.
It's worth noting that the singleton pattern can be considered an antipattern as it's a form of global state, it depends on the context of the program if this is a deal breaker or not.
To "have python select whatever text file is in the folder", you could use the glob library to get a list of file(s) in the directory, see: https://docs.python.org/2/library/glob.html
You can also use os.listdir() to list all of the files in a directory, without matching pattern names.
Then, open() and read() whatever file or files you find in that directory.
I am trying to create a function that will pop up a list of file includes the word "Module"(case insensitive).
I tried :lvim /Module/gj *.f90 when all *.f90 is in current dir, but I failed to make a globpath() like expand so that I can include and subdirs.
So, I turned to python. From python, I am getting the list perfectly. I am inserting the python code, which will possibly show my goal:
#!/usr/bin/python
import os
import re
flsts = []
path = "/home/rudra/Devel/dream/"
print("All files==>")
for dirs, subdirs, files in os.walk(path):
for tfile in files:
if tfile.endswith('f90'):
print(os.path.splitext(tfile)[0])
text = open(dirs+'/'+tfile).read()
match = re.search("Module", text)
if match:
flsts.append(os.path.splitext(tfile)[0])
print("The list to be used ==>")
print(flsts)
after having the list, I want a
complete(col('.')), flsts)
The problem is, I am unable to include it inside vim function.
May I kindly have some help, so that I can get a list from vim and use it in the complete function?
I have checked this as a possible solution, but unfortunately it is not.
Kindly help.
edit: More explanation
So, say, in my work-dir, i have:
$tree */*.f90
OLD/dirac.f90
OLD/environment.f90
src/constants.f90
src/gencrystal.f90
src/geninp.f90
src/init.f90
among them, only two has word module in it:
$ grep Module */*.f90
OLD/dirac.f90: 10 :module mdirac
src/constants.f90: 2 :module constants
So, I want, with a inoremap, complete() to pop up only constants and dirac.
Hence, Module is the keyword I am searching in the subdirs of present working directory, and only those file matches (dirac and constants in this example) should pop up in complete()
I'm not sure what your exact problem is.
With split(globpath('./**/Module/**', '*.f90'), '\n') you will obtain the list of all files that match *.f90, and which are somewhere within a directory named Module.
Then, using complete() has a few restrictions. It has to be from a function that will be called from insert mode, and that returns an empty string.
By itself, complete() will insert the selected text, if we play with the {starcol} parameter, we can even remove what's before the cursor. This way, you can type Module, hit the key you want and use Module to filter.
function! s:Complete()
" From lh-vim-lib: word_tools.vim
let key = GetCurrentKeyword()
let files = split(glob('./**/*'.key.'*/**', '*.vim'), '\n')
call complete(col('.')-len(key), files )
return ''
endfunction
inoremap µ <c-R>=<sid>Complete()<cr>
However, if you want to trigger an action (instead of inserting text), it becomes much more complex. I did that in muTemplate. I've published the framework used to associate hooks to completion items in lh-vim-lib (See lh#icomplete#*() functions).
EDIT: OK, then, I'll work with let files=split(system("grep --include=*.f90 -Ril module *"), '\n') to obtain the list of files, then call complete(col('.'), files) with that list. That should be the more efficient solution. This is somehow quite similar to Ingo's solution. The difference is that we don't need Python if grep is available.
Regarding Python integration, well it's possible with :py vim.command(). See for instance jira-complete that integrates complete() with a Python script that builds the completion-list: https://github.com/mnpk/vim-jira-complete/blob/master/autoload/jira.vim#L116
Notes:
if "module:" can be pre-searched with ctags, it will to possible to extract your files from tags database with taglist().
It's also possible to fill dynamically the list of files with complete_add(), which is something that would make sense from a python script that tests each file one after the other.
There's an example at :help complete() that you can adapt. If you modify your Python script to output just the (newline-separated) files, you can invoke it via system():
inoremap <F5> <C-R>=FindFiles()<CR>
function! FindFiles()
call complete(col('.'), split(system('python path/to/script.py'), '\n'))
return ''
endfunction
The question might sound strange because I know I enforce a strange situation> It came up by accident (a bug one might say) and I even know hot to avoid it, so please skip that part.
I would really like to understand the behaviour I see.
The point of the function is to add all files with a given prefix in a directory to an archive. I noticed that even despite a "bug", the program works correctly (sic!). I wanted to understand why.
The code is fairly simple so I allow myself to post whole function:
def pack(prefix, custom_meta_files = []):
postfix = 'tgz'
if prefix[-1] != '.':
postfix = '.tgz'
archive = tarfile.open(prefix+postfix, "w:gz")
files = filter(lambda path: path.startswith(prefix), os.listdir())
#print('files: {0}'.format(list(files)))
for file in files:
print('packing `{0}`'.format(file))
archive_name = file[len(prefix):] #skip prefix + dot
archive.add(file, archive_name)
not_doubled_metas = set(custom_meta_files) - set(archive.getnames())
print('metas to add: {0}'.format(not_doubled_metas))
for meta in not_doubled_metas:
print('packing `{0}`'.format(meta))
archive.add(meta)
print('contents:{0}'.format(archive.getnames()))
As one can notice I create the archive with the prefix, and then I create a list of files to pack by by listing everything in cwd and filter it via the lambda. Naturally the archive passes the filter. There is also a snippet to add fixed files if the names do not overlap, although it is not important I think.
So the output from such run is e.g:
packing `ga_run.seq_niche.N30.1.bt0_5K.params`
packing `ga_run.seq_niche.N30.1.bt0_5K.stats`
packing `ga_run.seq_niche.N30.1.bt0_5K.tgz`
metas to add: {'stats.meta'}
packing `stats.meta`
contents:['params', 'stats', 'stats.meta']
So the script tried adding itself, however it does not appear in the final contents. I do not know what is the expected behaviour, but there is no warning at all and the documentation does not mention anything. I read the parts about methods to add members and used search for itself and same name.
I would assume it is automatically skipped, but I don't know how to acutally check it. I would personally expect to add a zero length file as member, however I understand skipping as I makes more sense actually.
Question Is it a desired behaviour in tarfile.add() to ignore adding the archive to itself? Where is it said?
Scanning the tarfile.py code from 3.2 to 2.4 they all have code similar to:
# Skip if somebody tries to archive the archive...
if self.name is not None and os.path.abspath(name) == self.name:
self._dbg(2, "tarfile: Skipped %r" % name)
return
I am automatically generating filenames and I do not want there to be an overwrite. I am lazily using this little line of code
fd, filepath = tempfile.mkstemp(ext, prefix='odt_img_', dir=self.destPath)
os.close(fd) # just using the name and overwriting later
Later on I write to filepath, but I am not sure if mkstemp just adds some random letters or if it actually makes sure the name is unique.
tempfile.mkstemp only guarantees to create and open a new file with a name that does not exist. From the docs:
Creates a temporary file in the most secure manner possible. There are no race conditions in the file’s creation, assuming that the platform properly implements the os.O_EXCL flag for os.open().
and the O_EXCL flag specifies:
Ensure that this call creates the file: if this flag is specified in conjunction with O_CREAT, and the filename already exists, then open() will fail.
Internally, mkstemp just loops through a random sequence trying to create a file that does not exist until it succeeds or runs out of "ideas" in which case it would fail with an IOError.
All,
I am working on creating an interface for dealing with some massive data and generating arff files for doing some machine learning stuff with. I can currently collect the features- but I have no way of associating them with the files they were derived from. I am currently using Dumbo
def mapper(key, value):
#do stuff to generate features
Is there any convenient method for determining the filename that was opened and had its contents passed to the mapper function?
Thanks again.
-Sam
If you're able to access the job configuration properties, then the mapreduce.job.input.file property should contain the file name of the current file.
I'm not sure how you get at these properties in Dumbo/Mrjob though - the docs specify that periods (in the conf names) are replaced with underscores, and then looking through the source for PipeMapRed.java, looks like everything single job conf property is set as a env variable - so try and access an env variable named mapreduce_job_input_file
http://hadoop.apache.org/mapreduce/docs/r0.21.0/mapred_tutorial.html#Configured+Parameters
As described here, you can use -addpath yes option.
-addpath yes (replace each input key by a tuple consisting of the path of the corresponding input file and the original key)