Python File Creation Date & Rename - Request for Critique

Python File Creation Date & Rename - Request for Critique - python

Scenario: When I photograph an object, I take multiple images, from several angles. Multiplied by the number of objects I "shoot", I can generate a large number of images. Problem: Camera generates images identified as, 'DSCN100001', 'DSCN100002", etc. Cryptic.
I put together a script that will prompt for directory specification (Windows), as well as a "Prefix". The script reads the file's creation date and time, and rename the file accordingly. The prefix will be added to the front of the file name. So, 'DSCN100002.jpg' can become "FatMonkey 20110721 17:51:02". The time detail is important to me for chronology.
The script follows. Please tell me whether it is Pythonic, whether or not it is poorly written and, of course, whether there is a cleaner - more efficient way of doing this. Thank you.
import os
import datetime
target = raw_input('Enter full directory path: ')
prefix = raw_input('Enter prefix: ')
os.chdir(target)
allfiles = os.listdir(target)
for filename in allfiles:
t = os.path.getmtime(filename)
v = datetime.datetime.fromtimestamp(t)
x = v.strftime('%Y%m%d-%H%M%S')
os.rename(filename, prefix + x +".jpg")

The way you're doing it looks Pythonic. A few alternatives (not necessarily suggestions):
You could skip os.chdir(target) and do os.path.join(target, filename) in the loop.
You could do strftime('{0}-%Y-%m-%d-%H:%M:%S.jpg'.format(prefix)) to avoid string concatenation. This is the only one I'd reccomend.
You could reuse a variable name like temp_date instead of t, v, and x. This would be OK.
You could skip storing temporary variables and just do:
for filename in os.listdir(target):
os.rename(filename, datetime.fromtimestamp(
os.path.getmtime(filename)).strftime(
'{0}-%Y-%m-%d-%H:%M:%S.jpeg'.format(prefix)))
You could generalize your function to work for recursive directories by using os.walk().
You could detect the file extension of files so it would be correct not just for .jpegs.
You could make sure you only renamed files of the form DSCN1#####.jpeg

Your code is nice and simple. Few possible improvements I can suggest:
Command line arguments is more preferable for dir names because of autocomplition by TAB
EXIF is more accurate source of date and time of photo creating. If you modify photo in image editor, modify time will be changed while EXIF information will be preserved. Here is discussion about EXIF library for Python: Exif manipulation library for python

My only thought is that if you are going to have the computer do the work for you, let it do more of the work. My assumption is that you are going to shoot one object several times, then either move to another object or move another object into place. If so, you could consider grouping the photos by how close the timestamps are together (maybe any delta over 2 minutes is considered a new object). Then based on these pseudo clusters, you could name the photos by object.
May not be what you are looking for, but thought I'd add in the suggestion.

Related

Run only if "if " statement is true.!

So I've a question, Like I'm reading the fits file and then i'm using the information from the header of the fits to define the other files which are related to the original fits file. But for some of the fits file, the other files (blaze_file, bis_file, ccf_table) are not available. And because of that my code gives the pretty obvious error that No Such file or directory.
import pandas as pd
import sys, os
import numpy as np
from glob import glob
from astropy.io import fits
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
e2ds_hdu = fits.open(filename)
e2ds_header = e2ds_hdu[0].header
date = e2ds_header['DATE-OBS']
date2 = date = date[0:19]
blaze_file = e2ds_header['HIERARCH ESO DRS BLAZE FILE']
bis_file = glob('HARPS.' + date2 + '*_bis_G2_A.fits')
ccf_table = glob('HARPS.' + date2 + '*_ccf_G2_A.tbl')
if not all(file in os.listdir(PATH) for file in [blaze_file,bis_file,ccf_table]):
continue
So what i want to do is like, i want to make my code run only if all the files are available otherwise don't. But the problem is that, i'm defining the other files as variable inside the for loop as i'm using the header information. So how can i define them before the for loop???? and then use something like
So can anyone help me out of this?

The filenames returned by os.listdir() are always relative to the path given there.
In order to be used, they have to be joined with this path.
Example:
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
filepath = os.path.join(PATH, filename)
e2ds_hdu = fits.open(filepath)
…
Let the filenames be ['a', 'b', 'a_ed2ds_A.fits', 'b_ed2ds_A.fits']. The code now excludes the two first names and then prepends the file path to the remaining two.
a_ed2ds_A.fits becomes /home/Desktop/2d_spectra/a_ed2ds_A.fits and
b_ed2ds_A.fits becomes /home/Desktop/2d_spectra/b_ed2ds_A.fits.
Now they can be accessed from everywhere, not just from the given file path.
I should become accustomed to reading a question in full before trying to answer it.
The problem I mentionned is a problem if you don't start the script from any path outside the said directory. Nevertheless, applying it will make your code much more consistent.
Your real problem, however, lies somewhere else: you examine a file and then, after checking its contents, want to read files whose names depend on informations from that first file.
There are several ways to accomplish your goal:
Just extend your loop with the proper tests.
Pseudo code:
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if all files exist:
proceed
or
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if not all files exist:
continue # actual keyword, no pseudo code!
proceed
Put some functionality into functions (variation of 1.)
Create a loop in a generator function which yields the "interesting information" of one fits file (or alternatively nothing) and have another loop run over them to actually work with the data.
If I am still missing some points or am not detailled enough, please let me know.

Since you have to read the fits file to know the other dependant files names, there's no way you can avoid reading the fit file first. The only thing you can do is test for the dependant files existance before trying to read them and skip the rest of the loop (using continue) if not.

Edit this line
e2ds_hdu = fits.open(filename)
And replace with
e2ds_hdu = fits.open(os.path.join(PATH, filename))

How can I improve performance of finding all files in a folder created at a certain date?

There are 10,000 files in a folder. Few files are created on 2018-06-01, few on 2018-06-09, like that.
I need to find all files which are created on 2018-06-09. But it is taking to much time (almost 2 hours) to read each file and get the file creation date and then get the files which are created on 2018-06-09.
for file in os.scandir(Path):
if file.is_file():
file_ctime = datetime.fromtimestamp(os.path.getctime(file)).strftime('%Y- %m- %d %H:%M:%S')
if file_ctime[0:4] == '2018-06-09':
# ...

You could try using os.listdir(path) to get all the files and dirs from the given path.
Once you have all the files and directories you could use filter and a lambda function to create a new list of only the files with the desired timestamp.
You could then iterate through that list to do what work you need to on the correct files.

Let's start with the most basic thing - why are you building a datetime only to re-format it as string and then do a string comparison?
Then there is the whole point of using os.scandir() over os.listdir() - os.scandir() returns a os.DirEntry which caches file stats through the os.DirEntry.stat() call.
In dependence of checks you need to perform, os.listdir() might even perform better if you expect to do a lot of filtering on the filename as then you won't need to build up a whole os.DirEntry just to discard it.
So, to optimize your loop, if you don't expect a lot of filtering on the name:
for entry in os.scandir(Path):
if entry.is_file() and 1528495200 <= entry.stat().st_ctime < 1528581600:
pass # do whatever you need with it
If you do, then better stick with os.listdir() as:
import stat
for entry in os.listdir(Path):
# do your filtering on the entry name first...
path = os.path.join(Path, entry) # build path to the listed entry...
stats = os.stat(path) # cache the file entry statistics
if stat.S_ISREG(stats.st_mode) and 1528495200 <= stats.st_ctime < 1528581600:
pass # do whatever you need with it
If you want to be flexible with the timestamps, use datetime.datetime.timestamp() beforehand to get the POSIX timestamps and then you can compare them against what stat_result.st_ctime returns directly without conversion.
However, even your original, non-optimized approach should be significantly faster than 2 hours for a mere 10k entries. I'd check the underlying filesystem, too, something seems wrong there.

Get file path of continuously updating file

I have found a few approaches to search for the newest file created by a user in a directory, but I need to determine if an easier approach exists. Most posts on this topics work in some instances or have major hurdles, so I am hoping to unmuddy the water.
I am having difficulty looking through a growing file system, as well as bringing more users in with more potential errors.
I get data from a Superlogics Winview CP 32 for a continuously streaming system. On each occasion of use of the system, I have the operator input a unique identifier for the file name containing a few of the initial conditions of the system we need to track. I would like to get that file name with no help from the operator/user.
Eventually, the end goal is to pare down a list of files I want to search, filtered based on keys, so my first instinct was to use only matching file types, trim all folders in a pathway into a list, and sort based on max timestamp. I used some pretty common functions from these pages:
def fileWalkIn(path='.',matches=[],filt='*.csv'): # Useful for walking through a given directory
"""Iterates through all files under the given path using a filter."""
for root, dirnames, filenames in os.walk(path):
for filename in fnmatch.filter(filenames, filt):
matches.append(os.path.join(root, filename))
yield os.path.join(root, filename)
def getRecentFile(path='.',matches=[],filt='*.dat'):
rr = max(fileWalkIn(path=path,matches=matches,filt=filt), key=os.path.getmtime)
return rr
This got me far, but is rather bulky and slow, which means I cannot do this repeatedly if I want to explore the files that match, lest I have to carry around a bulky list of the matching files.
Ideally, I will be able to process the data on the fly, executing and printing live while it writes, so this approach is not usable in that instance.
I borrowed from these pages a new approach by alex-martelli, which does not use a filter, gives the option of giving files, opposed to directories, is much slimmer than fileWalkIn, and works quicker if using the timestamp.
def all_subdirs_of(b='.'): # Useful for walking through a given directory
# Create hashable list of files or directories in the parent directory
results = []
for d in os.listdir(b):
bd = os.path.join(b, d)
if os.path.isfile(bd):
results.append(bd)
elif os.path.isdir(bd):
results.append(bd)
# return both
return results
def newest(path='.'):
rr = max(all_subdirs_of(b=path), key=os.path.getmtime)
return rr
def getActiveFile(newFile ='.'):
while os.path.exists(newFile):
newFile = newest(newFile)
if os.path.isfile(newFile):
return newFile
else:
if newFile:
continue
else:
return newFile
This gets me the active file in a directory much more quickly, but only if no other files have written since launching my data collection. I can see all kinds of problems here and need some help determining if I have gone down a rabbit hole and there is a more simple solution, like testing file sizes, or whether a more cohesive solution with less potential snags exists.
I found other answers for different languages (java, how-to-get-the-path-of-a-running-jar-file), but would need something in Python. I have explored functions like watchdog and win32, but both require steep learning curves, and I feel like I am either very close, or need to change my paradigm entirely.

dircache might speed up the second approach a bit. It's a wrapper around listdir that checks the directory timestamp and only re-reads directory contents if there's been a change.
Beyond that you really need something that listens to file system events. A quick google turned up two pip packages, pyinotify for Linux only and watchdog.
Hope this helps.

"Batch" renaming one file at a time in Python

I would like to perform a sort of "manual" batch operation where Python looks in a directory, sees a list of files, then automatically displays them one at a time and waits for user input before moving on to the next file. I am going to assume the files have relatively random names (and the order in which Python chooses to display them doesn't really matter).
So, I might have pic001.jpg and myCalendar.docx. Is there a way to have Python move through these (in any order) so that I can prepend something to each one manually? For instance, it could look like
Please type a prefix for each of the following:
myCalendar.docx:
and when I typed "2014" the file would become 2014_myCalendar.docx. Python would then go on to say
Please type a prefix for each of the following:
myCalendar.docx: 2014
... myCalendar.docx renamed to 2014_myCalendar.docx
pic001.jpg:
then I could make it disneyland_pic001.jpg.
I know how to rename files, navigate directories, etc. I'm just not sure how to get Python to cycle through every file in a certain directory, one at a time, and let me modify each one. I think this would be really easy to do with a for loop if each of the files was numbered, but for what I'm trying to do, I can't assume that they will be.
Thank you in advance.
Additionally, if you could point me to some tutorials or documentation that might help me with this, I'd appreciate that as well. I've got http://docs.python.org open in a few tabs, but as someone who's relatively new to Python, and programming in general, I find their language to be a little over my head sometimes.

Something like this (untested):
DIR = '/Volumes/foobar'
prefix = raw_input('Please type a prefix for each of the following: ')
for f in os.listdir(DIR):
path = os.path.join(DIR, f)
new_path = os.path.join(DIR, '%s%s' % (prefix, f))
try:
os.rename(path, new_path)
print 'renamed', f
except:
raise

What is expected behviour of tarfile.add() when adding archive to itself?

The question might sound strange because I know I enforce a strange situation> It came up by accident (a bug one might say) and I even know hot to avoid it, so please skip that part.
I would really like to understand the behaviour I see.
The point of the function is to add all files with a given prefix in a directory to an archive. I noticed that even despite a "bug", the program works correctly (sic!). I wanted to understand why.
The code is fairly simple so I allow myself to post whole function:
def pack(prefix, custom_meta_files = []):
postfix = 'tgz'
if prefix[-1] != '.':
postfix = '.tgz'
archive = tarfile.open(prefix+postfix, "w:gz")
files = filter(lambda path: path.startswith(prefix), os.listdir())
#print('files: {0}'.format(list(files)))
for file in files:
print('packing `{0}`'.format(file))
archive_name = file[len(prefix):] #skip prefix + dot
archive.add(file, archive_name)
not_doubled_metas = set(custom_meta_files) - set(archive.getnames())
print('metas to add: {0}'.format(not_doubled_metas))
for meta in not_doubled_metas:
print('packing `{0}`'.format(meta))
archive.add(meta)
print('contents:{0}'.format(archive.getnames()))
As one can notice I create the archive with the prefix, and then I create a list of files to pack by by listing everything in cwd and filter it via the lambda. Naturally the archive passes the filter. There is also a snippet to add fixed files if the names do not overlap, although it is not important I think.
So the output from such run is e.g:
packing `ga_run.seq_niche.N30.1.bt0_5K.params`
packing `ga_run.seq_niche.N30.1.bt0_5K.stats`
packing `ga_run.seq_niche.N30.1.bt0_5K.tgz`
metas to add: {'stats.meta'}
packing `stats.meta`
contents:['params', 'stats', 'stats.meta']
So the script tried adding itself, however it does not appear in the final contents. I do not know what is the expected behaviour, but there is no warning at all and the documentation does not mention anything. I read the parts about methods to add members and used search for itself and same name.
I would assume it is automatically skipped, but I don't know how to acutally check it. I would personally expect to add a zero length file as member, however I understand skipping as I makes more sense actually.
Question Is it a desired behaviour in tarfile.add() to ignore adding the archive to itself? Where is it said?

Scanning the tarfile.py code from 3.2 to 2.4 they all have code similar to:
# Skip if somebody tries to archive the archive...
if self.name is not None and os.path.abspath(name) == self.name:
self._dbg(2, "tarfile: Skipped %r" % name)
return

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.