"Batch" renaming one file at a time in Python

"Batch" renaming one file at a time in Python - python

I would like to perform a sort of "manual" batch operation where Python looks in a directory, sees a list of files, then automatically displays them one at a time and waits for user input before moving on to the next file. I am going to assume the files have relatively random names (and the order in which Python chooses to display them doesn't really matter).
So, I might have pic001.jpg and myCalendar.docx. Is there a way to have Python move through these (in any order) so that I can prepend something to each one manually? For instance, it could look like
Please type a prefix for each of the following:
myCalendar.docx:
and when I typed "2014" the file would become 2014_myCalendar.docx. Python would then go on to say
Please type a prefix for each of the following:
myCalendar.docx: 2014
... myCalendar.docx renamed to 2014_myCalendar.docx
pic001.jpg:
then I could make it disneyland_pic001.jpg.
I know how to rename files, navigate directories, etc. I'm just not sure how to get Python to cycle through every file in a certain directory, one at a time, and let me modify each one. I think this would be really easy to do with a for loop if each of the files was numbered, but for what I'm trying to do, I can't assume that they will be.
Thank you in advance.
Additionally, if you could point me to some tutorials or documentation that might help me with this, I'd appreciate that as well. I've got http://docs.python.org open in a few tabs, but as someone who's relatively new to Python, and programming in general, I find their language to be a little over my head sometimes.

Something like this (untested):
DIR = '/Volumes/foobar'
prefix = raw_input('Please type a prefix for each of the following: ')
for f in os.listdir(DIR):
path = os.path.join(DIR, f)
new_path = os.path.join(DIR, '%s%s' % (prefix, f))
try:
os.rename(path, new_path)
print 'renamed', f
except:
raise

Related

deleting duplicate files with parenthesis in the name, only if original exists

i didn't really find anything that suits my needs, and i am not sure how to proceed.
i have lots of photos, scattered across different folder, while many of them are just duplicates. like for example:
20180514.jpg(1).jpg
20180514.jpg
or
20180514(1).jpg
now, i want to create a python script, which looks for files with a parenthesis, checks if a related file without parenthesis exists and deletes the file with the parenthesis. with my lack of python skills, i managed to search for all files with wildcards:
parenthesisList = glob.glob('**/*(*)*', recursive=True)
and could theoretically delete them from there, but with 30k+ pictures, i wouldn't dare just deleting them, not knowing if they really had an original file or not.
the tricky part now is to compare that list to another list, which is something like:
everythingList = glob.glob('**/*(*)*', recursive=True)
and to evaluate, which of the files in
parenthesisList have a file with the same name, except the parenthesis.
bonus point would be to only delete the file, if the size of the file is the same or less, but don't really need that. thanks for any help!
EDIT: my post sounds like i want someone to do it for me, but if it wasn't clear, my question is: how do you check if items from list A contain items from list A minus '('?

from os import listdir
for filename in listdir(r"C:\Users\...\Pictures"):
# check if file name contains parenthesis
if "(" in filename:
os.remove(r"C:\Users\...\Pictures\\" + filename)
Note that this will also remove folders names with "(".

Get file path of continuously updating file

I have found a few approaches to search for the newest file created by a user in a directory, but I need to determine if an easier approach exists. Most posts on this topics work in some instances or have major hurdles, so I am hoping to unmuddy the water.
I am having difficulty looking through a growing file system, as well as bringing more users in with more potential errors.
I get data from a Superlogics Winview CP 32 for a continuously streaming system. On each occasion of use of the system, I have the operator input a unique identifier for the file name containing a few of the initial conditions of the system we need to track. I would like to get that file name with no help from the operator/user.
Eventually, the end goal is to pare down a list of files I want to search, filtered based on keys, so my first instinct was to use only matching file types, trim all folders in a pathway into a list, and sort based on max timestamp. I used some pretty common functions from these pages:
def fileWalkIn(path='.',matches=[],filt='*.csv'): # Useful for walking through a given directory
"""Iterates through all files under the given path using a filter."""
for root, dirnames, filenames in os.walk(path):
for filename in fnmatch.filter(filenames, filt):
matches.append(os.path.join(root, filename))
yield os.path.join(root, filename)
def getRecentFile(path='.',matches=[],filt='*.dat'):
rr = max(fileWalkIn(path=path,matches=matches,filt=filt), key=os.path.getmtime)
return rr
This got me far, but is rather bulky and slow, which means I cannot do this repeatedly if I want to explore the files that match, lest I have to carry around a bulky list of the matching files.
Ideally, I will be able to process the data on the fly, executing and printing live while it writes, so this approach is not usable in that instance.
I borrowed from these pages a new approach by alex-martelli, which does not use a filter, gives the option of giving files, opposed to directories, is much slimmer than fileWalkIn, and works quicker if using the timestamp.
def all_subdirs_of(b='.'): # Useful for walking through a given directory
# Create hashable list of files or directories in the parent directory
results = []
for d in os.listdir(b):
bd = os.path.join(b, d)
if os.path.isfile(bd):
results.append(bd)
elif os.path.isdir(bd):
results.append(bd)
# return both
return results
def newest(path='.'):
rr = max(all_subdirs_of(b=path), key=os.path.getmtime)
return rr
def getActiveFile(newFile ='.'):
while os.path.exists(newFile):
newFile = newest(newFile)
if os.path.isfile(newFile):
return newFile
else:
if newFile:
continue
else:
return newFile
This gets me the active file in a directory much more quickly, but only if no other files have written since launching my data collection. I can see all kinds of problems here and need some help determining if I have gone down a rabbit hole and there is a more simple solution, like testing file sizes, or whether a more cohesive solution with less potential snags exists.
I found other answers for different languages (java, how-to-get-the-path-of-a-running-jar-file), but would need something in Python. I have explored functions like watchdog and win32, but both require steep learning curves, and I feel like I am either very close, or need to change my paradigm entirely.

dircache might speed up the second approach a bit. It's a wrapper around listdir that checks the directory timestamp and only re-reads directory contents if there's been a change.
Beyond that you really need something that listens to file system events. A quick google turned up two pip packages, pyinotify for Linux only and watchdog.
Hope this helps.

What is expected behviour of tarfile.add() when adding archive to itself?

The question might sound strange because I know I enforce a strange situation> It came up by accident (a bug one might say) and I even know hot to avoid it, so please skip that part.
I would really like to understand the behaviour I see.
The point of the function is to add all files with a given prefix in a directory to an archive. I noticed that even despite a "bug", the program works correctly (sic!). I wanted to understand why.
The code is fairly simple so I allow myself to post whole function:
def pack(prefix, custom_meta_files = []):
postfix = 'tgz'
if prefix[-1] != '.':
postfix = '.tgz'
archive = tarfile.open(prefix+postfix, "w:gz")
files = filter(lambda path: path.startswith(prefix), os.listdir())
#print('files: {0}'.format(list(files)))
for file in files:
print('packing `{0}`'.format(file))
archive_name = file[len(prefix):] #skip prefix + dot
archive.add(file, archive_name)
not_doubled_metas = set(custom_meta_files) - set(archive.getnames())
print('metas to add: {0}'.format(not_doubled_metas))
for meta in not_doubled_metas:
print('packing `{0}`'.format(meta))
archive.add(meta)
print('contents:{0}'.format(archive.getnames()))
As one can notice I create the archive with the prefix, and then I create a list of files to pack by by listing everything in cwd and filter it via the lambda. Naturally the archive passes the filter. There is also a snippet to add fixed files if the names do not overlap, although it is not important I think.
So the output from such run is e.g:
packing `ga_run.seq_niche.N30.1.bt0_5K.params`
packing `ga_run.seq_niche.N30.1.bt0_5K.stats`
packing `ga_run.seq_niche.N30.1.bt0_5K.tgz`
metas to add: {'stats.meta'}
packing `stats.meta`
contents:['params', 'stats', 'stats.meta']
So the script tried adding itself, however it does not appear in the final contents. I do not know what is the expected behaviour, but there is no warning at all and the documentation does not mention anything. I read the parts about methods to add members and used search for itself and same name.
I would assume it is automatically skipped, but I don't know how to acutally check it. I would personally expect to add a zero length file as member, however I understand skipping as I makes more sense actually.
Question Is it a desired behaviour in tarfile.add() to ignore adding the archive to itself? Where is it said?

Scanning the tarfile.py code from 3.2 to 2.4 they all have code similar to:
# Skip if somebody tries to archive the archive...
if self.name is not None and os.path.abspath(name) == self.name:
self._dbg(2, "tarfile: Skipped %r" % name)
return

Python: os.chdir() not working within a for loop?

I'm trying to get a homemade path navigation function working - basically I need to go through one folder, and explore every folder within it, running a function within each folder.
I reach a problem when I try to change directories within a for loop. I've got this "findDirectories" function:
def findDirectories(list):
for files in os.listdir("."):
print (files)
list.append(files)
os.chdir("y")
That last line causes the problems. If I remove it, the function just compiles a list with all the folders in that folder. Unfortunately, this means I have to run this each time I go down a folder, I can't just run the whole thing once. I've specified the folder "y" as that's a real folder, but the program crashes upon opening even with that. Doing os.chdir("y") outside of the for loop has no issues at all.
I'm new to Python, but not to programming in general. How can I get this to work, or is there a better way? The final result I need is running a Function on each single "*Response.xml" file that exists within this folder, no matter how deeply nested it is.

Well, you don't post the traceback of the actual error but clearly it doesn't work as you have specified y as a relative path.
Thus it may be able to change to y in the first iteration of the loop, but in the second it will be trying to change to a subdirectory of y that is also called y
Which you probably do not have.
You want to be doing something like
import os
for dirName, subDirs, fileNames in os.walk(rootPath):
# its not clear which files you want, I assume anything that ends with Response.xml?
for f in fileNames:
if f.endswith("Response.xml"):
# this is the path you will want to use
filePath = os.path.join(dirName, f)
# now do something with it!
doSomethingWithFilePath(filePath)
Thats untested, but you have the idea ...

As Dan said, os.walk would be better. See the example there.

Python File Creation Date & Rename - Request for Critique

Scenario: When I photograph an object, I take multiple images, from several angles. Multiplied by the number of objects I "shoot", I can generate a large number of images. Problem: Camera generates images identified as, 'DSCN100001', 'DSCN100002", etc. Cryptic.
I put together a script that will prompt for directory specification (Windows), as well as a "Prefix". The script reads the file's creation date and time, and rename the file accordingly. The prefix will be added to the front of the file name. So, 'DSCN100002.jpg' can become "FatMonkey 20110721 17:51:02". The time detail is important to me for chronology.
The script follows. Please tell me whether it is Pythonic, whether or not it is poorly written and, of course, whether there is a cleaner - more efficient way of doing this. Thank you.
import os
import datetime
target = raw_input('Enter full directory path: ')
prefix = raw_input('Enter prefix: ')
os.chdir(target)
allfiles = os.listdir(target)
for filename in allfiles:
t = os.path.getmtime(filename)
v = datetime.datetime.fromtimestamp(t)
x = v.strftime('%Y%m%d-%H%M%S')
os.rename(filename, prefix + x +".jpg")

The way you're doing it looks Pythonic. A few alternatives (not necessarily suggestions):
You could skip os.chdir(target) and do os.path.join(target, filename) in the loop.
You could do strftime('{0}-%Y-%m-%d-%H:%M:%S.jpg'.format(prefix)) to avoid string concatenation. This is the only one I'd reccomend.
You could reuse a variable name like temp_date instead of t, v, and x. This would be OK.
You could skip storing temporary variables and just do:
for filename in os.listdir(target):
os.rename(filename, datetime.fromtimestamp(
os.path.getmtime(filename)).strftime(
'{0}-%Y-%m-%d-%H:%M:%S.jpeg'.format(prefix)))
You could generalize your function to work for recursive directories by using os.walk().
You could detect the file extension of files so it would be correct not just for .jpegs.
You could make sure you only renamed files of the form DSCN1#####.jpeg

Your code is nice and simple. Few possible improvements I can suggest:
Command line arguments is more preferable for dir names because of autocomplition by TAB
EXIF is more accurate source of date and time of photo creating. If you modify photo in image editor, modify time will be changed while EXIF information will be preserved. Here is discussion about EXIF library for Python: Exif manipulation library for python

My only thought is that if you are going to have the computer do the work for you, let it do more of the work. My assumption is that you are going to shoot one object several times, then either move to another object or move another object into place. If so, you could consider grouping the photos by how close the timestamps are together (maybe any delta over 2 minutes is considered a new object). Then based on these pseudo clusters, you could name the photos by object.
May not be what you are looking for, but thought I'd add in the suggestion.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.