renaming files to incremental values with correct digits - python

I would like to rename a list of pictures based on a root name being the directory name, (picture in this example) by padding the previous numbering with the appropriate of zeros based on the total number of files and increment. I was thinking of using Powershell or Python. Recommendations?
current 'C:\picture' directory contents
pic 1.jpg
...
pic 101.jpg
Result
picture 001.jpg
...
picture 101.jpg

Assuming
You already know how to traverse your directory
Access the file names in your script
Rename the files
Couple of Things to understand
Your file name has a format with the numbers padded with '0's if its less than a certain size, in your example if its less than 3. str.format, provides an elaborate format string specifier to achieve this
You need to know how to get the relevant portions of your file name to be reformatted as required
The formatting would vary ultimately based on number of files.
Demo
>>> no_of_files = 100
>>> no_of_digits = int(math.log10(no_of_files)) + 1
>>> format_exp = "pictures {{:>0{}}}.{{}}".format(no_of_digits)
>>> for fname in files:
#Discard the irrelevant portion
fname = fname.rsplit()[-1]
print format_exp.format(*fname.split('.'))
pictures 001.jpg
pictures 002.jpg
pictures 010.jpg
pictures 100.jpg

Here's python solution:
import glob
import os
dirpath = r'c:\picture'
dirname = os.path.basename(dirpath)
filepath_list = glob.glob(os.path.join(dirpath, 'pic *.jpg'))
pad = len(str(len(filepath_list)))
for n, filepath in enumerate(filepath_list, 1):
os.rename(
filepath,
os.path.join(dirpath, 'picture {:>0{}}.jpg'.format(n, pad))
)
pad is calculated using file count len(filepath_list):
>>> len(str(100)) # if file count is 100
3
'picture {:>0{}}.jpg'.format(99, 3) is like 'picture {:>03}.jpg'.format(99). Format string {:>03} zero-pad(0), right-align(>) the input value (99 in the following example).
>>> 'picture {:>0{}}.jpg'.format(99, 3)
'picture 099.jpg'
>>> 'picture {:>03}.jpg'.format(99)
'picture 099.jpg'
Documentation for the functions used:
enumerate
glob.glob
os.path.basename
os.path.join
str.format

Here's a PowerShell solution:
$jpgs = Get-ChildItem C:\Picture\*.jpg
$numDigits = "$($jpgs.Length)".Length
$formatStr = "{0:$('0' * $numDigits)}"
$jpgs | Where {$_.BaseName -match '(\d+)'} |
Rename-Item -NewName {$_.DirectoryName + '\' + $_.Directory.Name + ($formatStr -f [int]$matches[1]) + $_.Extension} -WhatIf
Remove the -WhatIf parameter to actually execute the rename if the preview you get with -WhatIf looks good.

Related

How can i adjust my code to start with a new number?

import os
src = "/home/user/Desktop/images/"
ext = ".jpg"
for i,filename in enumerate(os.listdir(src)):
# print(i,filename)
if filename.endswith(ext):
os.rename(src + filename, src + str(i) + ext)
print(filename, src + str(i) + ext)
else :
os.remove(src + filename)
this code will rename all the images in a folder starting with 0.jpg,1.jpg etc... and remove none jpg but what if i already had some images in that folder, let's say i had images 0.jpg, 1.jpg, 2.jpg, then i added a few others called im5.jpg and someImage.jpg.
What i want to do is adjust the code to read the value of the last image number, in this case 2 and start counting from 3 .
In other words i'll ignore the already labeled images and proceed with the new ones counting from 3.
Terse and semi-tested version:
import os
import glob
offset = sorted(int(os.path.splitext(os.path.basename(filename))[0])
for filename in glob.glob(os.path.join(src, '*' + ext)))[-1] + 1
for i, filename in enumerate(os.listdir(src), start=offset):
...
Provided all *.jpg files consist of a only a number before their extension. Otherwise you will get a ValueError.
And if there happens to be a gap in the numbering, that gap will not be filled with new files. E.g., 1.jpg, 2.jpg, 3.jpg, 123.jpg will continue with 124.jpg (which is safer anyway).
If you need to filter out filenames such as im5.jpg or someImage.jpg, you could add an if-clause to the list comprehension, with a regular expression:
import os
import glob
import re
offset = sorted(int(os.path.splitext(os.path.basename(filename))[0])
for filename in glob.glob(os.path.join(src, '*' + ext))
if re.search('\d+' + ext, filename))[-1] + 1
Of course, by now the three lines are pretty unreadable, and may not win the code beauty contest.

How to add numbers infront of each files without touching filename using Python?

I have some files (800+) in folder as shown below:
test_folder
1_one.txt
2_two.txt
3_three.txt
4_power.txt
5_edge.txt
6_mobile.txt
7_test.txt
8_power1.txt
9_like.txt
10_port.txt
11_fire.txt
12_water.txt
I want to rename all these files using python like this:
test_folder
001_one.txt
002_two.txt
003_three.txt
004_power.txt
005_edge.txt
006_mobile.txt
007_test.txt
008_power1.txt
009_like.txt
010_port.txt
011_fire.txt
012_water.txt
Can we do this with Python? Please guide on how to do this.
Use zfill to pad zeros
import os,glob
src_folder = r"/user/bin/"
for file_name in glob.glob(os.path.join(src_folder, "*.txt")):
lst = file_name.split('_')
if len(lst)>1:
try:
value=int(lst[0])
except ValueError:
continue
lst[0] = lst[0].zfill(3)
os.rename(file_name, '_'.join(lst))
Using zfill:
Split based on underscore _ and then use zfill to pad zero's
import os
os.chdir("test_folder")
for filename in os.listdir("."):
os.rename(filename, filename.split("_")[0].zfill(3) + filename[filename.index('_'):])
Converting to integer:
Only renames if prefix is a valid integer. Uses format(num, '03') to make sure the integer is padded with appropriate leading zero's. Renames files 1_file.txt, 12_water.txt but skips a_baa.txt etc.
import os
os.chdir("E:\pythontest")
for filename in os.listdir("."):
try:
num = int(filename.split("_")[0])
os.rename(filename, format(num, '03') + filename[filename.index('_'):])
except:
print 'Skipped ' + filename
EDIT: Both snippets ensure that if the filename contains multiple underscores then the later ones aren't snipped. So 1_file_new.txt gets renamed to 001_file_new.txt.
Examples:
# Before
'1_one.txt',
'12_twelve.txt',
'13_new_more_underscores.txt',
'a_baaa.txt',
'newfile.txt',
'onlycharacters.txt'
# After
'001_one.txt',
'012_twelve.txt',
'013_new_more_underscores.txt',
'a_baaa.txt',
'newfile.txt',
'onlycharacters.txt'
Here's a quick example to rename the files in the current directory:
import os
for f in os.listdir("."):
if os.path.isfile(f) and len(f.split("_")) > 1:
number, suffix = f.split("_")
new_name = "%03d_%s" % (int(number), suffix)
os.rename(f, new_name)
You can use glob.glob() to get a list of text files. Then use a regular expression to ensure that the file being renamed starts with digits and an underscore. Then split the file up and add leading zeros as follows:
import re
import glob
import os
src_folder = r"c:\source folder"
for filename in glob.glob(os.path.join(src_folder, "*.txt")):
path, filename = os.path.split(filename)
re_file = re.match("(\d+)(_.*)", filename)
if re_file:
prefix, base = re_file.groups()
new_filename = os.path.join(path, "{:03}{}".format(int(prefix), base))
os.rename(filename, new_filename)
The {:03} tells Python to zero pad your number to 3 digits. Python's Format Specification Mini-Language is very powerful.
Note os.path.join() is used to safely concatenate path components, so you don't have to worry about trailing separators.

How to find parenthesis bound strings in python

I'm learning Python and wanted to automate one of my assignments in a cybersecurity class.
I'm trying to figure out how I would look for the contents of a file that are bound by a set of parenthesis. The contents of the (.txt) file look like:
cow.jpg : jphide[v5](asdfl;kj88876)
fish.jpg : jphide[v5](65498ghjk;0-)
snake.jpg : jphide[v5](poi098*/8!##)
test_practice_0707.jpg : jphide[v5](sJ*=tT#&Ve!2)
test_practice_0101.jpg : jphide[v5](nKFdFX+C!:V9)
test_practice_0808.jpg : jphide[v5](!~rFX3FXszx6)
test_practice_0202.jpg : jphide[v5](X&aC$|mg!wC2)
test_practice_0505.jpg : jphide[v5](pe8f%yC$V6Z3)
dog.jpg : negative`
And here is my code so far:
import sys, os, subprocess, glob, shutil
# Finding the .jpg files that will be copied.
sourcepath = os.getcwd() + '\\imgs\\'
destpath = 'stegdetect'
rawjpg = glob.glob(sourcepath + '*.jpg')
# Copying the said .jpg files into the destpath variable
for filename in rawjpg:
shutil.copy(filename, destpath)
# Asks user for what password file they want to use.
passwords = raw_input("Enter your password file with the .txt extension:")
shutil.copy(passwords, 'stegdetect')
# Navigating to stegdetect. Feel like this could be abstracted.
os.chdir('stegdetect')
# Preparing the arguments then using subprocess to run
args = "stegbreak.exe -r rules.ini -f " + passwords + " -t p *.jpg"
# Uses open to open the output file, and then write the results to the file.
with open('cracks.txt', 'w') as f: # opens cracks.txt and prepares to w
subprocess.call(args, stdout=f)
# Processing whats in the new file.
f = open('cracks.txt')
If it should just be bound by ( and ) you can use the following regex, which ensures starting ( and closing ) and you can have numbers and characters between them. You can add any other symbol also that you want to include.
[\(][a-z A-Z 0-9]*[\)]
[\(] - starts the bracket
[a-z A-Z 0-9]* - all text inside bracket
[\)] - closes the bracket
So for input sdfsdfdsf(sdfdsfsdf)sdfsdfsdf , the output will be (sdfdsfsdf)
Test this regex here: https://regex101.com/
I'm learning Python
If you are learning you should consider alternative implementations, not only regexps.
TO iterate line by line of a text file you just open the file and for over the file handle:
with open('file.txt') as f:
for line in f:
do_something(line)
Each line is a string with the line contents, including the end-of-line char '/n'. To find the start index of a specific substring in a string you can use find:
>>> A = "hello (world)"
>>> A.find('(')
6
>>> A.find(')')
12
To get a substring from the string you can use the slice notation in the form:
>>> A[6:12]
'(world'
You should use regular expressions which are implemented in the Python re module
a simple regex like \(.*\) could match your "parenthesis string"
but it would be better with a group \((.*)\) which allows to get only the content in the parenthesis.
import re
test_string = """cow.jpg : jphide[v5](asdfl;kj88876)
fish.jpg : jphide[v5](65498ghjk;0-)
snake.jpg : jphide[v5](poi098*/8!##)
test_practice_0707.jpg : jphide[v5](sJ*=tT#&Ve!2)
test_practice_0101.jpg : jphide[v5](nKFdFX+C!:V9)
test_practice_0808.jpg : jphide[v5](!~rFX3FXszx6)
test_practice_0202.jpg : jphide[v5](X&aC$|mg!wC2)
test_practice_0505.jpg : jphide[v5](pe8f%yC$V6Z3)
dog.jpg : negative`"""
REGEX = re.compile(r'\((.*)\)', re.MULTILINE)
print(REGEX.findall(test_string))
# ['asdfl;kj88876', '65498ghjk;0-', 'poi098*/8!##', 'sJ*=tT#&Ve!2', 'nKFdFX+C!:V9' , '!~rFX3FXszx6', 'X&aC$|mg!wC2', 'pe8f%yC$V6Z3']

Directory sizes and extensions

I'd like to create python command line code that is able to print directory tree with sizes of all subdirectories (from certain directory) and most frequent extensions... I will show the example output.
root_dir (5 GB, jpg (65 %): avi ( 30 %) : pdf (5 %))
-- aa (3 GB, jpg (100 %) )
-- bb (2 GB, avi (20 %) : pdf (2 %) )
--- bbb (1 GB, ...)
--- bb2 (1 GB, ...)
-- cc (1 GB, pdf (100 %) )
The format is :
nesting level, directory name (size of the directory with all files and subdirectories, most frequent extensions with size percentages in this directory.
I have this code snippet so far. The problem is that it counts only file sizes in directory, so the resulting size is smaller than real size of the directory. Other problem is how to put it all together to print the tree I defined above without redundant computations.
Calculating directory sizes really isn't python's strong suit, as explained in this post: very quickly getting total size of folder. If you have access to du and find, by all means use that. You can easily display the size of each directory with the following line:
find . -type d -exec du -hs "{}" \;
If you insist in doing this in python, you may prefer post-order traversal over os.walk, as suggested by PableG. But using os.walk can be visually cleaner, if efficiency is not the utmost factor for you:
import os, sys
from collections import defaultdict
def walkIt(folder):
for (path, dirs, files) in os.walk(folder):
size = getDirSize(path)
stats = getExtensionStats(files)
# only get the top 3 extensions
print '%s (%s, %s)'%(path, size, stats[:3])
def getExtensionStats(files):
# get all file extensions
extensions = [f.rsplit(os.extsep, 1)[-1]
for f in files if len(f.rsplit(os.extsep, 1)) > 1]
# count the extensions
exCounter = defaultdict(int)
for e in extensions:
exCounter[e] += 1
# convert count to percentage
percentPairs = [(e, 100*ct/len(extensions)) for e, ct in exCounter.items()]
# sort them
percentPairs.sort(key=lambda i: i[1])
return percentPairs
def getDirSize(root):
size = 0
for path, dirs, files in os.walk(root):
for f in files:
size += os.path.getsize( os.path.join( path, f ) )
return size
if __name__ == '__main__':
path = sys.argv[1] if len(sys.argv) > 1 else '.'
walkIt(path)
I personally find os.listdir + a_recursive_function best suited for this task than os.walk:
import os, copy
from os.path import join, getsize, isdir, splitext
frequent_ext = { ".jpg": 0, ".pdf": 0 } # Frequent extensions
def list_dir(base_dir):
dir_sz = 0 # directory size
files = os.listdir(base_dir)
ext_size = copy.copy(frequent_ext)
for file_ in files:
file_ = join(base_dir, file_)
if isdir(file_):
ret = list_dir(file_)
dir_sz += ret[0]
for k, v in frequent_ext.items(): # Add to freq.ext.sizes
ext_size[k] += ret[1][k]
else:
file_sz = getsize(file_)
dir_sz += file_sz
ext = os.path.splitext(file_)[1].lower() # Frequent extension?
if ext in frequent_ext.keys():
ext_size[ext] += file_sz
print base_dir, dir_sz,
for k, v in ext_size.items():
print "%s: %5.2f%%" % (k, float(v) / max(1, dir_sz) * 100.),
print
return (dir_sz, ext_size)
base_dir = "e:/test_dir/"
base_dir = os.path.abspath(base_dir)
list_dir(base_dir)
#Cldy Is right use os.path
for example os.path.walk will walk depth first through every directory below the argument, and return the files and folders in each directory
Use os.path.getsize to get the sizes and split to get the extensions. Store extensions in a list or dict and count them after going through each
If your are on Linux, I would suggest looking at du instead.
That's the module you need. And also this.

Mac to Windows Python

I have recently moved a set of near identical programs from my mac to my school's windows, and while the paths appear to be the same (or the tail end of them), they will not run properly.
import glob
import pylab
from pylab import *
def main():
outfnam = "igdata.csv"
fpout = open(outfnam, "w")
nrows = 0
nprocessed = 0
nbadread = 0
filenames = [s.split("/")[1] for s in glob.glob("c/Cmos6_*.IG")]
dirnames = "c an0 an1 an2 an3 an4".split()
for suffix in filenames:
nrows += 1
row = []
row.append(suffix)
for dirnam in dirnames:
fnam = dirnam+"/"+suffix
lines = [l.strip() for l in open(fnam).readlines()]
nprocessed += 1
if len(lines)<5:
nbadread += 1
print "warning: file %s contains only %d lines"%(fnam, len(lines))
tdate = "N/A"
irrad = dirnam
Ig_zeroVds_largeVgs = 0.0
else:
data = loadtxt(fnam, skiprows=5)
tdate = lines[0].split(":")[1].strip()
irrad = lines[3].split(":")[1].strip()
# pull out last column (column "-1") from second-to-last row
Ig_zeroVds_largeVgs = data[-2,-1]
row.append(irrad)
row.append("%.3e"%(Ig_zeroVds_largeVgs))
fpout.write(", ".join(row) + "\n")
print "wrote %d rows to %s"%(nrows, outfnam)
print "processed %d input files, of which %d had missing data"%( \
nprocessed, nbadread)`
This program worked fine for a mac, but for windows I keep getting for :
print "wrote %d rows to %s"%(nrows, outfnam)
print "processed %d input files, of which %d had missing data"%( \
nprocessed, nbadread)
wrote 0 row to file name
processed 0 input files, of which o had missing data
on my mac i go 144 row to file...
does any one have any suggestions?
If the script doesn't raise any errors, this piece of code is most likely returning an empty list.
glob.glob("c/Cmos6_*.IG")
Seeing as glob.glob works perfectly fine with forward slashes on Windows, the problem is most likely that it's not finding the files, which most likely means that the string you provided has an error somewhere in it. Make sure there isn't any error in "c/Cmos6_*.IG".
If the problem isn't caused by this, then unfortunately, I have no idea why it is happening.
Also, when I tried it, filenames returned by glob.glob have backslashes in them on Windows, so you should probably split by "\\" instead.
Off the top of my head, it looks like a problem of using / in the path. Windows uses \ instead.
os.path contains a number of functions to ease working with paths across platforms.
Your s.split("/") should definitely be s.split(os.pathsep). I got bitten by this, onceā€¦ :)
In fact, glob returns paths with \ on Windows and / on Mac OS X, so you need to do your splitting with the appropriate path separator (os.pathsep).

Categories