I am attempting to take a file name such as 'OP 40 856101.txt' from a directory, remove the .txt, set each single word to a specific variable, then reorder the filename based on a required order such as '856101 OP 040'. Below is my code:
import os
dir = 'C:/Users/brian/Documents/Moeller'
orig = os.listdir(dir) #original names of the files in the folder
for orig_name in orig: #This loop splits each file name into a list of stings containing each word
f = os.path.splitext(orig_name)[0]
sep = f.split() #Separation is done by a space
for t in sep: #Loops across each list of strings into an if statement that saves each part to a specific variable
#print(t)
if t.isalpha() and len(t) == 3:
wc = t
elif len(t) > 3 and len(t) < 6:
wc = t
elif t == 'OP':
op = t
elif len(t) >= 4:
pnum = t
else:
opnum = t
if len(opnum) == 2:
opnum = '0' + opnum
new_nam = '%s %s %s %s' % (pnum,op,opnum, wc) #This is the variable that contain the text for the new name
print("The orig filename is %r, the new filename is %r" % (orig_name, new_nam))
os.rename(orig_name, new_nam)
However I am getting an error with my last for loop where I attempt to rename each file in the directory.
FileNotFoundError: [WinError 2] The system cannot find the file specified: '150 856101 OP CLEAN.txt' -> '856101 OP 150 CLEAN'
The code runs perfectly until the os.rename() command, if I print out the variable new_nam, it prints out the correct naming order for all of the files in the directory. Seems like it cannot find the original file though to replace the filename to the string in new_nam. I assume it is a directory issue, however I am newer to python and can't seem to figure where to edit my code. Any tips or advice would be greatly appreciated!
Try this (just changed the last line):
os.rename(os.path.join(dir,orig_name), os.path.join(dir,new_nam))
You need to tell Python the actual path of the file to rename - otherwise, it looks only in the directory containing this file.
Incidentally, it's better not to use dir as a variable name, because that's the name of a built-in.
Related
I tried to solve some basic CS50 python problems and got stuck with this one (I know it can be done just by using ifs). Basically you need to print the file extension, so i split the filename to get ["filename", "extension"], and to this point it does work ok.
However no matter what i input it always print "error". Here's the code:
filename = input("Enter filename: ")
extension = filename.split(".")
match extension:
case ["jpg"]:
print("this is a jpg file")
case ["gif"]:
print("this is a gif file")
case _:
print("error")
I also tried some examples from different sites and it always prints the last case. Python version is 3.10.6. Any ideas?
extension is a list of the form ["filename", "extension"], it will never match a single-element list.
You need to match only on the second element:
case [_, "jpg"]:
print("this is a jpg file")
case [_, "gif"]:
print("this is a gif file")
However, if the filename includes multiple . then this will break because extension will have an unknown number of elements. This can be resolved in several ways:
Using *_ instead of _
Using os.path.splitext(filename)[-1] to get the extension (with the .) in the safest manner:
import os
extension = os.path.splitext(filename)[-1]
match extension:
case ".jpg":
print("this is a jpg file")
case ".gif":
print("this is a gif file")
case _:
print("error")
When you split your string, it creates a list to contain all the fragments.
extension = filename.split(".")
>>> print(type(extension), extension)
<class 'list'> ['hello', 'jpg']
You should select the specific string you are interested in. In this case, it would be the last element in the list, presumably. To achieve this, you could use:
extension[-1]
>>> print(type(extension[-1]), extension[-1])
<class 'str'> jpg
Issue: You are using split() which returns a list (e.g. ['data', 'gif']). You are comparing this with a list that contains just the file extension (e.g. ['gif']). This will result in False.
Assumption: The filename is valid and has an extension e.g. image.gif.
Solution 1 - Using index: In this case, the last element in the list is the extension so you can use this for matching. An index of -1 will return the last element in the list.
filename = input('Enter filename: ')
extension = filename.split('.')
match extension[-1]:
case 'jpg':
print('this is a jpg file')
case 'gif':
print('this is a gif file')
case _:
print('error')
Solution 2 - Using list: Alternatively, if you wish to use the result of split() directly then you can do so as follows:
filename = input('Enter filename: ')
extension = filename.split(".")
match extension:
case [*_, 'jpg']:
print('this is a jpg file')
case [*_, 'gif']:
print('this is a gif file')
case _:
print('error')
Note, we use *_ as there could be file names with multiple . thus resulting in a list with more than two items.
Match on Pathlib.suffix :
from pathlib import Path
filename = input("Enter filename: ")
extension = Path(filename).suffix
match extension:
case ".jpg":
print("this is a jpg file")
case ".gif":
print("this is a gif file")
case _:
print("error")
Yet another option is to use os.path.splitext on the file name - although it includes the dot:
import os
path = input("Enter path: ")
match os.path.splitext()[-1]:
case ".jpg":
print("jpg")
case ".png":
print("png")
If you want to also match double extensions like in data.tar.gz, you need to use str.split combined with str.join - additionally, this code also checks for filenames with a single leading dot (so not having an extension):
path = input("Enter path: ")
split = path.split(".")
if len(split) == 1 or os.path.split(path)[-1].startswith("."):
ext = ""
elif len(split) == 2:
ext = split[1]
else:
ext = split[-2] + "." + split[-1]
match ext:
case ".jpg":
print("jpg")
case ".png":
print("png")
So I'm still new to coding and I've got a mess of for loops, conditional if statements, and a couple of while loops. My code loops over files and depending on my input, it moves the files to a location matching my input. However, I would like to be able to prompt the code to simply print a list but not move onto the next file. I've tried placing it into a while loop but whenever the while loop is satisfied, it passes onto the next file.
while True:
try:
if "print df" in answer:
subset_folders_list = list()
for folder in all_folders_list:
if folder.startswith('A'):
subset_folders_list.append(folder)
df = pd.DataFrame(subset_folders_list, columns ['Folders'])
print(df)
But upon an input of "print-folders", it will print my dataframe and move onto the next file because the condition of the while loop is met. How can I get this code to print this dataframe without moving onto the next file. Note that this while loop is nested inside of another while loop inside of a function that is called inside of another loop that is inside of a function. But I think this is the only chunk of code I need to fix in order to implement this feature.
EDIT: other relevant code:
for filename in files_to_move:
counter += 1
matching_folders = list()
iterating = True
item_words = set(re.split('[. ,_-]', filename.lower()))
source_file_path = os.path.join(paths[0], filename)
all_folders_list = [g for g in os.listdir(paths[12]) if not g.startswith('.')]
#Matching the filename with a folder
for folder in all_folders_list:
count = 0
folder_words = folder.lower().split(' ')
for word in item_words:
if word in folder_words:
count += 1
if count >= 2:
matching_folders.append(folder)
#Multiple matching folders
if len(matching_folders) >= 2:
print("\n" + f"There is MORE than one folder for {filename}")
if not filename in files_to_move:
continue
while iterating:
try:
pass
answer_2 = input("\n" + f"MOVE IT TO ONE OR ELSEWHERE (type name of folder or print-subset for subset list)?: ")
item_words = answer_2.lower().split(' ')
if len(item_words) >= 2:
#Moving file to a matching or input folder
for folder in all_folders_list:
count = 0
folder_words = folder.lower().split(' ')
for word in item_words:
if word in folder_words:
count += 1
if count == 2:
folder_path = os.path.join(paths[12], folder)
destination_file_path = os.path.join(folder_path, filename)
shutil.move(source_file_path, destination_file_path)
print("\n" + f"File moved to {folder}")
print(FTG)
iterating = False
break
if len(item_words) < 2:
file_mover(source_file_path, paths, filename, answer_2, iterating, all_folders_list, matching_folders, x, counter)
iterating = False
break
So my function file_mover() has many other conditional if statements in it, I will skip them because they work, but the part I want to add is to be able to print the subset without moving onto the next file. Here is file_mover:
def file_mover(source_file_path, paths, filename, answer_2, iterating, all_folders_list, matching_folders, x, counter):
FTG = str("\n" + str(x - counter) + " files to go")
while iterating:
if "exit" in answer_2:
print()
iterating = False
sys.exit()
if "pass" in answer_2:
print("Moving on to NEXT file" + "\n")
print(FTG)
iterating = False
pass
if "del" in answer_2:
shutil.move(source_file_path, os.path.join(paths[15], filename))
print(f"File moved to DELETE folder" + "\n")
print(FTG)
iterating = False
#This is where I want the ability to just print something given the input "print-subset", or something like that, and have it re-prompt me for an input.
else:
break
The reason I want to add this is because there is a subset of non-matching folders that sometimes I want to move the files to, and I don't always remember what they are or I type them in wrong. Now I've got an error catcher in my main while loop, so that could easily function as a workaround for typing in the folder name wrong, but I'm new to coding and this would be good practice. I've tried a lot of things and I can get it to print over and over again endlessly or I can get it to print one time but it moves onto the next file. I don't want either of those to happen. I'd like the subset to print once and have it prompt me for an input again for the same file. I can also try to implement this in the len(item_words) >= 2 section and do the input as "print subset", but I might also run into the same problems.
I am writing a simple Python script to tell me file size for a set of documents which I am importing from a CSV. I verified that none of the entries are over 100 characters, so this error "ValueError: scandir: path too long for Windows" does not make sense to me.
Here is my code:
# determine size of a given folder in MBytes
import os, subprocess, json, csv, platform
# Function to check if a Drive Letter exists
def hasdrive(letter):
return "Windows" in platform.system() and os.system("vol %s: 2>nul>nul" % (letter)) == 0
# Define Drive to check for
letter = 'S'
# Check if Drive doesnt exist, if not then map drive
if not hasdrive(letter):
subprocess.call(r'net use s: /del /Y', shell=True)
subprocess.call(r'net use s: \\path_to_files', shell=True)
list1 = []
# Import spreadsheet to calculate size
with open('c:\Temp\files_to_delete_subset.csv') as f:
reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE)
for row in reader:
list1.extend(row)
# Define variables
folder = "S:"
folder_size = 0
# Exporting outcome
for list1 in list1:
folder = folder + str(list1)
for root, dirs, files in os.walk(folder):
for name in files:
folder_size += os.path.getsize(os.path.join(root, name))
print(folder)
# print(os.path.join(root, name) + " " + chr(os.path.getsize(os.path.join(root, name))))
print(folder_size)
From my understanding the max path size in Windows is 260 characters, so 1 driver letter + 100 character path should NOT exceed the Windows max.
Here is an example of a path: '/Document/8669/CORRESP/1722165.doc'
The folder string you're trying to walk is growing forever. Simplifying the code to the problem area:
folder = "S:"
# Exporting outcome
for list1 in list1:
folder = folder + str(list1)
You never set folder otherwise, so it starts out as S:<firstpath>, then on the next loop it's S:<firstpath><secondpath>, then S:<firstpath><secondpath><thirdpath>, etc. Simple fix: Separate drive from folder:
drive = "S:"
# Exporting outcome
for path in list1:
folder = drive + path
Now folder is constructed from scratch on each loop, throwing away the previous path, rather than concatenating them.
I also gave the iteration value a useful name (and removed the str call, because the values should all be str already).
I was writing a little program that finds all files with given prefix, let's say 'spam' for this example, in a folder and locates gaps in numbering and renames subsequent folders to fill the gap. Below illustrates a portion of the program that locates the files using a regex and renames it:
prefix = 'spam'
newNumber = 005
# Regex for finding files with specified prefix + any numbering + any file extension
prefixRegex = re.compile(r'(%s)((\d)+)(\.[a-zA-Z0-9]+)' % prefix)
# Rename file by keeping group 1 (prefix) and group 4 (file extension),
# but substituting numbering with newNumber
newFileName = prefixRegex.sub(r'\1%s\4' % newNumber, 'spam006.txt')
What I was expecting from above was spam005.txt, but instead I got #5.txt
I figured out I could use r'%s%s\4' % (prefix, newNumber) instead and then it does work as intended, but I'd like to understand why this error is happening. Does it have something to do with the %s used during re.compile()?
There are two problems here:
Your newNumber needs to be a string if you want it to be 005 as the first two 0 are dropped when it is being interpreted as an integer.
Your next problem is indeed in your substitution. By using the string formating you effectively create the new regexp \15\4 (see the 5 in there, that was your newNumber). When python sees this it tries to get capturing group 15 and not group 1 followed by a literal 5. You can enclose the reference in a g like this to get your desired behavior: \g<1>5\4
So your code needs to be changed to this:
prefix = 'spam'
newNumber = '005'
# Regex for finding files with specified prefix + any numbering + any file extension
prefixRegex = re.compile(r'(%s)((\d)+)(\.[a-zA-Z0-9]+)' % prefix)
# Rename file by keeping group 1 (prefix) and group 4 (file extension),
# but substituting numbering with newNumber
newFileName = prefixRegex.sub(r'\g<1>%s\4' % newNumber, 'spam006.txt')
More information about the \g<n> behavior can be found at the end of the re.sub doucmentation
I have recently moved a set of near identical programs from my mac to my school's windows, and while the paths appear to be the same (or the tail end of them), they will not run properly.
import glob
import pylab
from pylab import *
def main():
outfnam = "igdata.csv"
fpout = open(outfnam, "w")
nrows = 0
nprocessed = 0
nbadread = 0
filenames = [s.split("/")[1] for s in glob.glob("c/Cmos6_*.IG")]
dirnames = "c an0 an1 an2 an3 an4".split()
for suffix in filenames:
nrows += 1
row = []
row.append(suffix)
for dirnam in dirnames:
fnam = dirnam+"/"+suffix
lines = [l.strip() for l in open(fnam).readlines()]
nprocessed += 1
if len(lines)<5:
nbadread += 1
print "warning: file %s contains only %d lines"%(fnam, len(lines))
tdate = "N/A"
irrad = dirnam
Ig_zeroVds_largeVgs = 0.0
else:
data = loadtxt(fnam, skiprows=5)
tdate = lines[0].split(":")[1].strip()
irrad = lines[3].split(":")[1].strip()
# pull out last column (column "-1") from second-to-last row
Ig_zeroVds_largeVgs = data[-2,-1]
row.append(irrad)
row.append("%.3e"%(Ig_zeroVds_largeVgs))
fpout.write(", ".join(row) + "\n")
print "wrote %d rows to %s"%(nrows, outfnam)
print "processed %d input files, of which %d had missing data"%( \
nprocessed, nbadread)`
This program worked fine for a mac, but for windows I keep getting for :
print "wrote %d rows to %s"%(nrows, outfnam)
print "processed %d input files, of which %d had missing data"%( \
nprocessed, nbadread)
wrote 0 row to file name
processed 0 input files, of which o had missing data
on my mac i go 144 row to file...
does any one have any suggestions?
If the script doesn't raise any errors, this piece of code is most likely returning an empty list.
glob.glob("c/Cmos6_*.IG")
Seeing as glob.glob works perfectly fine with forward slashes on Windows, the problem is most likely that it's not finding the files, which most likely means that the string you provided has an error somewhere in it. Make sure there isn't any error in "c/Cmos6_*.IG".
If the problem isn't caused by this, then unfortunately, I have no idea why it is happening.
Also, when I tried it, filenames returned by glob.glob have backslashes in them on Windows, so you should probably split by "\\" instead.
Off the top of my head, it looks like a problem of using / in the path. Windows uses \ instead.
os.path contains a number of functions to ease working with paths across platforms.
Your s.split("/") should definitely be s.split(os.pathsep). I got bitten by this, onceā¦ :)
In fact, glob returns paths with \ on Windows and / on Mac OS X, so you need to do your splitting with the appropriate path separator (os.pathsep).