Searching and deleting directories in python I made a big mistake

Searching and deleting directories in python I made a big mistake - python

enter code here
import os
os.chdir('I:\\Movies')
files = os.popen('dir').readlines()
disk = raw_input("Enter the disk: ")
while disk != "done":
os.chdir(disk + ':\\' + 'Movies')
files_in_disk = os.popen('dir').readlines()
for each_file in files_in_disk:
for item in files:
if ' '.join(each_file.split()[3:]) in item:
each_file = ' '.join(each_file.split()[3:])
os.system('rmdir /q /s ' + '"' + each_file + '"')
break
disk = raw_input("Enter the disk: ")
I had two copies of the same movies on two different drives, I wrote this script to delete one of the copy. But on E drive it erased nearly all of my files, why did this happen can someone please point out my mistake.

I think something here is not doing what you expect:
if ' '.join(each_file.split()[3:]) in item:
If a any file is has less than 4 space-delimited parts, the first bit of the if will be the empty string, and this will return true.
The problem is your loop. For each file in the E:\Movies, it checks if any file in I:\Movies matches (well, everything past the third word). If one of the files in I:\Movies happens to have less than 4 words (not entirely implausible), then the if will be true on every run.
I'm not sure what the intent is here, but this is my best guess as to what may be causing the problem.

Your mistake was not initially running this program with a print each_file statement rather than immediately jumping to the rmdir command.
Though this may read as a snarky answer, it is truly meant to be helpful. Whenever making irreversible changes (like deleting items from a file system or DB), one should always take some step to verify that the appropriate instructions are being generated/executed.

According to Microsft's TechNet article about rmdir:
/s : Removes the specified directory and all subdirectories including any files. Use /s to remove a tree.
So, if according to the other answers it is possible to supply non-matching file paths to rmdir it is not very difficult to delete whole subtrees on the disk. Especially if the list of files also contains subdirectories that point to parent subdirectories (for instance i:\movies..), you could be in a world of hurt in cases like that.
But I don't have access to a Windows machine with Python installed to prove it.

Related

Why is os.rename not renaming files when the only difference is capitalization?

Why is this basic rename script not doing what it should?
Just trying to capitalize first letter of each word.
import glob
import os
for filename in glob.glob("**/*.mp3", recursive = True):
withcap = str(filename).title()
print("nc " +(filename))
print("wc " +(withcap))
os.rename(filename, withcap)
The output from the print is correct but nothing happens at os.rename?
output:
nc BLOOD COMMAND - Return Of The Arsonist [Clean].mp3
wc Blood Command - Return Of The Arsonist [Clean].Mp3

This can happen if you're on an operating system with a case-insensitive filesystem -- like Windows -- where both original and destination names already show us as both existing and pointing to the same file.
A workaround is simply to rename through a temporary name that differs in more than case:
for filename in glob.glob("**/*.mp3", recursive = True):
withcap = str(filename).title()
os.rename(filename, withcap+'.tmp')
os.rename(withcap+'.tmp', withcap)

As mentioned in the comments, the cause is that you are working with a file system that is case insensitive. It considers the old and new name to be the same, so the "rename" becomes a no-op.
You'll have to do two renames for each file: first to a different, temporary name; then to the actual name with modified capitalization.

Iterating through subdirectories to add unique strings to each file

My goal: To build a program that:
Opens a folder (provided by the user) from the user's computer
Iterates through that folder, opening each document in each subdirectory (named according to language codes; "AR," "EN," "ES," etc.)
Substitutes a string in for another string in each document. Crucially, the new string will change with each document (though the old string will not), according to the language code in the folder name.
My level of experience: Minimal; been learning python for a few months but this is the first program I'm building that's not paint-by-numbers. I'm building it to make a process at work faster. I'm sure I'm not building this as efficiently as possible; I've been throwing it together from my own knowledge and from reading stackexchange religiously while building it.
Research I've done on my own: I've been living in stackexchange the past few days, but I haven't found anyone doing quite what I'm doing (which was very surprising to me). I'm not sure if this is just because I lack the vocabulary to search (tried out a lot of search terms, but none of them totally match what I'm doing) or if this is just the wrong way of going about things.
The issue I'm running into:
I'm getting this error:
Traceback (most recent call last):
File "test5.py", line 52, in <module>
for f in os.listdir(src_dir):
OSError: [Errno 20] Not a directory: 'ExploringEduTubingEN(1).txt'
I'm not sure how to iterate through every file in the subdirectories and update a string within each file (not the file names) with a new and unique string. I thought I had it, but this error has totally thrown me off. Prior to this, I was getting an error for the same line that said "Not a file or directory: 'ExploringEduTubingEN(1).txt'" and it's surprising to me that the first error could request a file or a directory, and once I fixed that, it asked for just a directory; seems like it should've just asked for a directory at the beginning.
With no further ado, the code (placing at bottom because it's long to include context):
import os
ex=raw_input("Please provide an example PDF that we'll append a language code to. ")
#Asking for a PDF to which we'll iteratively append the language codes from below.
lst = ['_ar.pdf', '_cs.pdf', '_de.pdf', '_el.pdf', '_en_gb.pdf', '_es.pdf', '_es_419.pdf',
'_fr.pdf', '_id.pdf', '_it.pdf', '_ja.pdf', '_ko.pdf', '_nl.pdf', '_pl.pdf', '_pt_br.pdf', '_pt_pt.pdf', '_ro.pdf', '_ru.pdf',
'_sv.pdf', '_th.pdf', '_tr.pdf', '_vi.pdf', '_zh_tw.pdf', '_vn.pdf', '_zh_cn.pdf']
#list of language code PDF appending strings.
pdf_list=open('pdflist.txt','w+')
#creating a document to put this group of PDF filepaths in.
pdf2='pdflist.txt'
#making this an actual variable.
for word in lst:
pdf_list.write(ex + word + "\n")
#creating a version of the PDF example for every item in the language list, and then appending the language codes.
pdf_list.seek(0)
langlist=pdf_list.readlines()
#creating a list of the PDF paths so that I can use it below.
for i in langlist:
i=i.rstrip("\n")
#removing the line breaks.
pdf_list.close()
#closing the file after removing the line breaks.
file1=raw_input("Please provide the full filepath of the folder you'd like to convert. ")
#the folder provided by the user to iterate through.
folder1=os.listdir(file1)
#creating a list of the files within the folder
pdfpath1="example.pdf"
langfile="example2.pdf"
#setting variables for below
#my thought here is that i'd need to make the variable the initial folder, then make it a list, then iterate through the list.
for ogfile in folder1:
#want to iterate through all the files in the directory, including in subdirectories
src_dir=ogfile.split("/",6)
src_dir="/".join(src_dir[:6])
#goal here is to cut off the language code folder name and then join it again, w/o language code.
for f in os.listdir(src_dir):
f = os.path.join(src_dir, f)
#i admit this got a little convoluted–i'm trying to make sure the files put the right code in, I.E. that the document from the folder ending in "AR" gets the PDF that will now end in "AR"
#the perils of pulling from lots of different questions in stackexchange
with open(ogfile, 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
for langfile in langlist:
f.write(content.replace(pdfpath1, langfile))
#replacing the placeholder PDF link with the created PDF links from the beginning of the code
If you read this far, thanks. I've tried to provide as much information as possible, especially about my thought process. I'll keep trying things and reading, but I'd love to have more eyes on it.

You have to specify the full path to your directories/files. Use os.path.join to create a valid path to your file or directory (and platform-independent).
For replacing your string, simply modify your example string using the subfolder name. Assuming that ex as the format filename.pdf, you could use: newstring = ex[:-4] + '_' + str.lower(subfolder) + '.pdf'. That way, you do not have to specify the list of replacement strings nor loop through this list.
Solution
To iterate over your directory and replace the content of your files as you'd like, you can do the following:
# Get the name of the file: "example.pdf" (note the .pdf is assumed here)
ex=raw_input("Please provide an example PDF that we'll append a language code to. ")
# Get the folder to go through
folderpath=raw_input("Please provide the full filepath of the folder you'd like to convert. ")
# Get all subfolders and go through them (named: 'AR', 'DE', etc.)
subfolders=os.listdir(folderpath)
for subfolder in subfolders:
# Get the full path to the subfolder
fullsubfolder = os.path.join(folderpath,subfolder)
# If it is a directory, go through it
if os.path.isdir(fullsubfolder):
# Find all files in subdirectory and go through each of them
files = os.listdir(fullsubfolder)
for filename in files:
# Get full path to the file
fullfile = os.path.join(fullsubfolder, filename)
# If it is a file, process it (note: we do not check if it is a text file here)
if os.path.isfile(fullfile):
with open(fullfile, 'r+') as f:
content = f.read()
f.seek(0)
f.truncate()
# Create the replacing string based on the subdirectory name. Ex: 'example_ar.pdf'
newstring = ex[:-4] + '_' + str.lower(subfolder) + '.pdf'
f.write(content.replace(ex, newstring))
Note
Instead of asking the user to find write the folder, you could ask him to open the directory with a dialog box. See this question for more info: Use GUI to open directory in Python 3

"Batch" renaming one file at a time in Python

I would like to perform a sort of "manual" batch operation where Python looks in a directory, sees a list of files, then automatically displays them one at a time and waits for user input before moving on to the next file. I am going to assume the files have relatively random names (and the order in which Python chooses to display them doesn't really matter).
So, I might have pic001.jpg and myCalendar.docx. Is there a way to have Python move through these (in any order) so that I can prepend something to each one manually? For instance, it could look like
Please type a prefix for each of the following:
myCalendar.docx:
and when I typed "2014" the file would become 2014_myCalendar.docx. Python would then go on to say
Please type a prefix for each of the following:
myCalendar.docx: 2014
... myCalendar.docx renamed to 2014_myCalendar.docx
pic001.jpg:
then I could make it disneyland_pic001.jpg.
I know how to rename files, navigate directories, etc. I'm just not sure how to get Python to cycle through every file in a certain directory, one at a time, and let me modify each one. I think this would be really easy to do with a for loop if each of the files was numbered, but for what I'm trying to do, I can't assume that they will be.
Thank you in advance.
Additionally, if you could point me to some tutorials or documentation that might help me with this, I'd appreciate that as well. I've got http://docs.python.org open in a few tabs, but as someone who's relatively new to Python, and programming in general, I find their language to be a little over my head sometimes.

Something like this (untested):
DIR = '/Volumes/foobar'
prefix = raw_input('Please type a prefix for each of the following: ')
for f in os.listdir(DIR):
path = os.path.join(DIR, f)
new_path = os.path.join(DIR, '%s%s' % (prefix, f))
try:
os.rename(path, new_path)
print 'renamed', f
except:
raise

Python File System Reader Performance

I need to scan a file system for a list of files, and log those who don't exist. Currently I have an input file with a list of the 13 million files which need to be investigated. This script needs to be run from a remote location, as I do not have access/cannot run scripts directly on the storage server.
My current approach works, but is relatively slow. I'm still fairly new to Python, so I'm looking for tips on speeding things up.
import sys,os
from pz import padZero #prepends 0's to string until desired length
output = open('./out.txt', 'w')
input = open('./in.txt', 'r')
rootPath = '\\\\server\share\' #UNC path to storage
for ifid in input:
ifid = padZero(str(ifid)[:-1], 8) #extracts/formats fileName
dir = padZero(str(ifid)[:-3], 5) #exracts/formats the directory containing the file
fPath = rootPath + '\\' + dir + '\\' + ifid + '.tif'
try:
size = os.path.getsize(fPath) #don't actually need size, better approach?
except:
output.write(ifid+'\n')
Thanks.

dirs = collections.defaultdict(set)
for file_path in input:
file_path = file_path.rjust(8, "0")
dir, name = file_path[:-3], file_path
dirs[dir].add(name)
for dir, files in dirs.iteritems():
for missing_file in files - set(glob.glob("*.tif")):
print missing_file
Explanation
First read the input file into a dictionary of directory: filename. Then for each directory, list all the TIFF files in that directory on the server, and (set) subtract this from the collection of filenames you should have. Print anything that's left.
EDIT: Fixed silly things. Too late at night when I wrote this!

That padZero and string concatenation stuff looks to me like it would take a good percent of time.
What you want it to do is spend all its time reading the directory, very little else.
Do you have to do it in python? I've done similar stuff in C and C++. Java should be pretty good too.

You're going to be I/O bound, especially on a network, so any changes you can make to your script will result in very minimal speedups, but off the top of my head:
import os
input, output = open("in.txt"), open("out.txt", "w")
root = r'\\server\share'
for fid in input:
fid = fid.strip().rjust(8, "0")
dir = fid[:-3] # no need to re-pad
path = os.path.join(root, dir, fid + ".tif")
if not os.path.isfile(path):
output.write(fid + "\n")
I don't really expect that to be any faster, but it is arguably easier to read.
Other approaches may be faster. For example, if you expect to touch most of the files, you could just pull a complete recursive directory listing from the server, convert it to a Python set(), and check for membership in that rather than hitting the server for many small requests. I will leave the code as an exercise...

I would probably use a shell command to get the full listing of files in all directories and subdirectories in one hit. Hopefully this will minimise the amount of requests you need to make to the server.
You can get a listing of the remote server's files by doing something like:
Linux: mount the shared drive as /shared/directory/ and then do ls -R /shared/directory > ~/remote_file_list.txt
Windows: Use Map Network Drive to mount the shared drive as drive letter X:, then do dir /S X:/shared_directory > C:/remote_file_list.txt
Use the same methods to create a listing of your local folder's contents as local_file_list.txt. You python script will then reduce to an exercise in text processing.
Note: I did actually have to do this at work.

Detecting case mismatch on filename in Windows (preferably using python)?

I have some xml-configuration files that we create in a Windows environment but is deployed on Linux. These configuration files reference each other with filepaths. We've had problems with case-sensitivity and trailing spaces before, and I'd like to write a script that checks for these problems. We have Cygwin if that helps.
Example:
Let's say I have a reference to the file foo/bar/baz.xml, I'd do this
<someTag fileref="foo/bar/baz.xml" />
Now if we by mistake do this:
<someTag fileref="fOo/baR/baz.Xml " />
It will still work on Windows, but it will fail on Linux.
What I want to do is detect these cases where the file reference in these files don't match the real file with respect to case sensitivity.

os.listdir on a directory, in all case-preserving filesystems (including those on Windows), returns the actual case for the filenames in the directory you're listing.
So you need to do this check at each level of the path:
def onelevelok(parent, thislevel):
for fn in os.listdir(parent):
if fn.lower() == thislevel.lower():
return fn == thislevel
raise ValueError('No %r in dir %r!' % (
thislevel, parent))
where I'm assuming that the complete absence of any case variation of a name is a different kind of error, and using an exception for that; and, for the whole path (assuming no drive letters or UNC that wouldn't translate to Windows anyway):
def allpathok(path):
levels = os.path.split(path)
if os.path.isabs(path):
top = ['/']
else:
top = ['.']
return all(onelevelok(p, t)
for p, t in zip(top+levels, levels))
You may need to adapt this if , e.g., foo/bar is not to be taken to mean that foo is in the current directory, but somewhere else; or, of course, if UNC or drive letters are in fact needed (but as I mentioned translating them to Linux is not trivial anyway;-).
Implementation notes: I'm taking advantage of the fact that zip just drop "extra entries" beyond the length of the shortest of the sequences it's zipping; so I don't need to explicitly slice off the "leaf" (last entry) from levels in the first argument, zip does it for me. all will short circuit where it can, returning False as soon as it detects a false value, so it's just as good as an explicit loop but faster and more concise.

it's hard to judge what exactly your problem is, but if you apply os.path.normcase along with str.stript before saving your file name, it should solve all your problems.
as I said in comment, it's not clear how are you ending up with such a mistake. However, it would be trivial to check for existing file, as long as you have some sensible convention (all file names are lower case, for example):
try:
open(fname)
except IOError:
open(fname.lower())

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.