I need to move a directory from one location to another location on the same filesystem. I'm aware of solutions like shutil.move(), but the filesystem in question is an SD card (and therefore extremely slow), and there are a lot of files to move, so simply copying them and then deleting the originals is not acceptable. The Unix mv command can move a directory from one filesystem to the same filesystem without copying any files -- is there a way to do that in Python?
It turns out that the answer is yes. As you probably know, you can move a file (assuming it doesn't already exist in the destination) using os.rename(r'D:\path1\myfile.txt', r'D:\path2\myfile.txt'). You can do the same for directories:
os.rename(r'D:\long\path\to\mydir', r'D:\mydir')
but, of course, that only works if D:\mydir doesn't already exist. If it does exist, and you want to merge the files that are already there with the files that you're moving, you'll need to get a little bit more clever. Here's a snippet that'll do what you want:
def movedir(src, dst):
try:
os.rename(src, dst)
return
except FileExistsError:
pass
for root, dirs, files in os.walk(src):
dest_root = os.path.join(dst, os.path.relpath(root, src))
done = []
for dir_ in dirs:
try:
os.rename(os.path.join(root, dir_), os.path.join(dest_root, dir_))
done.append(dir_)
except FileExistsError:
pass
for dir_ in done:
dirs.remove(dir_)
for file in files:
os.replace(os.path.join(root, file), os.path.join(dest_root, file))
for root, dirs, files in os.walk(src, topdown=False):
os.rmdir(root)
Here's a version with comments explaining what everything does:
def movedir(src, dst):
# if a directory of the same name does not exist in the destination, we can simply rename the directory
# to a different path, and it will be moved -- it will disappear from the source path and appear in the destination
# path instantaneously, without any files being copied.
try:
os.rename(src, dst)
return
except FileExistsError:
# if a directory of the same name already exists, we must merge them. This is what the algorithm below does.
pass
for root, dirs, files in os.walk(src):
dest_root = os.path.join(dst, os.path.relpath(root, src))
done = []
for dir_ in dirs:
try:
os.rename(os.path.join(root, dir_), os.path.join(dest_root, dir_))
done.append(dir_)
except FileExistsError:
pass
# tell os.walk() not to recurse into subdirectories we've already moved. see the documentation on os.walk()
# for why this works: https://docs.python.org/3/library/os.html#os.walk
# lists can't be modified during iteration, so we have to put all the items we want to remove from the list
# into a second list, and then remove them after the loop.
for dir_ in done:
dirs.remove(dir_)
# move files. os.replace() is a bit like os.rename() but if there's an existing file in the destination with
# the same name, it will be deleted and replaced with the source file without prompting the user. It doesn't
# work on directories, so we only use it for files.
# You may want to change this to os.rename() and surround it with a try/except FileExistsError if you
# want to prompt the user to overwrite files.
for file in files:
os.replace(os.path.join(root, file), os.path.join(dest_root, file))
# clean up after ourselves.
# Directories we were able to successfully move just by renaming them (directories that didn't exist in the
# destination already) have already disappeared from the source. Directories we had to merge are still there in
# the source, but their contents were moved. os.rmdir() will fail unless the directory is already empty.
for root, dirs, files in os.walk(src, topdown=False):
os.rmdir(root)
movedir(r'D:\long\path\to\mydir', r'D:\mydir')
Please note that using os.rename() in this manner only works if the source path and the destination path are on the same filesystem (on Windows, this is true if they have the same drive letter). If they're on different drive letters (i.e. one is on C: and the other is on D:) or if one of the paths contains a reparse point (if you don't know what that is, don't worry about it, you'll probably never encounter one), you will need to use shutil.move(), which copies the files and then deletes them from the source -- this what Windows does when you move files between drives, and it takes about as long to finish.
Related
I copied a (presumably large) number of files on to an existing directory, and I need to reverse the action. The targeted directory contains a number of other files, that I need to keep there, which makes it impossible to simply remove all files from the directory. I was able to do it with Python. Here's the script:
import os, sys, shutil
source = "/tmp/test/source"
target = "/tmp/test/target"
for root, dirs, files in os.walk(source): # for files and directories in source
for dir in dirs:
if dir.startswith("."):
print(f"Removing Hidden Directory: {dir}")
else:
print(f"Removing Directory: {dir}")
try:
shutil.rmtree(f"{target}/{dir}") # remove directories and sub-directories
except FileNotFoundError:
pass
for file in files:
if file.startswith("."): # if filename starts with a dot, it's a hidden file
print(f"Removing Hidden File: {file}")
else:
print(f"Removing File: {file}")
try:
os.remove(f"{target}/{file}") # remove files
except FileNotFoundError:
pass
print("Done")
The script above looks in the original (source) directory and lists those files. Then it looks into the directory you copied the files to(target), and removes only the listed files, as they exist in the source directory.
How can I do the same thing in Go? I tried filepath.WalkDir(), but as stated in the docs:
WalkDir walks the file tree rooted at root, calling fn for each file
or directory in the tree, including root.
If WalkDir() includes the root, then os.Remove() or os.RemoveAll() will delete the whole thing.
Answered by Cerise Limon. Use os.ReadDir to read source the directory entries. For each entry, os.RemoveAll the corresponding target file
I would like to have a Folders_To_Skip.txt file with a list of directories separated by new lines
ex:
A:\\stuff\a\b\
A:\\junk\a\b\
I have files which are breaking my .csv record compiling that this is used for and I want to exclude directories which I have no use for reading anyway.
In the locate function I have what I tried to implement from Excluding directories in os.walk but I can't seem to get it to work with directories in a list let alone while reading from a text file list as when I print files accessed it still includes files in the directories I attempted to exclude.
Could you also explain whether the solution would be specific excluded directories (not the end of the world) or if it can be operated to exclude subdirectories (would be more convenient).
Right now the code preceding locate allows for easy lookup of controlling text files and then loading those items in as lists for the rest of the script to run, with the assumption that all control files are in the same location but that location can change based on who is running the script and from where.
Also for testing purposes Drive_Locations.txt is setup as:
A
B
Here is the current script:
import os
from tkinter import filedialog
import fnmatch
input('Press Enter to select any file in writing directory or associated control files...')
fname = filedialog.askopenfilename()
fpath = os.path.split(fname)
# Set location for Drive Locations to scan
Disk_Locations = os.path.join(fpath[0], r'Drive_Locations.txt')
# Set location for Folders to ignore such as program files
Ignore = os.path.join(fpath[0], r'Folders_To_Skip.txt')
# Opens list of Drive Locations to be sampled
with open(Disk_Locations, 'r') as Drives:
Drive = Drives.readlines()
Drive = [x.replace('\n', '') for x in Drive]
# Iterable list for directories to be excluded
with open(Ignore, 'r') as SkipF1:
Skip_Fld = SkipF1.readlines()
Skip_Fld = [x.replace('\n', '') for x in Skip_Fld]
# Locates file in entire file tree from previously established parent directory.
def locate(pattern, root=os.curdir):
for path, dirs, files in os.walk(os.path.abspath(root), topdown=True):
dirs[:] = [d for d in dirs if d not in Skip_Fld]
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)
for disk in Drive:
# Formats Drive Location for acceptance
disk = str.upper(disk)
if str.find(disk, ':') < 0:
disk = disk + ':'
# Changes the current disk drive
if os.path.exists(disk):
os.chdir(disk)
# If disk incorrect skip to next disk
else:
continue
for exist_csv in locate('*.csv'):
# Skip compiled record output files in search
print(exist_csv)
The central bug here is that os.walk() returns a list of relative directory names. So for example when you are in the directory A:\stuff\a, the directory you want to skip is simply listed as b, not as A:\stuff\a\b; and so of course your skip logic doesn't find anything to remove from the list of subdirectories in the current directory.
Here's a refactoring which examines the current directory directly instead.
for path, dirs, files in os.walk(os.path.abspath(root), topdown=True):
if path not in Skip_Fld:
for filename in fnmatch.filter(files, pattern):
yield os.path.join(path, filename)
The abspath call is important to keep; good on you for including that in your attempt.
Your list of directories to skip should have single backslashes, or perhaps forward slashes, and probably no final directory separator (I fortunately have no way to check how these are reported by os.walk() on Windows).
I have this script, which I have no doubt is flawed:
import fnmatch, os, sys
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
with open(filename) as f:
s = f.read()
f.close()
if find in s :
print(filename)
findit(sys.argv[1], sys.argv[2], sys.argv[3])
when I run it I get Errno2, no such file or directory. BUT the file exists. For instance if I execute it by going: findit.py c:\python "folder" *.py it will work just fine, listing all the *.py files which contain the word "folder". BUT if I go findit.py c:\php\projects1 "include" *.php
as an example I get [Errno2] no such file or directory: 'About.php' (for example). But About.php exists. I don't understand what it's doing, or what I'm doing wrong.
If you look at any of the examples for os.walk, you'll see that they all do os.path.join(root, name). You need to do that too.
Why? Quoting from the docs:
filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
If you just use the filename as a path, it's going to look for a file of the same name in the current working directory. If there's no such file, you'll get a FileNotFoundError. If there is such a file, you'll open and read the wrong file. Only if you happen to be looking inside the current working directory will it work.
There's also another major problem in your code: os.walk walks a directory tree recursively, finding all files in the given top directory, or any subdirectory of top, or any subdirectory of… and so on, yielding once for each directory. But you're not doing anything useful with that (except printing out the folders). Instead, you wait until it finishes, and then use the files from whichever directory it happened to reach last.
If you just want to get a flat listing of the files directly in a directory, use os.listdir, not os.walk. (Or maybe use glob.glob instead of explicitly listing everything then filtering with fnmatch.)
On the other hand, if you want to walk the tree, you have to move your second for loop inside the first one.
You've also got a minor problem: You call f.close() inside a with open(…) as f:, which leads to f being closed twice. This is guaranteed to be completely harmless (at least in 2.5+, including 3.x), but it's still a bad idea.
Putting it together, here's a working version of your code:
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
pathname = os.path.join(folder, filename)
with open(pathname) as f:
s = f.read()
if find in s:
print(pathname)
You are using a relative filename. But your current directory does not contain the file. And you don't want to search there anyway. Use os.path.join(folder, filename) to make an absolute path.
I'm uploading a zipped folder that contains a folder of text files, but it's not detecting that the folder that is zipped up is a directory. I think it might have something to do with requiring an absolute path in the os.path.isdir call, but can't seem to figure out how to implement that.
zipped = zipfile.ZipFile(request.FILES['content'])
for libitem in zipped.namelist():
if libitem.startswith('__MACOSX/'):
continue
# If it's a directory, open it
if os.path.isdir(libitem):
print "You have hit a directory in the zip folder -- we must open it before continuing"
for item in os.listdir(libitem):
The file you've uploaded is a single zip file which is simply a container for other files and directories. All of the Python os.path functions operate on files on your local file system which means you must first extract the contents of your zip before you can use os.path or os.listdir.
Unfortunately it's not possible to determine from the ZipFile object whether an entry is for a file or directory.
A rewrite or your code which does an extract first may look something like this:
import tempfile
# Create a temporary directory into which we can extract zip contents.
tmpdir = tempfile.mkdtemp()
try:
zipped = zipfile.ZipFile(request.FILES['content'])
zipped.extractall(tmpdir)
# Walk through the extracted directory structure doing what you
# want with each file.
for (dirpath, dirnames, filenames) in os.walk(tmpdir):
# Look into subdirectories?
for dirname in dirnames:
full_dir_path = os.path.join(dirpath, dirname)
# Do stuff in this directory
for filename in filenames:
full_file_path = os.path.join(dirpath, filename)
# Do stuff with this file.
finally:
# ... Clean up temporary diretory recursively here.
Usually to make things handle relative paths etc when running scripts you'd want to use os.path.
It seems to me that you're reading from a Zipfile the items you've not actually unzipped it so why would you expect the file/dirs to exist?
Usually I'd print os.getcwd() to find out where I am and also use os.path.join to join with the root of the data directory, whether that is the same as the directory containing the script I can't tell. Using something like scriptdir = os.path.dirname(os.path.abspath(__file__)).
I'd expect you would have to do something like
libitempath = os.path.join(scriptdir, libitem)
if os.path.isdir(libitempath):
....
But I'm guessing at what you're doing as it's a little unclear for me.
On a mac in python 2.7 when walking through directories using os.walk my script goes through 'apps' i.e. appname.app, since those are really just directories of themselves. Well later on in processing I am hitting errors when going through them. I don't want to go through them anyways so for my purposes it would be best just to ignore those types of 'directories'.
So this is my current solution:
for root, subdirs, files in os.walk(directory, True):
for subdir in subdirs:
if '.' in subdir:
subdirs.remove(subdir)
#do more stuff
As you can see, the second for loop will run for every iteration of subdirs, which is unnecessary since the first pass removes everything I want to remove anyways.
There must be a more efficient way to do this. Any ideas?
You can do something like this (assuming you want to ignore directories containing '.'):
subdirs[:] = [d for d in subdirs if '.' not in d]
The slice assignment (rather than just subdirs = ...) is necessary because you need to modify the same list that os.walk is using, not create a new one.
Note that your original code is incorrect because you modify the list while iterating over it, which is not allowed.
Perhaps this example from the Python docs for os.walk will be helpful. It works from the bottom up (deleting).
# Delete everything reachable from the directory named in "top",
# assuming there are no symbolic links.
# CAUTION: This is dangerous! For example, if top == '/', it
# could delete all your disk files.
import os
for root, dirs, files in os.walk(top, topdown=False):
for name in files:
os.remove(os.path.join(root, name))
for name in dirs:
os.rmdir(os.path.join(root, name))
I am a bit confused about your goal, are you trying to remove a directory subtree and are encountering errors, or are you trying to walk a tree and just trying to list simple file names (excluding directory names)?
I think all that is required is to remove the directory before iterating over it:
for root, subdirs, files in os.walk(directory, True):
if '.' in subdirs:
subdirs.remove('.')
for subdir in subdirs:
#do more stuff