I have a directory, 'Dst Directory', which has files and folders in it and I have 'src Directory' which also has files and folders in it. What I want to do is move the contents of 'src Directory' to 'Dst Directory' and overwrite anyfiles that exist with the same name. So for example 'Src Directory\file.txt' needs to be moved to 'Dst Directory\' and overwrite the existing file.txt. The same applies for some folders, moving a folder and merging the contents with the same folder in 'dst directory'
I'm currently using shutil.move to move the contents of src to dst but it won't do it if the files already exist and it won't merge folders; it'll just put the folder inside the existing folder.
Update: To make things a bit clearer, what I'm doing is unzipping an archive to the Dst Directory and then moving the contents of Src Directory there and rezipping, effectively updating files in the zip archive. This will be repeated for adding new files or new versions of files etc which is why it needs to overwrite and merge
Solved: I solved my problem by using distutils.dir_util.copy_tree(src, dst), this copies the folders and files from src directory to dst directory and overwrites/merges where neccesary.
This will go through the source directory, create any directories that do not already exist in destination directory, and move files from source to the destination directory:
import os
import shutil
root_src_dir = 'Src Directory\\'
root_dst_dir = 'Dst Directory\\'
for src_dir, dirs, files in os.walk(root_src_dir):
dst_dir = src_dir.replace(root_src_dir, root_dst_dir, 1)
if not os.path.exists(dst_dir):
os.makedirs(dst_dir)
for file_ in files:
src_file = os.path.join(src_dir, file_)
dst_file = os.path.join(dst_dir, file_)
if os.path.exists(dst_file):
# in case of the src and dst are the same file
if os.path.samefile(src_file, dst_file):
continue
os.remove(dst_file)
shutil.move(src_file, dst_dir)
Any pre-existing files will be removed first (via os.remove) before being replace by the corresponding source file. Any files or directories that already exist in the destination but not in the source will remain untouched.
Use copy() instead, which is willing to overwrite destination files. If you then want the first tree to go away, just rmtree() it separately once you are done iterating over it.
http://docs.python.org/library/shutil.html#shutil.copy
http://docs.python.org/library/shutil.html#shutil.rmtree
Update:
Do an os.walk() over the source tree. For each directory, check if it exists on the destination side, and os.makedirs() it if it is missing. For each file, simply shutil.copy() and the file will be created or overwritten, whichever is appropriate.
Since none of the above worked for me, so I wrote my own recursive function. Call Function copyTree(dir1, dir2) to merge directories. Run on multi-platforms Linux and Windows.
def forceMergeFlatDir(srcDir, dstDir):
if not os.path.exists(dstDir):
os.makedirs(dstDir)
for item in os.listdir(srcDir):
srcFile = os.path.join(srcDir, item)
dstFile = os.path.join(dstDir, item)
forceCopyFile(srcFile, dstFile)
def forceCopyFile (sfile, dfile):
if os.path.isfile(sfile):
shutil.copy2(sfile, dfile)
def isAFlatDir(sDir):
for item in os.listdir(sDir):
sItem = os.path.join(sDir, item)
if os.path.isdir(sItem):
return False
return True
def copyTree(src, dst):
for item in os.listdir(src):
s = os.path.join(src, item)
d = os.path.join(dst, item)
if os.path.isfile(s):
if not os.path.exists(dst):
os.makedirs(dst)
forceCopyFile(s,d)
if os.path.isdir(s):
isRecursive = not isAFlatDir(s)
if isRecursive:
copyTree(s, d)
else:
forceMergeFlatDir(s, d)
You can use this to copy directory overwriting existing files:
import shutil
shutil.copytree("src", "dst", dirs_exist_ok=True)
dirs_exist_ok argument was added in Python 3.8.
See docs: https://docs.python.org/3/library/shutil.html#shutil.copytree
If you also need to overwrite files with read only flag use this:
def copyDirTree(root_src_dir,root_dst_dir):
"""
Copy directory tree. Overwrites also read only files.
:param root_src_dir: source directory
:param root_dst_dir: destination directory
"""
for src_dir, dirs, files in os.walk(root_src_dir):
dst_dir = src_dir.replace(root_src_dir, root_dst_dir, 1)
if not os.path.exists(dst_dir):
os.makedirs(dst_dir)
for file_ in files:
src_file = os.path.join(src_dir, file_)
dst_file = os.path.join(dst_dir, file_)
if os.path.exists(dst_file):
try:
os.remove(dst_file)
except PermissionError as exc:
os.chmod(dst_file, stat.S_IWUSR)
os.remove(dst_file)
shutil.copy(src_file, dst_dir)
Have a look at: os.remove to remove existing files.
I had a similar problem. I wanted to move files and folder structures and overwrite existing files, but not delete anything which is in the destination folder structure.
I solved it by using os.walk(), recursively calling my function and using shutil.move() on files which I wanted to overwrite and folders which did not exist.
It works like shutil.move(), but with the benefit that existing files are only overwritten, but not deleted.
import os
import shutil
def moverecursively(source_folder, destination_folder):
basename = os.path.basename(source_folder)
dest_dir = os.path.join(destination_folder, basename)
if not os.path.exists(dest_dir):
shutil.move(source_folder, destination_folder)
else:
dst_path = os.path.join(destination_folder, basename)
for root, dirs, files in os.walk(source_folder):
for item in files:
src_path = os.path.join(root, item)
if os.path.exists(dst_file):
os.remove(dst_file)
shutil.move(src_path, dst_path)
for item in dirs:
src_path = os.path.join(root, item)
moverecursively(src_path, dst_path)
Related
How can I extract all the .zip files in a certain directory to its parent directory?
I tried:
import zipfile
parent_directory = '../input'
directory = '../input/zip'
for f in os.listdir(directory):
with zipfile.ZipFile(os.path.join(directory,f), "r") as z:
z.extractall(parent_directory)
However the unzipped files are not saved in '..input/zip', they are saved in nested folders
This might be a bit exaggerated.
After files are unzipped, I run this to:
move the original .zip file up one directory level. (to avoid /src_filename' already exists error)
move all files from all subdirectories into the zip parent directory.
move the original .zip file back into the parent directory.
import os
import shutil
src = r'C:\Users\Owner\Desktop\PythonZip\PyUnzip01\child_dir\unzip_test2'
dest = r'C:\Users\Owner\Desktop\PythonZip\PyUnzip01\child_dir'
pdir = '../PyUnzip01'
os.replace(r"C:\Users\Owner\Desktop\PythonZip\PyUnzip01\child_dir\unzip_test2.zip", r"C:\Users\Owner\Desktop\PythonZip\PyUnzip01\unzip_test2.zip")
for root, subdirs, files in os.walk(src):
for file in files:
path = os.path.join(root, file)
shutil.move(path, dest)
os.replace(r"C:\Users\Owner\Desktop\PythonZip\PyUnzip01\unzip_test2.zip", r"C:\Users\Owner\Desktop\PythonZip\PyUnzip01\child_dir\unzip_test2.zip")
I am trying to collect all files with all sub-directories and move to another directory
Code used
#collects all mp3 files from folders to a new folder
import os
from pathlib import Path
import shutil
#run once
path = os.getcwd()
os.mkdir("empetrishki")
empetrishki = path + "/empetrishki" #destination dir
print(path)
print(empetrishki)
#recursive collection
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
for name in files:
filePath = Path(name)
if filePath.suffix.lower() == ".mp3":
print(filePath)
os.path.join
filePath.rename(empetrishki.joinpath(filePath))
I have trouble with the last line of moving files: filePath.rename() nor shutil.move nor joinpath() have worked for me. Maybe that's because I am trying to change the element in the tuple - the output from os.walk
Similar code works with os.scandir but this would collect files only in the current directory
How can I fix that, thanks!
If you use pathlib.Path(name) that doesn't mean that something exists called name. Hence, you do need to be careful that you have a full path, or relative path, and you need to make sure to resolve those. In particular I am noting that you don't change your working directory and have a line like this:
filePath = Path(name)
This means that while you may be walking down the directory, your working directory may not be changing. You should make your path from the root and the name, it is also a good idea to resolve so that the full path is known.
filePath = Path(root).joinpath(name).resolve()
You can also place the Path(root) outside the inner loop as well. Now you have an absolute path from '/home/' to the filename. Hence, you should be able to rename with .rename(), like:
filePath.rename(x.parent.joinpath(newname))
#Or to another directory
filePath.rename(other_dir.joinpath(newname))
All together:
from pathlib import os, Path
empetrishki = Path.cwd().joinpath("empetrishki").resolve()
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
root = Path(root).resolve()
for name in files:
file = root.joinpath(name)
if file.suffix.lower() == ".mp3":
file.rename(empetrishki.joinpath(file.name))
for root, dirs, files in os.walk(path, topdown=True, onerror=None, followlinks=True):
if root == empetrishki:
continue # skip the destination dir
for name in files:
basename, extension = os.path.splitext(name)
if extension.lower() == ".mp3":
oldpath = os.path.join(root, name)
newpath = os.path.join(empetrishki, name)
print(oldpath)
shutil.move(oldpath, newpath)
This is what I suggest. Your code is running on the current directory, and the file is at the path os.path.join(root, name) and you need to provide such path to your move function.
Besides, I would also suggest to use os.path.splitext for extracting the file extension. More pythonic. And also you might want to skip scanning your target directory.
I have the following directory structure:
-mailDir
-folderA
-sub1
-sub2
-inbox
-1.txt
-2.txt
-89.txt
-subInbox
-subInbox2
-folderB
-sub1
-sub2
-inbox
-1.txt
-2.txt
-200.txt
-577.txt
The aim is to copy all the txt files under inbox folder into another folder.
For this I tried the below code
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
shutil.copy(path.join(ii,i),destDir)
If the inbox directory only has .txt files then the above code works fine. Since the inbox folder under folderA directory has other sub directory along with .txt files, the code returns permission denied error. What I understood is shutil.copy won't allow to copy the folders.
The aim is to copy only the txt files in every inbox folder to some other location. If the file names are same in different inbox folder I have to keep both file names. How we can improve the code in this case ? Please note other than .txt all others are folders only.
One simple solution is to filter for any i that does not have the .txt extension by using the string endswith() method.
import os
from os import path
import shutil
rootDir = "mailDir"
destDir = "destFolder"
eachInboxFolderPath = []
for root, dirs, files in os.walk(rootDir):
for dirName in dirs:
if(dirName=="inbox"):
eachInboxFolderPath.append(root+"\\"+dirName)
for ii in eachInboxFolderPath:
for i in os.listdir(ii):
if i.endswith('.txt'):
shutil.copy(path.join(ii,i),destDir)
This should ignore any folders and non-txt files that are found with os.listdir(ii). I believe that is what you are looking for.
Just remembered that I once wrote several files to solve this exact problem before. You can find the source code here on my Github.
In short, there are two functions of interest here:
list_files(loc, return_dirs=False, return_files=True, recursive=False, valid_exts=None)
copy_files(loc, dest, rename=False)
For your case, you could copy and paste these functions into your project and modify copy_files like this:
def copy_files(loc, dest, rename=False):
# get files with full path
files = list_files(loc, return_dirs=False, return_files=True, recursive=True, valid_exts=('.txt',))
# copy files in list to dest
for i, this_file in enumerate(files):
# change name if renaming
if rename:
# replace slashes with hyphens to preserve unique name
out_file = sub(r'^./', '', this_file)
out_file = sub(r'\\|/', '-', out_file)
out_file = join(dest, out_file)
copy(this_file, out_file)
files[i] = out_file
else:
copy(this_file, dest)
return files
Then just call it like so:
copy_files('mailDir', 'destFolder', rename=True)
The renaming scheme might not be exactly what you want, but it will at least not override your files. I believe this should solve all your problems.
Here you go:
import os
from os import path
import shutil
destDir = '<absolute-path>'
for root, dirs, files in os.walk(os.getcwd()):
# Filter out only '.txt' files.
files = [f for f in files if f.endswith('.txt')]
# Filter out only 'inbox' directory.
dirs[:] = [d for d in dirs if d == 'inbox']
for f in files:
p = path.join(root, f)
# print p
shutil.copy(p, destDir)
Quick and simple.
sorry, I forgot the part where, you also need unique file names as well. The above solution only works for distinct file names in a single inbox folder.
For copying files from multiple inboxes and having a unique name in the destination folder, you can try this:
import os
from os import path
import shutil
sourceDir = os.getcwd()
fixedLength = len(sourceDir)
destDir = '<absolute-path>'
filteredFiles = []
for root, dirs, files in os.walk(sourceDir):
# Filter out only '.txt' files in all the inbox directories.
if root.endswith('inbox'):
# here I am joining the file name to the full path while filtering txt files
files = [path.join(root, f) for f in files if f.endswith('.txt')]
# add the filtered files to the main list
filteredFiles.extend(files)
# making a tuple of file path and file name
filteredFiles = [(f, f[fixedLength+1:].replace('/', '-')) for f in filteredFiles]
for (f, n) in filteredFiles:
print 'copying file...', f
# copying from the path to the dest directory with specific name
shutil.copy(f, path.join(destDir, n))
print 'copied', str(len(filteredFiles)), 'files to', destDir
If you need to copy all files instead of just txt files, then just change the condition f.endswith('.txt') to os.path.isfile(f) while filtering out the files.
I built a script in Python to copy any files from a list of folders to a destination folder already made.
source = ['c:/test/source/', ]
destination = 'c:/test/destination/'
def copy(source, destination):
import os, shutil
try:
for folder in source:
files = os.listdir(folder)
for file in files:
current_file = os.path.join(folder, file)
shutil.copy(os.path.join(folder, file), destination)
except:
pass
The problem with this script is that it didn't copy the sub folders. Any suggestion to fix it ?
Thanks
I think you need to use shutil.copytree
shutil.copytree(os.path.join(folder, file), destination)
but shutil.copytree won't overwrite if folder exist,
if you want to overwrite all, use distutils.dir_util.copy_tree
from distutils import dir_util
dir_util.copy_tree(os.path.join(folder, file), destination)
Being completely new in python I'm trying to run a command over a set of files in python. The command requires both source and destination file (I'm actually using imagemagick convert as in the example below).
I can supply both source and destination directories, however I can't figure out how to easily retain the directory structure from the source to the destination directory.
E.g. say the srcdir contains the following:
srcdir/
file1
file3
dir1/
file1
file2
Then I want the program to create the following destination files on destdir: destdir/file1, destdir/file3, destdir/dir1/file1 and destdir/dir1/file2
So far this is what I came up with:
import os
from subprocess import call
srcdir = os.curdir # just use the current directory
destdir = 'path/to/destination'
for root, dirs, files in os.walk(srcdir):
for filename in files:
sourceFile = os.path.join(root, filename)
destFile = '???'
cmd = "convert %s -resize 50%% %s" % (sourceFile, destFile)
call(cmd, shell=True)
The walk method doesn't directly provide what directory the file is under srcdir other than concatenating the root directory string with the file name. Is there some easy way to get the destination file, or do I have to do some string manipulation in order to do this?
Change your loop to:
for root, dirs, files in os.walk(srcdir):
destroot = os.path.join(destdir, root[len(srcdir):])
for adir in dirs:
os.makedirs(os.path.join(destroot, adir))
for filename in files:
sourceFile = os.path.join(root, filename)
destFile = os.path.join(destroot, filename)
processFile(sourceFile, destFile)
There are a few relative path scripts out there that will do what you want -- namely find the relative path between two paths. E.g.:
http://www.voidspace.org.uk/python/pathutils.html
(relpath method)
http://code.activestate.com/recipes/302594-another-relative-filepath-script/
http://groups.google.com/group/comp.lang.python/browse_thread/thread/390d8d3e3ac8ef44/d8c74f96468c6a36?q=relative+path&rnum=1&pli=1
Unfortunately, I don't think this functionality has ever been added to core python.
While not pretty, this will preserve the directory structure of the tree:
_, _, subdirs = root.partition(srcdir)
destfile = os.path.join(destdir, subdirs[1:], filename)