Backup File Script - python

I am writing a script to backup files from one dir(Master) to another dir(Clone).
And the script will monitor the two directories.
If a file inside clone is missing then the script will copy the missing file from Master to
Clone.Now I have a problem creating the missing folder.
I have read the documentation and found that shutil.copyfile will create a dir if the
dir doesn't exist.But I am getting an IOError message showing that the destination dir
is not exist.Below is the code.
import os,shutil,hashlib
master="C:\Users\Will Yan\Desktop\Master"
client="D:\Clone"
if(os.path.exists(client)):
print "PATH EXISTS"
else:
print "PATH Doesn't exists copying"
shutil.copytree(master,client)
def walkLocation(location,option):
aList = []
for(path,dirs,files) in os.walk(location):
for i in files:
if option == "path":
aList.append(path+"/"+i)
else:
aList.append(i)
return aList
def getPaths(location):
paths=[]
files=[]
result =[]
paths = walkLocation(location,'path')
files = walkLocation(location,'files')
result.append(paths)
result.append(files)
return result
ma=walkLocation(master,"path")
cl=walkLocation(client,"path")
maf=walkLocation(master,"a")
clf=walkLocation(client,"a")
for i in range(len(ma)):
count = 0
for j in range(len(cl)):
if maf[i]==clf[j]:
break
else:
count= count+1
if count==len(cl):
dirStep1=ma[i][ma[i].find("Master")::]
dirStep2=dirStep1.replace("Master",client)
shutil.copyfile(ma[i],dirStep2)
Can anyone tell me where did I do wrong?
Thanks

Sorry, but the documentation doesn't say that. Here's a reproduction of the full documentation for the function:
shutil.copyfile(src, dst)
Copy the
contents (no metadata) of the file
named src to a file named dst. dst
must be the complete target file name;
look at copy() for a copy that accepts
a target directory path. If src and
dst are the same files, Error is
raised. The destination location must
be writable; otherwise, an IOError
exception will be raised. If dst
already exists, it will be replaced.
Special files such as character or
block devices and pipes cannot be
copied with this function. src and dst
are path names given as strings.
So you have to create the directory yourself.

Related

How to skip existing files in sub folders and copy only new files

I am copy folder and all sub folders inside the main folder using shutil copytree
import shutil
import sys
import os
import re
SOURCE_FOLDER = sys.argv[1]
DESTINATION_FOLDER = sys.argv[2]
def copyDirectory(SOURCE_FOLDER, DESTINATION_FOLDER):
try:
print SOURCE_FOLDER
print DESTINATION_FOLDER
shutil.copytree(SOURCE_FOLDER, DESTINATION_FOLDER)
# Directories are the same
#except:
# print "Not copied"
except shutil.Error as e:
print('Directory not copied. Error: %s' % e)
# Any error saying that the directory doesn't exist
except OSError as e:
print('Directory not copied. Error: %s' % e)
copyDirectory(SOURCE_FOLDER,DESTINATION_FOLDER)
The problem is if the directory exists it throws error
Directory not copied. Error: [Errno 17] File exists: 'destination'
What i want is if directory already exists it want to check all the sub directories and if sub directory also exists it should check all the files in it and it should skip the existing files and copy the new files in that sub directory,If sub direscotry not exists then it should copy that sub directory
Note: Sub directories might be nested(Sub directory of sub directory).
But the above script is not working what should i add to that script?
shutil.copytree isn't written to skip existing destination files and directories. From the docs
The destination directory must not already exist.
You will need to write your own solution. The existing copytree code is a good start.
In order to check if directory is already exists you can use: os.path.exists(directory)
if not os.path.exists(DESTINATION_FOLDER):
shutil.copytree(SOURCE_FOLDER, DESTINATION_FOLDER)
If the dest directory already exists you can get run your functions on the sub-directories of the src-dir.
You can get a list of all src-dir sub-directories using the following function which get directory name as input, and return a list of sub-directories
def SubDirPath (d):
return filter(os.path.isdir, [os.path.join(d,f) for f in os.listdir(d)])
using this list of directories you can execute your function again, on each instance of the directory.
For each directory which exists in both : src and dst - you'll need to check for every file in src-dir if the file also exists in the dst-dir.
Best Regards,
Yaron
With python3, you can use shutil.copytree with an option to ignore existing dirs error:
shutil.copytree(SOURCE_FOLDER, DESTINATION_FOLDER, dirs_exist_ok=True)

Find, renaming, and replacing files

I need to update an existing directory with files that are provided in a Patch directory.
This is what I'm starting with. All commented out by me and then I try to build each line.
# $SourceDirectory = Patch folder that has files in any number of sub folders
# $DestDirectory = Application folder that has the files that need patching
# $UnMatchedFilesFolder = A Folder where SourceFiles go that don't have a match in $DestDirectory
# import os.path
# import os.listdir
#
# Create list1 of files from $SourceDirectory
# For each file (excluding directory names) in List1 (including subfolders), search for it in $DestDirectory and its subfolders;
# If you find the file by the same name, then create a backup of that file with .old;
# move $DestDirectoryPathAndFile to $DestDirectoryPathAndFile.old;
# print "Creating backup of file";
# After the backup is made, then copy the file from the $SourceDirectory to the;
# exact same location where it was found in the $DestDirectory. ;
# Else;
# move file to UnmatchedFilesDirectory.;
# If the number of files in $UnMatchedFilesDirectory =/ 0;
# Create list3 from $UnmatchedFilesDirectory
# print "The following files in $UnMatchedFilesDirectory will need to be installed individually";
# Print "Automated Patching completed.";
# Print "Script completed";
As mentioned in the previous post, I am skeptical of the course you are following based on the information given. Based on the document given, there are far better sites/tutorials available for free to help you learn Python/programming. That said, Stack Overflow is a friendly place, and so I hope to provide you with information which will help you on your way:
import os
source_dir =r"D:\temp"
dest_dir=r"D:\temp2"
for root, dirs, files in os.walk(source_dir):
# os.walk 'root' steps through subdirectories as we iterate
# this allows us to join 'root' and 'file' without missing any sub-directories
for file in files:
exist_path = os.path.join(root, file)
# expected_file represents the fullpath of a file we are looking to create/replace
expected_file = exist_path.replace(source_dir, dest_dir)
current = os.path.join(root, file)
if os.path.exists(expected_file):
print "The file %s exists, os.rename with '.old' before copying %s" % (current, exist_path)
# .. note:: we should rename to .bkp here, then we would correctly copy the file below without conflict
print "Now %s doesn't exist, we are free to copy %s" % (expected_file, exist_path)

Python: folder creation when copying files

I'm trying to create a shell script that will copy files from one computer (employee's old computer) to another (employee's new computer). I have it to the point where I can copy files over, thanks to the lovely people here, but I'm running into a problem - if I'm going from, say, this directory that has 2 files:
C:\Users\specificuser\Documents\Test Folder
....to this directory...
C:\Users\specificuser\Desktop
...I see the files show up on the Desktop, but the folder those files were in (Test Folder) isn't created.
Here is the copy function I'm using:
#copy function
def dir_copy(srcpath, dstpath):
#if the destination path doesn't exist, create it
if not os.path.exists(dstpath):
os.makedir(dstpath)
#tag each file to the source path to create the file path
for file in os.listdir(srcpath):
srcfile = os.path.join(srcpath, file)
dstfile = os.path.join(dstpath, file)
#if the source file path is a directory, copy the directory
if os.path.isdir(srcfile):
dir_copy(srcfile, dstfile)
else: #if the source file path is just a file, copy the file
shutil.copyfile(srcfile, dstfile)
I know I need to create the directory on the destination, I'm just not quite sure how to do it.
Edit: I found that I had a type (os.makedir instead of os.mkdir). I tested it, and it creates directories like it's supposed to. HOWEVER I'd like it to create the directory one level up from where it's starting. For example, in Test Folder there is Sub Test Folder. It has created Sub Test Folder but won't create Test Folder because Test Folder is not part of the dstpath. Does that make sense?
You might want to look at shutil.copytree(). It performs the recursive copy functionality, including directories, that you're looking for. So, for a basic recursive copy, you could just run:
shutil.copytree(srcpath, dstpath)
However, to accomplish your goal of copying the source directory to the destination directory, creating the source directory inside of the destination directory in the process, you could use something like this:
import os
import shutil
def dir_copy(srcpath, dstdir):
dirname = os.path.basename(srcpath)
dstpath = os.path.join(dstdir, dirname)
shutil.copytree(srcpath, dstpath)
Note that your srcpath must not contain a slash at the end for this to work. Also, the result of joining the destination directory and the source directory name must not already exist, or copytree will fail.
This is a common problem with file copy... do you intend to just copy the contents of the folder or do you want the folder itself copied. Copy utilities typically have a flag for this and you can too. I use os.makedirs so that any intermediate directories are created also.
#copy function
def dir_copy(srcpath, dstpath, include_directory=False):
if include_directory:
dstpath = os.path.join(dstpath, os.path.basename(srcpath))
os.makedirs(dstpath, exist_ok=True)
#tag each file to the source path to create the file path
for file in os.listdir(srcpath):
srcfile = os.path.join(srcpath, file)
dstfile = os.path.join(dstpath, file)
#if the source file path is a directory, copy the directory
if os.path.isdir(srcfile):
dir_copy(srcfile, dstfile)
else: #if the source file path is just a file, copy the file
shutil.copyfile(srcfile, dstfile)
import shutil
import os
def dir_copy(srcpath, dstpath):
try:
shutil.copytree(srcpath, dstpath)
except shutil.Error as e:
print('Directory not copied. Error: %s' % e)
except OSError as e:
print('Directory not copied. Error: %s' % e)
dir_copy('/home/sergey/test1', '/home/sergey/test2')
I use this script to backup (copy) my working folder. It will skip large files, keep folder structure (hierarchy) and create destination folders if they don't exist.
import os
import shutil
for root, dirs, files in os.walk(the_folder_copy_from):
for name in files:
if os.path.getsize(os.path.join(root, name))<10*1024*1024:
target=os.path.join("backup", os.path.relpath(os.path.join(root, name),start=the_folder_copy_from))
print(target)
os.makedirs(os.path.dirname(target),exist_ok=True)
shutil.copy(src=os.path.join(root, name),dst=target)
print("Done")

Extracting compressed files

The following code allows me to extract .tgz files. However, it stops extracting after about two levels down; there are other subfolders that have .tgz files that need extracting. Additionally, when I extract a file, I have to manually move it to another path or it will get overwritten by other .tgz files that I extract to that location (all .tgz that I'm using have the same file structure/folder names once extracted). Any help is appreciated. Thanks!
import os, sys, tarfile
def extract(tar_url, extract_path='.'):
print tar_url
tar = tarfile.open(tar_url, 'r')
for item in tar:
tar.extract(item, extract_path)
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
try:
extract(sys.argv[1] + '.tgz')
print 'Done.'
except:
name = os.path.basename(sys.argv[0])
print name[:name.rfind('.')], '<filename>'
If I have not wrongly misinterpreted your question, then here is what you want to do -
Extract a .tgz file which may have
more .tgz files within it that needs further
extraction (and so on..)
While extracting, you need to be careful that you are not replacing an already existing directory in the folder.
If I have correctly interpreted your problem, then...
Here is what my code does -
Extracts every .tgz file (recursively) in a separate folder with the same name as the .tgz file (without its extension) in the same directory.
While extracting, it makes sure that it is not overwriting/replacing any already existing files/folder.
So if this is the directory structure of the .tgz file -
parent/
xyz.tgz/
a
b
c
d.tgz/
x
y
z
a.tgz/ # note if I extract this directly, it will replace/overwrite contents of the folder 'a'
m
n
o
p
After extraction, the directory structure will be -
parent/
xyz.tgz
xyz/
a
b
c
d/
x
y
z
a 1/ # it extracts 'a.tgz' to the folder 'a 1' as folder 'a' already exists in the same folder.
m
n
o
p
Although I have provided plenty of documentation in my code below, I would just brief out the structure of my program. Here are the functions I have defined -
FileExtension --> returns the extension of a file
AppropriateFolderName --> helps in preventing overwriting/replacing of already existing folders (how? you will see it in the program)
Extract --> extracts a .tgz file (safely)
WalkTreeAndExtract - walks down a directory (passed as parameter) and extracts all .tgz files(recursively) on the way down.
I cannot suggest changes to what you have done, as my approach is a bit different. I have used extractall method of the tarfile module instead of the bit complicated extract method as you have done. (Just have glance at this - http://docs.python.org/library/tarfile.html#tarfile.TarFile.extractall and read the warning associated with using extractall method. I don`t think we will be having any such problem in general, but just keep that in mind.)
So here is the code that worked for me -
(I tried it for .tar files nested 5 levels deep (ie .tar within .tar within .tar ... 5 times), but it should work for any depth* and also for .tgz files.)
# extracting_nested_tars.py
import os
import re
import tarfile
file_extensions = ('tar', 'tgz')
# Edit this according to the archive types you want to extract. Keep in
# mind that these should be extractable by the tarfile module.
def FileExtension(file_name):
"""Return the file extension of file
'file' should be a string. It can be either the full path of
the file or just its name (or any string as long it contains
the file extension.)
Examples:
input (file) --> 'abc.tar'
return value --> 'tar'
"""
match = re.compile(r"^.*[.](?P<ext>\w+)$",
re.VERBOSE|re.IGNORECASE).match(file_name)
if match: # if match != None:
ext = match.group('ext')
return ext
else:
return '' # there is no file extension to file_name
def AppropriateFolderName(folder_name, parent_fullpath):
"""Return a folder name such that it can be safely created in
parent_fullpath without replacing any existing folder in it.
Check if a folder named folder_name exists in parent_fullpath. If no,
return folder_name (without changing, because it can be safely created
without replacing any already existing folder). If yes, append an
appropriate number to the folder_name such that this new folder_name
can be safely created in the folder parent_fullpath.
Examples:
folder_name = 'untitled folder'
return value = 'untitled folder' (if no such folder already exists
in parent_fullpath.)
folder_name = 'untitled folder'
return value = 'untitled folder 1' (if a folder named 'untitled folder'
already exists but no folder named
'untitled folder 1' exists in
parent_fullpath.)
folder_name = 'untitled folder'
return value = 'untitled folder 2' (if folders named 'untitled folder'
and 'untitled folder 1' both
already exist but no folder named
'untitled folder 2' exists in
parent_fullpath.)
"""
if os.path.exists(os.path.join(parent_fullpath,folder_name)):
match = re.compile(r'^(?P<name>.*)[ ](?P<num>\d+)$').match(folder_name)
if match: # if match != None:
name = match.group('name')
number = match.group('num')
new_folder_name = '%s %d' %(name, int(number)+1)
return AppropriateFolderName(new_folder_name,
parent_fullpath)
# Recursively call itself so that it can be check whether a
# folder named new_folder_name already exists in parent_fullpath
# or not.
else:
new_folder_name = '%s 1' %folder_name
return AppropriateFolderName(new_folder_name, parent_fullpath)
# Recursively call itself so that it can be check whether a
# folder named new_folder_name already exists in parent_fullpath
# or not.
else:
return folder_name
def Extract(tarfile_fullpath, delete_tar_file=True):
"""Extract the tarfile_fullpath to an appropriate* folder of the same
name as the tar file (without an extension) and return the path
of this folder.
If delete_tar_file is True, it will delete the tar file after
its extraction; if False, it won`t. Default value is True as you
would normally want to delete the (nested) tar files after
extraction. Pass a False, if you don`t want to delete the
tar file (after its extraction) you are passing.
"""
tarfile_name = os.path.basename(tarfile_fullpath)
parent_dir = os.path.dirname(tarfile_fullpath)
extract_folder_name = AppropriateFolderName(tarfile_name[:\
-1*len(FileExtension(tarfile_name))-1], parent_dir)
# (the slicing is to remove the extension (.tar) from the file name.)
# Get a folder name (from the function AppropriateFolderName)
# in which the contents of the tar file can be extracted,
# so that it doesn't replace an already existing folder.
extract_folder_fullpath = os.path.join(parent_dir,
extract_folder_name)
# The full path to this new folder.
try:
tar = tarfile.open(tarfile_fullpath)
tar.extractall(extract_folder_fullpath)
tar.close()
if delete_tar_file:
os.remove(tarfile_fullpath)
return extract_folder_name
except Exception as e:
# Exceptions can occur while opening a damaged tar file.
print 'Error occured while extracting %s\n'\
'Reason: %s' %(tarfile_fullpath, e)
return
def WalkTreeAndExtract(parent_dir):
"""Recursively descend the directory tree rooted at parent_dir
and extract each tar file on the way down (recursively).
"""
try:
dir_contents = os.listdir(parent_dir)
except OSError as e:
# Exception can occur if trying to open some folder whose
# permissions this program does not have.
print 'Error occured. Could not open folder %s\n'\
'Reason: %s' %(parent_dir, e)
return
for content in dir_contents:
content_fullpath = os.path.join(parent_dir, content)
if os.path.isdir(content_fullpath):
# If content is a folder, walk it down completely.
WalkTreeAndExtract(content_fullpath)
elif os.path.isfile(content_fullpath):
# If content is a file, check if it is a tar file.
# If so, extract its contents to a new folder.
if FileExtension(content_fullpath) in file_extensions:
extract_folder_name = Extract(content_fullpath)
if extract_folder_name: # if extract_folder_name != None:
dir_contents.append(extract_folder_name)
# Append the newly extracted folder to dir_contents
# so that it can be later searched for more tar files
# to extract.
else:
# Unknown file type.
print 'Skipping %s. <Neither file nor folder>' % content_fullpath
if __name__ == '__main__':
tarfile_fullpath = 'fullpath_path_of_your_tarfile' # pass the path of your tar file here.
extract_folder_name = Extract(tarfile_fullpath, False)
# tarfile_fullpath is extracted to extract_folder_name. Now descend
# down its directory structure and extract all other tar files
# (recursively).
extract_folder_fullpath = os.path.join(os.path.dirname(tarfile_fullpath),
extract_folder_name)
WalkTreeAndExtract(extract_folder_fullpath)
# If you want to extract all tar files in a dir, just execute the above
# line and nothing else.
I have not added a command line interface to it. I guess you can add it if you find it useful.
Here is a slightly better version of the above program -
http://guanidene.blogspot.com/2011/06/nested-tar-archives-extractor.html

How to check contents of a folder using Python

How can you check the contents of a file with python, and then copy a file from the same folder and move it to a new location?
I have Python 3.1 but i can just as easily port to 2.6
thank you!
for example
import os,shutil
root="/home"
destination="/tmp"
directory = os.path.join(root,"mydir")
os.chdir(directory)
for file in os.listdir("."):
flag=""
#check contents of file ?
for line in open(file):
if "something" in line:
flag="found"
if flag=="found":
try:
# or use os.rename() on local
shutil.move(file,destination)
except Exception,e: print e
else:
print "success"
If you look at the shutil doc, under .move() it says
shutil.move(src, dst)ΒΆ
Recursively move a file or directory to another location.
If the destination is on the current filesystem, then simply use rename.
Otherwise, copy src (with copy2()) to the dst and then remove src.
I guess you can use copy2() to move to another file system.
os.listdir() and shutil.move().

Categories