Find, renaming, and replacing files - python

I need to update an existing directory with files that are provided in a Patch directory.
This is what I'm starting with. All commented out by me and then I try to build each line.
# $SourceDirectory = Patch folder that has files in any number of sub folders
# $DestDirectory = Application folder that has the files that need patching
# $UnMatchedFilesFolder = A Folder where SourceFiles go that don't have a match in $DestDirectory
# import os.path
# import os.listdir
#
# Create list1 of files from $SourceDirectory
# For each file (excluding directory names) in List1 (including subfolders), search for it in $DestDirectory and its subfolders;
# If you find the file by the same name, then create a backup of that file with .old;
# move $DestDirectoryPathAndFile to $DestDirectoryPathAndFile.old;
# print "Creating backup of file";
# After the backup is made, then copy the file from the $SourceDirectory to the;
# exact same location where it was found in the $DestDirectory. ;
# Else;
# move file to UnmatchedFilesDirectory.;
# If the number of files in $UnMatchedFilesDirectory =/ 0;
# Create list3 from $UnmatchedFilesDirectory
# print "The following files in $UnMatchedFilesDirectory will need to be installed individually";
# Print "Automated Patching completed.";
# Print "Script completed";

As mentioned in the previous post, I am skeptical of the course you are following based on the information given. Based on the document given, there are far better sites/tutorials available for free to help you learn Python/programming. That said, Stack Overflow is a friendly place, and so I hope to provide you with information which will help you on your way:
import os
source_dir =r"D:\temp"
dest_dir=r"D:\temp2"
for root, dirs, files in os.walk(source_dir):
# os.walk 'root' steps through subdirectories as we iterate
# this allows us to join 'root' and 'file' without missing any sub-directories
for file in files:
exist_path = os.path.join(root, file)
# expected_file represents the fullpath of a file we are looking to create/replace
expected_file = exist_path.replace(source_dir, dest_dir)
current = os.path.join(root, file)
if os.path.exists(expected_file):
print "The file %s exists, os.rename with '.old' before copying %s" % (current, exist_path)
# .. note:: we should rename to .bkp here, then we would correctly copy the file below without conflict
print "Now %s doesn't exist, we are free to copy %s" % (expected_file, exist_path)

Related

Moving all .csv files which match specific text pattern from multiple subfolders to new folder [Python]

I see many answers on here to similar questions but I cannot seem to adapt it quite yet due to my budding Python skill. I'd like to save the time of individually grabbing the data sets that contain what I need for analysis in R, but my scripts either don't run or seem to do what I need.
I need to 1) loop through a sea of subfolders in a parent folder, 2) loop through the bagillion .csv files in those subs and pick out the 1 that matters (matching text below) and 3) copy it over to a new clean folder with only what I want.
What I have tried:
1)
import os, shutil, glob
src_fldr = 'C:/project_folder_withsubfolders';
dst_fldr = 'C:/project_folder_withsubfolders/subfolder_to_dump_into';
try:
os.makedirs(dst_fldr); ## it creates the destination folder
except:
print ("Folder already exist or some error");
for csv_file in glob.glob(src_fldr+'*statistics_Intensity_Sum_Ch=3_Img=1.csv*'):
shutil.copy2(csv_file, dst_fldr);
where the text statistics_Intensity_Sum etc is the exact pattern I need for the file to copy over
this didn't actually copy anything over
Making a function that will do this:
srcDir = 'C:/project_folder_withsubfolders'
dstDir = 'C:/project_folder_withsubfolders/subfolder_to_dump_into'
def moveAllFilesinDir(srcDir, dstDir):
files = os.listdir(srcDir)
for f in files:
if f.find("statistics_Intensity_Sum_Ch=3_Img=1"):
shutil.move(f, dstDir)
else:
shutil.move(f, srcDir)
moveAlllFilesinDir(srcDir, dstDir)
This returned the following error:
File "C:\Users\jbla12\AppData\Local\Programs\Python\Python39\lib\shutil.py", line 806, in move
os.rename(src, real_dst)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'F1 converted' -> 'C:/Users/jbla12/Desktop/R Analyses/p65_project/sum_files\\F1 converted'
That's because that's a sub-folder I want it to go through! I've tried other methods but don't have record of them in my scripts.
SOLVED:
Special thanks to "Automate the Boring Stuff"
import shutil
import os
dest = 'C:/Users/jbla12/Desktop/R Analyses/p65_project/sum_files/'
src = 'C:/Users/jbla12/Desktop/R Analyses/p65_project/'
txt_ID = 'statistics_Intensity_Sum_Ch=3_Img=1.csv'
def moveSpecFiles(txt_ID, src, dest):
#src is the original file(s) destination
#dest is the destination for the files to end up in
#spec_txt is what the files end with that you want to ID
for foldername, subfolders, filenames in os.walk(src):
for file in filenames:
if file.endswith(txt_ID):
shutil.copy(os.path.join(foldername, file), dest)
print('Your files are ready sir/madam!')
moveSpecFiles(txt_ID, src, dest)

Is there a simpler function or one liner to check if folder exists if not create it and paste a specific file into it?

I am aiming to create a function that does the following:
Declare a path with a file, not just a folder. e.g. 'C:/Users/Lampard/Desktop/Folder1/File.py'
Create a folder in same folder as the declared file path - Calling it 'Archive'
Cut the file and paste it into the new folder just created.
If the folder 'Archive' already exists - then simply cut and paste the file into there
I have spent approx. 15-20min going through these:
https://www.programiz.com/python-programming/directory
Join all except last x in list
https://docs.python.org/3/library/pathlib.html#operators
And here is what I got to:
import os
from pathlib import Path, PurePath
from shutil import copy
#This path will change every time - just trying to get function right first
path = 'C:/Users/Lampard/Desktop/Folder1/File.py'
#Used to allow suffix function
p = PurePath(path)
#Check if directory is a file not a folder
if not p.suffix:
print("Not an extension")
#If it is a file
else:
#Create new folder before last file
#Change working directory
split = path.split('/')
new_directory = '/'.join(split[:-1])
apply_new_directory = os.chdir(new_directory)
#If folder does not exist create it
try:
os.mkdir('Archive')#Create new folder
#If not, continue process to copy file and paste it into Archive
except FileExistsError:
copy(path, new_directory + '/Archive/' + split[-1])
Is this code okay? - does anyone know a simpler method?
Locate folder/file in path
print [name for name in os.listdir(".") if os.path.isdir(name)]
Create path
import os
# define the name of the directory to be created
path = "/tmp/year"
try:
os.mkdir(path)
except OSError:
print ("Creation of the directory %s failed" % path)
else:
print ("Successfully created the directory %s " % path)
To move and cut files you can use this library
As you're already using pathlib, there's no need to use shutil:
from pathlib import Path
path = 'C:/Users/Lampard/Desktop/Folder1/File.py' # or whatever
p = Path(path)
target = Path(p.with_name('Archive')) # replace the filename with 'Archive'
target.mkdir() # create target directory
p.rename(target.joinpath(p.name)) # move the file to the target directory
Feel free to add appriopriate try…except statements to handle any errors.
Update: you might find this version more readable:
target = p.parent / 'Archive'
target.mkdir()
p.rename(target / p.name)
This is an example of overloading / operator.

Move pairs of files (.txt & .xml) into their corresponding folder using Python

I have been working this challenge for about a day or so. I've looked at multiple questions and answers asked on SO and tried to 'MacGyver' the code used for my purpose, but still having issues.
I have a directory (lets call it "src\") with hundreds of files (.txt and .xml). Each .txt file has an associated .xml file (let's call it a pair). Example:
src\text-001.txt
src\text-001.xml
src\text-002.txt
src\text-002.xml
src\text-003.txt
src\text-003.xml
Here's an example of how I would like it to turn out so each pair of files are placed into a single unique folder:
src\text-001\text-001.txt
src\text-001\text-001.xml
src\text-002\text-002.txt
src\text-002\text-002.xml
src\text-003\text-003.txt
src\text-003\text-003.xml
What I'd like to do is create an associated folder for each pair and then move each pair of files into its respective folder using Python. I've already tried working from code I found (thanks to a post from Nov '12 by Sethdd, but am having trouble figuring out how to use the move function to grab pairs of files. Here's where I'm at:
import os
import shutil
srcpath = "PATH_TO_SOURCE"
srcfiles = os.listdir(srcpath)
destpath = "PATH_TO_DEST"
# grabs the name of the file before extension and uses as the dest folder name
destdirs = list(set([filename[0:9] for filename in srcfiles]))
def create(dirname, destpath):
full_path = os.path.join(destpath, dirname)
os.mkdir(full_path)
return full_path
def move(filename, dirpath):
shutil.move(os.path.join(srcpath, filename)
,dirpath)
# create destination directories and store their names along with full paths
targets = [
(folder, create(folder, destpath)) for folder in destdirs
]
for dirname, full_path in targets:
for filename in srcfile:
if dirname == filename[0:9]:
move(filename, full_path)
I feel like it should be easy, but Python isn't something I work with everyday and it's been a while since my scripting days... Any help would be greatly appreciated!
Thanks,
WK2EcoD
Use the glob module to interate all of the 'txt' files. From that you can parse and create the folders and copy the files.
The process should be as simple as it appears to you as a human.
for file_name in os.listdir(srcpath):
dir = file_name[:9]
# if dir doesn't exist, create it
# move file_name to dir
You're doing a lot of intermediate work that seems to be confusing you.
Also, insert some simple print statements to track data flow and execution flow. It appears that you have no tracing output so far.
You can do it with os module. For every file in directory check if associated folder exists, create if needed and then move the file. See the code below:
import os
SRC = 'path-to-src'
for fname in os.listdir(SRC):
filename, file_extension = os.path.splitext(fname)
if file_extension not in ['xml', 'txt']:
continue
folder_path = os.path.join(SRC, filename)
if not os.path.exists(folder_path):
os.mkdir(folderpath)
os.rename(
os.path.join(SRC, fname),
os.path.join(folder_path, fname)
)
My approach would be:
Find the pairs that I want to move (do nothing with files without a pair)
Create a directory for every pair
Move the pair to the directory
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import os, shutil
import re
def getPairs(files):
pairs = []
file_re = re.compile(r'^(.*)\.(.*)$')
for f in files:
match = file_re.match(f)
if match:
(name, ext) = match.groups()
if ext == 'txt' and name + '.xml' in files:
pairs.append(name)
return pairs
def movePairsToDir(pairs):
for name in pairs:
os.mkdir(name)
shutil.move(name+'.txt', name)
shutil.move(name+'.xml', name)
files = os.listdir()
pairs = getPairs(files)
movePairsToDir(pairs)
NOTE: This script works when called inside the directory with the pairs.

Python: folder creation when copying files

I'm trying to create a shell script that will copy files from one computer (employee's old computer) to another (employee's new computer). I have it to the point where I can copy files over, thanks to the lovely people here, but I'm running into a problem - if I'm going from, say, this directory that has 2 files:
C:\Users\specificuser\Documents\Test Folder
....to this directory...
C:\Users\specificuser\Desktop
...I see the files show up on the Desktop, but the folder those files were in (Test Folder) isn't created.
Here is the copy function I'm using:
#copy function
def dir_copy(srcpath, dstpath):
#if the destination path doesn't exist, create it
if not os.path.exists(dstpath):
os.makedir(dstpath)
#tag each file to the source path to create the file path
for file in os.listdir(srcpath):
srcfile = os.path.join(srcpath, file)
dstfile = os.path.join(dstpath, file)
#if the source file path is a directory, copy the directory
if os.path.isdir(srcfile):
dir_copy(srcfile, dstfile)
else: #if the source file path is just a file, copy the file
shutil.copyfile(srcfile, dstfile)
I know I need to create the directory on the destination, I'm just not quite sure how to do it.
Edit: I found that I had a type (os.makedir instead of os.mkdir). I tested it, and it creates directories like it's supposed to. HOWEVER I'd like it to create the directory one level up from where it's starting. For example, in Test Folder there is Sub Test Folder. It has created Sub Test Folder but won't create Test Folder because Test Folder is not part of the dstpath. Does that make sense?
You might want to look at shutil.copytree(). It performs the recursive copy functionality, including directories, that you're looking for. So, for a basic recursive copy, you could just run:
shutil.copytree(srcpath, dstpath)
However, to accomplish your goal of copying the source directory to the destination directory, creating the source directory inside of the destination directory in the process, you could use something like this:
import os
import shutil
def dir_copy(srcpath, dstdir):
dirname = os.path.basename(srcpath)
dstpath = os.path.join(dstdir, dirname)
shutil.copytree(srcpath, dstpath)
Note that your srcpath must not contain a slash at the end for this to work. Also, the result of joining the destination directory and the source directory name must not already exist, or copytree will fail.
This is a common problem with file copy... do you intend to just copy the contents of the folder or do you want the folder itself copied. Copy utilities typically have a flag for this and you can too. I use os.makedirs so that any intermediate directories are created also.
#copy function
def dir_copy(srcpath, dstpath, include_directory=False):
if include_directory:
dstpath = os.path.join(dstpath, os.path.basename(srcpath))
os.makedirs(dstpath, exist_ok=True)
#tag each file to the source path to create the file path
for file in os.listdir(srcpath):
srcfile = os.path.join(srcpath, file)
dstfile = os.path.join(dstpath, file)
#if the source file path is a directory, copy the directory
if os.path.isdir(srcfile):
dir_copy(srcfile, dstfile)
else: #if the source file path is just a file, copy the file
shutil.copyfile(srcfile, dstfile)
import shutil
import os
def dir_copy(srcpath, dstpath):
try:
shutil.copytree(srcpath, dstpath)
except shutil.Error as e:
print('Directory not copied. Error: %s' % e)
except OSError as e:
print('Directory not copied. Error: %s' % e)
dir_copy('/home/sergey/test1', '/home/sergey/test2')
I use this script to backup (copy) my working folder. It will skip large files, keep folder structure (hierarchy) and create destination folders if they don't exist.
import os
import shutil
for root, dirs, files in os.walk(the_folder_copy_from):
for name in files:
if os.path.getsize(os.path.join(root, name))<10*1024*1024:
target=os.path.join("backup", os.path.relpath(os.path.join(root, name),start=the_folder_copy_from))
print(target)
os.makedirs(os.path.dirname(target),exist_ok=True)
shutil.copy(src=os.path.join(root, name),dst=target)
print("Done")

Extracting compressed files

The following code allows me to extract .tgz files. However, it stops extracting after about two levels down; there are other subfolders that have .tgz files that need extracting. Additionally, when I extract a file, I have to manually move it to another path or it will get overwritten by other .tgz files that I extract to that location (all .tgz that I'm using have the same file structure/folder names once extracted). Any help is appreciated. Thanks!
import os, sys, tarfile
def extract(tar_url, extract_path='.'):
print tar_url
tar = tarfile.open(tar_url, 'r')
for item in tar:
tar.extract(item, extract_path)
if item.name.find(".tgz") != -1 or item.name.find(".tar") != -1:
extract(item.name, "./" + item.name[:item.name.rfind('/')])
try:
extract(sys.argv[1] + '.tgz')
print 'Done.'
except:
name = os.path.basename(sys.argv[0])
print name[:name.rfind('.')], '<filename>'
If I have not wrongly misinterpreted your question, then here is what you want to do -
Extract a .tgz file which may have
more .tgz files within it that needs further
extraction (and so on..)
While extracting, you need to be careful that you are not replacing an already existing directory in the folder.
If I have correctly interpreted your problem, then...
Here is what my code does -
Extracts every .tgz file (recursively) in a separate folder with the same name as the .tgz file (without its extension) in the same directory.
While extracting, it makes sure that it is not overwriting/replacing any already existing files/folder.
So if this is the directory structure of the .tgz file -
parent/
xyz.tgz/
a
b
c
d.tgz/
x
y
z
a.tgz/ # note if I extract this directly, it will replace/overwrite contents of the folder 'a'
m
n
o
p
After extraction, the directory structure will be -
parent/
xyz.tgz
xyz/
a
b
c
d/
x
y
z
a 1/ # it extracts 'a.tgz' to the folder 'a 1' as folder 'a' already exists in the same folder.
m
n
o
p
Although I have provided plenty of documentation in my code below, I would just brief out the structure of my program. Here are the functions I have defined -
FileExtension --> returns the extension of a file
AppropriateFolderName --> helps in preventing overwriting/replacing of already existing folders (how? you will see it in the program)
Extract --> extracts a .tgz file (safely)
WalkTreeAndExtract - walks down a directory (passed as parameter) and extracts all .tgz files(recursively) on the way down.
I cannot suggest changes to what you have done, as my approach is a bit different. I have used extractall method of the tarfile module instead of the bit complicated extract method as you have done. (Just have glance at this - http://docs.python.org/library/tarfile.html#tarfile.TarFile.extractall and read the warning associated with using extractall method. I don`t think we will be having any such problem in general, but just keep that in mind.)
So here is the code that worked for me -
(I tried it for .tar files nested 5 levels deep (ie .tar within .tar within .tar ... 5 times), but it should work for any depth* and also for .tgz files.)
# extracting_nested_tars.py
import os
import re
import tarfile
file_extensions = ('tar', 'tgz')
# Edit this according to the archive types you want to extract. Keep in
# mind that these should be extractable by the tarfile module.
def FileExtension(file_name):
"""Return the file extension of file
'file' should be a string. It can be either the full path of
the file or just its name (or any string as long it contains
the file extension.)
Examples:
input (file) --> 'abc.tar'
return value --> 'tar'
"""
match = re.compile(r"^.*[.](?P<ext>\w+)$",
re.VERBOSE|re.IGNORECASE).match(file_name)
if match: # if match != None:
ext = match.group('ext')
return ext
else:
return '' # there is no file extension to file_name
def AppropriateFolderName(folder_name, parent_fullpath):
"""Return a folder name such that it can be safely created in
parent_fullpath without replacing any existing folder in it.
Check if a folder named folder_name exists in parent_fullpath. If no,
return folder_name (without changing, because it can be safely created
without replacing any already existing folder). If yes, append an
appropriate number to the folder_name such that this new folder_name
can be safely created in the folder parent_fullpath.
Examples:
folder_name = 'untitled folder'
return value = 'untitled folder' (if no such folder already exists
in parent_fullpath.)
folder_name = 'untitled folder'
return value = 'untitled folder 1' (if a folder named 'untitled folder'
already exists but no folder named
'untitled folder 1' exists in
parent_fullpath.)
folder_name = 'untitled folder'
return value = 'untitled folder 2' (if folders named 'untitled folder'
and 'untitled folder 1' both
already exist but no folder named
'untitled folder 2' exists in
parent_fullpath.)
"""
if os.path.exists(os.path.join(parent_fullpath,folder_name)):
match = re.compile(r'^(?P<name>.*)[ ](?P<num>\d+)$').match(folder_name)
if match: # if match != None:
name = match.group('name')
number = match.group('num')
new_folder_name = '%s %d' %(name, int(number)+1)
return AppropriateFolderName(new_folder_name,
parent_fullpath)
# Recursively call itself so that it can be check whether a
# folder named new_folder_name already exists in parent_fullpath
# or not.
else:
new_folder_name = '%s 1' %folder_name
return AppropriateFolderName(new_folder_name, parent_fullpath)
# Recursively call itself so that it can be check whether a
# folder named new_folder_name already exists in parent_fullpath
# or not.
else:
return folder_name
def Extract(tarfile_fullpath, delete_tar_file=True):
"""Extract the tarfile_fullpath to an appropriate* folder of the same
name as the tar file (without an extension) and return the path
of this folder.
If delete_tar_file is True, it will delete the tar file after
its extraction; if False, it won`t. Default value is True as you
would normally want to delete the (nested) tar files after
extraction. Pass a False, if you don`t want to delete the
tar file (after its extraction) you are passing.
"""
tarfile_name = os.path.basename(tarfile_fullpath)
parent_dir = os.path.dirname(tarfile_fullpath)
extract_folder_name = AppropriateFolderName(tarfile_name[:\
-1*len(FileExtension(tarfile_name))-1], parent_dir)
# (the slicing is to remove the extension (.tar) from the file name.)
# Get a folder name (from the function AppropriateFolderName)
# in which the contents of the tar file can be extracted,
# so that it doesn't replace an already existing folder.
extract_folder_fullpath = os.path.join(parent_dir,
extract_folder_name)
# The full path to this new folder.
try:
tar = tarfile.open(tarfile_fullpath)
tar.extractall(extract_folder_fullpath)
tar.close()
if delete_tar_file:
os.remove(tarfile_fullpath)
return extract_folder_name
except Exception as e:
# Exceptions can occur while opening a damaged tar file.
print 'Error occured while extracting %s\n'\
'Reason: %s' %(tarfile_fullpath, e)
return
def WalkTreeAndExtract(parent_dir):
"""Recursively descend the directory tree rooted at parent_dir
and extract each tar file on the way down (recursively).
"""
try:
dir_contents = os.listdir(parent_dir)
except OSError as e:
# Exception can occur if trying to open some folder whose
# permissions this program does not have.
print 'Error occured. Could not open folder %s\n'\
'Reason: %s' %(parent_dir, e)
return
for content in dir_contents:
content_fullpath = os.path.join(parent_dir, content)
if os.path.isdir(content_fullpath):
# If content is a folder, walk it down completely.
WalkTreeAndExtract(content_fullpath)
elif os.path.isfile(content_fullpath):
# If content is a file, check if it is a tar file.
# If so, extract its contents to a new folder.
if FileExtension(content_fullpath) in file_extensions:
extract_folder_name = Extract(content_fullpath)
if extract_folder_name: # if extract_folder_name != None:
dir_contents.append(extract_folder_name)
# Append the newly extracted folder to dir_contents
# so that it can be later searched for more tar files
# to extract.
else:
# Unknown file type.
print 'Skipping %s. <Neither file nor folder>' % content_fullpath
if __name__ == '__main__':
tarfile_fullpath = 'fullpath_path_of_your_tarfile' # pass the path of your tar file here.
extract_folder_name = Extract(tarfile_fullpath, False)
# tarfile_fullpath is extracted to extract_folder_name. Now descend
# down its directory structure and extract all other tar files
# (recursively).
extract_folder_fullpath = os.path.join(os.path.dirname(tarfile_fullpath),
extract_folder_name)
WalkTreeAndExtract(extract_folder_fullpath)
# If you want to extract all tar files in a dir, just execute the above
# line and nothing else.
I have not added a command line interface to it. I guess you can add it if you find it useful.
Here is a slightly better version of the above program -
http://guanidene.blogspot.com/2011/06/nested-tar-archives-extractor.html

Categories