I have a folder with 1092 files. I need to move those files to a new directory in batches of 10 (each new folder will have only 10 files each, so max. of 110 folders).
I tried this code, and now the folders have been created, but I can't find any of original files (???). They are neither in the original and newly created folders...
path = "/home/user/Documents/MSc/Imagens/Dataset"
paths = []
for root, dirs, file in os.walk(path):
for name in file:
paths.append(os.path.join(root,name))
start = 0
end = 10
while end <= 1100:
dest = str(os.mkdir("Dataset_" + str(start) + "_" + str(end)))
for i in paths[start:end]:
shutil.move(i, dest)
start += 10
end += 10
Any ideas?
With your move command, you are moving all 10 files to one single folder - but not into that folder as the filenames are missing. And dest is none, since os.mkdir() doesn't return anything.
You need to append the filename to dest:
dataset_dirname = "Dataset_" + str(start) + "_" + str(end)
os.mkdir(dataset_dirname)
dataset_fullpath = os.path.join(path, dataset_dirname)
for i in paths[start:end]:
# append filename to dataset_fullpath and move the file
shutil.move(i, os.path.join(dataset_fullpath , os.path.basename(i)))
Related
I have a directory structure with a lot of files in it (~1 million) which I would like to zip into chunks of 10k files. So far I have this, which creates, well, garbage files-- when I unzip them it looks like all of the files are glommed into one long file instead of individual files--- and I'm stuck. Any help would be greatly appreciated.
dirctr = 1
for root, dirs, files in os.walk(args.input_dir, followlinks=False):
counter = 1
curtar= args.output_dir + 'File' + str(dirctr) + '.gz'
tar = tarfile.open(name=curtar, mode="w:gz")
for filename in files:
if ((counter -1) % args.files_per_dir) == 0:
if tarfile.is_tarfile(curtar):
tar.close(curtar)
dirctr = dirctr + 1
curtar= args.output_dir + 'File' + str(dirctr) + '.gz'
tar.open(name=curtar, mode="w:gz")
tar.add(os.path.join(root,filename))
counter = counter + 1
tar.close(curtar)
I am trying to figure out how to check if a file within my source folder, exists within my destination folder, then copy the file over to the destination folder.
If the file within the source folder exists within the destination folder, rename file within source folder to "_1" or _i+1 then copy it to destination folder.
For Example (will not be a .txt, just using this as an example, files will be dynamic in nature):
I want to copy file.txt from folder a over to folder b.
file.txt already exists within within folder b a. If I attempted to copy file.txt over to folder b, I would receive a copy error.
Rename file.txt to file_1.txt a. Copy file_1.txt to folder b b. If file_1.txt exists then make it file_2.txt
What I have so far is this:
for filename in files:
filename_only = os.path.basename(filename)
src = path + "\\" + filename
failed_f = pathx + "\\Failed\\" + filename
# This is where I am lost, I am not sure how to declare the i and add _i + 1 into the code.
if path.exists(file_path):
numb = 1
while True:
new_path = "{0}_{2}{1}".format(*path.splitext(file_path) + (numb,))
if path.exists(new_path):
numb += 1
shutil.copy(src, new_path)
else:
shutil.copy(src, new_path)
shutil.copy(src, file_path)
Thanks much in advance.
import os
for filename in files:
src = os.path.join(path, filename)
i = 0
while True:
base = os.path.basename(src)
name = base if i == 0 else "_{}".format(i).join(os.path.splitext(base))
dst_path = os.path.join(dst, name)
if not os.path.exists(dst_path):
shutil.copy(src, dst_path)
break
i += 1
How do i search a string in python for a value and only return if the string does not contain any leading/lagging values from the search string. white spacing is aceptable, i.e.
I would like to search a folder with sub folders below for "WO1" or "WO1 BLD2 L1 H3 Fitout"
Subfolder:
WO1
WO123
SP10152100 WO137
WO1 BLD2 L1 H3 Fitout
the code i am using returns "SP10152100 WO137"
temp_WO_dir1 = "WO" + wo
path = root + "\\"+ dir
for root, dirs, files in os.walk(path):
for dir in dirs:
print dir
if temp_WO_dir1 in dir:
find = True
sp_path = root + "\\" + dir + "\\3 Finance\\Telstra Invoicing"
print sp_path
break
if root.count(os.sep) - path.count(os.sep) == 0:
del dirs[:]
Update:
Tried the following but not working
if re.match(temp_WO_dir1+"\s" , dir):
WO1 flanked by a word boundry, proceeded by anything, followed by anything.
.*\bWO1\b.*
I have a script that:
Loops through all the files in a directory + its subdirectories
Creates folder for each unique year in the list of files
Moves files to their respective folder years
Renames them based on timestamp + unique number.
When I run parts 1-3 only, it moves the files to the folders correctly.
When I run parts 1-4 (including the os.rename part), it renames the files AFTER moving them back to the parent directory.
Start file structure:
parent_folder
--> file.txt modified 01-21-2012
--> file2.txt modified 09-30-2013
--> file3.txt modified 06-21-2017
Expected result:
parent_folder
--> '2012'
--> 2012-01-21-1.txt
--> '2013'
--> 2013-09-30-2.txt
--> '2017'
--> 2017-06-21-3.txt
Actual result:
parent_folder
--> '2012'
--> '2013'
--> '2017'
--> '2012-01-21-1.txt'
--> '2013-09-30-2.txt'
--> '2017-06-21-4.txt'
As you can see, it renamed the files but moved them out of their folders. Why is it doing this?
My code (I inserted print statements for logging):
import os, datetime, sys, shutil
#PART 1 : Change to the inputted directory
#===============================
# This is the directory I will work on.
p = 'ENTER_FOLDER_PATH_HERE'
print('This is the directory that will be organized:')
print(os.getcwd())
if os.path.isdir(p): # check if directory exists
print("Step 1: Changing directory")
os.chdir(p)
#PART 2 : Make a folder for each unique year
#===========================================
fileNames = next(os.walk(os.getcwd()))[2] # list files, excluding subdirectories
f = {}
filename = []
dates = []
# Loop through each file and grab the unique year.
# Store the file (key) and its modified year (value) into dictionary 'f'.
for name in fileNames:
f[name] = datetime.datetime.fromtimestamp(os.path.getmtime(name)).strftime("%Y")
dates = list(set(f.values()))
# Create the list of unique folders from the dictionary.
print("Step 2: Creating the following folders:\n", dates)
print('\n')
[os.mkdir(folder) for folder in dates]
#PART 3: Move all files to their respective folders based on modified year.
#==========================================================================
if sys.platform == 'Windows':
print("Step 3: moving files...")
[shutil.move(key, os.getcwd() + '\\' + value) for key, value in f.items()]
elif sys.platform == 'darwin':
print("Step 3: moving files...")
[shutil.move(key, os.getcwd() + '//' + value) for key, value in f.items()]
else:
print("Sorry, this script is not supported in your OS.")
else:
print("Oops, seems like that directory doesn't exist. Please try again.")
#PART 4: Rename the files
#==========================================================================
# Get each file in directory and renames it to its modified date, Y-M-D format
count=1
for root, dir, files in os.walk(p):
for file in files:
if not file.startswith('.'): # ignore hidden files
filePath = os.path.join(root,file)
ext = os.path.splitext(filePath)[1]
print("File number: ", count, file, ext)
print('\n')
os.rename(filePath, datetime.datetime.fromtimestamp(os.path.getmtime(filePath)).strftime("%Y-%m-%d") + '-' + str(count) + ext)
count += 1
print(filePath)
Logs:
This is the directory that will be organized:
TEST_PATH
Step 1: Changing directory
Step 2: Creating the following folders:
['2013', '2012', '2017']
Step 3: moving files...
File number: 1 2012-01-21-1.jpg TEST_PATH/2012/2012-01-21-1.jpg
TEST_PATH//2012/2012-01-21-1.jpg
File number: 2 2013-09-30-2.jpg TEST_PATH/2013/2013-09-30-2.jpg
TEST_PATH/2013/2013-09-30-2.jpg
TEST_PATH/2013/2013-09-30-2.jpg
File number: 4 June 21 2017.txt TEST_PATH/2017/June 21 2017.txt
TEST_PATH/2017/June 21 2017.txt
It moves the file, because of the working directory you are currently in. I gues it works just like mv command. The resulting file, after raname, will be put in a path specified by the second argument of the os.rename function, relative to cwd. If you want it to work correctly you need to specify the relative path with the new filename.
Btw. you can do steps 3&4 at once this way.
Currently I have 38 subfolders in my folder test The sub folder names start from 01 to 38. Each sub folder has 2 wav files which are named randomly I want to rename it properly and sequentially.
for example:
sub folder 01 has wav files My recording #1 and My recording #6 , I want them to be renamed as 01_test_01 and 01_test_02 so the last folder 38 should have files 38_test_01 and 38_test_02
below is my code
import os
name = 'test'
rootdir = r'C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test'
for subdir, dirs, files in os.walk(rootdir):
for file in files:
filepath = subdir+os.sep +file
if filepath.endswith('.wav'):
split_dir = subdir.split(os.sep)
f_name, f_ext=(os.path.splitext(file))
new_1 = split_dir[8]
y=1
while y < 3 :
new_name= (new_1 +'_' + 'test_' + str(y).zfill(2) + f_ext)
y = y +1
print (filepath)
print (subdir+os.sep+new_name)
os.rename(filepath, subdir+os.sep+new_name)
However when os.rename is executed I get the following error
Traceback (most recent call last):
File "C:\Users\kushal\Desktop\final_earthquake\sikkim_demo\demo_sikkim_victor\sort_inner_wav.py", line 23, in <module>
os.rename(filepath, subdir+os.sep+new_name)
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'C:\\Users\\kushal\\Desktop\\final_earthquake\\demonstration_sikkim\\wav\\test\\01\\My recording #5.wav' -> 'C:\\Users\\kushal\\Desktop\\final_earthquake\\demonstration_sikkim\\wav\\test\\01\\01_test_02.wav'
Most likey It is trying to rename the same file twice instead of renaming each file once from the subfolder
The output:
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\01\My recording #1.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\01\01_test_01.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\01\My recording #1.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\01\01_test_02.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\01\My recording #6.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\01\01_test_01.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\01\My recording #6.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\01\01_test_02.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\02\My recording #3.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\02\02_test_01.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\02\My recording #3.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\02\02_test_02.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\02\My recording #4.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\02\02_test_01.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\02\My recording #4.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\02\02_test_02.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\03\My recording #5.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\03\03_test_01.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\03\My recording #5.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\03\03_test_02.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\03\My recording #6.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\03\03_test_01.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\03\My recording #6.wav
C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test\03\03_test_02.wav
You have two for loops which is understandable but why there is an inner while loop? I guess, you added that while loop because you mentioned that, Each sub folder has 2 wav files.
The second for loop (for file in files:) and the inner while loop (while y < 3:) is actually causing the error. The second for loop is already iterating over all the files, so, you don't need that inner while loop.
Modify your program without the inner while loop as follows.
import os
rootdir = './test'
for subdir, dirs, files in os.walk(rootdir):
y = 1
for file in files:
filepath = subdir + os.sep + file
if filepath.endswith('.wav'):
split_dir = subdir.split(os.sep)
f_name, f_ext = os.path.splitext(file)
new_name= split_dir[len(split_dir) - 1] +'_' + 'test_' + str(y).zfill(2) + f_ext
y = y + 1
print (filepath)
print (subdir + os.sep + new_name)
os.rename(filepath, subdir + os.sep + new_name)
It outputs (in my scenario):
./test\01\yy.wav
./test\01\01_test_01.wav
./test\01\xx.wav
./test\01\01_test_02.wav
./test\02\yy.wav
./test\02\02_test_01.wav
./test\02\xx.wav
./test\02\02_test_02.wav
You have inner while loop, that cause the problem, it will use same filepath multiple times, replace it with if condition like this:
import os
name = 'test'
rootdir = r'C:\Users\kushal\Desktop\final_earthquake\demonstration_sikkim\wav\test'
for subdir, dirs, files in os.walk(rootdir):
y=1
for file in files:
filepath = subdir+os.sep +file
if filepath.endswith('.wav'):
split_dir = subdir.split(os.sep)
print split_dir
f_name, f_ext=(os.path.splitext(file))
new_1 = split_dir[7]
new_name= (new_1 +'_' + 'test_' + str(y).zfill(2) + f_ext)
y+=1
if y>3:
break
print (filepath)
print (subdir+os.sep+new_name)
os.rename(filepath, subdir+os.sep+new_name)