I have 207 directories full of .WAV files, where each directory contains a certain number of files recorded on one day (the number varies from directory to directory). The names of the directories are just dates in YYYYMMDD format, and the filenames have already been modified so that their names are in ‘HHMMSS.WAV’ format (the time the recording was taken, i.e. 024545.WAV) in each directory. Each directory has a different recording period, so for example, directory1 contains files that were recorded on a certain day between 02am and 11am, while directory2 contains files that were recorded on a certain day between 11am and 6pm, etc.
I need to concatenate the files by hourly intervals; so for example, in directory1 there are 1920 clips, and I need to move files in each hourly interval into a separate directory – so effectively, there will be x number of new subdirectories for directory1 where x is the number of hourly intervals that are present in directory1 (i.e. directory1_00-01 for all the files in directory1 that were recorded between 00am and 01am, directory1_01-02 for all the files in directory1 that were recorded between 01am and 02am, etc. and if there were 6 hour intervals in directory1, I will need 6 subdirectories, one for each hour interval). I need to have these separate directories because it’s the only way I’ve figured out how to concatenate .WAV files together (see Script 2). The concatenated files should also be in a separate directory to contain all stitched files for directory1.
Currently, I’m doing everything manually using two python scripts and it’s getting extremely cumbersome since I’m physically changing the numbers and intervals for every hour (silly, I know):
Script 1 (to move all files in an hour into another directory; in this particular bit of code, I'm finding all the clips between 01am and 02am and moving them to the subdirectory within directory1 so that the subdirectory only contains files from 01am to 02am):
import os
import shutil
origin = r'PATH/TO/DIRECTORY1’
destination = r'PATH/TO/DIRECTORY1/DIRECTORY1_01-02'
startswith_ = '01'
[os.rename(os.path.join(origin,i), os.path.join(destination, i)) for i in os.listdir(origin) if i.startswith(startswith_)]
Script 2 (to concatenate all files in the folder and writing the output to another directory; in this particular bit of code, I'm in the subdirectory from Script 1, concatenating all the files within it, and saving the output file "directory1_01-02.WAV" in another subdirectory of directory1 called "directory1_concatenated"):
import os
import glob
import ffmpeg
from pydub import AudioSegment
os.chdir("PATH/TO/DIRECTORY1/DIRECTORY1_01-02'")
wav_segments = [AudioSegment.from_wav(wav_file) for wav_file in glob.glob("*.wav")]
combined = AudioSegment.empty()
for clip in wav_segments:
combined += clip
combined.export(‘PATH/TO/DIRECTORY1/DIRECTORY1_CONCATENATED/DIRECTORY1_01-02.WAV', format = “wav)
The idea is that by the end of it, "directory1_concatenated" should contain all the concatenated files from each hour interval within directory1.
Can anyone please help me somehow automate this process so I don’t have to do it manually for all 207 directories? Feel free to ask any questions about the process just in case I haven't explained myself very well (sorry!).
Edit:
Figured out how to automate Script 1 to run thanks to the os.walk suggestions :) Now I have a follow-up question about Script 2. How do you increment the saved files so that they're numbered? When I try the following, I get an "invalid syntax" error.
rootdir = 'PATH/TO/DIRECTORY1'
for root, dirs, files in os.walk(rootdir):
for i in dirs:
wav_segments = [AudioSegment.from_wav(wav_file) for wav_file in glob.glob("*.wav")]
combined = AudioSegment.empty()
for clip in wav_segments:
combined += clip
combined.export("PATH/TO/DIRECTORY1/DIRECTORY1_CONCATENATED/DIRECTORY1_%s.wav", format = "wav", % i)
i++
I've been reading some other stack overflow questions but they all seem to deal with specific files? Or maybe I'm just not understanding os.walk fully yet - sorry, beginner here.
I am pretty sure you could do something with os.walk() to walk through your directories and files. Look at this snippet:
import os
rootdir = '.'
for root, dirs, files in os.walk(rootdir):
print(root, dirs, files)
Related
I am trying to reorganize a large number of pdf files (3 million files, average file 300KB). Currently, the files are stored in randomly named folders, but I want to organize them by their file name. File names are 8-digit integers such as 12345678.pdf
Currently, the files are stored like this
/old/a/12345678.pdf
/old/a/12345679.pdf
/old/b/22345679.pdf
I want them to be stored like this
/new/12/345/12345678.pdf
/new/12/345/12345679.pdf
/new/22/345/22345679.pdf
I thought this was an easy task using shutil:
from pathlib import Path
import shutil
for path_old in Path('old').rglob('*.pdf'):
r = int(path_old.stem)
path_new = '/new/'+str(r// 1000**2)+'/'+str(r // 1000 % 1000)+'/'+path_old.name
shutil.move(str(path_old),path_new)
Unfortunately, this takes forever. My script is only moving ~15 files per second, which means it will take days to complete.
I am not exactly sure whether this is a Python/shutil problem or a more general IO problem - sorry if I misplaced the question. I am open to any type of solution that makes this process faster.
I had 120 files in my source folder which I need to move to a new directory (destination). The destination is made in the function I wrote, based on the string in the filename. For example, here is the function I used.
path ='/path/to/source'
dropbox='/path/to/dropbox'
files = = [os.path.join(path,i).split('/')[-1] for i in os.listdir(path) if i.startswith("SSE")]
sam_lis =list()
for sam in files:
sam_list =sam.split('_')[5]
sam_lis.append(sam_list)
sam_lis =pd.unique(sam_lis).tolist()
# Using the above list
ID = sam_lis
def filemover(ID,files,dropbox):
"""
Function to move files from the common place to the destination folder
"""
for samples in ID:
for fs in files:
if samples in fs:
desination = dropbox + "/"+ samples + "/raw/"
if not os.path.isdir(desination):
os.makedirs(desination)
for rawfiles in fnmatch.filter(files, pat="*"):
if samples in rawfiles:
shutil.move(os.path.join(path,rawfiles),
os.path.join(desination,rawfiles))
In the function, I am creating the destination folders, based on the ID's derived from the files list. When I tried to run this for the first time it threw me FILE NOT exists error.
However, later when I checked the source all files starting with SSE were missing. In the beginning, the files were there. I want some insights here;
Whether or not os.shutil.move moves the files to somewhere like a temp folder instead of destination folder?
whether or not the os.shutil.move deletes the files from the source in any circumstance?
Is there any way I can test my script to find the potential reasons for missing files?
Any help or suggestions are much appreciated?
It is late but people don't understand the op's question. If you move a file into a non-existing folder, the file seems to become a compressed binary and get lost forever. It has happened to me twice, once in git bash and the other time using shutil.move in Python. I remember the python happens when your shutil.move destination points to a folder instead of to a copy of the full file path.
For example, if you run the code below, a similar situation to what the op described will happen:
src_folder = r'C:/Users/name'
dst_folder = r'C:/Users/name/data_images'
file_names = glob.glob(r'C:/Users/name/*.jpg')
for file in file_names:
file_name = os.path.basename(file)
shutil.move(os.path.join(src_folder, file_name), dst_folder)
Note that dst_folder in the else block is just a folder. It should be dst_folder + file_name. This will cause what the Op described in his question. I find something similar on the link here with a more detailed explanation of what went wrong: File moving mistake with Python
shutil.move does not delete your files, if for any reason your files failed to move to a given location, check the directory where your code is stored, for a '+' folder your files are most likely stored there.
Structure: 20170410.1207.te <- Date (2017 04 10 , 12:07)
There is a company folder that contains several folders. All folders with the above structure which are older than 30 days should be moved to the folder _History (basically archiving them), but at least 5 should be left no matter which timestamp.
As a time value, the string must be taken from the folder name to be converted as a date and compared to today's date minus 30 days.
I also want to create a log file that logs when which folders were moved at what location.
The Code below just shows me the filename, can somebody help me please?
import os
import shutil
for subdir, dirs, files in os.walk("C:\Python-Script\Spielwiese"):
for file in files:
print(os.path.join(file))
shutil.move("C:\Python-Script\Spielwiese\", "C:\Python-Script\Spielwiese2")
The following code will return a list of all files in a given timeframe, sorted by create time on windows. Depending on how you want to filter, I can give you more information. You can than work on the resulting list. One more thing is, that you should use pathlib for windows filepaths, to not run into problems with german paths and unicode escapes in your pathname.
import os
import shutil
found_files = []
for subdir, dirs, files in os.walk("C:\Python-Script\Spielwiese"):
for file in files:
name = os.path.join(file)
create_date = os.path.getctime(file)
if create_date > some_time: # Put the timeframe here
found_files.append((name, create_date))
found_files.sort(key=lambda tup: tup[1]) # Sort the files according to creation time
I'm trying to write code that will move hundreds of PDF files from a :/Scans folder into another directory based on the matching each client's name. I'm not sure if I should be using Glob, or Shutil, or a combination of both. My working theory is that Glob should work for such a program, as the glob module "finds all the pathnames matching a specified pattern," and then use Shutil to physically move the files?
Here is a breakdown of my file folders to give you a better idea of what I'm trying to do.
Within :/Scans folder I have thousands of PDF files, manually renamed based on client and content, such that the folder looks like this:
lastName, firstName - [contentVariable]
(repeat the above 100,000x)
Within the :/J drive of my computer I have a folder named 'Clients' with sub-folders for each and every client, similar to the pattern above, named as 'lastName, firstName'
I'm looking to have the program go through the :/Scans folder and move each PDF to the matching client folder based on 'lastName, firstName'
I've been able to write a simple program to move files between folders, but not one that will do the aforesaid name matching.
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive','C:/Users/Kenny/Documents/Clients')
^ Moving a file from one folder to another.. quite easily done.
Is there a way to modify the above code to apply to a regex (below)?
shutil.copy('C:/Users/Kenny/Documents/Scan_Drive/\w*', 'C:/Users/Kenny/Documents/Clients/\w*')
EDIT: #Byberi - Something as such?
path = "C:/Users/Kenny/Documents/Scans"
dirs = os.path.isfile()
This would print all the files and directories
for file in dirs:
print file
dest_dir = "C:/Users/Kenny/Documents/Clients"
for file in glob.glob(r'C:/*'):
print(file)
shutil.copy(file, dest_dir)
I've consulted the following threads already, but I cannot seem to find how to match and move the files.
Select files in directory and move them based on text list of filenames
https://docs.python.org/3/library/glob.html
Python move files from directories that match given criteria to new directory
https://www.guru99.com/python-copy-file.html
https://docs.python.org/3/howto/regex.html
https://code.tutsplus.com/tutorials/file-and-directory-operations-using-python--cms-25817
I have a server that collects 3 CSV files every hour files thus generating 72 files in 24 hours. The files have a date and an hour extension, the hour extensions will be the same every day (for that hour) just the date will change in which they all go into a directory called completed.
Occasionally some files dont get collected in the hour thus i want to find out which CSV files are missing for that date .
To begin with i want to search for the files in the completed directory that were collected, using glob function
import sys
import glob
csv_files = glob.glob('*.csv')
print(csv_files)
Question : is it possible to make the wild card an input variable, for
example if i could input a date into this wild card i could see what files
have been generated for that day only , from here i could compare the 72
files that should be in the directory (based on the hour).
Any other ideas would be appreciated.