I need to compare two folders on a XP machine.
This is a radio station, we have all our music stored as high bitrate mp3, when new songs are acquired from CD they are wav. I need to be able to compare the mp3 and the wav folders for duplicates (naming will be identical except for the file extension). The object is to produce a list of items in the wav folder that don't have mp3 versions.
Python 2.7 is installed and my very limited experience of coding has been with python.
All help appreciated, even if it is just a kick in the right direction...
Thanks.
Use os.listdir to get the folder contents, and os.path.splitext to determine the base name:
import os
wavs = set(os.path.splitext(fn)[0] for fn in os.listdir('/path/to/wavs'))
mp3s = set(os.path.splitext(fn)[0] for fn in os.listdir('/path/to/mp3s'))
must_convert = wavs - mp3s
If you want to collate the mp3s and wavs of multiple folders (but not recursively), you'll have to store both basename and the full filename:
import os,collections
files = collections.defaultdict(dict)
for d in ['/path/to/wavs', '/more/wavs', '/some/mp3s', '/other/mp3s']:
for f in os.listdir(d):
basename,ext = os.path.splitext(f)
files[ext][basename] = os.path.join(d, f)
files_to_convert = [fn for basename,fn in files['.wav'].items()
if basename not in files['.mp3']]
import os
wav=[os.path.splitext(x)[0] for x in os.listdir(r'C:\Music\wav') if os.path.splitext(x)[1]=='.wav']
mp3=[os.path.splitext(x)[0] for x in os.listdir(r'C:\Music\mp3') if os.path.splitext(x)[1]=='.mp3']
#here wav is a list names of only those files whose extension is .wav
#here mp3 is a list names of only those files whose extension is .mp3
print(set(wav)-set(mp3))
Here is a solution that works recursively, slightly based on phihag's answer.
import os
sets = {}
for dirname in 'mp3_folder', 'wav_folder':
sets[dirname] = set()
for path, dirs, files in os.walk(dirname):
sets[dirname].update(os.path.join(path, os.path.splitext(fn)[0]).lstrip(dirname) for fn in files)
must_convert = sets['mp3_folder']-sets['wav_folder']
print('\n'.join(sorted(must_convert)))
Related
I am struggling with the paths and directories to solve this problem. Basically, I have a long list of .lammps files in one directory. My goal is to copy each file and move it into its own folder (which is one directory back) where its folder name is the file name minus the .lammps. All of the folders are already made, I just can't seem to figure out moving them. The entire list of files is in the Files directory. The individual folders are in the ROTATED FILES directory. Here is what I have. Any tips greatly appreciated.
Here is a file example
n-optimized.new.10_10-90-10_10.Ni00Nj01.lammps
The folder for this file is then named
n-optimized.new.10_10-90-10_10.Ni00Nj01
import os
file_directory = os.chdir("C:\Py Practice\ROTATED FILES\Files")
files = os.listdir()
for file in files:
# get the file -.lammps string
name1 = file.split('.')[0:4]
name2 = ".".join(name1)
# get the path for the files new respective folder (back a directory and paste folder name)
file_folder = "C:\Py Practice\ROTATED FILES/" + name2
# Move
combined_path = os.path.join(file, file_folder)
I've tried shutil and figured path join might be easier.
First of all, the code you have here shouldn't work since you either have to escape backslashes or use a raw string. Secondly, rather than using os for file system operations, it's much better to learn how to use pathlib (also a core python module) which provides a more modern object-oriented approach to file operations.
Using pathlib and shutil you can do something like
from pathlib import Path
from shutil import copyfile
file_directory = Path(r"C:\Py Practice\ROTATED FILES\Files")
# get the list of source files
source_files = [f for f in file_directory.glob('*.lammps')]
# create target file paths
target_files = [file_directory.parent / f.stem/ f.name for f in source_files]
for source, target in zip(source_files, target_files):
copyfile(str(source), str(target))
Here we're accessing different parts of file path using a convenient OOP structure. For example, if your file f is located in 'c:/foo/bar/boo.txt' then f.name is just the name of file: boo.txt, f.stem is the stem part of the file name (excluding the extension) boo, f.parent is its parent directory 'c:/foo/bar/' etc.
There's a really handy graphic of pathlib Path objects here.
The only inconvenience is that not all of core modules support Path objects yet so for copyfile we just need to get the string representation by calling str on the object.
And you don't even need to have target folders created beforehand, it's very easy to create the necessary folder structure as you go along:
from pathlib import Path
from shutil import copyfile
file_directory = Path(r"C:\Py Practice\ROTATED FILES\Files")
# get the list of source files
source_files = [f for f in file_directory.glob('*.lammps')]
# create target file paths
target_files = [file_directory.parent / f.stem/ f.name for f in source_files]
for source, target in zip(source_files, target_files):
# check that target directory exists
# and create a folder if not
if not target.parent.is_dir():
target.parent.mkdir()
copyfile(str(source), str(target))
I've got 2 folders, each with a different CSV file inside (both have the same format):
I've written some python code to search within the "C:/Users/Documents" directory for CSV files which begin with the word "File"
import glob, os
inputfile = []
for root, dirs, files in os.walk("C:/Users/Documents/"):
for datafile in files:
if datafile.startswith("File") and datafile.endswith(".csv"):
inputfile.append([os.path.join(root, datafile)])
print(inputfile)
That almost worked as it returns:
[['C:/Users/Documents/Test A\\File 1.csv'], ['C:/Users/Documents/Test B\\File 2.csv']]
Is there any way I can get it to return this instead (no sub list and shows / instead of \):
['C:/Users/Documents/Test A/File 1.csv', 'C:/Users/Documents/Test B/File 2.csv']
The idea is so I can then read both CSV files at once later, but I believe I need to get the list in the format above first.
okay, I will paste an option here.
I made use of os.path.abspath to get the the path before join.
Have a look and see if it works.
import os
filelist = []
for folder, subfolders, files in os.walk("C:/Users/Documents/"):
for datafile in files:
if datafile.startswith("File") and datafile.endswith(".csv"):
filePath = os.path.abspath(os.path.join(folder, datafile))
filelist.append(filePath)
filelist
Result:
['C:/Users/Documents/Test A/File 1.csv','C:/Users/Documents/Test B/File 2.csv']
I am trying to zip each folder on its own in Python. However, the first folder is being zipped and includes all folders within it. Could someone please explain what is going on? Should I not be using shutil for this?
#%% Set path variable
path = r"G:\Folder"
os.chdir(path)
os.getcwd()
#%% Zip all folders
def retrieve_file_paths(dirName):
# File paths variable
filePaths = []
# Read all directory, subdirectories and file lists
for root, directories, files in os.walk(dirName):
for filename in directories:
# Createthe full filepath by using os module
filePath = os.path.join(root, filename)
filePaths.append(filePath)
# return all paths
return filePaths
filepaths = retrieve_file_paths(path)
#%% Print folders and start zipping individually
for x in filepaths:
print(x)
shutil.make_archive(x, 'zip', path)
shutil.make_archive will make an archive of all files and subfolders - since this is what most people want. If you need more choice of what files are included, you must use zipfile directly.
You can do this right within the walk loop (that is what it's for).
import os
import zipfile
dirName = 'C:\...'
# Read all directory, subdirectories and file lists
for root, directories, files in os.walk(dirName):
zf = zipfile.ZipFile(os.path.join(root, "thisdir.zip"), "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
for name in files:
if name == 'thisdir.zip': continue
filePath = os.path.join(root, name)
zf.write(filePath, arcname=name)
zf.close()
This will create a file "thisdir.zip" in each subdirectory, containing only the files within this directory.
(edit: tested & corrected code example)
Following Torben's answer to my question, I modified the code to zip each directory recursively. I realised what had happened was that I was not specifying sub directories. Code below:
#Set path variable
path = r"insert root directory here"
os.chdir(path)
# Declare the functionto return all file paths in selected directory
def retrieve_file_paths(dirName):
for root, dirs, files in os.walk(dirName):
for dir in dirs:
zf = zipfile.ZipFile(os.path.join(root+dir, root+dir+'.zip'), "w", compression=zipfile.ZIP_DEFLATED, compresslevel=9)
files = os.listdir(root+dir)
print(files)
filePaths.append(files)
for f in files:
filepath = root + dir +'/'+ f
zf.write(filepath, arcname=f)
zf.close()
retrieve_file_paths(path)
it's a relativly simple answer once you got a look onto the Docs.
You can see the following under shutil.make_archive:
Note This function is not thread-safe.
The way threading in computing works on a high level basis:
On a machine there are cores, which can process data. (e.g. AMD Ryzen 5 5800x with 8cores)
Within a process, there are threads (e.g. 16 Threads on the Ryzen 5800X).
However, in multiprocessing there is no data shared between the processes.
In multithreading within one process you can access data from the same variable.
Because this function is not thread-safe, you will share the variable "x" and access the same item. Which means there can only be one output.
Have a look into multithreading and works with locks in order to sequelize threads.
Cheers
I am new to python. I have successful written a script to search for something within a file using :
open(r"C:\file.txt) and re.search function and all works fine.
Is there a way to do the search function with all files within a folder? Because currently, I have to manually change the file name of my script by open(r"C:\file.txt),open(r"C:\file1.txt),open(r"C:\file2.txt)`, etc.
Thanks.
You can use os.walk to check all the files, as the following:
import os
for root, _, files in os.walk(path):
for filename in files:
with open(os.path.join(root, filename), 'r') as f:
#your code goes here
Explanation:
os.walk returns tuple of (root path, dir names, file names) in the folder, so you can iterate through filenames and open each file by using os.path.join(root, filename) which basically joins the root path with the file name so you can open the file.
Since you're a beginner, I'll give you a simple solution and walk through it.
Import the os module, and use the os.listdir function to create a list of everything in the directory. Then, iterate through the files using a for loop.
Example:
# Importing the os module
import os
# Give the directory you wish to iterate through
my_dir = <your directory - i.e. "C:\Users\bleh\Desktop\files">
# Using os.listdir to create a list of all of the files in dir
dir_list = os.listdir(my_dir)
# Use the for loop to iterate through the list you just created, and open the files
for f in dir_list:
# Whatever you want to do to all of the files
If you need help on the concepts, refer to the following:
for looops in p3: http://www.python-course.eu/python3_for_loop.php
os function Library (this has some cool stuff in it): https://docs.python.org/2/library/os.html
Good luck!
You can use the os.listdir(path) function:
import os
path = '/Users/ricardomartinez/repos/Salary-API'
# List for all files in a given PATH
file_list = os.listdir(path)
# If you want to filter by file type
file_list = [file for file in os.listdir(path) if os.path.splitext(file)[1] == '.py']
# Both cases yo can iterate over the list and apply the operations
# that you have
for file in file_list:
print(file)
#Operations that you want to do over files
as the title would imply I am looking to create a script that will allow me to print a list of file names in a directory to a CSV file.
I have a folder on my desktop that contains approx 150 pdf's. I'd like to be able to have the file names printed to a csv.
I am brand new to Python and may be jumping out of the frying pan and into the fire with this project.
Can anyone offer some insight to get me started?
First off you will want to start by grabbing all of the files in the directory, then simply by writing them to a file.
from os import listdir
from os.path import isfile, join
import csv
onlyfiles = [f for f in listdir("./") if isfile(join("./", f))]
with open('file_name.csv', 'w') as print_to:
writer = csv.writer(print_to)
writer.writerow(onlyfiles)
Please Note
"./" on line 5 is the directory you want to grab the files from.
Please replace 'file_name.csv' with the name of the file you want to right too.
The following will create a csv file with all *.pdf files:
from glob import glob
with open('/tmp/filelist.csv', 'w') as fout:
# write the csv header -- optional
fout.write("filename\n")
# write each filename with a newline characer
fout.writelines(['%s\n' % fn for fn in glob('/path/to/*.pdf')])
glob() is a nice shortcut to using listdir because it supports wildcards.
import os
csvpath = "csvfile.csv"
dirpath = "."
f = open("csvpath, "wb")
f.write(",".join(os.listdir(dirpath)))
f.close()
This may be improved to present filenames in way that you need, like for getting them back, or something. For instance, this most probably won't include unicode filenames in UTF-8 form but make some mess out of the encoding, but it is easy to fix all that.
If you have very big dir, with many files, you may have to wait some time for os.listdir() to get them all. This also can be fixed by using some other methods instead of os.listdir().
To differentiate between files and subdirectories see Michael's answer.
Also, using os.path.isfile() or os.path.isdir() you can recursively get all subdirectories if you wish.
Like this:
def getall (path):
files = []
for x in os.listdir(path):
x = os.path.join(path, x)
if os.path.isdir(x): files += getall(x)
else: files.append(x)
return files