So, I created a Python script to batch convert PDF files using Ghostscript. Ideally it should work, but I am not sure why it isn't working. For now, it is going through the input PDF files twice and when it runs the second time, it overwrites the output files.
Here's the script.
from __future__ import print_function
import os
import subprocess
try:
os.mkdir('compressed')
except FileExistsError:
pass
for root, dirs, files in os.walk("."):
for file in files:
if file.endswith(".pdf"):
filename = os.path.join(root, file)
arg1= '-sOutputFile=' + './compressed/' + file
print ("compressing:", file )
p = subprocess.Popen(['gs', '-sDEVICE=pdfwrite', '-dCompatibilityLevel=1.4', '-dPDFSETTINGS=/screen', '-dNOPAUSE', '-dBATCH', '-dQUIET', str(arg1), filename], stdout=subprocess.PIPE).wait()
Here's the ouput.
I am missing what did I do wrong.
file is just the name of the file. You have several files called the same in different directories. Don't forget that os.walk recurses in subdirectories by default.
So you have to save the converted files in a directory or name which depends on root.
and put the output directory outside the current directory as os.walk will scan it
For instance, for flat output replace:
arg1= '-sOutputFile=' + './compressed/' + file
by
arg1= '-sOutputFile=' + '/somewhere/else/compressed/' + root.strip(".").replace(os.sep,"_")+"_"+file
The expression
root.strip(".").replace(os.sep,"_")
should create a "flat" version of root tree without current directory (no dot) and path separators converted to underscores, plus one final underscore. That's one option that would work.
An alternate version that won't scan ./compressed or any other subdirectory (maybe more what you're looking for) would be using os.listdir instead (no recursion)
root = "."
for file in os.listdir(root):
if file.endswith(".pdf"):
filename = os.path.join(root, file)
arg1= '-sOutputFile=' + './compressed/' + file
print ("compressing:", file )
Or os.scandir
root = "."
for entry in os.scandir(root):
file = entry.name
if file.endswith(".pdf"):
filename = os.path.join(root, file)
arg1= '-sOutputFile=' + './compressed/' + file
print ("compressing:", file )
Your problem is that os.walk will also retrieve that contents in "compressed" directory. This is because the files will be compressed and created before os.walk list files in that directory. If you add print(os.path.join(root, file)) to your for-loop you will notice that.
Bellow is a snippet that works since the files retrieved are only the ones in the current directory.
import os
os.makedirs("compressed", exist_ok=True)
for file in os.listdir("."):
if not os.path.isfile(file):
continue
if not file.endswith(".pdf"):
continue
print(file)
os.walk will by definition enter into subdirectories, so you are compressing the files in the compressed subdirectory a second time.
Probably you simply want
for file in os.scandir("."):
...
As an aside, you almost certainly want to avoid Popen in favor of subprocess.run() or one of its legacy variations.
On the first iteration of
for root, dirs, files in os.walk(".")
you find the files in the current directory, then you compress them into the
./compressed/*.pdf path.
After that the second iteration of the outer loop will find the already compressed files in the subdirectory.
Easiest fix is to move the output directory outside of the input directory (or create an input directory next to the compressed dir, and read the files from there instead of .)
Related
I'm trying to write a python script to move all music files from my whole pc to one spcific folder.
They are scattered everywhere and I want to get them all in one place, so I don't want to copy but completely move them.
I was already able to make a list of all the files with this script:
import os
targetfiles = []
extensions = (".mp3", ".wav", ".flac")
for root, dirs, files in os.walk('/'):
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
print(targetfiles)
This prints out a nice list of all the files but I'm stuck to now move them.
I did many diffent tries with different code and this was one of them:
import os
import shutil
targetfiles = []
extensions = (".mp3", ".wav", ".flac")
for root, dirs, files in os.walk('/'):
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
new_path = 'C:/Users/Nicolaas/Music/All' + file
shutil.move(targetfiles, new_path)
But everything I try gives me an error:
TypeError: rename: src should be string, bytes or os.PathLike, not list
I think I've met my limit gathering this all as I'm only starting at Python but I would be very grateful if anyone could point me in the right direction!
You are trying to move a list of files to a new location, but the shutil.move function expects a single file as the first argument. To move all the files in the targetfiles list to the new location, you have to use a loop to move each file individually.
for file in targetfiles:
shutil.move(file, new_path)
Also if needed add a trailing slash to the new path 'C:/Users/Nicolaas/Music/All/'
On a sidenote are you sure that moving all files with those extentions is a good idea? I would suggest copying them or having a backup.
Edit:
You can use an if statement to exclude certain folders from being searched.
for root, dirs, files in os.walk('/'):
if any(folder in root for folder in excluded_folders):
continue
for file in files:
if file.endswith(extensions):
targetfiles.append(os.path.join(root, file))
Where excluded_folder is a list of the unwanted folders like: excluded_folders = ['Program Files', 'Windows']
I would suggest using glob for matching:
import glob
def match(extension, root_dir):
return glob.glob(f'**\\*.{extension}', root_dir=root_dir, recursive=True)
root_dirs = ['C:\\Path\\to\\Albums', 'C:\\Path\\to\\dir\\with\\music\\files']
excluded_folders = ['Bieber', 'Eminem']
extensions = ("mp3", "wav", "flac")
targetfiles = [f'{root_dir}\\{file_name}' for root_dir in root_dirs for extension in extensions for file_name in match(extension, root_dir) if not any(excluded_folder in file_name for excluded_folder in excluded_folders)]
Then you can move these files to new_path
I have written a small script to hopefully iterate through my directory/folder and replace act with csv. Essentially, I have 11 years worth of files that have a .act extension and I just want to replace it with .csv
import os
files = os.listdir("S:\\folder\\folder1\\folder2\\folder3")
path = "S:\\folder\\folder1\\folder2\\folder3\\"
#print(files)
for x in files:
new_name = x.replace("act","csv")
os.rename(path+x,path+new_name)
print(new_name)
When I execute this, it worked for the first five files and then failed on the sixth with the following error:
FileNotFoundError: [WinError 2] The system cannot find the file specified: 'S:\\folder\\folder1\\folder2\\folder3\\file_2011_06.act' -> 'S:\\folder\\folder1\\folder2\\folder3\\file_2011_06.csv'
When I searched for "S:\folder\folder1\folder2\folder3\file_2011_06.act" in file explorer, the file opens. Are there any tips on what additional steps I can take to debug this issue?
Admittedly, this is my first programming script. I'm trying to do small/minor things to start learning. So, I likely missed something... Thank you!
In your solution, you use string's replace to replace "act" by "csv". This could lead to problems if your path contains "act" somewhere else, e.g., S:\\facts\\file_2011_01.act would become S:\\fcsvs\\file_2011_01.act and rename will throw a FileNotFoundError because rename cannot create folders.
When dealing with file names (e.g., concatenating path fragments, extracting file extensions, ...), I recommend using os.path or pathlib instead of direct string manipulation.
I would like to propose another solution using os.walk. In contrast to os.listdir, it recursively traverses all sub-directories in a single loop.
import os
def act_to_csv(directory):
for root, folders, files in os.walk(directory):
for file in files:
filename, extension = os.path.splitext(file)
if extension == '.act':
original_filepath = os.path.join(root, file)
new_filepath = os.path.join(root, filename + '.csv')
print(f"{original_filepath} --> {new_filepath}")
os.rename(original_filepath, new_filepath)
Also, I'd recommend to first backup your files before manipulating them with scripts. Would be annoying to loose data or see it becoming a mess because of a bug in a script.
import os
folder="S:\\folder\\folder1\\folder2\\folder3\\"
count=1
for file_name in os.listdir(folder):
source = folder + file_name
destination = folder + str(count) + ".csv"
os.rename(source, destination)
count += 1
print('All Files Renamed')
print('New Names are')
res = os.listdir(folder)
print(res)
I have read quite a few links on the site saying to use "os.path.abspath(#filename)". This method isn't exactly working for me. I am writing a program that will be able to search a given directory for files with certain extensions, save the name and absolute path as keys and values (respectively) into a dictionary, and then use the absolute path to open the files and make the edits that are required. The problem I am having is that when I use os.path.abspath() it isn't returning the full path.
Let's say my program is on the desktop. I have a file stored at "C:\Users\Travis\Desktop\Test1\Test1A\test.c". My program can easily locate this file, but when I use os.path.abspath() it returns "C:\Users\Travis\Desktop\test.c" which is the absolute path of where my source code is stored, but not the file I was searching for.
My exact code is:
import os
Files={}#Dictionary that will hold file names and absolute paths
root=os.getcwd()#Finds starting point
for root, dirs, files in os.walk(root):
for file in files:
if file.endswith('.c'):#Look for files that end in .c
Files[file]=os.path.abspath(file)
Any tips or advice as to why it may be doing this and how I can fix it? Thanks in advance!
os.path.abspath() makes a relative path absolute relative to the current working directory, not to the file's original location. A path is just a string, Python has no way of knowing where the filename came from.
You need to supply the directory yourself. When you use os.walk, each iteration lists the directory being listed (root in your code), the list of subdirectories (just their names) and a list of filenames (again, just their names). Use root together with the filename to make an absolute path:
Files={}
cwd = os.path.abspath(os.getcwd())
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith('.c'):
Files[file] = os.path.join(root, os.path.abspath(file))
Note that your code only records the one path for each unique filename; if you have foo/bar/baz.c and foo/spam/baz.c, it depends on the order the OS listed the bar and spam subdirectories which one of the two paths wins.
You may want to collect paths into a list instead:
Files={}
cwd = os.path.abspath(os.getcwd())
for root, dirs, files in os.walk(cwd):
for file in files:
if file.endswith('.c'):
full_path = os.path.join(root, os.path.abspath(file))
Files.setdefault(file, []).append(full_path)
Per the docs for os.path.join,
If any component is an absolute path, all previous components (on
Windows, including the previous drive letter, if there was one) are
thrown away
So, for example, if the second argument is an absolute path, the first path, '/a/b/c' is discarded.
In [14]: os.path.join('/a/b/c', '/d/e/f')
Out[14]: '/d/e/f'
Therefore,
os.path.join(root, os.path.abspath(file))
will discard root no matter what it is, and return os.path.abspath(file) which will tack file on to the current working directory, which will not necessarily be the same as root.
Instead, to form the absolute path to the file:
fullpath = os.path.abspath(os.path.join(root, file))
Actually, I believe the os.path.abspath is unnecessary, since I believe root will always be absolute, but my reasoning for that depends on the source code for os.walk not just the documented (guaranteed) behavior of os.walk. So to be absolutely sure (pun intended), use os.path.abspath.
import os
samefiles = {}
root = os.getcwd()
for root, dirs, files in os.walk(root):
for file in files:
if file.endswith('.c'):
fullpath = os.path.join(root, file)
samefiles.setdefault(file, []).append(fullpath)
print(samefiles)
Glob is useful in these cases, you can do:
files = {f:os.path.join(os.getcwd(), f) for f in glob.glob("*.c")}
to get the same result
I have this script, which I have no doubt is flawed:
import fnmatch, os, sys
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
with open(filename) as f:
s = f.read()
f.close()
if find in s :
print(filename)
findit(sys.argv[1], sys.argv[2], sys.argv[3])
when I run it I get Errno2, no such file or directory. BUT the file exists. For instance if I execute it by going: findit.py c:\python "folder" *.py it will work just fine, listing all the *.py files which contain the word "folder". BUT if I go findit.py c:\php\projects1 "include" *.php
as an example I get [Errno2] no such file or directory: 'About.php' (for example). But About.php exists. I don't understand what it's doing, or what I'm doing wrong.
If you look at any of the examples for os.walk, you'll see that they all do os.path.join(root, name). You need to do that too.
Why? Quoting from the docs:
filenames is a list of the names of the non-directory files in dirpath. Note that the names in the lists contain no path components. To get a full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
If you just use the filename as a path, it's going to look for a file of the same name in the current working directory. If there's no such file, you'll get a FileNotFoundError. If there is such a file, you'll open and read the wrong file. Only if you happen to be looking inside the current working directory will it work.
There's also another major problem in your code: os.walk walks a directory tree recursively, finding all files in the given top directory, or any subdirectory of top, or any subdirectory of… and so on, yielding once for each directory. But you're not doing anything useful with that (except printing out the folders). Instead, you wait until it finishes, and then use the files from whichever directory it happened to reach last.
If you just want to get a flat listing of the files directly in a directory, use os.listdir, not os.walk. (Or maybe use glob.glob instead of explicitly listing everything then filtering with fnmatch.)
On the other hand, if you want to walk the tree, you have to move your second for loop inside the first one.
You've also got a minor problem: You call f.close() inside a with open(…) as f:, which leads to f being closed twice. This is guaranteed to be completely harmless (at least in 2.5+, including 3.x), but it's still a bad idea.
Putting it together, here's a working version of your code:
def findit (rootdir, find, pattern):
for folder, dirs, files in os.walk(rootdir):
print (folder)
for filename in fnmatch.filter(files,pattern):
pathname = os.path.join(folder, filename)
with open(pathname) as f:
s = f.read()
if find in s:
print(pathname)
You are using a relative filename. But your current directory does not contain the file. And you don't want to search there anyway. Use os.path.join(folder, filename) to make an absolute path.
I was looking to copy a whole directory and its files but also printing each file name that its being copied.
I was using a simply call to cp -rf dir dest with os.system but I cant print each filename separately as obvious.
I then thought about listing eash directory file by calling recursively ls with os.system, saving the whole string, split them on an array, and implement a for loop to run os.system("cp " file1 + " des/") and printing the filename, but it looks like lot of work.
Any better ideas to accomplish this?
You can use os.walk to get the entire directory listing and use that listing to copy all files iteratively. Something like
file_paths = [os.path.join(root, f) for root, _, files in os.walk('.') for f in files]
for path in file_paths:
print path
shutil.copy(path, target)
Alternatively according to MatthewFranglen's comment you can just do shutil.copytree(src, dst). That will also allow you to ignore things but you'll need to define a function to do that instead of using an if in the list comprehension.
# ignore all .DS_Store and *.txt files
file_paths = [os.path.join(root, f) for root, _, files in os.walk('.') for f in files if (f != '.DS_Store') or f.endswith('.txt'))]
compared to
from shutil import copytree, ignore_patterns
ignore_func = ignore_patterns('.DS_Store', '*.txt') # ignore .DS_Store and *.txt files
copytree('/path/to/dir/', '/other/dir', ignore=ignore_func)